First Law in Disk Density 16 BM3380 [BM 3370/ 2 71727374757677787980818283848586878889 IGURE 80.10 Maximal areal density law. Squares represent predicted density, triangles are the MAD reported for the Inner Track FIGURE 80.11 Disk terminology. Heads reside on arms which are positioned by actuators. Tracks are concentric rings on platters. A sector is the basic unit of read/write. A cylinder is a stack of tracks at one actuator position. An HDA is everything in the figure plus the air-tight casing. In some devices it is possible to transfer from multiple surfaces simultaneously. the ollection of heads that participate in a single logical transfer that is spread over multiple surfaces is called a head group covered with a magnetic material for recording information. Each platter contains a number of circular cording tracks. A sector is a unit of a track that is physically read or written at the same time. In traditional magnetic disks, the constant angular rotation of the platters dictates that sectors on inner tracks are recorded nore densely than sectors on the outer tracks. Thus, the platter can spin at a constant rate and the same amount of data can be recorded on the inner and outer tracks. Some modern disks use zone recording techniques to more densely record data on the outer tracks, but this requires more sophisticated read/write electronics. The read/write head is an electromagnet that produces switchable magnetic fields to read and record bit streams on a platters track. It is associated with a disk arm, attached to an actuator. The head" flies"close to, but never touches, the rotating platter(except perhaps when powered down). This is the classical definition of a winchester disk. The actuator is a mechanical assembly that positions the head electronics over the appre priate track. It is possible to have multiple read/write mechanisms per surface, e.g., multiple heads per arm-at one extreme, one could have a head-per-track position, that is, the disk equivalent of a magnetic drum--or Some optical disks use a technique called constant linear velocity(CLv), where the platter rotates at different speeds depending on the relative position of the track. This allows more data to be stored on the outer tracks than the inner tracks, but because it takes more delay to vary the speed of rotation, the technique is better suited to sequential rather than random access e 2000 by CRC Press LLC
© 2000 by CRC Press LLC covered with a magnetic material for recording information. Each platter contains a number of circular recording tracks. A sector is a unit of a track that is physically read or written at the same time. In traditional magnetic disks, the constant angular rotation of the platters dictates that sectors on inner tracks are recorded more densely than sectors on the outer tracks. Thus, the platter can spin at a constant rate and the same amount of data can be recorded on the inner and outer tracks.1 Some modern disks use zone recording techniques to more densely record data on the outer tracks, but this requires more sophisticated read/write electronics. The read/write head is an electromagnet that produces switchable magnetic fields to read and record bit streams on a platter’s track. It is associated with a disk arm, attached to an actuator. The head “flies” close to, but never touches, the rotating platter (except perhaps when powered down). This is the classical definition of a Winchester disk. The actuator is a mechanical assembly that positions the head electronics over the appropriate track. It is possible to have multiple read/write mechanisms per surface, e.g., multiple heads per arm—at one extreme, one could have a head-per-track position, that is, the disk equivalent of a magnetic drum—or 1 Some optical disks use a technique called constant linear velocity (CLV), where the platter rotates at different speeds depending on the relative position of the track. This allows more data to be stored on the outer tracks than the inner tracks, but because it takes more delay to vary the speed of rotation, the technique is better suited to sequential rather than random access. FIGURE 80.10 Maximal areal density law. Squares represent predicted density; triangles are the MAD reported for the indicated products. FIGURE 80.11 Disk terminology. Heads reside on arms which are positioned by actuators. Tracks are concentric rings on platters. A sector is the basic unit of read/write. A cylinder is a stack of tracks at one actuator position. An HDA is everything in the figure plus the air-tight casing. In some devices it is possible to transfer from multiple surfaces simultaneously. The collection of heads that participate in a single logical transfer that is spread over multiple surfaces is called a head group
multiple arms per surface through multiple actuators. Due to costs and technical limitations, it is usually uneconomical to build a device with a large number of actuators and heads. A cylinder is a stack of tracks at one actuator position. A head disk assembly( HDa)is the collection of platters, heads, arms, and actuators, plus the air-tight casing. a disk drive is an HDA plus all associated electronics. A disk might be a platter, an actuator, or a drive depending the context. We can illustrate these concepts by describing two first-generation supercomputer disks, the Cray DD-19 and the CDC 819[Bucher and Hayes, 1980). These were state-of-the-art disks around 1980. Each disk has 40 recording surfaces(20 platters), 411 cylinders, and 18(DD-19)or 20(CDC 819)512-byte sectors per track. Both disks possess a limited"parallel read-out"capability. A given data word is actually byte interleaved over four surfaces. Rather than a single set of read/write electronics for the actuator, these disks have four sets, so it is possible to read or write with four heads at a time. Four heads on n adjacent arms are called a head group a disk track is thus composed of the stacked recording tracks of four adjacent surfaces, and there are 10 tracks per cylinder, spread over 40 surfaces. The advances over the last decade can be illustrated by the Cray DD-49, which is a typical high-end supercomputer disk of today. It consists of 16 recording surfaces(9 platters), 886 cylinders, 42 4096-byte sectors per track, with 32 read/write heads organized into eight head groups, four groups on each of two independent actuators. Each actuator can sweep the entire range of tracks, and by scheduling " the arms to position the actuator closest to the target track of the pending request, the average seek time can be reduced. The DD-49 has a capacity of 1. 2 Gbytes of storage and can transfer at a sustained rate of 9.6 Mbytes/s. G A variety of standard and proprietary interfaces are defined for transferring the data recorded on the disk to or from the host. We concentrate on industry standards here. On the disk surface, information is represented as alternating polarities of magnetic fields. These signals need to be sensed, amplified, and decoded into synchronized pulses by the read electronics. For example, the pulse-level protocol ST506/412 standard describes the way pulses can be extracted from the alternating flux fields. The bit-level ESDI, SMD, and IPI-2 standards describe the bit encoding of signals. At the packet level, these bits must be aligned into bytes, error correcting codes need to be applied, and the extracted data must be delivered to the host. These"intelligent"standards include SCSI(small computer standard interface)and IPI-3 The ST506 is a low-cost but primitive interface, most appropriate for interfacing floppy disks to personal computers and low-end workstations. For example, the controller must perform data separation on its own this is not done for it by the disk device. As a result, its transfer rate is limited to 0.625 Mbytes/s. The SMD interface is higher performance and is used extensively in connecting disks to mainframe disk controllers. ESDI is similar, but geared more towards smaller disk systems. One of its innovations over the ST506 is its ability to specify a seek to a particular track number rather than requiring track positioning via step-by-step pulses. Its performance is in the range of 1. 25-1.875 Mbytes/s SCSI has so far been used primarily with workstations and minicomputers, but offers the highest degree of integration and intelligence. Implementations with per- formance at the level of 1.5-4 Mbytes/s are common. The newer IPI-3 standard has the advan but provides even higher performance at a higher cost. It is beginning to make inroads into mainframe systems. However, because of the very widespread use of SCSI, many believe that SCSI-2, an extension of SCSI to wider signal paths, will become the de facto standard for high-performance small disks The connection pathway between the host and the disk device varies widely depending on the desired level of performance. A low-end workstation or personal computer would use a SCSi interface to directly connect the device to the host. a higher end file server or minicomputer would typically use a separate disk controller to anage several devices at the same time. These devices attach to the controller through SMD interfaces. It is the controller's responsibility to implement error checking and corrections and direct memory transfer to the host. Mainframes tend to have more devices and more complex interconnection schemes to access them. In IBM terminology [Buzen and Shum, 1986], the channel path, i. e, the set of cables and associated electronics that transfer data and control information between an I/o device and main memory, consists of a channel, a storage director, and a head of string(see Fig. 80. 12). The collection of disks that share the same pathway to the head In earlier IBM systems, a channel path and channel are essentially the same thing. The channel processor ne hardware that executes channel programs, which are fetched from the host's memory A subchannel is the e 2000 by CRC Press LLC
© 2000 by CRC Press LLC multiple arms per surface through multiple actuators. Due to costs and technical limitations, it is usually uneconomical to build a device with a large number of actuators and heads. A cylinder is a stack of tracks at one actuator position. A head disk assembly (HDA) is the collection of platters, heads, arms, and actuators, plus the air-tight casing. A disk drive is an HDA plus all associated electronics. A disk might be a platter, an actuator, or a drive depending the context. We can illustrate these concepts by describing two first-generation supercomputer disks, the Cray DD-19 and the CDC 819 [Bucher and Hayes, 1980]. These were state-of-the-art disks around 1980. Each disk has 40 recording surfaces (20 platters), 411 cylinders, and 18 (DD-19) or 20 (CDC 819) 512-byte sectors per track. Both disks possess a limited “parallel read-out” capability. A given data word is actually byte interleaved over four surfaces. Rather than a single set of read/write electronics for the actuator, these disks have four sets, so it is possible to read or write with four heads at a time. Four heads on adjacent arms are called a head group. A disk track is thus composed of the stacked recording tracks of four adjacent surfaces, and there are 10 tracks per cylinder, spread over 40 surfaces. The advances over the last decade can be illustrated by the Cray DD-49, which is a typical high-end supercomputer disk of today. It consists of 16 recording surfaces (9 platters), 886 cylinders, 42 4096-byte sectors per track, with 32 read/write heads organized into eight head groups, four groups on each of two independent actuators. Each actuator can sweep the entire range of tracks, and by “scheduling” the arms to position the actuator closest to the target track of the pending request, the average seek time can be reduced. The DD-49 has a capacity of 1.2 Gbytes of storage and can transfer at a sustained rate of 9.6 Mbytes/s. A variety of standard and proprietary interfaces are defined for transferring the data recorded on the disk to or from the host. We concentrate on industry standards here. On the disk surface, information is represented as alternating polarities of magnetic fields. These signals need to be sensed, amplified, and decoded into synchronized pulses by the read electronics. For example, the pulse-level protocol ST506/412 standard describes the way pulses can be extracted from the alternating flux fields. The bit-level ESDI, SMD, and IPI-2 standards describe the bit encoding of signals. At the packet level, these bits must be aligned into bytes, error correcting codes need to be applied, and the extracted data must be delivered to the host. These “intelligent” standards include SCSI (small computer standard interface) and IPI-3. The ST506 is a low-cost but primitive interface, most appropriate for interfacing floppy disks to personal computers and low-end workstations. For example, the controller must perform data separation on its own; this is not done for it by the disk device. As a result, its transfer rate is limited to 0.625 Mbytes/s. The SMD interface is higher performance and is used extensively in connecting disks to mainframe disk controllers. ESDI is similar, but geared more towards smaller disk systems. One of its innovations over the ST506 is its ability to specify a seek to a particular track number rather than requiring track positioning via step-by-step pulses. Its performance is in the range of 1.25–1.875 Mbytes/s. SCSI has so far been used primarily with workstations and minicomputers, but offers the highest degree of integration and intelligence. Implementations with performance at the level of 1.5–4 Mbytes/s are common. The newer IPI-3 standard has the advantages of SCSI, but provides even higher performance at a higher cost. It is beginning to make inroads into mainframe systems. However, because of the very widespread use of SCSI, many believe that SCSI-2, an extension of SCSI to wider signal paths, will become the de facto standard for high-performance small disks. The connection pathway between the host and the disk device varies widely depending on the desired level of performance. A low-end workstation or personal computer would use a SCSI interface to directly connect the device to the host. A higher end file server or minicomputer would typically use a separate disk controller to manage several devices at the same time. These devices attach to the controller through SMD interfaces. It is the controller’s responsibility to implement error checking and corrections and direct memory transfer to the host. Mainframes tend to have more devices and more complex interconnection schemes to access them. In IBM terminology [Buzen and Shum, 1986], the channel path, i.e., the set of cables and associated electronics that transfer data and control information between an I/O device and main memory, consists of a channel, a storage director, and a head of string (see Fig. 80.12). The collection of disks that share the same pathway to the head of string is called a string. In earlier IBM systems, a channel path and channel are essentially the same thing. The channel processor is the hardware that executes channel programs, which are fetched from the host’s memory. A subchannel is the
PU Channels [ Disks FIGURE 80.12 Host-to-device pathways For large IBM mainframes, the connection between host and device m through a channel, storage director, and string controller. Note that multiple storage directors can be attached to a multiple string controllers per storage director, and multiple devices per string controller. This multipathing approad it possible to share devices among hosts and to provide alternative pathways to better utilize the drives and controllers. While logically correct, the figure does not reflect the true physical components of high-end IBM systems(308X, 3090). The concept of channel has disappeared from these systems and has been replaced by a channel path execution environment of a channel program, similar to a process on a conventional CPU. Formerly, a subchannel was statically assigned for execution to a particular channel, but a major innovation in high-end BM systems(308X and 3090)allows subchannels to be dynamically switched among channel paths. This is like allocating a process to a new processor within a multiprocessor system every time it is rescheduled for execution I/O program control statements, e.g., transfer in channel, are interpreted by the channel, while the storage director(also known as the device controller or control unit) handles seek and data-transfer requests. Besides nese control functions, it may also perform certain datapath functions, such as error detection/correction and mapping between serial and parallel data. In response to requests from the storage director, the device will position the access mechanism, select the appropriate head, and perform the read or write. If the storage director is simply a control unit, then the datapath functions will be handled by the head of string (also known as a To minimize the latency caused by copying into and out of buffers, the IBM I/O system uses little buffering tween the device and memory. In a high-performance environment, devices spend a good deal of time waiting for the pathway's resources to become free. These resources are used for time periods related to disk transfer speeds, measured in milliseconds. One possible method for improving utilization is to support dis- connect/reconnect A subchannel can connect to a device, issue a seek, disconnect to free the channel path for other requests, and reconnect later to perform the transfer when the seek is completed. Unfortunately, not all reconnects can be serviced immediately, because the control units are busy servicing other devices. These RPS misses(to be described in more detail in the next section) are a major source of delay in heavily utilized IB storage subsystems [Buzen and Shum, 1987]. Performance can be further improved by providing multiple paths between memory and devices. To this purpose, IBMs high-end systems support dynamic path reconnect, a Only the most recent generation of storage directors (e. g, IBM 3880, 3990) te disk caches, but care must be taken to avoid cache management-related delays[ Buzen, 1982 e 2000 by CRC Press LLC
© 2000 by CRC Press LLC execution environment of a channel program, similar to a process on a conventional CPU. Formerly, a subchannel was statically assigned for execution to a particular channel, but a major innovation in high-end IBM systems (308X and 3090) allows subchannels to be dynamically switched among channel paths. This is like allocating a process to a new processor within a multiprocessor system every time it is rescheduled for execution. I/O program control statements, e.g., transfer in channel, are interpreted by the channel, while the storage director (also known as the device controller or control unit) handles seek and data-transfer requests. Besides these control functions, it may also perform certain datapath functions, such as error detection/correction and mapping between serial and parallel data. In response to requests from the storage director, the device will position the access mechanism, select the appropriate head, and perform the read or write. If the storage director is simply a control unit, then the datapath functions will be handled by the head of string (also known as a string controller). To minimize the latency caused by copying into and out of buffers, the IBM I/O system uses little buffering between the device and memory. 1 In a high-performance environment, devices spend a good deal of time waiting for the pathway’s resources to become free. These resources are used for time periods related to disk transfer speeds, measured in milliseconds. One possible method for improving utilization is to support disconnect/reconnect. A subchannel can connect to a device, issue a seek, disconnect to free the channel path for other requests, and reconnect later to perform the transfer when the seek is completed. Unfortunately, not all reconnects can be serviced immediately, because the control units are busy servicing other devices. These RPS misses (to be described in more detail in the next section) are a major source of delay in heavily utilized IBM storage subsystems [Buzen and Shum, 1987]. Performance can be further improved by providing multiple paths between memory and devices. To this purpose, IBM’s high-end systems support dynamic path reconnect, a 1 Only the most recent generation of storage directors (e.g., IBM 3880, 3990) incorporate disk caches, but care must be taken to avoid cache management-related delays [Buzen, 1982]. FIGURE 80.12 Host-to-device pathways. For large IBM mainframes, the connection between host and device must pass through a channel, storage director, and string controller. Note that multiple storage directors can be attached to a channel, multiple string controllers per storage director, and multiple devices per string controller. This multipathing approach makes it possible to share devices among hosts and to provide alternative pathways to better utilize the drives and controllers. While logically correct, the figure does not reflect the true physical components of high-end IBM systems (308X, 3090). The concept of channel has disappeared from these systems and has been replaced by a channel path
10S c byte/s DSU DS DSU DCU XIOP\ channel FIGURE 80.13 Elements of the Cray I/O system for the Y-MP. An IOS contains up to four IOPs. The MIOP connects to the operator workstation and performs mainly maintenance functions. The XIOP supports block multiplexing and is most appropriate for controlling relatively slow speed devices, such as tapes. The BIOP and DIOP are designed for controlling high-speed devices like disks. Up to four disk storage units(DSUs)can be attached through the disk control unit(DCu) to the IOP. Three DCUs can be connected to each of the BIOP and DIOP, leading to a total of 24 disks per IOS. The Y-MI can be configured with two IOSs, for a system total of 48 devices. mechanism that allows a subchannel to change its channel path each time it cycles through a disconnect/recon- nect with a given device. Rather than wait for its currently allocated path to become free, it can be assigned to another available path Turning to supercomputer I/O systems, we will now examine the I/O architecture of the Cray machines Because the Cray IO system(IOS)varies from model to model, the following discussion concentrates on the IOS found on the Cray X-MP and Y-MP [Cray, 1988]. In general, the IOS consists of two to four I/O processors (IOPs), each with its own local memory and sharing a common buffer memory with the other IOPs. The IOP is designed to be a simple, fast machine for controlling data transfers between devices and the central memory of the Cray main processors. Since it executes the control statements of an I/O program, it is not unlike the IBM channel processor in terms of its functionality, except that IO programs reside in its local memory rather than in the hosts. An IOP's local memory is connected through a high-speed communications interface, called a channel in Cray terminology, to a disk control unit(DCU). A given port into the local memory can be time multiplexed among multiple channels. Data is transferred back and forth between devices and the main processors through the IOPs local memory, which is interfaced to central memory through a 100-Mbyte/s hannel pair (one pathway for each direction of transfer) The dCU provides the interface between the IOP and the disk drives and is similar in functionality to IBMs storage director. It oversees the data transfers between devices and the IOP's local memory, provides speed atching buffer storage, and transmits control signals and status information between the IOP and the devices. Disk storage units(DSUs)are attached to the DCU through point-to-point connections. The DSU contains the disk device and is responsible for dealing with its own defect management, by using a technique called sector slipping. Figure 80.13 summarizes the elements of the Cray I/O system Digital Equipment Corporations high-end I/O strategy is described in terms of the digital storage architecture DSA)and is embodied in system configurations such as the VAXCluster shared disk system(see Fig 80.14) The architecture provides a rigorous definition of how storage subsystems and host computers interact. It achieves this by defining a client/server message-based model for I/O interaction based on device-independent interfaces[Massiglia, 1986; Kronenberg et al., 1986]. A mass storage subsystem is viewed at the architectural level as consisting of logical block machines capable of storing and retrieving fixed blocks of data, i. e, the I/O stem supports the transfer of logical blocks between CPUs and devices given a logical block number. From e viewpoint of physical components, a subsystem consists of controllers which connect computers to drives. The software architecture is divided into four levels: the Operating System Client(also called the Class Driver) the Class Server( Controller), the Device Client(Data Controller), and the Device Server(Device). The Disk Class Driver, resident on a host CPU, accepts requests for disk IyO service from applications, packages these e 2000 by CRC Press LLC
© 2000 by CRC Press LLC mechanism that allows a subchannel to change its channel path each time it cycles through a disconnect/reconnect with a given device. Rather than wait for its currently allocated path to become free, it can be assigned to another available path. Turning to supercomputer I/O systems, we will now examine the I/O architecture of the Cray machines. Because the Cray I/O system (IOS) varies from model to model, the following discussion concentrates on the IOS found on the Cray X-MP and Y-MP [Cray, 1988]. In general, the IOS consists of two to four I/O processors (IOPs), each with its own local memory and sharing a common buffer memory with the other IOPs. The IOP is designed to be a simple, fast machine for controlling data transfers between devices and the central memory of the Cray main processors. Since it executes the control statements of an I/O program, it is not unlike the IBM channel processor in terms of its functionality, except that IO programs reside in its local memory rather than in the host’s. An IOP’s local memory is connected through a high-speed communications interface, called a channel in Cray terminology, to a disk control unit (DCU). A given port into the local memory can be time multiplexed among multiple channels. Data is transferred back and forth between devices and the main processors through the IOP’s local memory, which is interfaced to central memory through a 100-Mbyte/s channel pair (one pathway for each direction of transfer). The DCU provides the interface between the IOP and the disk drives and is similar in functionality to IBM’s storage director. It oversees the data transfers between devices and the IOP’s local memory, provides speed matching buffer storage, and transmits control signals and status information between the IOP and the devices. Disk storage units (DSUs) are attached to the DCU through point-to-point connections. The DSU contains the disk device and is responsible for dealing with its own defect management, by using a technique called sector slipping. Figure 80.13 summarizes the elements of the Cray I/O system. Digital Equipment Corporation’s high-end I/O strategy is described in terms of the digital storage architecture (DSA) and is embodied in system configurations such as the VAXCluster shared disk system (see Fig. 80.14). The architecture provides a rigorous definition of how storage subsystems and host computers interact. It achieves this by defining a client/server message-based model for I/O interaction based on device-independent interfaces [Massiglia, 1986; Kronenberg et al., 1986]. A mass storage subsystem is viewed at the architectural level as consisting of logical block machines capable of storing and retrieving fixed blocks of data, i.e., the I/O system supports the transfer of logical blocks between CPUs and devices given a logical block number. From the viewpoint of physical components, a subsystem consists of controllers which connect computers to drives. The software architecture is divided into four levels: the Operating System Client (also called the Class Driver), the Class Server (Controller), the Device Client (Data Controller), and the Device Server (Device). The Disk Class Driver, resident on a host CPU, accepts requests for disk I/O service from applications, packages these FIGURE 80.13 Elements of the Cray I/O system for the Y-MP. An IOS contains up to four IOPs. The MIOP connects to the operator workstation and performs mainly maintenance functions. The XIOP supports block multiplexing and is most appropriate for controlling relatively slow speed devices, such as tapes. The BIOP and DIOP are designed for controlling high-speed devices like disks. Up to four disk storage units (DSUs) can be attached through the disk control unit (DCU) to the IOP. Three DCUs can be connected to each of the BIOP and DIOP, leading to a total of 24 disks per IOS. The Y-MP can be configured with two IOSs, for a system total of 48 devices
CPU A CPU B PU C cI Bus HSC FIGURE 80.14 VAXCluster architecture CPUs are connected to HSCs(hierarchical storage controllers)through a dual CI computer interconnect)bus. Thirty-one hosts and 31 HSCs can be connected to a CI. Up to 32 disks can be connected an HSC-70 requests into messages, and transmits them via a communications interface(such as the Computer Interconnect port driver) to the Disk Class Server resident within a controller in the I/O subsystem. The command set ipported by the Class Server includes such relatively device-independent operations as read logical block, write logical block, bring on-line, and request status. The Disk Class Server interprets the transmitted com mands, handles the scheduling of command execution, tracks their progress, and reports status back to the Class Driver. Note the absence of seek or select head commands. This interface can be used equally well for solid-state disks as for conventional magnetic disks. Device-specific commands are issued at a lower level of the architecture i e between the device Client( disk controller)and Device Server(disk device). The former provides the path for moving commands and data between hosts and drives, and it is usually realized physically by a piece of hardware that corresponds to the device controller. The latter coincides with the physical drives It is interesting to contrast these proprietary approaches with an industry standard approach like SCSI admittedly targeted for the low to mid range of performance. SCSI defines the logical and physical interface etween a host bus adapter(HBa)and a disk controller, usually embedded within the assembly of the disk device. The HBA accepts I/O requests from the host, initiates I/O actions by communicating with the controllers, nd performs direct memory access transfers between its own buffers and the memory of the host. Requesters of service are called initiators, while providers of service are called targets Up to eight nodes can reside on a single SCSI string, sharing a common pathway to the HBA. The embedded controller performs device handling nd error recovery. Physically, the interface is implemented with a single daisy-chained cable, and the 8-bit datapath is used to communicate control and status information, as well as data. SCSI defines a layered communications protocol, including a message layer for protocol and status, and a command/status layer for target operation execution. The HBA roughly corresponds to the function of the IBM channel processor or Cray IOP, while the embedded controller is similar to the IBM storage director/string controller or the Cray DCU. Despite the differences in terminology, the systems we have surveyed exhibut significant commonality of function and similar approaches for partitioning these functions among hardware components. Characterization of I/o Workloads Before characterizing the IO behavior of different workloads, it is necessary to first understand the elements of disk performance. Disk performance is a function of the service time, which consists of three main compo- nents: seek time, rotational latency, and data transfer time. Seek time is the time needed to position the heads Other kinds of class servers are also supported, such as for tape drives. In a heavily utilized system, delays waiting for a device can match actual disk service times, which in reality is composed of device queuing, controller overhead, seek, rotational latency, reconnect misses, error retries, and data transfer. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC requests into messages, and transmits them via a communications interface (such as the Computer Interconnect port driver) to the Disk Class Server resident within a controller in the I/O subsystem. The command set supported by the Class Server includes such relatively device-independent operations as read logical block, write logical block, bring on-line, and request status. The Disk Class Server1 interprets the transmitted commands, handles the scheduling of command execution, tracks their progress, and reports status back to the Class Driver. Note the absence of seek or select head commands. This interface can be used equally well for solid-state disks as for conventional magnetic disks. Device-specific commands are issued at a lower level of the architecture, i.e., between the Device Client (disk controller) and Device Server (disk device). The former provides the path for moving commands and data between hosts and drives, and it is usually realized physically by a piece of hardware that corresponds to the device controller. The latter coincides with the physical drives used for storing and retrieving data. It is interesting to contrast these proprietary approaches with an industry standard approach like SCSI, admittedly targeted for the low to mid range of performance. SCSI defines the logical and physical interface between a host bus adapter (HBA) and a disk controller, usually embedded within the assembly of the disk device. The HBA accepts I/O requests from the host, initiates I/O actions by communicating with the controllers, and performs direct memory access transfers between its own buffers and the memory of the host. Requesters of service are called initiators, while providers of service are called targets. Up to eight nodes can reside on a single SCSI string, sharing a common pathway to the HBA. The embedded controller performs device handling and error recovery. Physically, the interface is implemented with a single daisy-chained cable, and the 8-bit datapath is used to communicate control and status information, as well as data. SCSI defines a layered communications protocol, including a message layer for protocol and status, and a command/status layer for target operation execution. The HBA roughly corresponds to the function of the IBM channel processor or Cray IOP, while the embedded controller is similar to the IBM storage director/string controller or the Cray DCU. Despite the differences in terminology, the systems we have surveyed exhibut significant commonality of function and similar approaches for partitioning these functions among hardware components. Characterization of I/O Workloads Before characterizing the I/O behavior of different workloads, it is necessary to first understand the elements of disk performance. Disk performance is a function of the service time, which consists of three main components: seek time, rotational latency, and data transfer time. 2 Seek time is the time needed to position the heads 1 Other kinds of class servers are also supported, such as for tape drives. 2 In a heavily utilized system, delays waiting for a device can match actual disk service times, which in reality is composed of device queuing, controller overhead, seek, rotational latency, reconnect misses, error retries, and data transfer. FIGURE 80.14 VAXCluster architecture. CPUs are connected to HSCs (hierarchical storage controllers) through a dual CI (computer interconnect) bus. Thirty-one hosts and 31 HSCs can be connected to a CI. Up to 32 disks can be connected to an HSC-70