Boykin, J.Operating Syster The Electrical Engineering Handbook Ed. Richard C. Dorf Boca raton crc Press llc. 2000
Boykin, J. “Operating Systems” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
96 Operating systems 96.1 96.2 Types of Operating Systems 96.3 Distributed Computing Systems 6.4 Fault-Tolerant Systems 96.6 Real-Time Systems 96.7 Operating System Structure Joseph Boykin 96.8 Industry Standards Clarion Advanced Storage 96.1 Introduction An operating system is just another program running on a computer. It is unlike any other program, however. An operating systems primary function is the management of all hardware and software resources. It manages processors, memory, I/O devices, and networks. It enforces policies such as protection of one program from another and fairness to ensure that users have equal access to system resources. It is privileged in that it is the only program that can perform specialized hardware operations. The operating system is the primary program upon which all other programs rely To understand modern operating systems we must begin with some history [Boykin and LoVerso, The modern digital computer is only about 40 years old. The first machines were giant monoliths housed in special rooms, and access to them was carefully controlled. To program one of these systems the user scheduled access time well in advance, for in those days the user had sole access to the machine. The program such a user ran was the only program running on the machine It did not take long to recognize the need for better control over computer resources. This began in the mid 1950s with the dawn of batch processing and early operating systems that did little more than load programs and manage I/O devices. In the 1960s we saw more general-purpose systems. New operating systems that provided time-sharing and real-time computing were developed. This was the time when the foundation for all modern operating systems was laid Today's operating systems are sophisticated pieces of software. They may contain millions of lines of code and provide such services as distributed file access, security, fault tolerance, and real-time scheduling. In this hapter we examine many of these features of modern operating systems and their use to the practicing engineer 96.2 Types of Operating Systems Different stems(oS)provide a wide range of functionality. Some are designed as single-use and some for multiple users. The operating system, with appropriate hardware support, can pi executing program from malicious or inadvertent attempts of another to modify or examine its men rotect one connected to a storage device such as a disk drive, the OS implements a file system to permit storage of files. c 2000 by CRC Press LLC
© 2000 by CRC Press LLC 96 Operating Systems 96.1 Introduction 96.2 Types of Operating Systems 96.3 Distributed Computing Systems 96.4 Fault-Tolerant Systems 96.5 Parallel Processing 96.6 Real-Time Systems 96.7 Operating System Structure 96.8 Industry Standards 96.9 Conclusions 96.1 Introduction An operating system is just another program running on a computer. It is unlike any other program, however. An operating system’s primary function is the management of all hardware and software resources. It manages processors, memory, I/O devices, and networks. It enforces policies such as protection of one program from another and fairness to ensure that users have equal access to system resources. It is privileged in that it is the only program that can perform specialized hardware operations. The operating system is the primary program upon which all other programs rely. To understand modern operating systems we must begin with some history [Boykin and LoVerso, 1990]. The modern digital computer is only about 40 years old. The first machines were giant monoliths housed in special rooms, and access to them was carefully controlled. To program one of these systems the user scheduled access time well in advance, for in those days the user had sole access to the machine. The program such a user ran was the only program running on the machine. It did not take long to recognize the need for better control over computer resources. This began in the mid- 1950s with the dawn of batch processing and early operating systems that did little more than load programs and manage I/O devices. In the 1960s we saw more general-purpose systems. New operating systems that provided time-sharing and real-time computing were developed. This was the time when the foundation for all modern operating systems was laid. Today’s operating systems are sophisticated pieces of software. They may contain millions of lines of code and provide such services as distributed file access, security, fault tolerance, and real-time scheduling. In this chapter we examine many of these features of modern operating systems and their use to the practicing engineer. 96.2 Types of Operating Systems Different operating systems (OS) provide a wide range of functionality. Some are designed as single-user systems and some for multiple users. The operating system, with appropriate hardware support, can protect one executing program from malicious or inadvertent attempts of another to modify or examine its memory. When connected to a storage device such as a disk drive, the OS implements a file system to permit storage of files. Joseph Boykin Clarion Advanced Storage
The file system often includes security features to protect against file access by unauthorized users. The syster may be connected to other computers via a network and thus provide access to remote system resources Operating systems are often categorized by the major functionality they provide. This functionality includes distributed computing, fault tolerance, parallel processing, real-time, and security. While no operating syster incorporates all of these capabilities, many have characteristics from each category. An operating system does not need to contain every modern feature to be useful. For example, MS-DOS is a single-user system with few of the features now common in other systems. Indeed, this system is little more than a program loader reminiscent of operating systems from the early 1960s. Unlike those vintage systems, there are numerous applications that run under MS-DOS. It is the abundance of programs that solve problems from word processing to spreadsheets to graphics that has made MS-DOS popular. The simplicity of these stems is exactly what makes them popular for the average person Systems capable of supporting multiple users are termed time-sharing systems; the system is shared among all users, with each user having the view that he or she has all system resources available. Multiuser operating systems provide protection for both the file system and the contents of main memory. The operating system must also mediate access to peripheral devices. For example, only one user may have access to a tape drive at Fault-tolerant systems rely on both hardware and software to ensure that the failure of any single hardware component, or even multiple components, does not cause the system to cease operation. To build such a system quires that each critical hardware component be replicated at least once. The operating system must be able to dynamically determine which resources are available and, if a resource fails, move a running program to an operational unit. Security has become more important during recent years. Theft of data and unauthorized access to data are prevented in secure systems. Within the United States, levels of security are defined by a government-produced document known as the Orange Book. This document defines seven levels of security, denoted from lowest to highest as D, C1, C2, B1, B2, B3, and Al. Many operating systems provide no security and are labeled D. Most time-sharing systems are secure enough that they could be classified at the Cl level. The C2 and Bl levels are similar, and this is where most secure operating systems are currently classified. During the 1990s B2 and B3 systems will become readily available from vendors. The Al level is extremely difficult to achieve, although several such systems are being worked on In the next several sections we expand upon the topics of distributed computing, fault-tolerant systems, parallel processing, and real-time systems. 96.3 Distributed Computing Systems The ability to connect multiple computers through a communications network has existed for many yea Initially, computer-to-computer communication consisted of a small number of systems performing bulk file transfers. The 1980s brought the invention of high-speed local area networks, or LANs. A LAn allows hundreds of machines to be connected together. New capabilities began to emerge, such as virtual terminals that allowed a user to log on to a computer without being physically connected to that system. Networks were used to provide remote access to printers, disks, and other peripherals. The drawback to these systems was the softwar it was not sophisticated enough to provide a totally integrated environment. Only small, well-defined interac tions among machines were permitted. Distributed systems provide the view that all resources from every computer on the network are available to the user. What's more, access to resources on a remote computer is viewed in the same way as access to resources on the local computer. For example, a file system that implements a directory hierarchy, such as UNIX, may have some directories on a local disk while one or more directories are on a remote system. Figure 96 1 illustrates how much of the directory hierarchy would be on the local system, while user directories (shaded directories)could be on a remote system IMS-DOS is a trademark of microsoft. Inc. 2UNIX is a trademark of UNIX Software Laboratories(USL) e 2000 by CRC Press LLC
© 2000 by CRC Press LLC The file system often includes security features to protect against file access by unauthorized users. The system may be connected to other computers via a network and thus provide access to remote system resources. Operating systems are often categorized by the major functionality they provide. This functionality includes distributed computing, fault tolerance, parallel processing, real-time, and security. While no operating system incorporates all of these capabilities, many have characteristics from each category. An operating system does not need to contain every modern feature to be useful. For example, MS-DOS1 is a single-user system with few of the features now common in other systems. Indeed, this system is little more than a program loader reminiscent of operating systems from the early 1960s. Unlike those vintage systems, there are numerous applications that run under MS-DOS. It is the abundance of programs that solve problems from word processing to spreadsheets to graphics that has made MS-DOS popular. The simplicity of these systems is exactly what makes them popular for the average person. Systems capable of supporting multiple users are termed time-sharing systems; the system is shared among all users, with each user having the view that he or she has all system resources available. Multiuser operating systems provide protection for both the file system and the contents of main memory. The operating system must also mediate access to peripheral devices. For example, only one user may have access to a tape drive at a time. Fault-tolerant systems rely on both hardware and software to ensure that the failure of any single hardware component, or even multiple components, does not cause the system to cease operation. To build such a system requires that each critical hardware component be replicated at least once. The operating system must be able to dynamically determine which resources are available and, if a resource fails, move a running program to an operational unit. Security has become more important during recent years. Theft of data and unauthorized access to data are prevented in secure systems. Within the United States, levels of security are defined by a government-produced document known as the Orange Book. This document defines seven levels of security, denoted from lowest to highest as D, C1, C2, B1, B2, B3, and A1. Many operating systems provide no security and are labeled D. Most time-sharing systems are secure enough that they could be classified at the C1 level. The C2 and B1 levels are similar, and this is where most secure operating systems are currently classified. During the 1990s B2 and B3 systems will become readily available from vendors. The A1 level is extremely difficult to achieve, although several such systems are being worked on. In the next several sections we expand upon the topics of distributed computing, fault-tolerant systems, parallel processing, and real-time systems. 96.3 Distributed Computing Systems The ability to connect multiple computers through a communications network has existed for many years. Initially, computer-to-computer communication consisted of a small number of systems performing bulk file transfers. The 1980s brought the invention of high-speed local area networks, or LANs. A LAN allows hundreds of machines to be connected together. New capabilities began to emerge, such as virtual terminals that allowed a user to log on to a computer without being physically connected to that system. Networks were used to provide remote access to printers, disks, and other peripherals. The drawback to these systems was the software; it was not sophisticated enough to provide a totally integrated environment. Only small, well-defined interactions among machines were permitted. Distributed systems provide the view that all resources from every computer on the network are available to the user. What’s more, access to resources on a remote computer is viewed in the same way as access to resources on the local computer. For example, a file system that implements a directory hierarchy, such as UNIX,2 may have some directories on a local disk while one or more directories are on a remote system. Figure 96.1 illustrates how much of the directory hierarchy would be on the local system, while user directories (shaded directories) could be on a remote system. 1 MS-DOS is a trademark of Microsoft, Inc. 2 UNIX is a trademark of UNIX Software Laboratories (USL)
FIGURE 96.1 UNIX file system hierarchy in a distributed environment. There are many advantages of distributed systems. Advantages over centralized systems include [ Tanenbaum, Economics: Microprocessors offer a better price/performance than mainframes. Speed: A distributed system may have more total computing power than a mainframe Reliability: If one machine crashe Incremental growth: Computing power can be added in small increments Advantages over nonnetworked personal computers include [ Tanenbaum, 1992] Data sharing: Allow many users access to a common databas Device sharing: Allow many users to share expensive peripherals like color printers Communication: Make human-to-human communication easier, for example, by electronic mail Flexibility: Spread the workload over the available machines in the most cost effective way. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC There are many advantages of distributed systems. Advantages over centralized systems include [Tanenbaum, 1992]: • Economics: Microprocessors offer a better price/performance than mainframes. • Speed: A distributed system may have more total computing power than a mainframe. • Reliability: If one machine crashes, the system as a whole can still survive. • Incremental growth: Computing power can be added in small increments. Advantages over nonnetworked personal computers include [Tanenbaum, 1992]: • Data sharing: Allow many users access to a common database. • Device sharing: Allow many users to share expensive peripherals like color printers. • Communication: Make human-to-human communication easier, for example, by electronic mail. • Flexibility: Spread the workload over the available machines in the most cost effective way. FIGURE 96.1 UNIX file system hierarchy in a distributed environment
While there are many advantages to distributed systems, there are also several disadvantages. The primary difficulty is that software for implementing distributed systems is large and complex. Small personal computers could not effectively run modern distributed applications. Software development tools for this environment are not well advanced. Thus, application developers are having a difficult time working in this environment. An additional problem is network speed. Most office networks are currently based on IEEE standard 802.3 IEEE, 1985], commonly(although erroneously) called Ethernet, which operates at 10 Mb/s(ten million bits per second). With this limited bandwidth, it is easy to saturate the network. While higher-speed networks such as FDDI and ATM? networks do exist, they are not yet in common use. While distributed computing has many advantages, we must also understand that without appropriate safeguards, our data may not be secure. Security is a difficult problem in a distributed environment. whom do you trust when there are potentially thousands of users with access to your local system? A network is subject to security attack by a number of mechanisms. It is possible to monitor all packets going across the network; hence, unencrypted data are easily obtained by an unauthorized user. a malicious user may cause a denial-of-service attack by flooding the network with packets, making all systems inaccessible to legitimate users Finally, we must deal with the problem of scale. To connect a few dozen or even a few hundred computers may not cause a problem with current software. However, global networks of computers are now being Scaling our current software to work with tens of thousands of computers running across large hic boundaries with many different types of networks is a challenge that has not yet been met 96.4 Fault-Tolerant Systems Most computers simply stop running when they break. We take this as a given. There are many environments, however, where it is not acceptable for the computer to stop working. The space shuttle is a good example There are other environments where you would simply prefer if the system continued to operate. A business sing a computer for order entry can continue to operate if the computer breaks, but the cost and inconvenience may be high Fault-tolerant systems are composed of specially designed hardware and software that are capable f continuous operation To build a fault-tolerant system requires both hardware and software modifications. Let's take a look at an example of a small problem that illustrates the type of changes that must be made. Remember, the goal of such a system is to achieve continuous operation. That means we can never purposely shut the computer off. How then do we repair the system if we cannot shut it off? First, the hardware must be capable of having circuit boards plugged and unplugged while the system is running; this is not possible on most computers. Second, removing a board must be detected by the hardware and reported to the operating system. The operating system,the manager of resources, must then discontinue use of that resource. Each component of the computer system, both hardware and software, must be specially built to handle failures. It should also be obvious that a fault-tolerant system must have redundant hardware. If, for example, a disk controller should fail, there must be another controller communicating with the disks that can take over One problem with implementing a fault-tolerant system is knowing when something has failed. If a circuit board totally ceases operation, we can determine the failure by its lack of response to commands. Another failure mode exists where the failing component appears to work but is operating incorrectly. A common approach to detect this problem is a voting mechanism. By implementing three hardware replicas the system detect when any one has failed by its producing output inconsistent with the other two. In that case, the output of the two components in agreement is used The operating system must be capable of restarting a program from a known point when a component or which the program was running has failed. The system can use checkpoints for this purpose. When an application program reaches a known state, such as when it completes a transaction, it stores the current state of the Fiber distributed data interface. The FDDI standard specifies an optical fiber ring with a data rate of 100 Mb/s. aSynchronous transfer mode. a packet-oriented transfer mode moving data in fixed-size packets called cells. There is no fixed speed for ATM. Typical speed is currently 155 Mb/s, although there are implementations running at 2 Gb/s. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC While there are many advantages to distributed systems, there are also several disadvantages. The primary difficulty is that software for implementing distributed systems is large and complex. Small personal computers could not effectively run modern distributed applications. Software development tools for this environment are not well advanced. Thus, application developers are having a difficult time working in this environment. An additional problem is network speed. Most office networks are currently based on IEEE standard 802.3 [IEEE, 1985], commonly (although erroneously) called Ethernet, which operates at 10 Mb/s (ten million bits per second). With this limited bandwidth, it is easy to saturate the network. While higher-speed networks such as FDDI1 and ATM2 networks do exist, they are not yet in common use. While distributed computing has many advantages, we must also understand that without appropriate safeguards, our data may not be secure. Security is a difficult problem in a distributed environment. Whom do you trust when there are potentially thousands of users with access to your local system? A network is subject to security attack by a number of mechanisms. It is possible to monitor all packets going across the network; hence, unencrypted data are easily obtained by an unauthorized user. A malicious user may cause a denial-of-service attack by flooding the network with packets, making all systems inaccessible to legitimate users. Finally, we must deal with the problem of scale. To connect a few dozen or even a few hundred computers together may not cause a problem with current software. However, global networks of computers are now being installed. Scaling our current software to work with tens of thousands of computers running across large geographic boundaries with many different types of networks is a challenge that has not yet been met. 96.4 Fault-Tolerant Systems Most computers simply stop running when they break. We take this as a given. There are many environments, however, where it is not acceptable for the computer to stop working. The space shuttle is a good example. There are other environments where you would simply prefer if the system continued to operate. A business using a computer for order entry can continue to operate if the computer breaks, but the cost and inconvenience may be high. Fault-tolerant systems are composed of specially designed hardware and software that are capable of continuous operation. To build a fault-tolerant system requires both hardware and software modifications. Let’s take a look at an example of a small problem that illustrates the type of changes that must be made. Remember, the goal of such a system is to achieve continuous operation. That means we can never purposely shut the computer off. How then do we repair the system if we cannot shut it off? First, the hardware must be capable of having circuit boards plugged and unplugged while the system is running; this is not possible on most computers. Second, removing a board must be detected by the hardware and reported to the operating system. The operating system, the manager of resources, must then discontinue use of that resource. Each component of the computer system, both hardware and software, must be specially built to handle failures. It should also be obvious that a fault-tolerant system must have redundant hardware. If, for example, a disk controller should fail, there must be another controller communicating with the disks that can take over. One problem with implementing a fault-tolerant system is knowing when something has failed. If a circuit board totally ceases operation, we can determine the failure by its lack of response to commands. Another failure mode exists where the failing component appears to work but is operating incorrectly. A common approach to detect this problem is a voting mechanism. By implementing three hardware replicas the system can detect when any one has failed by its producing output inconsistent with the other two. In that case, the output of the two components in agreement is used. The operating system must be capable of restarting a program from a known point when a component on which the program was running has failed. The system can use checkpoints for this purpose. When an application program reaches a known state, such as when it completes a transaction, it stores the current state of the 1 Fiber distributed data interface. The FDDI standard specifies an optical fiber ring with a data rate of 100 Mb/s. 2 Asynchronous transfer mode. A packet-oriented transfer mode moving data in fixed-size packets called cells. There is no fixed speed for ATM. Typical speed is currently 155 Mb/s, although there are implementations running at 2 Gb/s