4Chapter1IntroductiontotheLinuxKernelOne of Linux's most interesting features is that it is not a commercial product, insteadit is a collaborative project developed over the Internet. Although Linus remains the cre-ator of Linux and the maintainer of thekernel,progress continues througha loose-knitgroup of developers.Anyone can contribute to Linux.The Linuxkernel, as with much ofthesystem,isfreeor opensource software.Specifically,theLinuxkernel is licensedundertheGNUGeneralPublicLicense(GPL)version2.0.Consequently,youarefreetodown-load the source code and make any modifications you want.The only caveat is that if youdistribute your changes,you must continue to provide the recipients with the same rightsyou enjoyed, including the availability of the source code.Linux is many things to many people.The basics of a Linux system are the kernel, Clibrary, toolchain, and basic system utilities, such as a login process and shell.A Linux systemcan also include a modern X Window System implementation including a full-featureddesktop environment,suchas GNOME.Thousandsoffreeand commercial applicationsexistforLinux.Inthisbook,whenI sayLinmuxI typicallymeantheLimuxkermel.Whereitisambiguous,I try explicitly to point out whether I am referring to Linux as a full system orjust the kernel proper. Strictly speaking, the term Limux refers only to thekernel.Overview of Operating Systems and KernelsBecause of the ever-growing feature set and ill design of some modern commercial oper-ating systems, the notion of what precisely defines an operating system is not universal.Many users consider whatever they see on the screen to be the operating system.Techni-cally speaking,and in this book, the operating system is considered the parts of the systemresponsible for basic use and administration.This includes the kernel and device driversboot loader,command shell or other user interface,and basic file and system utilities.It isthe stuff you neednot a web browser or music players.The term system,in turn, refers tothe operating system and all the applications running on top of it.Of course, the topic of this book is the kernel. Whereas the user interface is the outer-most portion of the operating system, the kernel is the innermost. It is the core internals;the software that provides basic services for all other parts of the system, manages hard-ware, and distributes system resources.The kernel is sometimes referred to as thesupervisor,core,or internals of theoperating system.Typical components ofakernel areinterrupt handlers to service interrupt requests, a scheduler to share processor timeamong multiple processes,a memory management system to manage process addressspaces,and system services such as networking and interprocess communication.On$/willeavethefreeversusopendebatetoyou.Seehttp://www.fsf.organdhttp://www.opensource.org.*You should read the GNU GPL version 2.0.There is a copy in the file COPYING in yourkernel sourcetree.You can alsofind it online at http://www.fsf.org. Note that the latest versionof the GNU GPL is version3.0;thekernel developershavedecided toremain withversion2.0.www.it-ebooks.info
ptg 4 Chapter 1 Introduction to the Linux Kernel One of Linux’s most interesting features is that it is not a commercial product; instead, it is a collaborative project developed over the Internet.Although Linus remains the creator of Linux and the maintainer of the kernel, progress continues through a loose-knit group of developers.Anyone can contribute to Linux.The Linux kernel, as with much of the system, is free or open source software.3 Specifically, the Linux kernel is licensed under the GNU General Public License (GPL) version 2.0. Consequently, you are free to download the source code and make any modifications you want.The only caveat is that if you distribute your changes, you must continue to provide the recipients with the same rights you enjoyed, including the availability of the source code.4 Linux is many things to many people.The basics of a Linux system are the kernel, C library, toolchain, and basic system utilities, such as a login process and shell.A Linux system can also include a modern X Window System implementation including a full-featured desktop environment, such as GNOME.Thousands of free and commercial applications exist for Linux. In this book, when I say Linux I typically mean the Linux kernel.Where it is ambiguous, I try explicitly to point out whether I am referring to Linux as a full system or just the kernel proper. Strictly speaking, the term Linux refers only to the kernel. Overview of Operating Systems and Kernels Because of the ever-growing feature set and ill design of some modern commercial operating systems, the notion of what precisely defines an operating system is not universal. Many users consider whatever they see on the screen to be the operating system.Technically speaking, and in this book, the operating system is considered the parts of the system responsible for basic use and administration.This includes the kernel and device drivers, boot loader, command shell or other user interface, and basic file and system utilities. It is the stuff you need—not a web browser or music players.The term system, in turn, refers to the operating system and all the applications running on top of it. Of course, the topic of this book is the kernel.Whereas the user interface is the outermost portion of the operating system, the kernel is the innermost. It is the core internals; the software that provides basic services for all other parts of the system, manages hardware, and distributes system resources.The kernel is sometimes referred to as the supervisor, core, or internals of the operating system.Typical components of a kernel are interrupt handlers to service interrupt requests, a scheduler to share processor time among multiple processes, a memory management system to manage process address spaces, and system services such as networking and interprocess communication. On 3 I will leave the free versus open debate to you. See http://www.fsf.org and http://www.opensource. org. 4 You should read the GNU GPL version 2.0. There is a copy in the file COPYING in your kernel source tree. You can also find it online at http://www.fsf.org. Note that the latest version of the GNU GPL is version 3.0; the kernel developers have decided to remain with version 2.0. www.it-ebooks.info
5OverviewofOperatingSystemsandKernelsmodern systems with protected memory management units, the kernel typically resides inan elevated system state compared to normal user applications.This includes a protectedmemory space and full access to the hardware.This system state and memory space is col-lectively referred to as kernel-space.Conversely,user applications execute in user-space.Theysee a subset of the machine's available resources and can perform certain system functionsdirectlyaccess hardware, access memory outside of that allotted themby thekernel, orotherwise misbehave. When executing kernel code, the system is in kernel-space execut-ing in kernel mode.When running a regular process, the system is in user-space executingin usermode.Applications running on the system communicate with the kernel via system calls (seeFigure 1.1).An application typically calls functions in a library—for example, the Clibrarythat in turn rely on the system call interface to instruct the kernel to carry outtasks on the application's behalf. Some library calls provide manyfeatures not found in thesystem call, and thus, calling into thekernel is just one step in an otherwise largefunc-tion.For example, consider the familiar printf()function.It provides formatting andbuffering of the data; only one step in its work is invoking write()to write the data tothe console. Conversely, some library calls have a one-to-one relationship with the kernel.For example, the open()library function does little except call the open()system callStill other C libraryfunctions, such as strcpy(),should (one hopes)make no direct useof thekernel at all.When an application executes a system call, we say that the kernel isexecuting on behalf ofthe application.Furthermore, the application is said tobe executing asystem call in kernel-space, and the kernel is running in process context.This relationship-that applications call into the kernel via the system call interfaceis the fundamental man-ner in which applicationsget work done.The kernel also manages the system's hardware. Nearly all architectures, including allsystems that Linux supports,provide the conceptof interrupts.When hardware wants tocommunicate with the system,it issues an interrupt that literally interrupts the processor,which in turn interrupts the kernel.A number identifies interrupts and the kernel usesthis number to execute a specific interrupt handlerto process and respond to the interrupt.For example, as you type, the keyboard controller issues an interrupt to let the systemknow that there is new data in the keyboard buffer.The kernel notes the interrupt num-ber of the incoming interrupt and executes the correct interrupt handler.The interrupthandler processes the keyboard data and lets thekeyboard controller know it is ready formore data.To provide synchronization, thekernel can disable interrupts-either all inter-rupts or just one specific interrupt number. In many operating systems, including Linux,the interrupt handlers do not run in a process context. Instead, they run in a specialinternupt context that is not associated with any process.This special context exists solely tolet an interrupt handler quickly respond to an interrupt,and then exit.These contexts represent the breadth of the kernel's activities. In fact, in Linux, we cangeneralize that each processor is doing exactly one of three things at any given moment:- In user-space, executing user code in a process In kernel-space, in process context, executing on behalf of a specific processwww.it-ebooks.info
ptg Overview of Operating Systems and Kernels 5 modern systems with protected memory management units, the kernel typically resides in an elevated system state compared to normal user applications.This includes a protected memory space and full access to the hardware.This system state and memory space is collectively referred to as kernel-space. Conversely, user applications execute in user-space.They see a subset of the machine’s available resources and can perform certain system functions, directly access hardware, access memory outside of that allotted them by the kernel, or otherwise misbehave.When executing kernel code, the system is in kernel-space executing in kernel mode.When running a regular process, the system is in user-space executing in user mode. Applications running on the system communicate with the kernel via system calls (see Figure 1.1).An application typically calls functions in a library—for example, the C library—that in turn rely on the system call interface to instruct the kernel to carry out tasks on the application’s behalf. Some library calls provide many features not found in the system call, and thus, calling into the kernel is just one step in an otherwise large function. For example, consider the familiar printf() function. It provides formatting and buffering of the data; only one step in its work is invoking write() to write the data to the console. Conversely, some library calls have a one-to-one relationship with the kernel. For example, the open() library function does little except call the open() system call. Still other C library functions, such as strcpy(), should (one hopes) make no direct use of the kernel at all.When an application executes a system call, we say that the kernel is executing on behalf of the application. Furthermore, the application is said to be executing a system call in kernel-space, and the kernel is running in process context.This relationship— that applications call into the kernel via the system call interface—is the fundamental manner in which applications get work done. The kernel also manages the system’s hardware. Nearly all architectures, including all systems that Linux supports, provide the concept of interrupts.When hardware wants to communicate with the system, it issues an interrupt that literally interrupts the processor, which in turn interrupts the kernel.A number identifies interrupts and the kernel uses this number to execute a specific interrupt handler to process and respond to the interrupt. For example, as you type, the keyboard controller issues an interrupt to let the system know that there is new data in the keyboard buffer.The kernel notes the interrupt number of the incoming interrupt and executes the correct interrupt handler.The interrupt handler processes the keyboard data and lets the keyboard controller know it is ready for more data.To provide synchronization, the kernel can disable interrupts—either all interrupts or just one specific interrupt number. In many operating systems, including Linux, the interrupt handlers do not run in a process context. Instead, they run in a special interrupt context that is not associated with any process.This special context exists solely to let an interrupt handler quickly respond to an interrupt, and then exit. These contexts represent the breadth of the kernel’s activities. In fact, in Linux, we can generalize that each processor is doing exactly one of three things at any given moment: n In user-space, executing user code in a process n In kernel-space, in process context, executing on behalf of a specific process www.it-ebooks.info
6Chapter1IntroductiontotheLinuxKernelApplication 1Application2Application3user-spaceSystemCall Interfacekernel-spaceKernelSubsystemsDeviceDrivers-hardwareFigure 1.1Relationship between applications, the kernel,and hardware.. In kernel-space, in interrupt context, not associated with a process, handling aninterruptThis list is inclusive. Even corner cases fit into one of these three activities: For exam-ple, when idle,it turns out that thekernel is executing an idle process in process context inthe kernel.LinuxVersusClassicUnixKernelsOwing to their common ancestry and same API, modern Unixkernels share variousdesign traits. (See the Bibliography for my favorite books on the design of the classicUnix kernels.) With few exceptions,a Unix kernel is typically a monolithic static binary.That is, it exists as a single, large, executable image that runs in a single address space.Unix systems typically require a system with a paged memory-management unit(MMU);thishardware enables the system to enforce memoryprotection andto provide aunique virtual address space to each process. Linux historically has required an MMU, butwww.it-ebooks.info
ptg 6 Chapter 1 Introduction to the Linux Kernel Application 1 Application 2 System Call Interface Application 3 Kernel Subsystems Device Drivers user-space hardware kernel-space Figure 1.1 Relationship between applications, the kernel, and hardware. n In kernel-space, in interrupt context, not associated with a process, handling an interrupt This list is inclusive. Even corner cases fit into one of these three activities: For example, when idle, it turns out that the kernel is executing an idle process in process context in the kernel. Linux Versus Classic Unix Kernels Owing to their common ancestry and same API, modern Unix kernels share various design traits. (See the Bibliography for my favorite books on the design of the classic Unix kernels.) With few exceptions, a Unix kernel is typically a monolithic static binary. That is, it exists as a single, large, executable image that runs in a single address space. Unix systems typically require a system with a paged memory-management unit (MMU); this hardware enables the system to enforce memory protection and to provide a unique virtual address space to each process. Linux historically has required an MMU, but www.it-ebooks.info
7LinuxVersusClassicUnixKernelsspecial versions can actually run without one.This is a neat feature, enabling Linux to runon very small MMU-less embedded systems, but otherwise more academic than practi-cal-even simple embedded systems nowadays tend to have advanced features such asmemory-management units.In this book,we focus on MMU-based systems.MonolithicKernelVersusMicrokernelDesignsWecandividekernels intotwo mainschools ofdesign:themonolithickernel andthemicrokernel. (A third camp, exokernel, is found primarily in research systems.)Monolithic kernels are the simpler design ofthe two,and all kernels were designed in thismanneruntil the 198Os.Monolithickernels are implemented entirelyas a singleprocessrunning in a single address space. Consequently, such kernels typically exist on disk as single static binaries. All kernel services exist and execute in the large kernel address space.Communication withinthekernel is trivial because everythingrunsinkernel mode in thesameaddressspace:Thekernelcaninvokefunctionsdirectly,asauser-spaceapplicationmight. Proponents of this model cite the simplicity and performance of the monolithicapproach. Most Unix systems are monolithic in design.Microkernels, on the other hand, are not implemented as a single large process. Instead,the functionality ofthe kernel is broken down into separateprocesses, usually calledservers. Ideally,only the servers absolutely requiring such capabilities run in a privileged executionmode.The rest of the servers run in user-space.All the servers,though, are sepa-rated intodifferentaddressspaces.Therefore,directfunction invocationas inmonolithickernels is notpossible.Instead,microkernels communicate via messagepassing:An interprocess communication (IPC)mechanism is built into the system,and thevarious serverscommunicate with and invoke"servicesfrom each other by sending messages over the IPCmechanism.The separation of the various servers prevents a failure in one serverfrombringingdownanother.Likewise,themodularityofthesystemenablesoneservertobeswapped out for another.BecausetheIPCmechanisminvolvesquiteabitmoreoverheadthanatrivialfunctioncall,however,and becausea context switchfromkernel-spaceto user-space orvice versa isoften involved,messagepassing includesalatencyandthroughput hit not seen on monolithickernelswithsimplefunctioninvocation.Consequently,allpracticalmicrokernel-basedsystemsnowplacemostoralltheserversinkernel-space,toremovetheoverheadoffrequentcontextswitchesandpotentially enabledirectfunction invocation.TheWindows NTkernel (on which Windows Xp Vista,and 7 are based)and Mach (on which part of Mac OS Xisbased)areexamplesofmicrokernels.NeitherWindowsNTnorMacOSXrunanymicrokernelserversinuser-spaceintheirlatest iteration,defeatingtheprimarypurposeofmicro-kerneldesignaltogether.Linux is a monolithic kernel; that is, the Linux kemel executes in a single address spaceentirelyinkernelmode.Linux,however,borrowsmuchofthegoodfrommicrokernels:Linuxboastsamodulardesign,thecapabilitytopreemptitself(calledkernelpreemption),supportfor kernel threads, and the capability to dynamically load separate binaries (kernel modules)into the kernel image. Conversely, Linux has none of the performance-sapping features thatcurse microkernel design:Everything runs inkernel mode,with directfunction invocation-notmessagepassingthemodusofcommunication.Nonetheless,Linuxismodular,threaded,andthekernelitself is schedulable.Pragmatismwinsagain.www.it-ebooks.info
ptg Linux Versus Classic Unix Kernels 7 special versions can actually run without one.This is a neat feature, enabling Linux to run on very small MMU-less embedded systems, but otherwise more academic than practical—even simple embedded systems nowadays tend to have advanced features such as memory-management units. In this book, we focus on MMU-based systems. Monolithic Kernel Versus Microkernel Designs We can divide kernels into two main schools of design: the monolithic kernel and the microkernel. (A third camp, exokernel, is found primarily in research systems.) Monolithic kernels are the simpler design of the two, and all kernels were designed in this manner until the 1980s. Monolithic kernels are implemented entirely as a single process running in a single address space. Consequently, such kernels typically exist on disk as single static binaries. All kernel services exist and execute in the large kernel address space. Communication within the kernel is trivial because everything runs in kernel mode in the same address space: The kernel can invoke functions directly, as a user-space application might. Proponents of this model cite the simplicity and performance of the monolithic approach. Most Unix systems are monolithic in design. Microkernels, on the other hand, are not implemented as a single large process. Instead, the functionality of the kernel is broken down into separate processes, usually called servers. Ideally, only the servers absolutely requiring such capabilities run in a privileged execution mode. The rest of the servers run in user-space. All the servers, though, are separated into different address spaces. Therefore, direct function invocation as in monolithic kernels is not possible. Instead, microkernels communicate via message passing: An interprocess communication (IPC) mechanism is built into the system, and the various servers communicate with and invoke “services” from each other by sending messages over the IPC mechanism. The separation of the various servers prevents a failure in one server from bringing down another. Likewise, the modularity of the system enables one server to be swapped out for another. Because the IPC mechanism involves quite a bit more overhead than a trivial function call, however, and because a context switch from kernel-space to user-space or vice versa is often involved, message passing includes a latency and throughput hit not seen on monolithic kernels with simple function invocation. Consequently, all practical microkernel-based systems now place most or all the servers in kernel-space, to remove the overhead of frequent context switches and potentially enable direct function invocation. The Windows NT kernel (on which Windows XP, Vista, and 7 are based) and Mach (on which part of Mac OS X is based) are examples of microkernels. Neither Windows NT nor Mac OS X run any microkernel servers in user-space in their latest iteration, defeating the primary purpose of microkernel design altogether. Linux is a monolithic kernel; that is, the Linux kernel executes in a single address space entirely in kernel mode. Linux, however, borrows much of the good from microkernels: Linux boasts a modular design, the capability to preempt itself (called kernel preemption), support for kernel threads, and the capability to dynamically load separate binaries (kernel modules) into the kernel image. Conversely, Linux has none of the performance-sapping features that curse microkernel design: Everything runs in kernel mode, with direct function invocation— not message passing—the modus of communication. Nonetheless, Linux is modular, threaded, and the kernel itself is schedulable. Pragmatism wins again. www.it-ebooks.info
8Chapter1 IntroductiontotheLinuxKernelAs Linus and other kernel developers contribute to the Linux kernel, they decide howbest to advance Linux without neglecting its Unix roots (and, more important, the UnixAPI).Consequently, because Linux is not based on any specific Unix variant, Linus andcompany can pick and choose the best solution to any given problemor attimes, inventnew solutions! A handful of notable differences exist between the Linux kernel and classicUnix systems:- Linux supports the dynamic loading of kernel modules.Although the Linux kernelis monolithic,it can dynamically load and unload kernel code on demand.-Linux has symmetrical multiprocessor (SMP)support.Although most commercialvariants of Unix now support SMP, most traditional Unix implementations did not.. The Linux kernel is preemptive. Unlike traditional Unix variants, the Linux kernelcan preempt a task even as it executes in thekernel. Of the other commercial Uniximplementations, Solaris and IRIX have preemptivekernels, but most Unixkernelsarenotpreemptive.-Linux takes an interesting approach to thread support: It does not differentiatebetween threads and normal processes.To thekernel, all processes are the samesome just happen to share resources.- Linux provides an object-oriented device model with device classes, hot-pluggableevents, and a user-space device filesystem (sysfs).-Linux ignores some common Unix features that the kernel developers considerpoorly designed, such as STREAMS, or standards that are impossible to cleanlyimplement.-Linux is free in every sense of the word.The feature set Linux implements is theresult of the freedom of Linux's open developmentmodel. If a feature is withoutmerit or poorly thought out, Linux developers are under no obligation to imple-ment it.To the contrary,Linux has adopted an elitist attitude toward changes: Mod-ifications must solve a specific real-world problem, derive from a clean design,andhave a solid implementation.Consequently,features of some other modern Unixvariants that are more marketing bullet or one-off requests, such as pageable kernelmemory,have received no consideration.Despite these differences,however, Linux remains an operating system with a strongUnix heritage.LinuxKernelVersionsLinux kernels come in two flavors: stable and development.Stable kernels are production-level releases suitable for widespread deployment.New stable kernel versions are releasedtypically only to provide bug fixes or new drivers. Development kernels, on the otherhand, undergo rapid change where (almost) anything goes.As developers experimentwith new solutions, the kernel code base changes in often drastic ways.www.it-ebooks.info
ptg 8 Chapter 1 Introduction to the Linux Kernel As Linus and other kernel developers contribute to the Linux kernel, they decide how best to advance Linux without neglecting its Unix roots (and, more important, the Unix API). Consequently, because Linux is not based on any specific Unix variant, Linus and company can pick and choose the best solution to any given problem—or at times, invent new solutions! A handful of notable differences exist between the Linux kernel and classic Unix systems: n Linux supports the dynamic loading of kernel modules.Although the Linux kernel is monolithic, it can dynamically load and unload kernel code on demand. n Linux has symmetrical multiprocessor (SMP) support.Although most commercial variants of Unix now support SMP, most traditional Unix implementations did not. n The Linux kernel is preemptive. Unlike traditional Unix variants, the Linux kernel can preempt a task even as it executes in the kernel. Of the other commercial Unix implementations, Solaris and IRIX have preemptive kernels, but most Unix kernels are not preemptive. n Linux takes an interesting approach to thread support: It does not differentiate between threads and normal processes.To the kernel, all processes are the same— some just happen to share resources. n Linux provides an object-oriented device model with device classes, hot-pluggable events, and a user-space device filesystem (sysfs). n Linux ignores some common Unix features that the kernel developers consider poorly designed, such as STREAMS, or standards that are impossible to cleanly implement. n Linux is free in every sense of the word.The feature set Linux implements is the result of the freedom of Linux’s open development model. If a feature is without merit or poorly thought out, Linux developers are under no obligation to implement it.To the contrary, Linux has adopted an elitist attitude toward changes: Modifications must solve a specific real-world problem, derive from a clean design, and have a solid implementation. Consequently, features of some other modern Unix variants that are more marketing bullet or one-off requests, such as pageable kernel memory, have received no consideration. Despite these differences, however, Linux remains an operating system with a strong Unix heritage. Linux Kernel Versions Linux kernels come in two flavors: stable and development. Stable kernels are productionlevel releases suitable for widespread deployment. New stable kernel versions are released typically only to provide bug fixes or new drivers. Development kernels, on the other hand, undergo rapid change where (almost) anything goes.As developers experiment with new solutions, the kernel code base changes in often drastic ways. www.it-ebooks.info