Safari offers a solution that's better than e-books.It's a virtual library that lets you easily searchthousandsoftoptechnologybooks,cutandpastecodesamples,downloadchapters,andfindquickanswers when you need the most accurate, current information.Try it for free athttp://safari.oreilly.com.AcknowledgmentsThisbookwouldnothavebeenwrittenwithouttheprecioushelpofthemanystudentsof theUniversityof Rome school of engineering"Tor Vergata"whotook our course andtried todecipherlecturenotes about the Linuxkernel.Their strenuous efforts to grasp the meaning of the sourcecodeledustoimproveourpresentationandcorrectmanymistakes.AndyOram,ourwonderful editor at O'ReillyMedia,deserves alotof credit.Hewas thefirst atO'Reilly to believe in this project,and he spenta lot of time and energy deciphering our preliminarydrafts.He also suggested many ways to make the book more readable, and he wrote severalexcellentintroductoryparagraphs.Wehadsomeprestigious reviewerswhoread ourtextquitecarefully.Thefirsteditionwascheckedby (in alphabetical orderby first name)Alan Cox, Michael Kerrisk, Paul Kinzelman, Raph Levien,and Rik van Riel.ThesecondeditionwascheckedbyErezZadok,JerryCooperstein,JohnGoerzen,MichaelKerriskPaulKinzelman,RikvanRiel,andWaltSmith.This editionhasbeen reviewed byCharlesP.Wright,Clemens Buchacher,ErezZadok,RaphaelFinkel,RikvanRiel,andRobertP.J.Day.Theircomments,togetherwiththoseofmanyreadersfromall overtheworld, helped us to removeseveral errors and inaccuracies andhavemadethisbook stronger.Marco CesatiJuly2005Daniel P.Bovet16
16 Safari offers a solution that's better than e-books. It's a virtual library that lets you easily search thousands of top technology books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safari.oreilly.com. Acknowledgments This book would not have been written without the precious help of the many students of the University of Rome school of engineering "Tor Vergata" who took our course and tried to decipher lecture notes about the Linux kernel. Their strenuous efforts to grasp the meaning of the source code led us to improve our presentation and correct many mistakes. Andy Oram, our wonderful editor at O'Reilly Media, deserves a lot of credit. He was the first at O'Reilly to believe in this project, and he spent a lot of time and energy deciphering our preliminary drafts. He also suggested many ways to make the book more readable, and he wrote several excellent introductory paragraphs. We had some prestigious reviewers who read our text quite carefully. The first edition was checked by (in alphabetical order by first name) Alan Cox, Michael Kerrisk, Paul Kinzelman, Raph Levien, and Rik van Riel. The second edition was checked by Erez Zadok, Jerry Cooperstein, John Goerzen, Michael Kerrisk, Paul Kinzelman, Rik van Riel, and Walt Smith. This edition has been reviewed by Charles P. Wright, Clemens Buchacher, Erez Zadok, Raphael Finkel, Rik van Riel, and Robert P. J. Day. Their comments, together with those of many readers from all over the world, helped us to remove several errors and inaccuracies and have made this book stronger. Marco Cesati July 2005 Daniel P. Bovet
Chapter 1. IntroductionLinux is a member of the large family of Unix-like operating systems:A relative newcomerexperiencingsuddenspectacularpopularitystartinginthelate1990s,Linuxjoinssuchwell-knowncommercialUnixoperatingsystemsasSystemVRelease4(SVR4),developedbyAT&T(nowownedbytheSCO Group);the4.4BSDreleasefromtheUniversityof CaliforniaatBerkeley(4.4BSD);DigitalUNIXfromDigitalEquipmentCorporation(nowHewlett-Packard);AIXfromIBM;HP-UxfromHewlett-Packard;SolarisfromSunMicrosystems;andMacOSXfromAppleComputer,Inc.BesideLinux,afewotheropensourceUnix-likekernelsexist,suchasFreeBSD,NetBSD,andOpenBSD. LINUX is a registered trademark of Linus Torvalds.Linux was initially developed by Linus Torvalds in 1991 as an operating system forIBM-compatiblepersonal computersbasedontheIntel80386microprocessor.LinusremainsdeeplyinvolvedwithimprovingLinux,keeping itup-to-datewithvarioushardwaredevelopmentsand coordinatingtheactivityofhundredsofLinuxdevelopersaroundtheworld.Overtheyears,developershaveworkedto make Linux available on other architectures, including Hewlett-Packard's Alpha, Intel's Itanium,AMD'sAMD64,PowerPC,andIBM'szSeries.One of the more appealing benefits to Linux is that it isn't a commercial operating system: itssource code under the GNU General Public License (GPL)it is open and available to anyone tostudy(aswewill inthisbook);ifyoudownloadthecode(theofficialsiteishttp://www.kernel.org)or check the sources on a Linux CD,youwill beabletoexplore,fromtopto bottom,one of themostsuccessful modernoperatingsystems.Thisbook,infact,assumesyouhavethesourcecodeonhandand canapplywhatwesaytoyourownexplorations.he GNU project is coordinated by the Free Sofware Foundation,Inc. (htp/ww.nu.org);its aim istoimplementa wholeoperating systemfreely usable by everyone.The availability of a GNU C compilerhas been essential for the success of the Linux project.Technically speaking, Linux is a true Unix kernel, although it is not a full Unix operating systembecause it does not include all the Unix applications, such as filesystem utilities, windowingsystemsandgraphicaldesktops,systemadministratorcommands,texteditors,compilers,andsoon.However,because most of these programs arefreelyavailableunder theGPL,they can beinstalled in every Linux-based system.Because the Linux kernel requires so much additional software to provide a useful environment,manyLinuxusersprefertorelyoncommercialdistributions,availableonCD-RoM,togetthecodeincluded ina standard Unixsystem.Alternatively,thecodemaybeobtainedfrom several different17
17 Chapter 1. Introduction Linux[*] is a member of the large family of Unix-like operating systems . A relative newcomer experiencing sudden spectacular popularity starting in the late 1990s, Linux joins such well-known commercial Unix operating systems as System V Release 4 (SVR4), developed by AT&T (now owned by the SCO Group); the 4.4 BSD release from the University of California at Berkeley (4.4BSD); Digital UNIX from Digital Equipment Corporation (now Hewlett-Packard); AIX from IBM; HP-UX from Hewlett-Packard; Solaris from Sun Microsystems; and Mac OS X from Apple Computer, Inc. Beside Linux, a few other opensource Unix-like kernels exist, such as FreeBSD , NetBSD , and OpenBSD . [*] LINUX® is a registered trademark of Linus Torvalds. Linux was initially developed by Linus Torvalds in 1991 as an operating system for IBM-compatible personal computers based on the Intel 80386 microprocessor. Linus remains deeply involved with improving Linux, keeping it up-to-date with various hardware developments and coordinating the activity of hundreds of Linux developers around the world. Over the years, developers have worked to make Linux available on other architectures, including Hewlett-Packard's Alpha, Intel's Itanium, AMD's AMD64, PowerPC, and IBM's zSeries. One of the more appealing benefits to Linux is that it isn't a commercial operating system: its source code under the GNU General Public License (GPL) [ ] is open and available to anyone to study (as we will in this book); if you download the code (the official site is http://www.kernel.org) or check the sources on a Linux CD, you will be able to explore, from top to bottom, one of the most successful modern operating systems. This book, in fact, assumes you have the source code on hand and can apply what we say to your own explorations. [ ] The GNU project is coordinated by the Free Software Foundation, Inc. (http://www.gnu.org); its aim is to implement a whole operating system freely usable by everyone. The availability of a GNU C compiler has been essential for the success of the Linux project. Technically speaking, Linux is a true Unix kernel, although it is not a full Unix operating system because it does not include all the Unix applications, such as filesystem utilities, windowing systems and graphical desktops, system administrator commands, text editors, compilers, and so on. However, because most of these programs are freely available under the GPL, they can be installed in every Linux-based system. Because the Linux kernel requires so much additional software to provide a useful environment, many Linux users prefer to rely on commercial distributions, available on CD-ROM, to get the code included in a standard Unix system. Alternatively, the code may be obtained from several different
sites,for instancehttp://www.kernel.org.Severaldistributions put theLinux sourcecode in the/usr/src/linux directory.In the rest of this book,all filepathnames will refer implicitlyto the Linuxsource code directory.1.1.LinuxVersusOtherUnix-LikeKernelsThe various Unix-like systems on themarket,some of whichhave a long history and show signs ofarchaicpractices,differinmanyimportantrespects.AllcommercialvariantswerederivedfromeitherSVR4or4.4BSD,andall tendtoagreeonsomecommon standards likeIEEE'sPortableOperatingSystemsbasedonUnix(POSIX)andX/Open'sCommonApplicationsEnvironment(CAE).Thecurrent standards specifyonly an application programming interface(API)that is,awell-defined environment in which user programs should run. Therefore, the standards do notimposeanyrestriction on internaldesignchoicesofa compliantkernel.Asamatteroffact,severalnon-Unixoperating systems,suchasWindowsNTand itsdescendentsarePOIX-compliantTo define a common user interface, Unix-like kernels often share fundamental design ideas andfeatures. In this respect, Linux is comparable with the other Unix-like operating systems. Readingthis book and studying the Linux kernel,therefore,may help you understand the other Unixvariants,too.The2.6versionoftheLinuxkernelaimstobecompliantwiththeIEEEPOSIXstandard.This,ofcourse,meansthatmostexistingUnixprogramscanbecompiledandexecutedonaLinuxsystemwithverylittleeffortor even withoutthe need for patches tothesource code.Moreover,Linuxincludes all thefeatures of amodern Unix operating system,suchas virtual memory,a virtualfilesystem,lightweightprocesses,UnixsignalsSVR4interprocesscommunications,supportforSymmetricMultiprocessor(SMP)systems,andsoon.When Linus Torvalds wrote the first kernel, he referred to some classical books on Unix internals,likeMauriceBach'sTheDesignoftheUnixOperatingSystem(PrenticeHall,1986).Actually,Linuxstill has some bias toward the Unix baseline described in Bach's book (i.e., SvR2). However, Linuxdoesn't stick to any particular variant. Instead, it tries to adopt the best features and designchoicesof severaldifferentUnixkernelsThe following list describes how Linux competes against some well-known commercial Unixkernels:18
18 sites, for instance http://www.kernel.org. Several distributions put the Linux source code in the /usr/src/linux directory. In the rest of this book, all file pathnames will refer implicitly to the Linux source code directory. 1.1. Linux Versus Other Unix-Like Kernels The various Unix-like systems on the market, some of which have a long history and show signs of archaic practices, differ in many important respects. All commercial variants were derived from either SVR4 or 4.4BSD, and all tend to agree on some common standards like IEEE's Portable Operating Systems based on Unix (POSIX) and X/Open's Common Applications Environment (CAE). The current standards specify only an application programming interface (API)that is, a well-defined environment in which user programs should run. Therefore, the standards do not impose any restriction on internal design choices of a compliant kernel.[*] [*] As a matter of fact, several non-Unix operating systems, such as Windows NT and its descendents, are POSIX-compliant. To define a common user interface, Unix-like kernels often share fundamental design ideas and features. In this respect, Linux is comparable with the other Unix-like operating systems. Reading this book and studying the Linux kernel, therefore, may help you understand the other Unix variants, too. The 2.6 version of the Linux kernel aims to be compliant with the IEEE POSIX standard. This, of course, means that most existing Unix programs can be compiled and executed on a Linux system with very little effort or even without the need for patches to the source code. Moreover, Linux includes all the features of a modern Unix operating system, such as virtual memory, a virtual filesystem, lightweight processes, Unix signals , SVR4 interprocess communications, support for Symmetric Multiprocessor (SMP) systems, and so on. When Linus Torvalds wrote the first kernel, he referred to some classical books on Unix internals, like Maurice Bach's The Design of the Unix Operating System (Prentice Hall, 1986). Actually, Linux still has some bias toward the Unix baseline described in Bach's book (i.e., SVR2). However, Linux doesn't stick to any particular variant. Instead, it tries to adopt the best features and design choices of several different Unix kernels. The following list describes how Linux competes against some well-known commercial Unix kernels:
MonolithickernelIt is a large, complex do-it-yourself program, composed of several logically differentcomponents.Inthis,it isquiteconventional;mostcommercial Unixvariantsaremonolithic.(NotableexceptionsaretheAppleMacOSXandtheGNUHurdoperatingsystems,bothderivedfromtheCarnegie-Mellon'sMach,whichfollowamicrokernelapproach.)CompiledandstaticallylinkedtraditionalUnixkernelsMost modern kernels can dynamically load and unload some portions of the kernel code(typically,devicedrivers),whichareusuallycalledmodules.Linux'ssupportformodulesisvery good, because it is able to automatically load and unload modules on demand. Amongthe main commercial Unix variants, onlythe SVR4.2 and Solaris kernels have a similarfeature.KernelthreadingSomeUnixkernels,suchasSolarisandSVR4.2/MP,areorganizedasasetofkernelthreads.Akernelthread is an execution context that can be independently scheduled;itmaybeassociatedwithauserprogram,oritmayrunonlysomekernelfunctions.Contextswitches between kernel threads are usually much less expensive than context switchesbetween ordinary processes,because the former usually operate on a common addressspace.Linuxuseskernel threads ina very limitedwayto execute a fewkernelfunctionsperiodically:however,they do not represent the basic execution context abstraction.(That's thetopic ofthenext item.)MultithreadedapplicationsupportMostmodernoperatingsystemshavesomekindofsupportformultithreadedapplicationsthatis,userprogramsthataredesignedintermsofmanyrelativelyindependentexecutionflows that share a large portion of the application data structures.A multithreaded userapplicationcould becomposed ofmanylightweightprocesses(LwP),whichareprocessesthat can operate ona common address space, common physical memorypages,commonopenedfiles,andsoon.Linuxdefinesitsownversionof lightweightprocesses,whichisdifferentfromthetypesusedonothersystems suchasSVR4andSolaris.Whileall thecommercial Unixvariantsof LwParebasedonkernel threads,Linuxregardslightweightprocesses as the basic execution context and handles them via the nonstandard clone()system call.Preemptivekernel19
19 Monolithic kernel It is a large, complex do-it-yourself program, composed of several logically different components. In this, it is quite conventional; most commercial Unix variants are monolithic. (Notable exceptions are the Apple Mac OS X and the GNU Hurd operating systems, both derived from the Carnegie-Mellon's Mach, which follow a microkernel approach.) Compiled and statically linked traditional Unix kernels Most modern kernels can dynamically load and unload some portions of the kernel code (typically, device drivers), which are usually called modules . Linux's support for modules is very good, because it is able to automatically load and unload modules on demand. Among the main commercial Unix variants, only the SVR4.2 and Solaris kernels have a similar feature. Kernel threading Some Unix kernels, such as Solaris and SVR4.2/MP, are organized as a set of kernel threads . A kernel thread is an execution context that can be independently scheduled; it may be associated with a user program, or it may run only some kernel functions. Context switches between kernel threads are usually much less expensive than context switches between ordinary processes, because the former usually operate on a common address space. Linux uses kernel threads in a very limited way to execute a few kernel functions periodically; however, they do not represent the basic execution context abstraction. (That's the topic of the next item.) Multithreaded application support Most modern operating systems have some kind of support for multithreaded applications that is, user programs that are designed in terms of many relatively independent execution flows that share a large portion of the application data structures. A multithreaded user application could be composed of many lightweight processes (LWP), which are processes that can operate on a common address space, common physical memory pages, common opened files, and so on. Linux defines its own version of lightweight processes, which is different from the types used on other systems such as SVR4 and Solaris. While all the commercial Unix variants of LWP are based on kernel threads, Linux regards lightweight processes as the basic execution context and handles them via the nonstandard clone( ) system call. Preemptive kernel
Whencompiledwiththe"PreemptibleKernel"option,Linux2.6canarbitrarilyinterleaveexecution flows while they are in privileged mode.Besides Linux 2.6,a few otherconventional, general-purpose Unix systems, such as Solaris and Mach 3.o,are fullypreemptivekernels.SVR4.2/MPintroduces somefixed preemptionpointsas a method togetlimitedpreemptioncapability.MultiprocessorsupportSeveralUnixkernelvariants takeadvantage ofmultiprocessor systems.Linux2.6 supportssymmetric multiprocessing (SMP )for different memory models,including NUMA:thesystemcanusemultipleprocessorsandeachprocessorcanhandleanytaskthereisnodiscriminationamongthem.Althoughafewpartsofthekernel codearestillserializedbymeansofasingle"bigkernel lock,"it isfairtosaythatLinux2.6makesanearoptimaluseofSMP.FilesystemLinux's standard filesystems come in many flavors. You can use the plain old Ext2filesystem if youdon'thavespecific needs.YoumightswitchtoExt3if youwantto avoidlengthyfilesystemchecksafterasystemcrash.Ifyou'll havetodealwithmanysmallfiles,theReiserFSfilesystemislikelytobethebestchoice.BesidesExt3andReiserFS,severalother journaling filesystems can be used in Linux; they include IBM AIX's Journaling FileSystem (JFS )and Silicon Graphics IRIX 's XFS filesystem.Thanks to a powerfulobject-orientedVirtualFileSystemtechnology(inspiredbySolarisandSVR4),portingaforeign filesystemto Linux is generally easier thanporting tootherkernels.STREAMSLinuxhasnoanalogtotheSTREAMS I/OsubsystemintroducedinSVR4,althoughit isincludednowinmostUnixkernelsandhasbecomethepreferredinterfaceforwritingdevicedrivers,terminaldrivers,andnetworkprotocols.This assessment suggests that Linux is fully competitive nowadays with commercial operatingsystems.Moreover,Linuxhas several features that make it an exciting operating system.Commercial Unixkernels often introduce new featuresto gain a larger sliceof the market,butthesefeaturesarenot necessarilyuseful,stable,orproductive.Asamatteroffact,modern Unixkernels tend to be quite bloated.By contrast, Linuxtogether with the other open source operatingsystemsdoesn'tsufferfromtherestrictions andtheconditioningimposed bythemarket,hence itcanfreelyevolveaccordingtotheideasof itsdesigners(mainlyLinusTorvalds).Specifically,Linuxoffersthefollowingadvantagesoveritscommercial competitors:20
20 When compiled with the "Preemptible Kernel" option, Linux 2.6 can arbitrarily interleave execution flows while they are in privileged mode. Besides Linux 2.6, a few other conventional, general-purpose Unix systems, such as Solaris and Mach 3.0 , are fully preemptive kernels. SVR4.2/MP introduces some fixed preemption points as a method to get limited preemption capability. Multiprocessor support Several Unix kernel variants take advantage of multiprocessor systems. Linux 2.6 supports symmetric multiprocessing (SMP ) for different memory models, including NUMA: the system can use multiple processors and each processor can handle any task there is no discrimination among them. Although a few parts of the kernel code are still serialized by means of a single "big kernel lock ," it is fair to say that Linux 2.6 makes a near optimal use of SMP. Filesystem Linux's standard filesystems come in many flavors. You can use the plain old Ext2 filesystem if you don't have specific needs. You might switch to Ext3 if you want to avoid lengthy filesystem checks after a system crash. If you'll have to deal with many small files, the ReiserFS filesystem is likely to be the best choice. Besides Ext3 and ReiserFS, several other journaling filesystems can be used in Linux; they include IBM AIX's Journaling File System (JFS ) and Silicon Graphics IRIX 's XFS filesystem. Thanks to a powerful object-oriented Virtual File System technology (inspired by Solaris and SVR4), porting a foreign filesystem to Linux is generally easier than porting to other kernels. STREAMS Linux has no analog to the STREAMS I/O subsystem introduced in SVR4, although it is included now in most Unix kernels and has become the preferred interface for writing device drivers, terminal drivers, and network protocols. This assessment suggests that Linux is fully competitive nowadays with commercial operating systems. Moreover, Linux has several features that make it an exciting operating system. Commercial Unix kernels often introduce new features to gain a larger slice of the market, but these features are not necessarily useful, stable, or productive. As a matter of fact, modern Unix kernels tend to be quite bloated. By contrast, Linuxtogether with the other open source operating systemsdoesn't suffer from the restrictions and the conditioning imposed by the market, hence it can freely evolve according to the ideas of its designers (mainly Linus Torvalds). Specifically, Linux offers the following advantages over its commercial competitors: