IntroductionWhat This Book CoversThis book discusses the concepts, structure,and implementation of the Linux kernel. In particular,theindividual chapters coverthefollowingtopics:口Chapter1 provides an overview of the Linux kernel and describes the big picture that is investi-gated more closely in the following chapters.口Chapter2talksabout thebasics ofmultitasking,scheduling,and process management,andinvestigateshowthesefundamental techniques and abstractions are implemented.口Chapter3discusseshowphysical memoryismanaged.Both the interaction with hardwareandthe in-kernel distribution of RAM via the buddy system and the slab allocator are covered.口Chapter 4 proceeds to describe howuserland processes experience virtual memory,and thecomprehensive data structures and actions required from thekernel to implement this view.口Chapter 5 introduces themechanisms required to ensure proper operation of thekernel onmultiprocessorsystems.Additionally,itcoverstherelatedquestionofhowprocessescancom-municate with each other.口Chapter6 walks you through themeans for writing device drivers that are required to add sup-port for new hardware to the kernel.口Chapter7explains how modules allowfor dynamicallyadding newfunctionalityto thekernel.口Chapter8 discusses thevirtual filesystem,agenericlayer of thekernel thatallows for supportinga wide range of different filesystems, both physical and virtual.口Chapter9 describes the extended filesystem family, that is, the Ext2 and Ext3 filesystems that arethe standard workhorses ofmany Linux installations.口Chapter10 goesontodiscuss procfs and sysfs,twofilesystems that arenot designed to storeinformation,but to present meta-information about thekernel to userland.Additionally,a num-berofmeanstoeasewritingfilesystems arepresented.口Chapter 11 shows how extended attributes and access control lists that can help to improve sys-tem security are implemented.口Chapter12discusses thenetworking implementation of thekernel,witha specificfocus onIPv4,TCP,UDP,andnetfilter.Chapter13 introduces how systems calls that are the standard way to request akernel actionfrom userland are implemented.口Chapter 14 analyzes how kernel activities are triggered with interrupts, and presents means ofdeferring work to a later point in time.口Chapter15shows how thekernel handles all time-related requirements,bothwith low and highresolution.口Chapter 16 talks about speeding upkernel operations with the help of the page and buffercaches.口Chapter17 discusses how cached data in memory are synchronized with their sources on persis-tent storage devices.口Chapter18 introduceshowpage reclaim and swapping work.xxix
Mauerer flast.tex V2 - 09/05/2008 12:08pm Page xxix Introduction What This Book Covers This book discusses the concepts, structure, and implementation of the Linux kernel. In particular, the individual chapters cover the following topics: ❑ Chapter 1 provides an overview of the Linux kernel and describes the big picture that is investigated more closely in the following chapters. ❑ Chapter 2 talks about the basics of multitasking, scheduling, and process management, and investigates how these fundamental techniques and abstractions are implemented. ❑ Chapter 3 discusses how physical memory is managed. Both the interaction with hardware and the in-kernel distribution of RAM via the buddy system and the slab allocator are covered. ❑ Chapter 4 proceeds to describe how userland processes experience virtual memory, and the comprehensive data structures and actions required from the kernel to implement this view. ❑ Chapter 5 introduces the mechanisms required to ensure proper operation of the kernel on multiprocessor systems. Additionally, it covers the related question of how processes can communicate with each other. ❑ Chapter 6 walks you through the means for writing device drivers that are required to add support for new hardware to the kernel. ❑ Chapter 7 explains how modules allow for dynamically adding new functionality to the kernel. ❑ Chapter 8 discusses the virtual filesystem, a generic layer of the kernel that allows for supporting a wide range of different filesystems, both physical and virtual. ❑ Chapter 9 describes the extended filesystem family, that is, the Ext2 and Ext3 filesystems that are the standard workhorses of many Linux installations. ❑ Chapter 10 goes on to discuss procfs and sysfs, two filesystems that are not designed to store information, but to present meta-information about the kernel to userland. Additionally, a number of means to ease writing filesystems are presented. ❑ Chapter 11 shows how extended attributes and access control lists that can help to improve system security are implemented. ❑ Chapter 12 discusses the networking implementation of the kernel, with a specific focus on IPv4, TCP, UDP, and netfilter. ❑ Chapter 13 introduces how systems calls that are the standard way to request a kernel action from userland are implemented. ❑ Chapter 14 analyzes how kernel activities are triggered with interrupts, and presents means of deferring work to a later point in time. ❑ Chapter 15 shows how the kernel handles all time-related requirements, both with low and high resolution. ❑ Chapter 16 talks about speeding up kernel operations with the help of the page and buffer caches. ❑ Chapter 17 discusses how cached data in memory are synchronized with their sources on persistent storage devices. ❑ Chapter 18 introduces how page reclaim and swapping work. xxix
Introduction口Chapter19gives an introduction totheaudit implementation, which allows for observing indetail what the kernel is doing.口Appendix A discusses peculiarities of various architectures supported by the kernel.口Appendix B walks through various tools and means of working efficiently with the kernelsources.0AppendixCprovides some technical notes about the programming language C,and alsodiscusseshowtheGNUCcompilerisstructured.口AppendixDdescribeshowthekernel isbooted.口AppendixEgives an introductionto theELFbinaryformat.0Appendix F discusses numerous social aspects of kernel development and the Linux kernelcommunity.XXX
Mauerer flast.tex V2 - 09/05/2008 12:08pm Page xxx Introduction ❑ Chapter 19 gives an introduction to the audit implementation, which allows for observing in detail what the kernel is doing. ❑ Appendix A discusses peculiarities of various architectures supported by the kernel. ❑ Appendix B walks through various tools and means of working efficiently with the kernel sources. ❑ Appendix C provides some technical notes about the programming language C, and also discusses how the GNU C compiler is structured. ❑ Appendix D describes how the kernel is booted. ❑ Appendix E gives an introduction to the ELF binary format. ❑ Appendix F discusses numerous social aspects of kernel development and the Linux kernel community. xxx
Introduction and OverviewOperating systems are not only regarded as a fascinating part of information technology,but arealso the subject of controversial discussion among a wide public.Linux has played a major rolein this development. Whereas just 10 years ago a strict distinction was made between relativelysimpleacademicsystemsavailable insource codeand commercialvariants withvaryingperfor-mance capabilities whose sources werea well-guarded secret, nowadays anybody can downloadthesourcesofLinux (orofanyotherfreesystems)fromtheInternetinordertostudythem.Linux is now installed on millions of systems and is used by home users and professionalsalikefora widerangeof tasks.Fromminiatureembedded systems inwristwatchestomassivelyparallelmainframes,therearecountlesswaysofexploitingLinuxproductively.Andthismakesthesourcesso interesting.A sound, well-established concept (Unix)melded with powerful innovations and astrongpenchant fordealing with problems that do not arise in academic teaching systems-this iswhatmakes Linux sofascinating.This book describes the central functions of the kernel,explains its underlying structures,and examinesitsimplementation.Becausecomplex subjectsarediscussed,Iassumethatthereaderalreadyhas some experiencein operatingsystemsand systemsprogramminginC(itgoeswithoutsayingthatIassume some familiarity with using Linux systems).Itouch briefly on several general conceptsrelevanttocommonoperatingsystemproblems,butmyprimefocusisontheimplementationoftheLinuxkernel. Readers unfamiliar witha particular topic will find explanations on relevant basics inone of the manygeneral texts on operating systems; for example, in Tanenbaum's outstandinglit is not the intention of this book to participate in ideological discussions such as whether Linux can be regarded as afull operating system, although it is, in fact, just a kernel that cannot function productively without relying on other com-ponents. When I speak of Linux as an operating system without explicitly mentioning the acronyms of similar projects(primarily the GNU project, which despite strong initial resistance regarding the kernel reacts extremely sensitively whenLinux is used instead of GNu/Linux), this should not be taken to mean that I do not appreciate the importance of thework done by this project. Our reasons are simple and pragmatic.Where do we draw the line when citing those involvedwithout generating such lengthy constructs as GNU/IBM/RedHat/HP/KDE/Linux?If this footnote makes little sense, refer towww.gnu.org/gnu/linux-and-gnu.html,whereyouwill find a summary of thepositions of the GNU project.After all ideological questions have been settled, I promise to refrain from using half-page footnotes in the rest of this book
Mauerer runc01.tex V2 - 09/04/2008 4:13pm Page 1 Introduction and Overview Operating systems are not only regarded as a fascinating part of information technology, but are also the subject of controversial discussion among a wide public.1 Linux has played a major role in this development. Whereas just 10 years ago a strict distinction was made between relatively simple academic systems available in source code and commercial variants with varying performance capabilities whose sources were a well-guarded secret, nowadays anybody can download the sources of Linux (or of any other free systems) from the Internet in order to study them. Linux is now installed on millions of systems and is used by home users and professionals alike for a wide range of tasks. From miniature embedded systems in wristwatches to massively parallel mainframes, there are countless ways of exploiting Linux productively. And this makes the sources so interesting. A sound, well-established concept (Unix) melded with powerful innovations and a strong penchant for dealing with problems that do not arise in academic teaching systems — this is what makes Linux so fascinating. This book describes the central functions of the kernel, explains its underlying structures, and examines its implementation. Because complex subjects are discussed, I assume that the reader already has some experience in operating systems and systems programming in C (it goes without saying that I assume some familiarity with using Linux systems). I touch briefly on several general concepts relevant to common operating system problems, but my prime focus is on the implementation of the Linux kernel. Readers unfamiliar with a particular topic will find explanations on relevant basics in one of the many general texts on operating systems; for example, in Tanenbaum’s outstanding 1It is not the intention of this book to participate in ideological discussions such as whether Linux can be regarded as a full operating system, although it is, in fact, just a kernel that cannot function productively without relying on other components. When I speak of Linux as an operating system without explicitly mentioning the acronyms of similar projects (primarily the GNU project, which despite strong initial resistance regarding the kernel reacts extremely sensitively when Linux is used instead of GNU/Linux), this should not be taken to mean that I do not appreciate the importance of the work done by this project. Our reasons are simple and pragmatic. Where do we draw the line when citing those involved without generating such lengthy constructs as GNU/IBM/RedHat/HP/KDE/Linux? If this footnote makes little sense, refer to www.gnu.org/gnu/linux-and-gnu.html, where you will find a summary of the positions of the GNU project. After all ideological questions have been settled, I promise to refrain from using half-page footnotes in the rest of this book
Chapter1:IntroductionandOverviewintroductions ([TW06] and [Tan07]).A solid foundation of Cprogramming is required.Because thekernelmakes useofmanyadvancedtechniquesofCand,aboveall,ofmanyspecial featuresof theGNUCcompiler,AppendixCdiscussesthefinerpoints ofCwithwhicheven good programmers maynotbe familiar.Abasic knowledge of computer structures willbe useful as Linux necessarily interacts verydirectlywithsystemhardware-particularlywiththeCPU.Therearealsoalargenumberof introduc-toryworks dealing with this subject; some are listed in the reference section.When I deal with CPUsin greater depth (inmost casesI take theIA-32 or AMD64 architecture as an example because Linux isused predominantly on these system architectures), Iexplain therelevant hardware details.When Idis-cuss mechanisms that are not ubiquitous in daily live, I will explain the general concept behind them,but expect that readers will also consult the quoted manual pages for more advice on how a particularfeatureisusedfromuserspace.The present chapter is designed to provide an overview of the various areas of thekernel and to illustratetheir fundamental relationships before moving on to lengthier descriptions of the subsystems in thefollowingchapters.Since thekernel evolves quickly,one question that naturally comes to mind is which version is cov-eredinthisbook.Ihavechosenkernel2.6.24,whichwasreleasedattheendofJanuary2008.Thedynamic nature of kernel development implies that a newkernel version willbe available by the timeyou read this, and naturally, some details will have changedthis is unavoidable. If it were not thecase,Linuxwouldbeadead and boringsystem,and chancesarethatyouwould notwanttoreadthebook.While some of thedetails will have changed,concepts will nothavevaried essentially.Thisisparticularlytruebecause2.6.24hasseensomeveryfundamentalchangesascomparedtoearlierversionsDevelopersdonotripoutsuchthingsovernight,naturally1.1Tasksof theKernelOn a purely technical level, the kernel is an intermediary layer between the hardware and the software.Itspurpose is to pass application requests to thehardwareand toact as a low-level driver to addressthedevices and components of the system.Nevertheless,thereareotherinteresting ways of viewingthekernel.口Thekernel can be regarded as an enhanced machine that, in the view of theapplication, abstractsthecomputeronahighlevel.Forexample,whenthekerneladdressesaharddisk,itmustdecidewhich path to use to copydata from disk tomemory,wherethe data reside, which commandsmust be sent to the disk via which path,and so on.Applications, on the other hand,need onlyissuethecommandthatdataaretobetransferred.Howthisisdoneisirrelevanttotheappli-cation the details are abstracted by the kernel. Application programs have no contact withthe hardware itself, only with the kernel, which,for them,represents the lowest level in thehierarchytheyknow-and isthereforeanenhancedmachine.口Viewing thekernel as a resource manager is justified when several programs arerun concurrentlyon a system.In this case,thekernel isan instancethatshares available resources-CPUtime,disk space, network connections, and so on —between the various system processes while at thesametimeensuringsystemintegrity2The CPU is an exception since it is obviously unavoidable that programs access it. Nevertheless, the full range of possible instruc-tions is not available for applications.2
Mauerer runc01.tex V2 - 09/04/2008 4:13pm Page 2 Chapter 1: Introduction and Overview introductions ([TW06] and [Tan07]). A solid foundation of C programming is required. Because the kernel makes use of many advanced techniques of C and, above all, of many special features of the GNU C compiler, Appendix C discusses the finer points of C with which even good programmers may not be familiar. A basic knowledge of computer structures will be useful as Linux necessarily interacts very directly with system hardware — particularly with the CPU. There are also a large number of introductory works dealing with this subject; some are listed in the reference section. When I deal with CPUs in greater depth (in most cases I take the IA-32 or AMD64 architecture as an example because Linux is used predominantly on these system architectures), I explain the relevant hardware details. When I discuss mechanisms that are not ubiquitous in daily live, I will explain the general concept behind them, but expect that readers will also consult the quoted manual pages for more advice on how a particular feature is used from userspace. The present chapter is designed to provide an overview of the various areas of the kernel and to illustrate their fundamental relationships before moving on to lengthier descriptions of the subsystems in the following chapters. Since the kernel evolves quickly, one question that naturally comes to mind is which version is covered in this book. I have chosen kernel 2.6.24, which was released at the end of January 2008. The dynamic nature of kernel development implies that a new kernel version will be available by the time you read this, and naturally, some details will have changed — this is unavoidable. If it were not the case, Linux would be a dead and boring system, and chances are that you would not want to read the book. While some of the details will have changed, concepts will not have varied essentially. This is particularly true because 2.6.24 has seen some very fundamental changes as compared to earlier versions. Developers do not rip out such things overnight, naturally. 1.1 Tasks of the Kernel On a purely technical level, the kernel is an intermediary layer between the hardware and the software. Its purpose is to pass application requests to the hardware and to act as a low-level driver to address the devices and components of the system. Nevertheless, there are other interesting ways of viewing the kernel. ❑ The kernel can be regarded as an enhanced machine that, in the view of the application, abstracts the computer on a high level. For example, when the kernel addresses a hard disk, it must decide which path to use to copy data from disk to memory, where the data reside, which commands must be sent to the disk via which path, and so on. Applications, on the other hand, need only issue the command that data are to be transferred. How this is done is irrelevant to the application — the details are abstracted by the kernel. Application programs have no contact with the hardware itself,2 only with the kernel, which, for them, represents the lowest level in the hierarchy they know — and is therefore an enhanced machine. ❑ Viewing the kernel as a resource manager is justified when several programs are run concurrently on a system. In this case, the kernel is an instance that shares available resources — CPU time, disk space, network connections, and so on — between the various system processes while at the same time ensuring system integrity. 2The CPU is an exception since it is obviously unavoidable that programs access it. Nevertheless, the full range of possible instructions is not available for applications. 2
Chapter1:IntroductionandOverview口Another view of thekernel is as a library providing a range of system-oriented commands.As isgenerallyknown,system calls are used to send requests to the computer;with thehelp of theCstandard library,theseappear to the application programs as normal functions that are invokedin the same way as any other function.1.2lmplementationStrategiesCurrently,there are two main paradigms on which the implementation of operating systems is based:1.MicrokernelsIn these, only the most elementary functions are implemented directlyina central kernel-the microkernel.All other functions are delegated to autonomousprocesses that communicate with the central kernel via clearly defined communicationinterfacesforexample,variousfilesystems,memorymanagement,and so on.(Ofcourse,themost elementary level ofmemory management thatcontrols communicationwith the system itself is in the microkernel. However, handling on the system call level isimplemented in external servers.) Theoretically, this is a very elegant approach becausethe individual parts are clearly segregated from each other,and this forces programmersto use"clean"programmingtechniques.Otherbenefits of this approacharedynamicextensibilityandtheabilitytoswapimportantcomponentsatruntime.However,owingto theadditional CPU timeneeded to supportcomplexcommunication between thecomponents, microkernels have not really established themselves in practice although theyhave been the subject ofactive and varied research for some time now.2.MonolithicKernels-They are the alternative,traditional concept.Here, the entire codeofthekernelincludingall its subsystems suchasmemorymanagement,filesystems,ordevice drivers is packed into a single file. Each function has access to all other parts ofthe kernel; this can result in elaborately nested source code if programming is not done withgreatcare.Because, at the moment, the performance of monolithic kernels is still greater than that of microkernelsLinux was and still is implemented according tothis paradigm.However,onemajor innovation has beenintroduced.Modules withkernel code that can be inserted or removed whilethe system is up-and-runningsupportthedynamic addition ofa wholerange offunctionsto thekernel,thus compensatingfor someofthe disadvantages of monolithic kernels. This is assisted by elaborate means of communication betweenthe kernel and userland that allows for implementing hotplugging and dynamic loading of modules.1.3Elementsof theKernelThis section provides a brief overview of the various elements of the kernel and outlines the areas we willexamine in more detail in the following chapters.Despite its monolithic approach, Linux is surprisinglywell structured. Nevertheless, it is inevitable that its individual elements interact with each other; theysharedata structures,and (forperformancereasons)cooperatewitheachotherviamorefunctions thanwould be necessary in a strictly segregated system. In the following chapters, I am obliged to makefrequent reference to the other elements of the kernel and therefore to other chapters, although I havetried to keep the number of forward references to a minimum. For this reason, Iintroduce the individualelements briefly here so that you can form an impression of theirrole and their place in the overall3
Mauerer runc01.tex V2 - 09/04/2008 4:13pm Page 3 Chapter 1: Introduction and Overview ❑ Another view of the kernel is as a library providing a range of system-oriented commands. As is generally known, system calls are used to send requests to the computer; with the help of the C standard library, these appear to the application programs as normal functions that are invoked in the same way as any other function. 1.2 Implementation Strategies Currently, there are two main paradigms on which the implementation of operating systems is based: 1. Microkernels — In these, only the most elementary functions are implemented directly in a central kernel — the microkernel. All other functions are delegated to autonomous processes that communicate with the central kernel via clearly defined communication interfaces — for example, various filesystems, memory management, and so on. (Of course, the most elementary level of memory management that controls communication with the system itself is in the microkernel. However, handling on the system call level is implemented in external servers.) Theoretically, this is a very elegant approach because the individual parts are clearly segregated from each other, and this forces programmers to use ‘‘clean‘‘ programming techniques. Other benefits of this approach are dynamic extensibility and the ability to swap important components at run time. However, owing to the additional CPU time needed to support complex communication between the components, microkernels have not really established themselves in practice although they have been the subject of active and varied research for some time now. 2. Monolithic Kernels — They are the alternative, traditional concept. Here, the entire code of the kernel — including all its subsystems such as memory management, filesystems, or device drivers — is packed into a single file. Each function has access to all other parts of the kernel; this can result in elaborately nested source code if programming is not done with great care. Because, at the moment, the performance of monolithic kernels is still greater than that of microkernels, Linux was and still is implemented according to this paradigm. However, one major innovation has been introduced. Modules with kernel code that can be inserted or removed while the system is up-and-running support the dynamic addition of a whole range of functions to the kernel, thus compensating for some of the disadvantages of monolithic kernels. This is assisted by elaborate means of communication between the kernel and userland that allows for implementing hotplugging and dynamic loading of modules. 1.3 Elements of the Kernel This section provides a brief overview of the various elements of the kernel and outlines the areas we will examine in more detail in the following chapters. Despite its monolithic approach, Linux is surprisingly well structured. Nevertheless, it is inevitable that its individual elements interact with each other; they share data structures, and (for performance reasons) cooperate with each other via more functions than would be necessary in a strictly segregated system. In the following chapters, I am obliged to make frequent reference to the other elements of the kernel and therefore to other chapters, although I have tried to keep the number of forward references to a minimum. For this reason, I introduce the individual elements briefly here so that you can form an impression of their role and their place in the overall 3