CreditsExecutiveEditorVice President and ExecutiveGroupCarol LongPublisherRichard SwadleySeniorDevelopmentEditorTom DinseVice President and ExecutivePublisherProduction EditorJoseph B. WikertDebra BanningerProject Coordinator,CoverCopy EditorsLynsey StanfordCate CaffreyKathryn DugganProofreaderEditorial ManagerPublication Services,Inc.Mary Beth WakefieldIndexerProduction ManagerJack LewisTimTate
Mauerer fcredit.tex V2 - 08/22/2008 4:53am Page vii Credits Executive Editor Carol Long Senior Development Editor Tom Dinse Production Editor Debra Banninger Copy Editors Cate Caffrey Kathryn Duggan Editorial Manager Mary Beth Wakefield Production Manager Tim Tate Vice President and Executive Group Publisher Richard Swadley Vice President and Executive Publisher Joseph B. Wikert Project Coordinator, Cover Lynsey Stanford Proofreader Publication Services, Inc. Indexer Jack Lewis
AcknowledgmentsFirst and foremost, I have to thank the thousands of programmers who have created the Linuxkernelovertheyearsmostofthemcommerciallybased,butsomealsojustfortheirownprivateoracademicjoy. Without them, there would be no kernei, and I would have had nothing to write about.Please acceptmy apologies that I cannot list all several hundred names here, but in true UNIX style, you can easilygeneratethelistby:forfilein$ALL_FILES_COVERED_IN_THIS_BOOK;dogitlog--pretty="format:tan"$file;donelsort-u-k 2,2It goes without saying that Iadmire your work verymuch-you are all the trueheroes in this story!What you are reading right now is the result of an evolution over more than seven years: After two yearsof writing, thefirst edition waspublished in German by Carl Hanser Verlag in 2003.It then describedkernel 2.6.0.The text was used as a basis for the low-level design documentation for the EAL4+ securityevaluation of Red Hat Enterprise Linux 5,requiring to update it tokernel 2.6.18 (if the EAL acronymdoesnotmean anythingtoyou,thenWikipedia is oncemoreyourfriend).Hewlett-Packardsponsoredthetranslation into English and has,thankfully,granted the rights to publish the result.Updates tokernel2.6.24 were then performed specifically for this book.Several people were involved in thisevolution,and myappreciationgoesto all of them: Leslie Mackay-Poulton, with support from David Jacobs,did a tremendous job at translating a hugepile oftext intoEnglish.I'm also indebted to Sal La Pietra of atsec information securityfor pulling the strings to get thetranslationproject rolling,and especiallytoStephanMullerforclosecooperationduringtheevaluation.My cordial thanks also go to all other HP and Red Hat people involved in this evaluation, and also toClaudioKopper and Hans Lohrforour very enjoyablecooperationduring this project.Many thanks alsogo tothepeople at Wiley-bothvisibleand invisibletome-who helped to shapethe book into itscurrentform.The German edition was well received by readers and reviewers, but nevertheless comments aboutinaccuracies and suggestions for improvements were provided. I'm glad for all of them, and would alsoliketomentiontheinstructorswhoansweredthepublisher'ssurveyfortheoriginaledition.Someoftheirsuggestions wereveryvaluablefor improving the currentpublication.The samegoes for therefereesforthisedition,especiallytoDr.XiaodongZhangforprovidingnumerous suggestionsforAppendixF.4.Furthermore, Iexpress my gratitude to Dr. Christine Silberhorn for granting me the opportunity tosuspendmy regular research work at the MaxPlanck Research Group forfour weeks towork on thisproject.Ihope you enjoyed the peace during this time when nobody was trying to install Linux on yourMacBook!As with everybook,I owe my deepest gratitude to my familyfor supportingme in every aspect oflife-Imore than appreciate this indispensable aid.Finally,Ihave to thank Hariet Fabritius for infinite
Mauerer fack.tex V4 - 09/04/2008 3:36pm Page ix Acknowledgments First and foremost, I have to thank the thousands of programmers who have created the Linux kernel over the years — most of them commercially based, but some also just for their own private or academic joy. Without them, there would be no kernel, and I would have had nothing to write about. Please accept my apologies that I cannot list all several hundred names here, but in true UNIX style, you can easily generate the list by: for file in $ALL_FILES_COVERED_IN_THIS_BOOK; do git log -pretty="format:%an" $file; done | sort -u -k 2,2 It goes without saying that I admire your work very much — you are all the true heroes in this story! What you are reading right now is the result of an evolution over more than seven years: After two years of writing, the first edition was published in German by Carl Hanser Verlag in 2003. It then described kernel 2.6.0. The text was used as a basis for the low-level design documentation for the EAL4+ security evaluation of Red Hat Enterprise Linux 5, requiring to update it to kernel 2.6.18 (if the EAL acronym does not mean anything to you, then Wikipedia is once more your friend). Hewlett-Packard sponsored the translation into English and has, thankfully, granted the rights to publish the result. Updates to kernel 2.6.24 were then performed specifically for this book. Several people were involved in this evolution, and my appreciation goes to all of them: Leslie MackayPoulton, with support from David Jacobs, did a tremendous job at translating a huge pile of text into English. I’m also indebted to Sal La Pietra of atsec information security for pulling the strings to get the translation project rolling, and especially to Stephan Muller for close cooperation during the evaluation. ¨ My cordial thanks also go to all other HP and Red Hat people involved in this evaluation, and also to Claudio Kopper and Hans Lohr for our very enjoyable cooperation during this project. Many thanks also ¨ go to the people at Wiley — both visible and invisible to me — who helped to shape the book into its current form. The German edition was well received by readers and reviewers, but nevertheless comments about inaccuracies and suggestions for improvements were provided. I’m glad for all of them, and would also like to mention the instructors who answered the publisher’s survey for the original edition. Some of their suggestions were very valuable for improving the current publication. The same goes for the referees for this edition, especially to Dr. Xiaodong Zhang for providing numerous suggestions for Appendix F.4. Furthermore, I express my gratitude to Dr. Christine Silberhorn for granting me the opportunity to suspend my regular research work at the Max Planck Research Group for four weeks to work on this project. I hope you enjoyed the peace during this time when nobody was trying to install Linux on your MacBook! As with every book, I owe my deepest gratitude to my family for supporting me in every aspect of life — I more than appreciate this indispensable aid. Finally, I have to thank Hariet Fabritius for infinite
Acknowledgmentspatience with an author whose work cycle not only perfectly matched the most alarming forms of sleepdyssomnias,but who was always right on thebrink of confusing his native tongue with"c,"and whomshe consequently had to rescue from numerous situations where he seemingly had lost his mind (seebelow...). Now that I have more free time again, I'm not only looking forward to our well-deservedholiday,but can finally embark upon the project of giving your laptop all joys of a proper operatingsystem! (Writing these acknowledgments,I all of a sudden realize why people always hasten to lockawaytheirlaptopswhentheyseemeapproaching....)X
Mauerer fack.tex V4 - 09/04/2008 3:36pm Page x Acknowledgments patience with an author whose work cycle not only perfectly matched the most alarming forms of sleep dyssomnias, but who was always right on the brink of confusing his native tongue with ‘‘C,’’ and whom she consequently had to rescue from numerous situations where he seemingly had lost his mind (see below.). Now that I have more free time again, I’m not only looking forward to our well-deserved holiday, but can finally embark upon the project of giving your laptop all joys of a proper operating system! (Writing these acknowledgments, I all of a sudden realize why people always hasten to lock away their laptops when they see me approaching. .) x
ContentsxxviiIntroduction1Chapter1:IntroductionandOverview2TasksoftheKernel3ImplementationStrategies3ElementsoftheKernel4Processes,TaskSwitching,andScheduling4UNixProcesses7AddressSpacesandPrivilegeLevels11Page Tables13AllocationofPhysicalMemory16Timing17SystemCalls17DeviceDrivers,BlockandCharacterDevices18Networks18Filesystems18Modules and Hotplugging20Caching20List Handling22ObjectManagementandReferenceCounting25DataTypes27.andBeyondthe Infinite28WhytheKernellsSpecial29SomeNotesonPresentation33Summary35Chanter 2:Process Management and Scheduling36ProcessPriorities38ProcessLifeCycle40PreemptiveMultitasking41ProcessRepresentation47ProcessTypes47Namespaces
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xi Contents Introduction xxvii Chapter 1: Introduction and Overview 1 Tasks of the Kernel 2 Implementation Strategies 3 Elements of the Kernel 3 Processes, Task Switching, and Scheduling 4 Unix Processes 4 Address Spaces and Privilege Levels 7 Page Tables 11 Allocation of Physical Memory 13 Timing 16 System Calls 17 Device Drivers, Block and Character Devices 17 Networks 18 Filesystems 18 Modules and Hotplugging 18 Caching 20 List Handling 20 Object Management and Reference Counting 22 Data Types 25 . and Beyond the Infinite 27 Why the Kernel Is Special 28 Some Notes on Presentation 29 Summary 33 Chapter 2: Process Management and Scheduling 35 Process Priorities 36 Process Life Cycle 38 Preemptive Multitasking 40 Process Representation 41 Process Types 47 Namespaces 47
Contents54ProcessIdentificationNumbers62TaskRelationships63ProcessManagementSystemCalls63ProcessDuplication77KernelThreads79StartingNewPrograms83ExitingProcesses83ImplementationoftheScheduler84Overview86Data Structures93Dealing withPriorities99CoreScheduler106TheCompletelyFairSchedulingClass106DataStructures107CFSOperations112QueueManipulation113SelectingtheNext Task114Handling thePeriodic Tick115Wake-upPreemption116Handling NewTasks117TheReal-TimeSchedulingClassProperties118118Data Structures119SchedulerOperations121SchedulerEnhancements121SMP Scheduling126SchedulingDomainsandControlGroups127KernelPreemptionandLowLatencyEfforts132Summary133Chapter3:MemoryManagementOverview133136Organizationinthe(N)UMAModel136OverviewData Structures138153PageTables154Data Structures161Creatingand Manipulating Entries161InitializationofMemoryManagement162Data Structure Setup169Architecture-SpecificSetup191MemoryManagementduringtheBootProcessxii
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xii Contents Process Identification Numbers 54 Task Relationships 62 Process Management System Calls 63 Process Duplication 63 Kernel Threads 77 Starting New Programs 79 Exiting Processes 83 Implementation of the Scheduler 83 Overview 84 Data Structures 86 Dealing with Priorities 93 Core Scheduler 99 The Completely Fair Scheduling Class 106 Data Structures 106 CFS Operations 107 Queue Manipulation 112 Selecting the Next Task 113 Handling the Periodic Tick 114 Wake-up Preemption 115 Handling New Tasks 116 The Real-Time Scheduling Class 117 Properties 118 Data Structures 118 Scheduler Operations 119 Scheduler Enhancements 121 SMP Scheduling 121 Scheduling Domains and Control Groups 126 Kernel Preemption and Low Latency Efforts 127 Summary 132 Chapter 3: Memory Management 133 Overview 133 Organization in the (N)UMA Model 136 Overview 136 Data Structures 138 Page Tables 153 Data Structures 154 Creating and Manipulating Entries 161 Initialization of Memory Management 161 Data Structure Setup 162 Architecture-Specific Setup 169 Memory Management during the Boot Process 191 xii