Contents199ManagementofPhysical Memory199StructureoftheBuddySystem201AvoidingFragmentation209InitializingtheZoneandNodeData Structures215AllocatorAPI222ReservingPages240Freeing Pages244Allocationof DiscontiguousPages intheKernel251KernelMappings256TheSlabAllocator258AlternativeAllocators259MemoryManagementintheKernel261ThePrincipleofSlabAllocation265ImplementationGeneralCaches283285ProcessorCacheandTLBControl287Summary289Chapter4:VirtualProcessMemory289Introduction290VirtualProcessAddressSpace290LayoutoftheProcessAddressSpace294Creating the Layout297PrincipleofMemoryMappings298DataStructuresTrees and Lists299300RepresentationofRegions302ThePrioritySearchTree306OperationsonRegions306AssociatingVirtual AddresseswithaRegion308Merging Regions309InsertingRegions310Creating Regions312AddressSpaces314MemoryMappings314Creating Mappings317RemovingMappings319NonlinearMappings322ReverseMapping323DataStructures324CreatingaReverseMapping325UsingReverseMappingxili
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xiii Contents Management of Physical Memory 199 Structure of the Buddy System 199 Avoiding Fragmentation 201 Initializing the Zone and Node Data Structures 209 Allocator API 215 Reserving Pages 222 Freeing Pages 240 Allocation of Discontiguous Pages in the Kernel 244 Kernel Mappings 251 The Slab Allocator 256 Alternative Allocators 258 Memory Management in the Kernel 259 The Principle of Slab Allocation 261 Implementation 265 General Caches 283 Processor Cache and TLB Control 285 Summary 287 Chapter 4: Virtual Process Memory 289 Introduction 289 Virtual Process Address Space 290 Layout of the Process Address Space 290 Creating the Layout 294 Principle of Memory Mappings 297 Data Structures 298 Trees and Lists 299 Representation of Regions 300 The Priority Search Tree 302 Operations on Regions 306 Associating Virtual Addresses with a Region 306 Merging Regions 308 Inserting Regions 309 Creating Regions 310 Address Spaces 312 Memory Mappings 314 Creating Mappings 314 Removing Mappings 317 Nonlinear Mappings 319 Reverse Mapping 322 Data Structures 323 Creating a Reverse Mapping 324 Using Reverse Mapping 325 xiii
Contents327Managingthe Heap330HandlingofPageFaults336CorrectionofUserspacePageFaults337DemandAllocation/Paging339AnonymousPages340Copyon Write341Getting Nonlinear Mappings341Kernel PageFaults344CopyingDatabetweenKernelandUserspace345Summary347Chapter5:Locking_andInterprocess Communication348Control Mechanisms348RaceConditions349Critical Sections351Kernel LockingMechanisms352AtomicOperationsonIntegers354Spinlocks355Semaphores357TheRead-Copy-Update Mechanism359MemoryandOptimizationBarriers361Reader/WriterLocks361The BigKernel Lock362Mutexes364ApproximatePer-CPUCounters365LockContentionandFine-GrainedLocking366SystemVInterprocessCommunication366SystemVMechanisms367Semaphores376MessageQueues380Shared Memory381OtherIPCMechanisms381SignalsPipes and Sockets389390Summary391Chapter6:DeviceDrivers3911/Architecture392Expansion Hardware397AccesstoDevices397DeviceFiles397Character,Block,and OtherDevicesxiv
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xiv Contents Managing the Heap 327 Handling of Page Faults 330 Correction of Userspace Page Faults 336 Demand Allocation/Paging 337 Anonymous Pages 339 Copy on Write 340 Getting Nonlinear Mappings 341 Kernel Page Faults 341 Copying Data between Kernel and Userspace 344 Summary 345 Chapter 5: Locking and Interprocess Communication 347 Control Mechanisms 348 Race Conditions 348 Critical Sections 349 Kernel Locking Mechanisms 351 Atomic Operations on Integers 352 Spinlocks 354 Semaphores 355 The Read-Copy-Update Mechanism 357 Memory and Optimization Barriers 359 Reader/Writer Locks 361 The Big Kernel Lock 361 Mutexes 362 Approximate Per-CPU Counters 364 Lock Contention and Fine-Grained Locking 365 System V Interprocess Communication 366 System V Mechanisms 366 Semaphores 367 Message Queues 376 Shared Memory 380 Other IPC Mechanisms 381 Signals 381 Pipes and Sockets 389 Summary 390 Chapter 6: Device Drivers 391 I/O Architecture 391 Expansion Hardware 392 Access to Devices 397 Device Files 397 Character, Block, and Other Devices 397 xiv
Contents400DeviceAddressing Using loctls401Representationof MajorandMinorNumbers403Registration406AssociationwiththeFilesystem406DeviceFileElements inInodes407StandardFileOperations407StandardOperationsforCharacterDevices408StandardOperationsforBlockDevices409CharacterDeviceOperations409RepresentingCharacterDevices409OpeningDeviceFiles412Reading and Writing412BlockDeviceOperations413RepresentationofBlockDevices415Data Structures423AddingDisksandPartitionstotheSystem425OpeningBlockDeviceFiles427RequestStructureBIOs430432SubmittingRequests4381/0 Scheduling441Implementationofloctls442ResourceReservation442ResourceManagement1/0 Memory4451/0 Ports446448BusSystems449TheGenericDriverModel454ThePCIBusUSB463471Summary473Chapter Z:Modules473Overview474UsingModules474Adding and Removing477DependenciesQueryingModuleInformation478480Automatic Loading483Inserting and Deleting Modules483ModuleRepresentation488DependenciesandReferencesXV
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xv Contents Device Addressing Using Ioctls 400 Representation of Major and Minor Numbers 401 Registration 403 Association with the Filesystem 406 Device File Elements in Inodes 406 Standard File Operations 407 Standard Operations for Character Devices 407 Standard Operations for Block Devices 408 Character Device Operations 409 Representing Character Devices 409 Opening Device Files 409 Reading and Writing 412 Block Device Operations 412 Representation of Block Devices 413 Data Structures 415 Adding Disks and Partitions to the System 423 Opening Block Device Files 425 Request Structure 427 BIOs 430 Submitting Requests 432 I/O Scheduling 438 Implementation of Ioctls 441 Resource Reservation 442 Resource Management 442 I/O Memory 445 I/O Ports 446 Bus Systems 448 The Generic Driver Model 449 The PCI Bus 454 USB 463 Summary 471 Chapter 7: Modules 473 Overview 473 Using Modules 474 Adding and Removing 474 Dependencies 477 Querying Module Information 478 Automatic Loading 480 Inserting and Deleting Modules 483 Module Representation 483 Dependencies and References 488 xv
Contents491BinaryStructureofModules496Inserting Modules505RemovingModules506Automation andHotplugging507AutomaticLoadingwithkmod508Hotplugging511VersionControl512ChecksumMethods515VersionControlFunctions517Summary519Chapter8:TheVirtualFilesystem520FilesystemTypes521TheCommonFileModelInodes522Links522523Programming Interface524Files as aUniversal Interface525StructureoftheVFS525StructuralOverview527inodes532Process-SpecificInformation537File Operations542DirectoryEntryCache547WorkingwithVFSObjects548FilesystemOperations565FileOperationsStandard Functions572573Generic Read Routine576ThefaultMechanism578Permission-Checking581Summary583Chapter9:TheExtendedFilesystemFamily583Introduction584SecondExtendedFilesystem585Physical Structure592Data Structuresxvi
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xvi Contents Binary Structure of Modules 491 Inserting Modules 496 Removing Modules 505 Automation and Hotplugging 506 Automatic Loading with kmod 507 Hotplugging 508 Version Control 511 Checksum Methods 512 Version Control Functions 515 Summary 517 Chapter 8: The Virtual Filesystem 519 Filesystem Types 520 The Common File Model 521 Inodes 522 Links 522 Programming Interface 523 Files as a Universal Interface 524 Structure of the VFS 525 Structural Overview 525 Inodes 527 Process-Specific Information 532 File Operations 537 Directory Entry Cache 542 Working with VFS Objects 547 Filesystem Operations 548 File Operations 565 Standard Functions 572 Generic Read Routine 573 The fault Mechanism 576 Permission-Checking 578 Summary 581 Chapter 9: The Extended Filesystem Family 583 Introduction 583 Second Extended Filesystem 584 Physical Structure 585 Data Structures 592 xvi
Contents608CreatingaFilesystem610Filesystem Actions637ThirdExtendedFilesystem638ConceptsData Structures639642Summary643Chapter10:FilesystemswithoutPersistentStorage644TheprocFilesystem644Contentsof/proc652Data Structures655Initialization657MountingtheFilesystem660Managing /proc Entries664ReadingandWritingInformation666Task-RelatedInformation671SystemControlMechanism680SimpleFilesystems680SequentialFiles684WritingFilesystemswithLibfs687TheDebugFilesystem689PseudoFilesystemsSysfs689Overview690690DataStructures695MountingtheFilesystem697FileandDirectoryOperations704PopulatingSysfs706Summary707Chapter11:ExtendedAttributesandAccessControlLists707ExtendedAttributes708InterfacetotheVirtualFilesystem714ImplementationinExt3721ImplementationinExt2722AccessControl Lists722GenericImplementation726ImplementationinExt3732ImplementationinExt2732Summaryxvii
Mauerer ftoc.tex V4 - 09/03/2008 11:13pm Page xvii Contents Creating a Filesystem 608 Filesystem Actions 610 Third Extended Filesystem 637 Concepts 638 Data Structures 639 Summary 642 Chapter 10: Filesystems without Persistent Storage 643 The proc Filesystem 644 Contents of /proc 644 Data Structures 652 Initialization 655 Mounting the Filesystem 657 Managing /proc Entries 660 Reading and Writing Information 664 Task-Related Information 666 System Control Mechanism 671 Simple Filesystems 680 Sequential Files 680 Writing Filesystems with Libfs 684 The Debug Filesystem 687 Pseudo Filesystems 689 Sysfs 689 Overview 690 Data Structures 690 Mounting the Filesystem 695 File and Directory Operations 697 Populating Sysfs 704 Summary 706 Chapter 11: Extended Attributes and Access Control Lists 707 Extended Attributes 707 Interface to the Virtual Filesystem 708 Implementation in Ext3 714 Implementation in Ext2 721 Access Control Lists 722 Generic Implementation 722 Implementation in Ext3 726 Implementation in Ext2 732 Summary 732 xvii