Linux Kernel Internals
Linux Kernel Internals
Linux Kernel Internals Table of Contents Linux Kernel Internals. 1 Tigran Aivazian tigran@veritas.com. 1Booting.. 2.Process and Interrupt Management 3.Virtual Filesystem (VFS). LBooting 11 Building the Linux Kernel Image L2 Booting Overyiew. L3 Booting:Blos POST 14 Booting:bootsector and setup LILO as a bootloader. dat and code d li and Inter 22 of tas and kemnel threads 、9 nux linked list implementation 20 2 5 Wait Oucues 26 Kemel timer 2 7 Rottom Halve 28 Task Oucues 26 2.9 Tasklets. 27 2 10 Softiras 27 2.11 How System Calls Are Implemented on i386 Architecture2. 27 2.12 Atomic Operations 28 2 13 Spinlocks Read-write Spinlocks and Big-Reader Spinlocks .30 2.14 Semaphores and read/write Semaphores. 32 2.15 Kernel Support for Loading Modules. .33 3.Virtual Filesvstem (VFS).. 3.1 Inode Caches and Interaction with Deache 3.2 Filesystem Registration/Unregistration. 39 3.5 File Descriptor Management.. 1 3.4 Flle Structure Management 3.5 Superblock and Mountpoint Management 3.6 Example virtua Fllesystem:pipefs. ample Disk Filesvste d Binan 1a 52
Table of Contents Linux Kernel Internals.......................................................................................................................................1 Tigran Aivazian tigran@veritas.com.......................................................................................................1 1.Booting..................................................................................................................................................1 2.Process and Interrupt Management.......................................................................................................1 3.Virtual Filesystem (VFS)......................................................................................................................2 1.Booting..................................................................................................................................................2 1.1 Building the Linux Kernel Image......................................................................................................2 1.2 Booting: Overview.............................................................................................................................3 1.3 Booting: BIOS POST.........................................................................................................................3 1.4 Booting: bootsector and setup............................................................................................................4 1.5 Using LILO as a bootloader ..............................................................................................................7 1.6 High level initialisation .....................................................................................................................7 1.7 SMP Bootup on x86...........................................................................................................................9 1.8 Freeing initialisation data and code...................................................................................................9 1.9 Processing kernel command line.....................................................................................................10 2.Process and Interrupt Management.....................................................................................................12 2.1 Task Structure and Process Table....................................................................................................12 2.2 Creation and termination of tasks and kernel threads......................................................................16 2.3 Linux Scheduler...............................................................................................................................18 2.4 Linux linked list implementation.....................................................................................................20 2.5 Wait Queues.....................................................................................................................................22 2.6 Kernel Timers..................................................................................................................................25 2.7 Bottom Halves.................................................................................................................................25 2.8 Task Queues.....................................................................................................................................26 2.9 Tasklets............................................................................................................................................27 2.10 Softirqs...........................................................................................................................................27 2.11 How System Calls Are Implemented on i386 Architecture?.........................................................27 2.12 Atomic Operations.........................................................................................................................28 2.13 Spinlocks, Read−write Spinlocks and Big−Reader Spinlocks......................................................30 2.14 Semaphores and read/write Semaphores.......................................................................................32 2.15 Kernel Support for Loading Modules............................................................................................33 3.Virtual Filesystem (VFS)....................................................................................................................36 3.1 Inode Caches and Interaction with Dcache......................................................................................36 3.2 Filesystem Registration/Unregistration...........................................................................................39 3.3 File Descriptor Management............................................................................................................41 3.4 File Structure Management..............................................................................................................42 3.5 Superblock and Mountpoint Management.......................................................................................45 3.6 Example Virtual Filesystem: pipefs.................................................................................................48 3.7 Example Disk Filesystem: BFS.......................................................................................................50 3.8 Execution Domains and Binary Formats.........................................................................................52 Linux Kernel Internals i
Linux Kernel Internals Tigran Aivazian tigran@veritas.com 22 August 2000 Introduction to the Limx 2.4 kernel.The latest copy of this document can be always downloaded from: hup:/hyww moses uklinus nellpatches/lki.sgml This documentation is free software:you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Sofware Foundation:either version 2 of the License,or (at your option)any later version.The author is working as senior Limx kernel engineer at VERITAS Software Ltd and wrote this book for the purpose of supporting the short training course/lectures he gave on this subject.internally at VERITAS. 1.Booting .1.1 Building the Linux Kerel Image ·L2 Booting:Overview ·L3 Booting:BIOS POST .14 Booting:bootsector and setup .15 Using LILO as a bootloader .16 High level initialisation ·L7 SMP Bootup on x86 .1.8 Freeing initialisation data and code .19 Processing kernel command line 2.Process and Interrupt Management 23 26K es .2.11 How System Calls Are Implemented on i386 Architecture? .2.12 Atomic Operations .2.13 Spinlocks.Read-write Spinlocks and Big-Reader Spinlocks .214 Semaphores and read/write Semaphores Linux Kernel Internals 1
Linux Kernel Internals Tigran Aivazian tigran@veritas.com 22 August 2000 Introduction to the Linux 2.4 kernel. The latest copy of this document can be always downloaded from: http://www.moses.uklinux.net/patches/lki.sgml This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The author is working as senior Linux kernel engineer at VERITAS Software Ltd and wrote this book for the purpose of supporting the short training course/lectures he gave on this subject, internally at VERITAS. 1.Booting • 1.1 Building the Linux Kernel Image • 1.2 Booting: Overview • 1.3 Booting: BIOS POST • 1.4 Booting: bootsector and setup • 1.5 Using LILO as a bootloader • 1.6 High level initialisation • 1.7 SMP Bootup on x86 • 1.8 Freeing initialisation data and code • 1.9 Processing kernel command line 2.Process and Interrupt Management • 2.1 Task Structure and Process Table • 2.2 Creation and termination of tasks and kernel threads • 2.3 Linux Scheduler • 2.4 Linux linked list implementation • 2.5 Wait Queues • 2.6 Kernel Timers • 2.7 Bottom Halves • 2.8 Task Queues • 2.9 Tasklets • 2.10 Softirqs • 2.11 How System Calls Are Implemented on i386 Architecture? • 2.12 Atomic Operations • 2.13 Spinlocks, Read−write Spinlocks and Big−Reader Spinlocks • 2.14 Semaphores and read/write Semaphores Linux Kernel Internals 1
Linux Kernel Internals .2.15 Kernel Support for Loading Modules 3.Virtual Filesystem (VFS) .3 1 Inode Caches and Interaction with dcache .3.2 Filesystem Registration/Unregistration .3.3 File Descriptor Management .3.4 File Structure Management .3.5 Superblock and Mountpoint Management .3.6 Example Virtual Filesystem:pipefs Example Disk Filesyste mains and Binary Formats 1.Booting 1.1 Building the Linux Kernel Image This section explains the steps taken during compilation of the Linux kemel and the output produced at each stage.The build process depends on the architecture so I would like to emphasize that we only consider building a Linux/x86 kernel. When the user types'make zImage'or'make bzImage'the resulting bootable kernel image is stored as arch/i386/boot/zImage or arch/i386/boot/bzImage respectively.Here is how the image is built 1.C and assembl rce files are compiled into ELF relocatable object format(o)and some of them 2.tre gro ally into a mves (a ELF 32 aaohinxwhichisasticalylimked,nonstiped SB 80 o and 8 3 s pro 'nm vmlinux'.irrelevant or uninteresting symbols are grepped out out-D BIG KERNEI whether the t ctively 6 bh sccts bled nd the ed into nary'form called bbootsect(or】 ed 7 Setup code setup S(s video s)i ssed into bsetu s for hzlmage setup.s for e In the s the hootse code the differe nce is marke -D BIG KERNEL esent for bzIms ge The esult is then conve erted into 'raw binary'form called bsetup 8 Enter directory arch/i386/boot/comr ressed and convert /usr/sre/linux/vmlinux to Stmppiggy(tmp filename)in raw binary format,removing.note and.comment ELF sections 9.gzip-9<Stmppiggy>Stmppiggy.gz 10.Link Stmppiggy.gz into ELF relocatable (ld-r)piggy.o 11.Compile compression routines head.S and misc.c(still in arch/i386/boot/compressed directory)into ELF objects head.o and misc.o 3.Virtual Filesystem(VFS) 2
• 2.15 Kernel Support for Loading Modules 3.Virtual Filesystem (VFS) • 3.1 Inode Caches and Interaction with Dcache • 3.2 Filesystem Registration/Unregistration • 3.3 File Descriptor Management • 3.4 File Structure Management • 3.5 Superblock and Mountpoint Management • 3.6 Example Virtual Filesystem: pipefs • 3.7 Example Disk Filesystem: BFS • 3.8 Execution Domains and Binary Formats 1.Booting 1.1 Building the Linux Kernel Image This section explains the steps taken during compilation of the Linux kernel and the output produced at each stage. The build process depends on the architecture so I would like to emphasize that we only consider building a Linux/x86 kernel. When the user types 'make zImage' or 'make bzImage' the resulting bootable kernel image is stored as arch/i386/boot/zImage or arch/i386/boot/bzImage respectively. Here is how the image is built: 1. C and assembly source files are compiled into ELF relocatable object format (.o) and some of them are grouped logically into archives (.a) using ar(1) 2. Using ld(1), the above .o and .a are linked into 'vmlinux' which is a statically linked, non−stripped ELF 32−bit LSB 80386 executable file 3. System.map is produced by 'nm vmlinux', irrelevant or uninteresting symbols are grepped out. 4. Enter directory arch/i386/boot 5. Bootsector asm code bootsect.S is preprocessed either with or without −D__BIG_KERNEL__, depending on whether the target is bzImage or zImage, into bbootsect.s or bootsect.s respectively 6. bbootsect.s is assembled and then converted into 'raw binary' form called bbootsect (or bootsect.s assembled and raw−converted into bootsect for zImage) 7. Setup code setup,S (setup.S includes video.S) is preprocessed into bsetup.s for bzImage or setup.s for zImage. In the same way as the bootsector code, the difference is marked by −D__BIG_KERNEL__ present for bzImage. The result is then converted into 'raw binary' form called bsetup 8. Enter directory arch/i386/boot/compressed and convert /usr/src/linux/vmlinux to $tmppiggy (tmp filename) in raw binary format, removing .note and .comment ELF sections 9. gzip −9 < $tmppiggy > $tmppiggy.gz 10. Link $tmppiggy.gz into ELF relocatable (ld −r) piggy.o 11. Compile compression routines head.S and misc.c (still in arch/i386/boot/compressed directory) into ELF objects head.o and misc.o Linux Kernel Internals 3.Virtual Filesystem (VFS) 2
Linux Kernel Internals 12.Link together head o misc o piggy o into bymlinux (or vmlinux for zImage.don't mistake this for /usr/src/linux/vmlinux!).Note the difference between-Ttext 0x1000 used for vmlinux and-Ttext 0x100000 for bvmlinux,i.e.for bzImage compression loader is high-loaded 13.Convert bvmlinux to'raw binary'bvmlinux.out removing.note and.comment ELF sections 14.Go back to arch/i386/boot directory and using the program tools/build cat together bbootsect+ bsetup+compressed/bvmlinux out into bzImage(delete extra'b'above for zImage).This writes important variables like setup_sects and root_dev at the end of the bootsector. The size of the bootsector is always 512 bytes.The size of the setup must be greater than 4 sectors but is limited above by about 12K-the rule is: 0x4000 bytes>=512 +setup sects *512+room for stack while running bootsector/setup We will see later where this limitation comes from. The upper limi size of the boo keme ge and lo ower bound on the enAoise setup so it is easy to en kernel by a ding some large.spac 1.2 Booting:Overview The boot process details are architectu ecific so we shall foc our attention on the ibm pc/la32 architecture due to old de and backw d co patibility,the PC fin boots the operating system in an 1.BIOS selects the boot device 2.BIOS loads the bootsector from the boot device 3.Bootsector loads setup,decompression routines and compressed kernel image 4.The kernel is uncompressed in protected mode 5.Low-level initialisation performed by asm code 6.High-level C initialisation 1.3 Booting:BIOS POST 1.The power supply starts the clock generator and asserts #POWERGOOD signal on the bus 2.CPU #RESET line is asserted(CPU now in real 8086 mode) 3.% es=%fs=%gs=%ss=0, S.7o0 4.All the checks perfor 5.IVT initialised at address 0 1.2 Booting:Overview 3
12. Link together head.o misc.o piggy.o into bvmlinux (or vmlinux for zImage, don't mistake this for /usr/src/linux/vmlinux!). Note the difference between −Ttext 0x1000 used for vmlinux and −Ttext 0x100000 for bvmlinux, i.e. for bzImage compression loader is high−loaded 13. Convert bvmlinux to 'raw binary' bvmlinux.out removing .note and .comment ELF sections 14. Go back to arch/i386/boot directory and using the program tools/build cat together bbootsect + bsetup + compressed/bvmlinux.out into bzImage (delete extra 'b' above for zImage). This writes important variables like setup_sects and root_dev at the end of the bootsector. The size of the bootsector is always 512 bytes. The size of the setup must be greater than 4 sectors but is limited above by about 12K − the rule is: 0x4000 bytes >= 512 + setup_sects * 512 + room for stack while running bootsector/setup We will see later where this limitation comes from. The upper limit on the bzImage size produced at this step is about 2.5M for booting with LILO and 0xFFFF paragraphs (0xFFFF0 = 1048560 bytes) for booting raw image, e.g. from floppy disk or CD−ROM (El−Torito emulation mode). Note, that tools/build validates the size of the boot sector, of the kernel image and lower bound on the size of setup but not the upper bound of setup so it is easy to build a broken kernel by adding some large ".space" at the end of setup.S. 1.2 Booting: Overview The boot process details are architecture−specific so we shall focus our attention on the IBM PC/IA32 architecture. Due to old design and backward compatibility, the PC firmware boots the operating system in an old−fashioned manner. This process can be separated into the following six logical stages: 1. BIOS selects the boot device 2. BIOS loads the bootsector from the boot device 3. Bootsector loads setup, decompression routines and compressed kernel image 4. The kernel is uncompressed in protected mode 5. Low−level initialisation performed by asm code 6. High−level C initialisation 1.3 Booting: BIOS POST 1. The power supply starts the clock generator and asserts #POWERGOOD signal on the bus 2. CPU #RESET line is asserted (CPU now in real 8086 mode) 3. %ds=%es=%fs=%gs=%ss=0, %cs:%eip = 0xFFFF:0000 (ROM BIOS POST code) 4. All the checks performed by POST with interrupts disabled 5. IVT initialised at address 0 Linux Kernel Internals 1.2 Booting: Overview 3