partition. The final stage of the bootloader loads the compressed kernel image and passes control to it. Thekerneluncompressesitselfandturnsontheignition.Figure2.1.Linuxbootsequenceonx86-basedhardware.PowerOnvBIOSBootloader(GRUB/LILOI..)x86RealModeReal ModeKernelarch/x86/boot/pm.cVProtectedModeKernelx86ProtectedModeTheinitProcessUserProcessesandDaemonsx86-basedprocessorshavetwomodesofoperation,realmodeandprotectedmode.Inrealmode,youcanaccessonlythefirst 1MB of memory,thattoo withoutanyprotection.Protected modeissophisticatedandletsyoutap intomanyadvancedfeaturesoftheprocessorsuchaspaging.TheCPUhastopassthroughrealmodeenroute toprotected mode.This road isa one-way street,however.You can't switchbackto real mode fromprotected mode.The first-level kernel initializations are done in real mode assembly.Subsequent startup is performed inprotected modebythefunction start_kernel()defined in init/main.c,the sourcefile youmodified inthepreviouschapter.start kernel()beginsbyinitializingtheCPUsubsystem.Memoryandprocessmanagementareput inplacesoonafter.Peripheral busesandI/O devicesarestarted next.Asthe last step inthebootsequence, the init program, the parent of all Linux processes, is invoked. Init executes user-space scripts thatstart necessary kernel services. it finally spawns terminals on consoles and displays the login prompt.Each following section header is a message from Figure2.2generated during boot progression on an x86-based
partition. The final stage of the bootloader loads the compressed kernel image and passes control to it. The kernel uncompresses itself and turns on the ignition. Figure 2.1. Linux boot sequence on x86-based hardware. x86-based processors have two modes of operation, real mode and protected mode. In real mode, you can access only the first 1MB of memory, that too without any protection. Protected mode is sophisticated and lets you tap into many advanced features of the processor such as paging. The CPU has to pass through real mode en route to protected mode. This road is a one-way street, however. You can't switch back to real mode from protected mode. The first-level kernel initializations are done in real mode assembly. Subsequent startup is performed in protected mode by the function start_kernel() defined in init/main.c, the source file you modified in the previous chapter. start_kernel() begins by initializing the CPU subsystem. Memory and process management are put in place soon after. Peripheral buses and I/O devices are started next. As the last step in the boot sequence, the init program, the parent of all Linux processes, is invoked. Init executes user-space scripts that start necessary kernel services. It finally spawns terminals on consoles and displays the login prompt. Each following section header is a message from Figure 2.2 generated during boot progression on an x86-based
laptop.The semantics and themessages may change if you are booting thekernel on other architectures.Ifsome explanations in this section sound rather cryptic,don't worry;the intent here is onlyto giveyou a picturefrom1oofeetaboveandtoletyousavorafirsttasteofthekernel'sflavor.Manyconceptsmentionedhereinpassing are covered in depth later on.Figure 2.2.Kernel boot messages.Code View:Linuxversion2.6.23.ly(root@localhost.localdomain)(gccversion4.1.120061011(RedHat4.1.1-30))#7SMPPREEMPTThuNov111:39:30IST2007BIOS-provided physical RAMmap:BI0s-e820:0000000000000000 -000000000009f000 (usable)BI0s-e820:000000000009f000 -00000000000a0000 (reserved)758MB LOWMEM available.Kernel command line: ro root=/dev/hdal.Console:colourVGA+ 80x25Calibratingdelayusingtimerspecificroutine..1197.46BogoMps(lpj=2394935)CPU: Ll I cache: 32K, L1 D cache: 32KCPU:L2 cache:1024KChecking ‘hlt'instOKSetting up standard PcI resourcesNET:Registered protocol family 2IP route cache hash table entries:32768 (order: 5, 131072 bytes)TcP established hash table entries: 131072 (order: 9,2097152 bytes)checking if image is initramfs... it isFreeing initrd memory: 387k freedio scheduler noop registeredio scheduler anticipatory registered (default)00:0a:ttys0atI/o0x3f8(irg=4)isaNs16550AUniform Multi-Platform E-IDE driver Revision: 7.0Oalpha2ide: Assuming 33MHz system bus speed for PIo modes; override with idebus=xxICH4: IDE controller at PCI slot 0000:00:1f.1Probing IDE interface ideo..hda:HTS541010G9ATO0,ATA DISK drivehdc:HL-DT-STCD-RW/DVDDRIVEGCC-4241N,ATAPICD/DVD-ROMdriveserio:i8042 KBD port at 0x60,0x64 irq 1mice:Ps/2 mouse device common for all miceSynaptics Touchpad, model:1, fw:5.9, id:0x2c6abl, caps:0x884793/0x0agpgart: Detected an Intel 855GM ChipsetIntel(R).PRO/1000 Network Driver - version 7.3.20-k2ehci hcd 0000:00:ld.7:EHCI Host Controller
laptop. The semantics and the messages may change if you are booting the kernel on other architectures. If some explanations in this section sound rather cryptic, don't worry; the intent here is only to give you a picture from 100 feet above and to let you savor a first taste of the kernel's flavor. Many concepts mentioned here in passing are covered in depth later on. Figure 2.2. Kernel boot messages. Code View: Linux version 2.6.23.1y (root@localhost.localdomain) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #7 SMP PREEMPT Thu Nov 1 11:39:30 IST 2007 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) . 758MB LOWMEM available. . Kernel command line: ro root=/dev/hda1 . Console: colour VGA+ 80x25 . Calibrating delay using timer specific routine. 1197.46 BogoMIPS (lpj=2394935) . CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 1024K . Checking 'hlt' instruction. OK. . Setting up standard PCI resources . NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 9, 2097152 bytes) . checking if image is initramfs. it is Freeing initrd memory: 387k freed . io scheduler noop registered io scheduler anticipatory registered (default) . 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a NS16550A . Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH4: IDE controller at PCI slot 0000:00:1f.1 Probing IDE interface ide0. hda: HTS541010G9AT00, ATA DISK drive hdc: HL-DT-STCD-RW/DVD DRIVE GCC-4241N, ATAPI CD/DVD-ROM drive . serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice . Synaptics Touchpad, model: 1, fw: 5.9, id: 0x2c6ab1, caps: 0x884793/0x0 . agpgart: Detected an Intel 855GM Chipset. . Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 . ehci_hcd 0000:00:1d.7: EHCI Host Controller
Yenta:CardBusbridgefoundat0000:02:00.0[1014:0560]Non-volatile memory driver vl.2kjournald starting.Commit interval 5 secondsExT3 Fs on hda2, internal journalExT3-fs:mounted filesystem with ordered data mode.INIT:version 2.85 bootingBIOS-ProvidedPhysicalRAMMapThekernel assembles the system memory map from the BIos,and this is one of the first boot messagesyouwill see:BIOS-provided physical RAMmap:BI0S-e820:0000000000000000000000000009f000(usable)BI0S-e820:00000000ff8000000000000100000000(reserved)Realmode initializationcodeusestheBIOSint0x15servicewithfunctionnumber0xe82o(hencethestringBIos-e82ointheprecedingmessage)toobtainthesystemmemorymap.Thememorymapindicatesreservedand usablememoryranges,whichissubsequentlyusedbythekernel to createitsfreememorypool.Wediscuss more on the BIOS-supplied memory map in the section"Real Mode Calls"in Appendix B,"Linux and theBIOS."758MBLOWMEMAvailableThe normally addressable kernel memory region (<896MB) is called low memory.The kernel memory allocator,kma1loc (),returns memory fromthis region.Memorybeyond 896MB (calledhigh memory)can beaccessedonly using specialmappings.During boot, the kernel calculates and displays the total pages present in these memory zones. We take adeeperlook at memoryzoneslaterin this chapter.KernelCommandLine:roroot=/dev/hda1Linux bootloaders usually pass a command line to the kernel.Arguments in the command line are similar to theargv[l list passed to themain()function in C programs,except that theyarepassed tothekernel instead.Youmayadd command-lineargumentstothebootloaderconfigurationfileorsupplythemfromthebootloaderprompt at runtime.[1] If you are using the GRUB bootloader, the configuration file is either /boot/grub/grub.confor/boot/grub/menu./stdependingonyourdistribution.If youareaLILouser,theconfigurationfileis/etc/lilo.conf. An example grub.conf file (with comments added) is listed here.You can figure out the genesis ofthe preceding boot message if you look at the line following title kernel 2.6.23:[1] Bootloaders on embedded devices are usually "slim" and do not support configuration files or equivalent mechanisms. Because of thismany non-x86 architectures support a kernel configuration option called coNFIG_cMDLINe that you can use to supply the kernel command lineat build time.default o#Bootthe 2.6.23 kernel bydefault
Yenta: CardBus bridge found at 0000:02:00.0 [1014:0560] . Non-volatile memory driver v1.2 . kjournald starting. Commit interval 5 seconds EXT3 FS on hda2, internal journal EXT3-fs: mounted filesystem with ordered data mode. . INIT: version 2.85 booting . BIOS-Provided Physical RAM Map The kernel assembles the system memory map from the BIOS, and this is one of the first boot messages you will see: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) . BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved) Real mode initialization code uses the BIOS int 0x15 service with function number 0xe820(hence the string BIOS-e820 in the preceding message) to obtain the system memory map. The memory map indicates reserved and usable memory ranges, which is subsequently used by the kernel to create its free memory pool. We discuss more on the BIOS-supplied memory map in the section "Real Mode Calls" in Appendix B, "Linux and the BIOS." 758MB LOWMEM Available The normally addressable kernel memory region (< 896MB) is called low memory. The kernel memory allocator, kmalloc(), returns memory from this region. Memory beyond 896MB (called high memory) can be accessed only using special mappings. During boot, the kernel calculates and displays the total pages present in these memory zones. We take a deeper look at memory zones later in this chapter. Kernel Command Line: ro root=/dev/hda1 Linux bootloaders usually pass a command line to the kernel. Arguments in the command line are similar to the argv[] list passed to the main() function in C programs, except that they are passed to the kernel instead. You may add command-line arguments to the bootloader configuration file or supply them from the bootloader prompt at runtime.[1] If you are using the GRUB bootloader, the configuration file is either /boot/grub/grub.conf or /boot/grub/menu.lst depending on your distribution. If you are a LILO user, the configuration file is /etc/lilo.conf. An example grub.conf file (with comments added) is listed here. You can figure out the genesis of the preceding boot message if you look at the line following title kernel 2.6.23: [1] Bootloaders on embedded devices are usually "slim" and do not support configuration files or equivalent mechanisms. Because of this, many non-x86 architectures support a kernel configuration option called CONFIG_CMDLINE that you can use to supply the kernel command line at build time. default 0 #Boot the 2.6.23 kernel by default
timeout 5 #5 second to alter boot order or parameterstitle kernel 2.6.23#Boot Option 1#The boot image resides in the first partition of the first disk#underthe/boot/directoryandisnamedvmlinuz-2.6.23.ro!#indicates that the root partition should be mounted read-only.kernel (hd0,0)/boot/vmlinuz-2.6.23 roroot=/dev/hdal#Look under section "Freeing initrd memory:387k freed"initrd(hdo,0)/boot/initrd#...Command-lineargumentsaffect thecodepath traversed during boot.As a simple example, assume that thecommand-line argument of interest is called bootmode. If this parameter is set to 1, you would like to printsomedebug messages duringbootand switchto a runlevel of 3at theend of theboot.(Wait until thebootmessages are printed out by the init process to learn the semantics of runlevels.)If bootmode is instead set toO,youwould preferthe bootto be relatively laconic,and therunlevel set to 2.Becauseyouare alreadyfamiliarwithinit/main.c,let's add thefollowing modification to it:Code View:static unsigned int bootmode = l;static intinitis_bootmode_setup(char *str)-get_option(&str,&bootmode);return l;+/* Handle parameter "bootmode="*/_setup("bootmode-",is_bootmode_setup);if(bootmode)/+Printverbose output*//**H/*...*//*:If bootmode is l,choose an init runlevel of 3, elseswitchtoa run levelof 2*/if(bootmode)(argv_init[++args] ="3";Felse argv_init[++args] "2";1/**/...Rebuild thekernel asyou did earlierand try out the change.Wediscuss moreabout kernel command-linearguments in the section"Memory Layout"in Chapter 18,"Embedding Linux
timeout 5 #5 second to alter boot order or parameters title kernel 2.6.23 #Boot Option 1 #The boot image resides in the first partition of the first disk #under the /boot/ directory and is named vmlinuz-2.6.23. 'ro' #indicates that the root partition should be mounted read-only. kernel (hd0,0)/boot/vmlinuz-2.6.23 ro root=/dev/hda1 #Look under section "Freeing initrd memory:387k freed" initrd (hd0,0)/boot/initrd #. Command-line arguments affect the code path traversed during boot. As a simple example, assume that the command-line argument of interest is called bootmode. If this parameter is set to 1, you would like to print some debug messages during boot and switch to a runlevel of 3 at the end of the boot. (Wait until the boot messages are printed out by the init process to learn the semantics of runlevels.) If bootmode is instead set to 0, you would prefer the boot to be relatively laconic, and the runlevel set to 2. Because you are already familiar with init/main.c, let's add the following modification to it: Code View: static unsigned int bootmode = 1; static int _init is_bootmode_setup(char *str) { get_option(&str, &bootmode); return 1; } /* Handle parameter "bootmode=" */ _setup("bootmode=", is_bootmode_setup); if (bootmode) { /* Print verbose output */ /* . */ } /* . */ /* If bootmode is 1, choose an init runlevel of 3, else switch to a run level of 2 */ if (bootmode) { argv_init[++args] = "3"; } else { argv_init[++args] = "2"; } /* . */ Rebuild the kernel as you did earlier and try out the change. We discuss more about kernel command-line arguments in the section "Memory Layout" in Chapter 18, "Embedding Linux
CalibratingDelay...1197.46BogoMIPS(lpj=2394935)During boot,thekernel calculates the number of times theprocessor canexecute an internaldelayloop in onejiffy,which is the time interval between two consecutive ticks of the system timer.As you would expect, thecalculation hasto becalibrated totheprocessingspeedofyourCPU.Theresultofthiscalibration isstored inakernel variable called loops_per_jiffy. One place where the kernel makes use of 1oops_per_jiffy is when adevice driverdesires to delay execution for small durations in the order of microseconds.To understand the delay-loop calibration code, let's take a peek inside calibrate_delay (),defined ininit/calibrate.c.Thisfunction cleverlyderivesfloating-pointprecisionusing theintegerkernel.Thefollowingsnippet (with some comments added)shows the initial portion of the function that carves out a coarse valueforloops_per_jiffy:loops_per_jiffy=(1<<12);/Initialapproximation=4096*/printk(KERN DEBUG "Calibrating delay loop...");while((loops_per_jiffy<<=1)!=0)ticks = jiffies;/* As you will find out in the section, "KernelTimers," the jiffies variable contains thenumber of timer ticks sincethe kernelstarted, and is incremented in the timerinterrupt handler */while (ticks == jiffies); /* Wait until the startof the next jiffy */ticks = jiffiesi/+ Delay */_delay(loops_per_jiffy);/+ Did the wait outlast the current jiffy? Continue ifit didn't */ticks = jiffies-ticks;if (ticks)break;1loops_per_jiffy>>==l; /+ This fixes the most significant bit and is.the lower-bound of loops_per_jiffy */Theprecedingcodebeginsbyassumingthatloopsperjiffyisgreaterthan4og6,whichtranslatestoaprocessorspeed of roughlyonemillion instructionspersecond(MiPS).Itthenwaitsforafresh jiffytostartandexecutesthedelayloop,delay(loops per jiffy).Ifthedelayloop outlaststhe jiffy,theprevious value ofloops_per_jiffy(obtainedbybitwiseright-shifting itbyone)fixes itsmostsignificantbit (MSB).Otherwise,the function continues by checking whether it will obtain the MSB by bitwiseleft-shifting loops_per_jiffy.When thekernel thus figures out the MSB of loops_per_jiffy,it works on the lower-order bits and fine-tunesitsprecisionas follows:loopbit=loops_per_jiffyi/+ Gradually work on the lower-order bits */while(lps precision--&&(loopbit >>=1)){loops_per_jiffy[=loopbit;ticks = jiffies;while (ticks == jiffies) /* Wait until the startof the next jiffy */ticks -jiffies;
Calibrating Delay.1197.46 BogoMIPS (lpj=2394935) During boot, the kernel calculates the number of times the processor can execute an internal delay loop in one jiffy, which is the time interval between two consecutive ticks of the system timer. As you would expect, the calculation has to be calibrated to the processing speed of your CPU. The result of this calibration is stored in a kernel variable called loops_per_jiffy. One place where the kernel makes use of loops_per_jiffy is when a device driver desires to delay execution for small durations in the order of microseconds. To understand the delay-loop calibration code, let's take a peek inside calibrate_delay(), defined in init/calibrate.c. This function cleverly derives floating-point precision using the integer kernel. The following snippet (with some comments added) shows the initial portion of the function that carves out a coarse value for loops_per_jiffy: loops_per_jiffy = (1 << 12); /* Initial approximation = 4096 */ printk(KERN_DEBUG "Calibrating delay loop. "); while ((loops_per_jiffy <<= 1) != 0) { ticks = jiffies; /* As you will find out in the section, "Kernel Timers," the jiffies variable contains the number of timer ticks since the kernel started, and is incremented in the timer interrupt handler */ while (ticks == jiffies); /* Wait until the start of the next jiffy */ ticks = jiffies; /* Delay */ _delay(loops_per_jiffy); /* Did the wait outlast the current jiffy? Continue if it didn't */ ticks = jiffies - ticks; if (ticks) break; } loops_per_jiffy >>= 1; /* This fixes the most significant bit and is the lower-bound of loops_per_jiffy */ The preceding code begins by assuming that loops_per_jiffy is greater than 4096, which translates to a processor speed of roughly one million instructions per second (MIPS). It then waits for a fresh jiffy to start and executes the delay loop, _delay(loops_per_jiffy). If the delay loop outlasts the jiffy, the previous value of loops_per_jiffy (obtained by bitwise right-shifting it by one) fixes its most significant bit (MSB). Otherwise, the function continues by checking whether it will obtain the MSB by bitwise left-shifting loops_per_jiffy. When the kernel thus figures out the MSB of loops_per_jiffy, it works on the lower-order bits and fine-tunes its precision as follows: loopbit = loops_per_jiffy; /* Gradually work on the lower-order bits */ while (lps_precision- && (loopbit >>= 1)) { loops_per_jiffy |= loopbit; ticks = jiffies; while (ticks == jiffies); /* Wait until the start of the next jiffy */ ticks = jiffies;