D2.6536Page18Friday,January21,200510:32AMvaries between Linux distributions).The mechanism used to deliverkernel messagesisdescribed inChapter4.As you can see,writing a module is not as difficult asyou might expect-atleast,aslong as the module is not required to do anything worthwhile.Thehard part isunderstandingyour deviceand how to maximizeperformance.Wego deeper intomodularization throughout this chapter and leave device-specific issues for laterchapters.KernelModulesVersusApplicationsBeforewegofurther,it's worth underliningthevarious differencesbetweenakernelmodule and an application.While most small and medium-sized applications perform a single task from begin-ning to end, everykernel module just registers itself in order to serve future requests,anditsinitializationfunctionterminatesimmediately.Inotherwords,thetaskofthemodule's initialization function is to prepare for later invocation of the module'sfunctions; it's as though the module were saying, "Here I am, and this is what I cando." The module's exit function (hello_exit in the example) gets invoked just beforethemoduleisunloaded.Itshouldtellthekernel,"I'mnotthereanymore:don'taskme to do anything else." This kind of approach to programming is similar to eventdriven programming,but whilenotall applicationsareevent-driven,each and everykernel module is. Another major difference between event-driven applications andkernel code is in the exit function: whereas an application that terminates can be lazyin releasing resources or avoids clean up altogether, the exit function of a modulemust carefully undo everything the init function built up, or the pieces remainaround until the system is rebooted.Incidentally,the ability tounload amoduleis one of thefeatures of modularizationthat you'll most appreciate, because it helps cut down development time; you cantest successive versions of your new driver without going through the lengthy shut-down/reboot cycle each time.As a programmer, you know that an application can call functions it doesn't define:the linking stage resolves external references using the appropriate library of func-tions.printf is one of those callablefunctions and is defined in libc.A module,on theother hand, is linked only to the kernel, and the only functions it can call are theones exported by thekernel;there are no libraries to link to.Theprintk functionused in hello.c earlier, for example, is the version of printf defined within the kerneland exported to modules. It behaves similarly to the original function, with a fewminor differences, the main one being lack offloating-point support.Figure 2-1 shows how function calls and function pointers are used in a module toaddnewfunctionalitytoarunningkernel.181 Chapter 2: Building and Running Modules
This is the Title of the Book, eMatter Edition Copyright © 2005 O’Reilly & Associates, Inc. All rights reserved. 18 | Chapter 2: Building and Running Modules varies between Linux distributions). The mechanism used to deliver kernel messages is described in Chapter 4. As you can see, writing a module is not as difficult as you might expect—at least, as long as the module is not required to do anything worthwhile. The hard part is understanding your device and how to maximize performance. We go deeper into modularization throughout this chapter and leave device-specific issues for later chapters. Kernel Modules Versus Applications Before we go further, it’s worth underlining the various differences between a kernel module and an application. While most small and medium-sized applications perform a single task from beginning to end, every kernel module just registers itself in order to serve future requests, and its initialization function terminates immediately. In other words, the task of the module’s initialization function is to prepare for later invocation of the module’s functions; it’s as though the module were saying, “Here I am, and this is what I can do.” The module’s exit function (hello_exit in the example)gets invoked just before the module is unloaded. It should tell the kernel, “I’m not there anymore; don’t ask me to do anything else.” This kind of approach to programming is similar to eventdriven programming, but while not all applications are event-driven, each and every kernel module is. Another major difference between event-driven applications and kernel code is in the exit function: whereas an application that terminates can be lazy in releasing resources or avoids clean up altogether, the exit function of a module must carefully undo everything the init function built up, or the pieces remain around until the system is rebooted. Incidentally, the ability to unload a module is one of the features of modularization that you’ll most appreciate, because it helps cut down development time; you can test successive versions of your new driver without going through the lengthy shutdown/reboot cycle each time. As a programmer, you know that an application can call functions it doesn’t define: the linking stage resolves external references using the appropriate library of functions. printf is one of those callable functions and is defined in libc. A module, on the other hand, is linked only to the kernel, and the only functions it can call are the ones exported by the kernel; there are no libraries to link to. The printk function used in hello.c earlier, for example, is the version of printf defined within the kernel and exported to modules. It behaves similarly to the original function, with a few minor differences, the main one being lack of floating-point support. Figure 2-1 shows how function calls and function pointers are used in a module to add new functionality to a running kernel. ,ch02.6536 Page 18 Friday, January 21, 2005 10:32 AM
h02.6536 Page 19 Friday.January 21,200510:32 AMblk_ init_queue(0)insmodinit functioradd diskostructgendiskblock_deviceopsrequestrequest()DataoperationMutiple functionsdel_gendiskodeanupData pointerrmmodSingle functionsfunctionFunction callkclearlupqueuenDataFunctionpointerFigure 2-1.Linkinga module to thekernelBecause no library is linked to modules, source files should never include the usualheader files,<stdarg.h> and very special situations being the only exceptions. Onlyfunctions that are actually part of the kernel itself may be used in kernel modules.Anything related to the kernel is declared in headers found in the kernel source treeyou have set up and configured; most of the relevant headers live in include/linux andincludelasm,but other subdirectories of include have been added to host materialassociatedto specifickernelsubsystems.The role of individual kernel headers is introduced throughout the book as each ofthem is needed.Another important difference betweenkernel programming and application pro-gramming is in how each environment handles faults: whereas a segmentation faultis harmless during application development and a debugger can always be used totrace the error to the problem in the source code, a kernel fault kills the current pro-cess at least,if not the whole system.We see how to tracekernel errors in Chapter 4.UserSpaceandKernelSpaceA module runs in kernel space, whereas applications run in user space. This conceptis at thebase ofoperating systems theory.The role of the operating system, in practice, is to provide programs with a consis-tent view of the computer's hardware. In addition, the operating system mustaccount for independent operation of programs and protection against unauthorizedaccess to resources. This nontrivial task is possible only if the CPU enforces protec-tion of system softwarefromtheapplications.Kermel Modules Versus Applications 19
This is the Title of the Book, eMatter Edition Copyright © 2005 O’Reilly & Associates, Inc. All rights reserved. Kernel Modules Versus Applications | 19 Because no library is linked to modules, source files should never include the usual header files, <stdarg.h> and very special situations being the only exceptions. Only functions that are actually part of the kernel itself may be used in kernel modules. Anything related to the kernel is declared in headers found in the kernel source tree you have set up and configured; most of the relevant headers live in include/linux and include/asm, but other subdirectories of include have been added to host material associated to specific kernel subsystems. The role of individual kernel headers is introduced throughout the book as each of them is needed. Another important difference between kernel programming and application programming is in how each environment handles faults: whereas a segmentation fault is harmless during application development and a debugger can always be used to trace the error to the problem in the source code, a kernel fault kills the current process at least, if not the whole system. We see how to trace kernel errors in Chapter 4. User Space and Kernel Space A module runs in kernel space, whereas applications run in user space. This concept is at the base of operating systems theory. The role of the operating system, in practice, is to provide programs with a consistent view of the computer’s hardware. In addition, the operating system must account for independent operation of programs and protection against unauthorized access to resources. This nontrivial task is possible only if the CPU enforces protection of system software from the applications. Figure 2-1. Linking a module to the kernel insmod init function blk_init_queue() add_disk() request() block_device ops cleanup function rmmod del_gendisk() blk_cleanup_queue() request_queue_ struct gendisk Data operation Data pointer Function call Function pointer Multiple functions Single functions Data ,ch02.6536 Page 19 Friday, January 21, 2005 10:32 AM
D2.6536Page20Friday.January21,200510:32AMEvery modern processor is able to enforce this behavior.The chosen approach is toimplement different operating modalities (or levels) in the CPU itself. The levels havedifferent roles, and some operations are disallowed at the lower levels; program codecan switch from one level to another only through a limited number of gates. Unixsystems are designed to take advantage of this hardware feature, using two such lev-els. All current processors have at least two protection levels, and some, like the x86family, have more levels; when several levels exist, the highest and lowest levels areused. Under Unix, the kernel executes in the highest level (also called supervisormode),where everything isallowed,whereasapplications execute inthelowest level(the so-called user mode), where the processor regulates direct access to hardwareandunauthorizedaccesstomemory.We usually refer to the execution modes as kernel space and user space.These termsencompass not only the different privilege levels inherent in the two modes, but alsothe fact that each mode can have its own memory mappingits own addressspaceas well.Unix transfers execution from user space to kernel space whenever an applicationissues a system call or is suspended by a hardware interrupt. Kernel code executing asystem call is working in the context of a process—it operates on behalf of the call-ing process and is able to access data in the process's address space. Code that han-dles interrupts, on the other hand, is asynchronous with respect to processes and isnot related to any particular process.The role of a module is to extend kernel functionality; modularized code runs in ker-nel space. Usually a driver performs both the tasks outlined previously: some functions in the module are executed as part of system calls, and some are in charge ofinterrupt handling.Concurrency in the KernelOne way in which kernel programming differs greatly from conventional applicationprogramming is the issue of concurrency. Most applications, with the notable excep-tion of multithreading applications, typically run sequentially, from the beginning tothe end, without any need to worry about what else might be happening to changetheir environment.Kernel code does not run in such a simple world, and even thesimplest kernel modules must be written with the idea that many things can be hap-peningat onceThere areafew sources of concurrency inkernel programming.Naturally,Linux sys-tems run multiple processes, more than one of which can be trying to use your driverat the same time. Most devices are capable of interrupting the processor; interrupthandlers run asynchronously and can be invoked at the same time that your driver istrying to do something else. Several software abstractions (such as kernel timers,introduced in Chapter7)run asynchronously as well.Moreover,of course,Linux20Chapter2: Building and Running Modules
This is the Title of the Book, eMatter Edition Copyright © 2005 O’Reilly & Associates, Inc. All rights reserved. 20 | Chapter 2: Building and Running Modules Every modern processor is able to enforce this behavior. The chosen approach is to implement different operating modalities (or levels)in the CPU itself. The levels have different roles, and some operations are disallowed at the lower levels; program code can switch from one level to another only through a limited number of gates. Unix systems are designed to take advantage of this hardware feature, using two such levels. All current processors have at least two protection levels, and some, like the x86 family, have more levels; when several levels exist, the highest and lowest levels are used. Under Unix, the kernel executes in the highest level (also called supervisor mode), where everything is allowed, whereas applications execute in the lowest level (the so-called user mode), where the processor regulates direct access to hardware and unauthorized access to memory. We usually refer to the execution modes as kernel space and user space. These terms encompass not only the different privilege levels inherent in the two modes, but also the fact that each mode can have its own memory mapping—its own address space—as well. Unix transfers execution from user space to kernel space whenever an application issues a system call or is suspended by a hardware interrupt. Kernel code executing a system call is working in the context of a process—it operates on behalf of the calling process and is able to access data in the process’s address space. Code that handles interrupts, on the other hand, is asynchronous with respect to processes and is not related to any particular process. The role of a module is to extend kernel functionality; modularized code runs in kernel space. Usually a driver performs both the tasks outlined previously: some functions in the module are executed as part of system calls, and some are in charge of interrupt handling. Concurrency in the Kernel One way in which kernel programming differs greatly from conventional application programming is the issue of concurrency. Most applications, with the notable exception of multithreading applications, typically run sequentially, from the beginning to the end, without any need to worry about what else might be happening to change their environment. Kernel code does not run in such a simple world, and even the simplest kernel modules must be written with the idea that many things can be happening at once. There are a few sources of concurrency in kernel programming. Naturally, Linux systems run multiple processes, more than one of which can be trying to use your driver at the same time. Most devices are capable of interrupting the processor; interrupt handlers run asynchronously and can be invoked at the same time that your driver is trying to do something else. Several software abstractions (such as kernel timers, introduced in Chapter 7)run asynchronously as well. Moreover, of course, Linux ,ch02.6536 Page 20 Friday, January 21, 2005 10:32 AM
n02.6536 Page 21 Friday,January 21,2005 10:32AMcan run on symmetricmultiprocessor (SMP)systems,with the result thatyourdrivercould be executing concurrently on more than one CPU.Finally,in 2.6,kernel codehas been made preemptible; this change causes even uniprocessor systems to havemany of the same concurrency issues as multiprocessor systems.As a result,Linuxkernel code,including driver code,must be reentrantit mustbecapable of running in more than one context at the same time. Data structures mustbe carefully designed to keep multiplethreads of execution separate,and the codemust take care to access shared data in ways that prevent corruption of the dataWriting code that handles concurrency and avoids race conditions (situations inwhich an unfortunate order of execution causes undesirable behavior) requiresthought and can be tricky. Proper management of concurrency is required to writecorrect kernel code; for that reason, every sample driver in this book has been writ-ten with concurrency in mind. The techniques used are explained as we come tothem; Chapter 5 has also been dedicated to this issue and the kernel primitives avail-ableforconcurrencymanagement.A common mistake made by driver programmers is to assume that concurrency isnot a problem as long as a particular segment of code does not go to sleep (or"block").Even in previous kernels (which were notpreemptive),this assumptionwas not valid on multiprocessor systems. In 2.6, kernel code can (almost) neverassume that it can hold the processor over a given stretch of code. If you do not writeyour code with concurrency in mind, it will be subject to catastrophic failures thatcan be exceedinglydifficulttodebug.TheCurrentProcessAlthough kernel modules don't execute sequentially as applications do, most actionsperformed by the kernel are done on behalf of a specific process. Kernel code canrefer to the current process byaccessing theglobal item current,defined in<asm/current.h>,whichyieldsapointerto struct task struct,defined by<linux/sched.h>The current pointer refers to the process that is currently executing.During the execution of a system call, such as open or read, the current process is the one thatinvoked the call. Kernel code can use process-specific information by using current,if it needs to do so. An example of this technique is presented in Chapter 6.Actually, current is not truly a global variable. The need to support SMP systemsforced thekernel developers todevelopamechanism that finds the currentprocessonthe relevant CPU.This mechanism mustalsobefast, sincereferences to current hap-pen frequently. The result is an architecture-dependent mechanism that, usually,hides a pointer to the task struct structure on the kernel stack.The details of theimplementation remain hidden to otherkernel subsystems though, and a devicedrivercan just include<linux/sched.h>and refer to the currentprocess.Forexample,Kermel Modules Versus Applications 1 21
This is the Title of the Book, eMatter Edition Copyright © 2005 O’Reilly & Associates, Inc. All rights reserved. Kernel Modules Versus Applications | 21 can run on symmetric multiprocessor (SMP)systems, with the result that your driver could be executing concurrently on more than one CPU. Finally, in 2.6, kernel code has been made preemptible; this change causes even uniprocessor systems to have many of the same concurrency issues as multiprocessor systems. As a result, Linux kernel code, including driver code, must be reentrant—it must be capable of running in more than one context at the same time. Data structures must be carefully designed to keep multiple threads of execution separate, and the code must take care to access shared data in ways that prevent corruption of the data. Writing code that handles concurrency and avoids race conditions (situations in which an unfortunate order of execution causes undesirable behavior)requires thought and can be tricky. Proper management of concurrency is required to write correct kernel code; for that reason, every sample driver in this book has been written with concurrency in mind. The techniques used are explained as we come to them; Chapter 5 has also been dedicated to this issue and the kernel primitives available for concurrency management. A common mistake made by driver programmers is to assume that concurrency is not a problem as long as a particular segment of code does not go to sleep (or “block”). Even in previous kernels (which were not preemptive), this assumption was not valid on multiprocessor systems. In 2.6, kernel code can (almost)never assume that it can hold the processor over a given stretch of code. If you do not write your code with concurrency in mind, it will be subject to catastrophic failures that can be exceedingly difficult to debug. The Current Process Although kernel modules don’t execute sequentially as applications do, most actions performed by the kernel are done on behalf of a specific process. Kernel code can refer to the current process by accessing the global item current, defined in <asm/ current.h>, which yields a pointer to struct task_struct, defined by <linux/sched.h>. The current pointer refers to the process that is currently executing. During the execution of a system call, such as open or read, the current process is the one that invoked the call. Kernel code can use process-specific information by using current, if it needs to do so. An example of this technique is presented in Chapter 6. Actually, current is not truly a global variable. The need to support SMP systems forced the kernel developers to develop a mechanism that finds the current process on the relevant CPU. This mechanism must also be fast, since references to current happen frequently. The result is an architecture-dependent mechanism that, usually, hides a pointer to the task_struct structure on the kernel stack. The details of the implementation remain hidden to other kernel subsystems though, and a device driver can just include <linux/sched.h> and refer to the current process. For example, ,ch02.6536 Page 21 Friday, January 21, 2005 10:32 AM
D2.6536Page 22Friday.January21,200510:32AMthe following statement prints the process ID and the command name of the currentprocess by accessing certain fields in struct task_struct:printk(KERNINFO"Theprocessis\"%sI"(pid%i)In",current->comm, current->pid);The command name stored in current->comm is the base name of the program file(trimmed to15 characters if need be)that is being executed by the current processAFewOtherDetailsKernel programming differs from user-space programming in many ways. We'llpoint things out as we get to them over the course of the book, but there are a fewfundamental issues which, while not warranting a section of their own, are worth amention. So, as you dig into the kernel, the following issues should be kept in mind.Applications are laid out in virtual memory with a very large stack area.The stack, ofcourse,is used to hold the function call historyand all automatic variables created bycurrently active functions. The kernel, instead, has a very small stack; it can be assmall as a single, 4096-byte page. Your functions must share that stack with theentire kernel-space call chain. Thus, it is never a good idea to declare large auto-matic variables; if you need larger structures,you should allocate them dynamicallyat call time.Often, as you look at the kernel API, you will encounter function names starting withadouble underscore().Functions somarked aregenerallya low-level componentof the interface and should be used with caution.Essentially,the doubleunderscoresays to the programmer: "if you call this function, be sure you know what you aredoing."Kernel code cannot do floating point arithmetic. Enabling floating point wouldrequire that the kernel save and restore the floating point processor's state on eachentry to,and exit from,kernel space-at least,on somearchitectures.Given thatthere really is no need for floating point in kernel code, the extra overhead is notworthwhile.Compiling and LoadingThe “hello world" example at the beginning of this chapter included a brief demon-stration of building a module and loading it into the system. There is, of course, a lotmoretothat wholeprocessthan wehave seen so far.This section providesmoredetail on howa module author turns source code into an executing subsystem withinthe kernel.22 Chapter 2: Building and Running Modules
This is the Title of the Book, eMatter Edition Copyright © 2005 O’Reilly & Associates, Inc. All rights reserved. 22 | Chapter 2: Building and Running Modules the following statement prints the process ID and the command name of the current process by accessing certain fields in struct task_struct: printk(KERN_INFO "The process is \"%s\" (pid %i)\n", current->comm, current->pid); The command name stored in current->comm is the base name of the program file (trimmed to 15 characters if need be) that is being executed by the current process. A Few Other Details Kernel programming differs from user-space programming in many ways. We’ll point things out as we get to them over the course of the book, but there are a few fundamental issues which, while not warranting a section of their own, are worth a mention. So, as you dig into the kernel, the following issues should be kept in mind. Applications are laid out in virtual memory with a very large stack area. The stack, of course, is used to hold the function call history and all automatic variables created by currently active functions. The kernel, instead, has a very small stack; it can be as small as a single, 4096-byte page. Your functions must share that stack with the entire kernel-space call chain. Thus, it is never a good idea to declare large automatic variables; if you need larger structures, you should allocate them dynamically at call time. Often, as you look at the kernel API, you will encounter function names starting with a double underscore (_). Functions so marked are generally a low-level component of the interface and should be used with caution. Essentially, the double underscore says to the programmer: “If you call this function, be sure you know what you are doing.” Kernel code cannot do floating point arithmetic. Enabling floating point would require that the kernel save and restore the floating point processor’s state on each entry to, and exit from, kernel space—at least, on some architectures. Given that there really is no need for floating point in kernel code, the extra overhead is not worthwhile. Compiling and Loading The “hello world” example at the beginning of this chapter included a brief demonstration of building a module and loading it into the system. There is, of course, a lot more to that whole process than we have seen so far. This section provides more detail on how a module author turns source code into an executing subsystem within the kernel. ,ch02.6536 Page 22 Friday, January 21, 2005 10:32 AM