xxvi Preface New to This Edition The first edition of this book was published with a copyright of 2003,while the second had a copyright of 2011.Considering the rapid evolution of computer technology,the book content has held up surprisingly well.Intel x86 machines running C programs under Linux(and related operating systems)has proved to be a combination that continues to encompass many systems today.However, changes in hardware technology,compilers,program library interfaces,and the experience of many instructors teaching the material have prompted a substantial revision The biggest overall change from the second edition is that we have switched our presentation from one based on a mix of IA32 and x86-64 to one based exclusively on x86-64.This shift in focus affected the contents of many of the chapters.Here is a summary of the significant changes. Chapter 1:A Tour of Computer Systems We have moved the discussion of Am- dahl's Law from Chapter 5 into this chapter. Chapter 2:Representing and Manipulating Information.A consistent bit of feed- back from readers and reviewers is that some of the material in this chapter can be a bit overwhelming.So we have tried to make the material more ac- cessible by clarifying the points at which we delve into a more mathematical style of presentation.This enables readers to first skim over mathematical details to get a high-level overview and then return for a more thorough reading Chapter 3:Machine-Level Representation of Programs.We have converted from the earlier presentation based on a mix of IA32 and x86-64 to one based entirely on x86-64.We have also updated for the style of code generated by more recent versions of GcC.The result is a substantial rewriting,including changing the order in which some of the concepts are presented.We also have included,for the first time,a presentation of the machine-level support for programs operating on floating-point data.We have created a Web aside describing IA32 machine code for legacy reasons. Chapter 4:Processor Architecture.We have revised the earlier processor design, based on a 32-bit architecture,to one that supports 64-bit words and oper- ations. Chapter 5:Optimizing Program Performance.We have updated the material to reflect the performance capabilities of recent generations of x86-64 proces- sors.With the introduction of more functional units and more sophisticated control logic,the model of program performance we developed based on a data-flow representation of programs has become a more 'reliable predictor of performance than it was before. Chapter 6:The Memory Hierarchy.We have updated the material to reflect more recent technology
II 1 'i xxvi Preface New to This Edition Th~ first edition of this book was published with a copyright of 2003, while the second had a copyright of 2011. Considering the rapid evolution of computer technology, the book content has held up surprisingly well. Intel x86 machines running C programs upder Linux (and related operating systems) has P,roved to be a combination that continues to encompass many systems today. However, changes in hardware technology, compilers, program library interfaces, and the experience of many instructors teacl).ing the material have prompted a substantial revision. The biggest overall change from the second edition is that we have switched our presentation from one based on a mix of IA32 and x86-64 to one based exclusively on x86-64. This shift in focus affected the contents of many of the chapters. Here is a summary of the significant changes. Chapter 1: A Tour of Computer Systems We have moved the discussion of Amdahl's Law from Chapter 5 into this chapter. Chapter 2: Representing an/i Manipulating Information. A consistent bit of feedback from readers and reviewers is that some of the material in this chapter can be a bit overwhelming. So we have tried to make.the material more accessible by clarifying the points at which we delve into a more mathematical style of presentation. This enables readers to first skim over mathematical details to get a high-level overview and then return for a more thorough reading. Chapter 3: Machine-Level Representation of Programs. We have converted from the earlier presentation based on a mix of IA32 and x86-64 to one based entirely on x86-64. We have also updated for the style of code generated by more recent versions of Gee. The result is a substantial rewriting, including changing the order in which some of the concepts are presented. We also have included, for the first time, a presen\ation of the machine-level support for programs operating on floating-point data. We have created a Web aside describing IA32 machine code for legacy reasons. Chapter 4: Processor Architecture. We have revised the earlier processor design, based on a 32-bit architecture, to one that supports 64-bit words and operations. Chapter 5: Optimizing Program Performance. We have updated the material to reflect the performance capabilities of recent generations of x86-64 processors. With the introduction of more functional units and more sophisticated control logic, the model of program performance we developed based on a data-flow representation of programs has become a more 'reliable predictor of performance than it was before. Chapter 6: The Memory Hierarchy. We have updated the material to refle.ct more recent technology. -
Preface xxvii Chapter 7:Linking.We have rewritten this chapter,for x86-64,expanded the discussion of using the GOT and PLT to create position-independent code, and added a new section on a powerful linking technique known as library interpositioning. ChpterExeptional Control Flow.We have added a more rigorous treatment of sinahandrs including asyncsi functions specific guidis for writing,signal handlers,and ising sigsupend to wait for handlers. Chapter 9:Virtual Memory.This chapter has changed only slightly. Chapter 10.SyemLevel I/We have added a new section on files and the file hierarchy,but otherwise,this chapter has changed only slightly. Chapter 11:Network Programming.We have introduced techniques for protocol- independent and thread-safe network programming using the modern getaddrinfo and getnameinfo functions,which replace the obsolete and non-reentrant gethostbyname and gethostbyaddr functions. Chapter 12:Concurrent Programming.We have increased our coverage of using thread-level parallelism to make programs run faster on multi-core ma- chines. In addition,we have added and revised a number of practice and homework problems throughout the text. Origins of the Book This book stems from an introductory course that we developed at Carnegie Mel- lon University in the fall of 1998,called 15-213:Introduction to Computer Systems (ICS)[14].The ICS course has'been taught every semester since then.Over 400 students take the course each semester.The students range from sophomores to graduate students in a wide variety of majors.It is a required core course for all undergraduates in the CS and ECE departments at Carnegie Mellon,and it has become a prerequisite for most upper-level systems courses in CS and ECE. The idea with ICS was to introduce students to computers in a different way. Few of our studerits would have the opportunity to build a computer system.On the other hand,most students,including all computer scientists and computer engineers,Would be required to use and program computers on a daily basis.So we decided to teach about systems from the point of view of the programmer,using the following filter:we would cover a topic only if it affected the performance, correctness,or.utility of user-level C programs. For example,topics such as hardware adder and bus designs were out.Top- ics such as machine language were in;but instead of focusing on how to write assembly language by hand,we would look at how a C compiler translates C con- structs into.machine code,including pointers,loops,procedure calls,and switch statements.Further,we would take a broader and more holistic view of the system as both hardware and systems software,covering such topics as linking,loading
Chapter 7: Linking. We have rewritten this chapter. for x86-64, expanded the discussion of using the GOT and PLT to create position-independent code, ansl.added a new section on a powerful linking techniqiie known as library interpositioning. Chppt~r 8.!'Exr~ptio'!'al Control Flow .. We have addpd a more rigorous treatment ,,1 of sig~a.l',hand~~rs, including as~c-sig9al-sa\f functions'. specific guidelines {or wntm&,sll!J!~l ):landlers, and usmg sigsuspend to wait for handlers. Chapter 9: Virtual Memory. This chapter has changed only slightly. <;f'apter,10: Sy-;iem-Level 110. We hav,e added 'I new section on files and the file (~ ~ } ' U I hierarchy, but 9therwise, this chapter has changed only slightly. Chapter JI: Network Programming. We have introduced techniques for protocolindependent and thread-safe network programming using the modern getaddrinfo and getnameinfo functions, which replace the obsolete and non-reentrant gethostbyname and gethostbyaddr functions. Chapter 12: Concurrent Programming. We have increas,ed our coverage of using thread-level parallelism to make programs run faster on multi-core machines. In addition, we have added and revised a number of practice and homework problems throughout the text. Origins of the ~ook This book stems from an introductory course that we developed at Carnegie Mellon University in the fall of1998, called 15-213: Introduction to Computer Systems (rCS) [14]. The res cburse has' been taught every semester since then. Over 400 students.take the; course each semester. Jbe students range from sophomores to gra<;luate ~tudents ip a wide variety of majors. It is a required core course for all undergradu~t<;s in the C~ and ECE departments at Carnegie Mellon, and it has become a prerequisite for most upper-level systems courses in CS and ECE. The idea with res was to introduce students to computers in a different way. Few of our students would have the opportunity to build a computer system. On the other frand, most students, inchtding all computer scientists and computer engineers, w'ould be required to use arid program computers on a daily basis. So we decided to teach about systems from the point of view of the programmer, using the following filter: we would cover a iOpic only if it affected the performance, correctness, or. utility of user-level C programs. For example, topics such as hardwate adder and bus designs were out. Topics such as machine language were in; but instead of focusing on how to write assembly language by hand, we would look at how a C compiler translates C constructs into.machine code, includipg pointers, loops, procedure calls, and switch statements. Further, we wo'uld take a broader and more holistic view of the system as both hardware and systems software, covering such topics as linking, loading, Preface xxvii
xxviii Preface processes,signals,performance optimization,virtual memory,I/O,and network and concurrent programming. This approach allowed us to teach the ICS course in a way that is practical, concrete,hands-on,and exciting for the students.The response from our students and faculty colleagues was immediate and overwhelmingly positive,and we real- ized that others outside of CMU might benefit from using our approach.Hence this book,which we developed from the ICS lecture notes,and which we have now revised to reflect changes in technology and in how computer systems are implemented. Via the multiple editions and multiple translations of this book,ICS and many variants have become part of the computer science.and computer engineering curricula at hundreds of colleges and universities worldwide For Instructors:Courses Based on the Book Instructors can use the CS:APP book to teach a number of different types of systems courses.Five categories of these courses are illustrated in Figure 2.The particular course depends on curriculum requirements,personal taste,and the backgrounds and abilities of the students.From left to right in the figure, the courses are characterized by an increasing emphasis on the programmer's perspective of a system.Here is a brief description. ORG.A computer organization course with traditional topics covered in an un- traditional style.Traditional topics such as logic design,processor architec- ture,assembly language,and memory systems are covered.However,there is more emphasis on the impact for the programmer.For example,data rep- resentations are related back to the data types and operations of Cprograms, and the presentation on assembly code is based on machine code generated by a C compiler rather than handwritten assembly code. ORG+.The ORG course with additional emphasis on the impact of hardware on the performance of application programs.Compared to ORG,students learn more about code optimization and about improving the memory per- formance of their C programs. ICS.The baseline ICS course,designed to produce enlightened programmers who understand the impact of the hardware,operating system,and compilation system on the performance and correctness of their application programs. A significant difference from ORG+is that low-level processor architecture is not covered.Instead,programmers work with a higher-level model of a modern out-of-order processor.The ICS course fits nicely into a 10-week quarter,and can also be stretched to a 15-week semester if covered at a more leisurely pace. ICS+.The baseline ICS course with additional coverage of systems programming topics such as system-level I/O,network programming,and concurrent pro- gramming.This is the semester-long Carnegie Mellon course,which covers every chapter in CS:APP except low-level processor architecture
11 '• '· xxviii Preface processes, signals, performance optimization, virtual memory, I/O, and network and concurrent programming. This approach allowed us to teach the res course in a way that is practical, concrete, hands-on, and exciting for the students. The response from our students and faculty colleagues was immediate and overwhelmingly positive, and we realized that others outside of CMU might benefit from using our approach. Hence this book, which we developed from the res lecture notes, and which we have now revised to reflect changes in technology and in how computer systems are implemented. Via the multiple editions and multiple translations of this book, ICS and many variants have become part of the computer science .and computer engineering curricula at hundreds of colleges and universities worldwide. For Instructors: Courses Based on the Book Instructors can use the CS:APP book to teach a number of different types of systems courses. Five categories of these courses are illustrated in Figure 2. The particular course depends on curriculum requirements, personal taste, and the backgrounds and abilities of the students. From left to right in the figure, the courses are characterized by an increasing emphasis on the programmer's perspective of a system. Here is a brief description. ORG. A computer organization course with traditional topics covered in an untraditional style. Traditional topics such as logic design, processor architecture, assembly language, and memory systems are covered. However, there is more emphasis on the impact for the programmer. For example, data representations are related back to the data types and operations of C programs, and the presentation on assembly code is based on machine code generated by a C compiler rather than handwritten assembly code. ORG+. The ORG course with additional emphasis on the impact of hardware on the performance of application programs. Compared to ORG, students learn more about code optimization and about improving the memory performance of their C programs. ICS. The baseline JCS course, designed to produce enlightene9 programmers who understand the impact of the hardware, operating system, and compilation system on the performance and correctness of their application programs. A significant difference from ORG+ is that low-level processor architecture is not covered. Instead, programmers work with a higher-level model of a modern out-of-order processor. The JCS course fits nicely into a 10-week quarter, and can also be stretched to a 15-week semester if covered at a more leisurely pace. JCS+. The baseline ICS course with additional coverage of systems programming topics such as system-level 1/0, network programming, and concurrent programming. This is the semester-long Carnegie Mellon course, which covers every chapter in CS:APP except low-level processor architecture
Preface xxix Course Chapter Topic ORG QRG+ ICS ICS+ SP 1 Tour of systems 2 Data representation ⊙ 3 Machine language, ●4 4 Processor architecture 5 Code optimization 6 Memory hierarchy ⊙(a) ⊙ 7 Linking ⊙(g ⊙e 8 Exceptional control.flow. 9 Virtual memory ⊙6, 10 System-level I/O ● 11 Network programming 12 Concurrent programming Figure 2 Five systems courses based on the CS:APP book.ICS+'is the 15-213 course from Carnegie Mellon.Notes:The symbol denotes partial coverage of a chapter,as follows:(a)hardware only;(b)no dynamic storage allocation;(c)no dynamic'linking; (d)no floating point. SP.A systems programming course.This course is'similar to ICS+,but it'drops floating point and performance 'optimization,and it places more empha- sis on systems programming,including process control,dynamic linking, system-level I/O,network programming,and concurrent programming.In- structors might,want to supplement from other sources for advanced topics such as daemons,;terminal control,and Unix IPC. The main message of Figure 2 is that the CS:APP'book gives a lot of options to students and instructors.If you want your students to be exposed to lower- level processor architecture,then that option is available via the ORG and ORG+ courses.On the other hand,if you want to switch from your current computer organization course to an ICS or ICS+course,but are wary of making such a drastic change all at once,then you,can move toward ICS incrementally.You can start with ORG,which teaches the traditional topics in a nontraditional way. Once you are comfortable with that material,then you can move to ORG+, and eventually to'ICS:If students have no experience in C(e.g.,they have only programmed in Java),.you could spend several weeks on C and then cover the material of ORG or ICS. Finally,we note that the ORG+anid SP courses would make a nice two-term sequence (either quarters or semesters).Or you might consider offering ICS+as one term of ICS and one term of SP
!=hapter ·1 2 3 4 5 10 11 12 Topic ~Tour of systems ' Data repres'en'tation Machine language, Processor architecttlre Code optimization 11 1 T Memo)"Y hierarchy ' j .I Linking I!, , Exception\tl ~pntrol.flow Virtual memory System-level r/O Network .Programming Concurrent programming . ' ORQ • " • • • 0<•) Q,RG+ • ·• • • • • • • res • .; • • • 0(c) • • res+ SP • •• • 0(d) • • • • 0<•> 0(c) • • • • • • • • • • • Figure 2 FiV'e systems ~ourses based on the CS:APP book. ICS+'is the 15-213 course from Carnegie Mellon'. Notes: The 0 symbol denotes ·partial coverage of a chapter, as follows: (a) hardware only; (bl no dynamic storage ~llocaHon; (c) no dynamic'linking; (d) no floating poini. " ' 'I SP. A sys\ems pro'~tariuhing course. Th~s course is' similar to JCS+, 6ut if drops floating point "and performance 'optimiZati:on, 'and it places more emphasis on systems 'PJOgramJlling, iµclm~ing proces§ ,control, dynamic linking, system-ley<;>) 1/0, qe~work; Jlf9gramming, qnd con01rrent pr9gramming. Instructqr~ mjght,w,ant tq supply,n;ient fr,9111 othyr sources for advanced topics ,such as dayl1JO!}~,;terprinal ~ontr9l, anq U,Q,qJPC. The main mes~ag;'ofFigu)'~ 2 is that t)l~ CS:APP'boqj< giyrs a lot 9f options to students anil instructors. If you wai:it your students to be exposed to lo.werlevel processor architecture, then that option is aval!able via the ORG and ORG+ courses. On the other hand, if you want to switch from your current computer organization course !lo an JCS or JCS+ course, but are wary of making such a drastic change all at once, then you •can move toward JCS incrementally. You can start with ORG,.which teaches the traditional topics in a nontraditional way. Once you are comfortable with that material, then you can move to ORG+, and eventually to·tcs: If students have no experience iii C (e.g., they have only pro~rJ!mme~Iin Java),.you could spend several weeks on C and then cover the matetial of ORG or res. Finally, we note that the ORG+ aiid SP courses would make a nice two-term sequence (either quarters or semesters). Or you might consider offering res+ as one term of res and one term of SP. Preface xxix
xxx Preface For Instructors:Classroom-Tested Laboratory Exercises The ICS+course at Carnegie Mellon receives very high evaluations from students. Median scores of 5.0/5.0 and means of 4.6/5.0 are typical for the student course evaluations.Students cite the fun,exciting,and relevant laboratory exercises as the primary reason.The labs are available from the CS:APP Web page.Here are examples of the labs that are provided with the book. Data Lab.This lab requires students to implement simple logical and arithmetic functions,but using a highly restricted subset of C.For example,they must compute the absolute value of a number using only bit-level operations.This lab helps students understand the bit-level representations of C data types and the bit-level behavior of the operations on data. Binary Bomb Lab.A binary bomb is a program provided to students as an object- code file.When run,it prompts the user to type in six different strings.If any of these are incorrect,the bomb"explodes,"printing an error message and logging the event on a grading server.Students must "defuse"their own unique bombs by disassembling and reverse engineering the programs to determine what the six strings should be.The lab teaches students to understand assembly language and also forces them to learn how to use a debugger. Buffer Overflow Lab.Students are required to modify the run-time behavior of a binary executable by exploiting a buffer overflow vulnerability.This lab teaches the students about the stack discipline and about the danger of writing code that is vulnerable to buffer overflow attacks. Architecture Lab.Several of the homework problems of Chapter 4 can be com- bined into a lab assignment,where students modify the HCL description of a processor to add new instructions,change the branch prediction policy,or add or remove bypassing paths and register ports.The resulting processors can be simulated and run through automated tests that will detect most of the possible bugs.This lab lets students experience the exciting parts of pro- cessor design without requiring a complete background in logic design and hardware description languages. Performance Lab.Students must optimize the performance of an application ker- nel function such as convolution or matrix transposition.This lab provides a very clear demonstration of the properties of cache memories and gives students experience with low-level program optimization. Cache Lab.In this alternative to the performance lab,students write a general- purpose cache simulator,and then optimize a small matrix transpose kernel to minimize the number of misses on a simulated cache.We use the Valgrind tool to generate real address traces for the matrix transpose kernel. Shell Lab.Students implement their own Unix shell program with job control, including the Ctrl+C and Ctrl+Z keystrokes and the fg,bg,and jobs com-
I ,, j xxx Preface For Instructors: Classroom-Tested Laboratory Exercises The ICS+ course at Carnegie Mellon receives very high evaluations from students. Median scores of 5.0/5.0 and means of 4.6/5.0 are typical for the student course evaluations. Students cite the fun, exciting, and relevant laboratory exercises as the primary reason. The Jabs are available from the CS:APP Web page. Here are examples of the Jabs that are provided with the book. Data Lab. This Jab requires students to implement simple logical and arithmetic functions, but using a highly restricted subset of C. For example, they must compute the absolute value of a number using only bit-level operations. This Jab helps students understand the bit-level representations of C data types and the bit-level behavior of the operations on data. Binary Bomb Lab. A binary bomb is a program provided to students as an objectcode file. When run, it prompts the user to type in six different strings. If any of these are incorrect, the bomb "explodes," printing an error message and Jogging the event on a grading, server. Students must "defuse" their own unique bombs by disassembling and reverse engineering the programs to determine what the Si)\ strings shoqld be. The lab teaches students to understand assembly language and also forces them to learn ,how to use a debugger. Buffer Overflow Lab. Students are required to modify the run-time behavior of a binary executable by exploiting a buffer overflow vulnerability. This Jab teaches the students about the stack discipline and aJ;>,out the ,danger of writing code that is vulnerable to buffer overflow attacks. Architecture Lab. Several of the homework problems of Chapter 4 can be combined into a lab assignment, where students modify the HCL description of a processor to add new instructions, change the branch prediction policy, or add or remove bypassing paths and register ports. The resulting processors can be simulated and run through automated tests that will detect most of the possible bugs. This Jab lets students experience the exciting parts of processor design without requiring a complete background in logic design and hardware description languages. Performance Lab. Students must optimize the performance of an application kernel function such as convolution or matrix transposition. This Jab provides a very clear demonstration of the properties of cache memories and gives students experience with low-level program optimization. Cache Lab. In this alternative to the performance lab, students write a generalpurpose cache simulator, and then optimize a small matrix transpose kernel to minimize the number of misses on a simulated each~. We use the Valgrind tool to generate real address traces for the matri/< transpose kernel. Shell L'ab. Students implement their own Unix shell program with job control, including the Ctrl+C and Ctrl+Z keystrokes and the fg, bg, and jobs com- • 'I i I ' j l i