Preface xvii Chapter or appendix Sections Software focus Hardware focus 1.Computer Abstractions 1.1to1.9 o ©Q and Technology ■1.10(History) a g 2.1to2.14 oa 2.Instructions:Language 2.15(Compilers&Java) J of the Computer 2.16to2.19 g o 2.20 (History) E.RISC Instruction-Set Architectures E.1 to E.19 3.1to3.9 o叉 应 3.Arithmetic for Computers 3.10 (History) o Q C.The Basics of Logic Design C.1 to C.13 J应 4.1(Overview) 豆 4.2 (Logic Conventions) 豆 4.3 to 4.4 (Simple Implementation) 豆 4.5(Pipelining Overview) o D 4.The Processor 4.6(Pipelined Datapath) g 4.7 to 4.9 (Hazards,Exceptions) 可豆 4.10 to 4.11 (Parallel,Real Stuff) o 4.12(Verilog Pipeline Control) 叉 4.13 to 4.14 (Fallacies) ● 4.15(History) J Q D.Mapping Control to Hardware D.1 to D.6 J双 5.1to5.8 四豆 G豆 5.9(Verilog Cache Controller) g 5.Large and Fast:Exploiting Memory Hierarchy 5.10to5.12 心豆 5.13(History) 应 6.1to6.10 心应 6.Storage and 6.11(Networks) 双 Other 1/O Topics 6.12to6.13 Q 双 6.14 (History) Q 7.Multicores,Multiprocessors, 7.1to7.13 双 and Clusters 7.14 (History) g A.Graphics Processor Units A.1 to A.12 叉 应 B.Assemblers,Linkers,and B.1 to B.12 the SPIM Simulator o J▣ Read carefully Read if have time Reference u Review or read Read for culture
Preface xvii Chapter or appendix Sections Software focus Hardware focus 1. Computer Abstractions and Technology 1.1 to 1.9 1.10 (History) 3. Arithmetic for Computers 3.1 to 3.9 3.10 (History) 4. The Processor 4.1 (Overview) 4.2 (Logic Conventions) 4.3 to 4.4 (Simple Implementation) E. RISC Instruction-Set Architectures E.1 to E.19 2. Instructions: Language of the Computer 2.1 to 2.14 2.15 (Compilers & Java) 2.16 to 2.19 2.20 (History) 4.5 (Pipelining Overview) 4.6 (Pipelined Datapath) 4.7 to 4.9 (Hazards, Exceptions) 4.10 to 4.11 (Parallel, Real Stuff) 4.15 (History) C. The Basics of Logic Design C.1 to C.13 D. Mapping Control to Hardware D.1 to D.6 B. Assemblers, Linkers, and the SPIM Simulator B.1 to B.12 Read carefully Review or read Read if have time Read for culture Reference 4.12 (Verilog Pipeline Control) 5. Large and Fast: Exploiting Memory Hierarchy 5.1 to 5.8 5.13 (History) 4.13 to 4.14 (Fallacies) 7. Multicores, Multiprocessors, and Clusters 7.1 to 7.13 7.14 (History) 6. Storage and Other I/O Topics 6.1 to 6.10 6.11 (Networks) 6.12 to 6.13 6.14 (History) 5.10 to 5.12 A. Graphics Processor Units A.1 to A.12 5.9 (Verilog Cache Controller)
xviii Preface languages.It includes material from Chapter 3 in the third edition so that the complete MIPS architecture is now in a single chapter,minus the floating-point instructions.Chapter 3 is for readers interested in constructing a datapath or in learning more about floating-point arithmetic.Some will skip Chapter 3,either because they don't need it or because it is a review.Chapter 4 combines two chap- ters from the third edition to explain pipelined processors.Sections 4.1,4.5,and 4.10 give overviews for those with a software focus.Those with a hardware focus, however,will find that this chapter presents core material;they may also,depend- ing on their background,want to read Appendix C on logic design first.Chapter 6 on storage is critical to readers with a software focus,and should be read by others if time permits.The last chapter on multicores,multiprocessors,and clusters is mostly new content and should be read by everyone. The first goal was to make parallelism a first class citizen in this edition,as it was a separate chapter on the CD in the last edition.The most obvious example is Chapter 7.In particular,this chapter introduces the Roofline performance model, and shows its value by evaluating four recent multicore architectures on two kernels.This model could prove to be as insightful for multicore microprocessors as the 3Cs model is for caches. Given the importance of parallelism,it wasn't wise to wait until the last chapter to talk about,so there is a section on parallelism in each of the preceding six chapters: Chapter 1:Parallelism and Power.It shows how power limits have forced the industry to switch to parallelism,and why parallelism helps. Chapter 2:Parallelism and Instructions:Synchronization.This section dis- cusses locks for shared variables,specifically the MIPS instructions Load Linked and Store Conditional. Chapter 3:Parallelism and Computer Arithmetic:Floating-Point Associativity. This section discusses the challenges of numerical precision and floating- point calculations. ■ Chapter 4:Parallelism and Advanced Instruction-Level Parallelism.It covers advanced ILP-superscalar,speculation,VLIW,loop-unrolling,and 000-as well as the relationship between pipeline depth and power consumption. Chapter 5:Parallelism and Memory Hierarchies:Cache Coherence.It introduces coherency,consistency,and snooping cache protocols. Chapter 6:Parallelism and I/O:Redundant Arrays of Inexpensive Disks.It describes RAID as a parallel I/O system as well as a highly available ICO system
xviii Preface languages. It includes material from Chapter 3 in the third edition so that the complete MIPS architecture is now in a single chapter, minus the floating-point instructions. Chapter 3 is for readers interested in constructing a datapath or in learning more about floating-point arithmetic. Some will skip Chapter 3, either because they don’t need it or because it is a review. Chapter 4 combines two chapters from the third edition to explain pipelined processors. Sections 4.1, 4.5, and 4.10 give overviews for those with a software focus. Those with a hardware focus, however, will find that this chapter presents core material; they may also, depending on their background, want to read Appendix C on logic design first. Chapter 6 on storage is critical to readers with a software focus, and should be read by others if time permits. The last chapter on multicores, multiprocessors, and clusters is mostly new content and should be read by everyone. The first goal was to make parallelism a first class citizen in this edition, as it was a separate chapter on the CD in the last edition. The most obvious example is Chapter 7. In particular, this chapter introduces the Roofline performance model, and shows its value by evaluating four recent multicore architectures on two kernels. This model could prove to be as insightful for multicore microprocessors as the 3Cs model is for caches. Given the importance of parallelism, it wasn’t wise to wait until the last chapter to talk about, so there is a section on parallelism in each of the preceding six chapters: ■ Chapter 1: Parallelism and Power. It shows how power limits have forced the industry to switch to parallelism, and why parallelism helps. ■ Chapter 2: Parallelism and Instructions: Synchronization. This section discusses locks for shared variables, specifically the MIPS instructions Load Linked and Store Conditional. ■ Chapter 3: Parallelism and Computer Arithmetic: Floating-Point Associativity. This section discusses the challenges of numerical precision and floatingpoint calculations. ■ Chapter 4: Parallelism and Advanced Instruction-Level Parallelism. It covers advanced ILP—superscalar, speculation, VLIW, loop-unrolling, and OOO—as well as the relationship between pipeline depth and power consumption. ■ Chapter 5:ParallelismandMemoryHierarchies:CacheCoherence. It introduces coherency, consistency, and snooping cache protocols. ■ Chapter 6: Parallelism and I/O: Redundant Arrays of Inexpensive Disks. It describes RAID as a parallel I/O system as well as a highly available ICO system
Preface xix Chapter 7 concludes with reasons for optimism why this foray into parallelism should be more successful than those of the past. I am particularly excited about the addition of an appendix on Graphical Processing Units written by NVIDIA's chief scientist,David Kirk,and chief archi- tect,John Nickolls.Appendix A is the first in-depth description of GPUs,which is a new and interesting thrust in computer architecture.The appendix builds upon the parallel themes of this edition to present a style of computing that allows the programmer to think MIMD yet the hardware tries to execute in SIMD-style whenever possible.As GPUs are both inexpensive and widely available-they are even found in many laptops-and their programming environments are freely available,they provide a parallel hardware platform that many could experiment with. The second goal was to streamline the book to make room for new material in parallelism.The first step was simply going through all the paragraphs accumulated over three editions with a fine-toothed comb to see if they were still necessary.The coarse-grained changes were the merging of chapters and dropping of topics.Mark Hill suggested dropping the multicycle processor implementation and instead adding a multicycle cache controller to the memory hierarchy chapter.This allowed the processor to be presented in a single chapter instead of two,enhancing the processor material by omission.The performance material from a separate chapter in the third edition is now blended into the first chapter. The third goal was to improve the pedagogy of the book.Chapter 1 is now meatier,including performance,integrated circuits,and power,and it sets the stage for the rest of the book.Chapters 2 and 3 were originally written in an evolutionary style,starting with a"single celled"architecture and ending up with the full MIPS architecture by the end of Chapter 3.This leisurely style is not a good match to the modern reader.This edition merges all of the instruction set material for the integer instructions into Chapter 2-making Chapter 3 optional for many readers-and each section now stands on its own.The reader no longer needs to read all of the preceding sections.Hence,Chapter 2 is now even better as a reference than it was in prior editions.Chapter 4 works better since the processor is now a single chapter,as the multicycle implementation is a distraction today.Chapter 5 has a new section on building cache controllers,along with a new CD section containing the Verilog code for that cache. The accompanying CD-ROM introduced in the third edition allowed us to reduce the cost of the book by saving pages as well as to go into greater depth on topics that were of interest to some but not all readers.Alas,in our enthusiasm to save pages,readers sometimes found themselves going back and forth between the CD and book more often than they liked.This should not be the case in this edition.Each chapter now has the Historical Perspectives section on the CD and four chapters also have one advanced material section on the CD.Additionally,all
Preface xix Chapter 7 concludes with reasons for optimism why this foray into parallelism should be more successful than those of the past. I am particularly excited about the addition of an appendix on Graphical Processing Units written by NVIDIA’s chief scientist, David Kirk, and chief architect, John Nickolls. Appendix A is the first in-depth description of GPUs, which is a new and interesting thrust in computer architecture. The appendix builds upon the parallel themes of this edition to present a style of computing that allows the programmer to think MIMD yet the hardware tries to execute in SIMD-style whenever possible. As GPUs are both inexpensive and widely available—they are even found in many laptops—and their programming environments are freely available, they provide a parallel hardware platform that many could experiment with. The second goal was to streamline the book to make room for new material in parallelism. The first step was simply going through all the paragraphs accumulated over three editions with a fine-toothed comb to see if they were still necessary. The coarse-grained changes were the merging of chapters and dropping of topics. Mark Hill suggested dropping the multicycle processor implementation and instead adding a multicycle cache controller to the memory hierarchy chapter. This allowed the processor to be presented in a single chapter instead of two, enhancing the processor material by omission. The performance material from a separate chapter in the third edition is now blended into the first chapter. The third goal was to improve the pedagogy of the book. Chapter 1 is now meatier, including performance, integrated circuits, and power, and it sets the stage for the rest of the book.Chapters 2 and 3 were originally written in an evolutionary style, starting with a “single celled” architecture and ending up with the full MIPS architecture by the end of Chapter 3. This leisurely style is not a good match to the modern reader. This edition merges all of the instruction set material for the integer instructions into Chapter 2—making Chapter 3 optional for many readers—and each section now stands on its own. The reader no longer needs to read all of the preceding sections. Hence,Chapter 2 is now even better as a reference than it was in prior editions.Chapter 4 works better since the processor is now a single chapter, as the multicycle implementation is a distraction today. Chapter 5 has a new section on building cache controllers, along with a new CD section containing the Verilog code for that cache. The accompanying CD-ROM introduced in the third edition allowed us to reduce the cost of the book by saving pages as well as to go into greater depth on topics that were of interest to some but not all readers. Alas, in our enthusiasm to save pages, readers sometimes found themselves going back and forth between the CD and book more often than they liked. This should not be the case in this edition. Each chapter now has the Historical Perspectives section on the CD and four chapters also have one advanced material section on the CD. Additionally, all
XX Preface exercises are in the printed book,so flipping between book and CD should be rare in this edition. For those of you who wonder why we include a CD-ROM with the book, the answer is simple:the CD contains content that we feel should be easily and immediately accessible to the reader no matter where they are.If you are interested in the advanced content,or would like to review a VHDL tutorial (for example),it is on the CD,ready for you to use.The CD-ROM also includes a feature that should greatly enhance your study of the material:a search engine is included that allows you to search for any string of text,in the printed book or on the CD itself.If you are hunting for content that may not be included in the book's printed index,you can simply enter the text you're searching for and the page number it appears on will be displayed in the search results.This is a very useful feature that we hope you make frequent use of as you read and review the book. This is a fast-moving field,and as is always the case for our new editions,an important goal is to update the technical content.The AMD Opteron X4 model 2356(code named "Barcelona")serves as a running example throughout the book, and is found in Chapters 1,4,5,and 7.Chapters 1 and 6 add results from the new power benchmark from SPEC.Chapter 2 adds a section on the ARM architec- ture,which is currently the world's most popular 32-bit ISA.Chapter 5 adds a new section on Virtual Machines,which are resurging in importance.Chapter 5 has detailed cache performance measurements on the Opteron X4 multicore and a few details on its rival,the Intel Nehalem,which will not be announced until after this edition is published.Chapter 6 describes Flash Memory for the first time as well as a remarkably compact server from Sun,which crams 8 cores,16 DIMMs, and 8 disks into a single IU bit.It also includes the recent results on long-term disk failures.Chapter 7 covers a wealth of topics regarding parallelism-including multithreading,SIMD,vector,GPUs,performance models,benchmarks,multipro- cessor networks-and describes three multicores plus the Opteron X4:Intel Xeon model e5345(Clovertown),IBM Cell model QS20,and the Sun Microsystems T2 model 5120 (Niagara 2). The final goal was to try to make the exercises useful to instructors in this Internet age,for homework assignments have long been an important way to learn material. Alas,answers are posted today almost as soon as the book appears.We have a two- part approach.First,expert contributors have worked to develop entirely new exercises for each chapter in the book.Second,most exercises have a qualitative description supported by a table that provides several alternative quantitative parameters needed to answer this question.The sheer number plus flexibility in terms of how the instructor can choose to assign variations of exercises will make it hard for students to find the matching solutions online.Instructors will also be able to change these quantitative parameters as they wish,again frustrating those students who have come to rely on the Internet to provide solutions for a static and unchanging set of exercises.We feel this new approach is a valuable new addition to the book-please let us know how well it works for you,either as a student or instructor!
xx Preface exercises are in the printed book, so flipping between book and CD should be rare in this edition. For those of you who wonder why we include a CD-ROM with the book, the answer is simple: the CD contains content that we feel should be easily and immediately accessible to the reader no matter where they are. If you are interested in the advanced content, or would like to review a VHDL tutorial (for example), it is on the CD, ready for you to use. The CD-ROM also includes a feature that should greatly enhance your study of the material: a search engine is included that allows you to search for any string of text, in the printed book or on the CD itself. If you are hunting for content that may not be included in the book’s printed index, you can simply enter the text you’re searching for and the page number it appears on will be displayed in the search results. This is a very useful feature that we hope you make frequent use of as you read and review the book. This is a fast-moving field, and as is always the case for our new editions, an important goal is to update the technical content. The AMD Opteron X4 model 2356 (code named “Barcelona”) serves as a running example throughout the book, and is found in Chapters 1, 4, 5, and 7. Chapters 1 and 6 add results from the new power benchmark from SPEC. Chapter 2 adds a section on the ARM architecture, which is currently the world’s most popular 32-bit ISA. Chapter 5 adds a new section on Virtual Machines, which are resurging in importance. Chapter 5 has detailed cache performance measurements on the Opteron X4 multicore and a few details on its rival, the Intel Nehalem, which will not be announced until after this edition is published. Chapter 6 describes Flash Memory for the first time as well as a remarkably compact server from Sun, which crams 8 cores, 16 DIMMs, and 8 disks into a single 1U bit. It also includes the recent results on long-term disk failures. Chapter 7 covers a wealth of topics regarding parallelism—including multithreading, SIMD, vector, GPUs, performance models, benchmarks, multiprocessor networks—and describes three multicores plus the Opteron X4: Intel Xeon model e5345 (Clovertown), IBM Cell model QS20, and the Sun Microsystems T2 model 5120 (Niagara 2). The final goal was to try to make the exercises useful to instructors in this Internet age, for homework assignments have long been an important way to learn material. Alas, answers are posted today almost as soon as the book appears. We have a twopart approach. First, expert contributors have worked to develop entirely new exercises for each chapter in the book. Second, most exercises have a qualitative description supported by a table that provides several alternative quantitative parameters needed to answer this question. The sheer number plus flexibility in terms of how the instructor can choose to assign variations of exercises will make it hard for students to find the matching solutions online. Instructors will also be able to change these quantitative parameters as they wish, again frustrating those students who have come to rely on the Internet to provide solutions for a static and unchanging set of exercises. We feel this new approach is a valuable new addition to the book—please let us know how well it works for you, either as a student or instructor!
Preface xxi We have preserved useful book elements from prior editions.To make the book work better as a reference,we still place definitions of new terms in the margins at their first occurrence.The book element called"Understanding Program Per- formance"sections helps readers understand the performance of their programs and how to improve it,just as the "Hardware/Software Interface"book element helped readers understand the tradeoffs at this interface."The Big Picture"section remains so that the reader sees the forest even despite all the trees."Check Yourself" sections help readers to confirm their comprehension of the material on the first time through with answers provided at the end of each chapter.This edition also includes the green MIPS reference card,which was inspired by the "Green Card"of the IBM System/360.The removable card has been updated and should be a handy reference when writing MIPS assembly language programs. Instructor Support We have collected a great deal of material to help instructors teach courses using this book.Solutions to exercises,chapter quizzes,figures from the book,lecture notes, lecture slides,and other materials are available to adopters from the publisher. Check the publisher's Web site for more information: textbooks.elsevier.com/9780123747501 Concluding Remarks If you read the following acknowledgments section,you will see that we went to great lengths to correct mistakes.Since a book goes through many printings,we have the opportunity to make even more corrections.If you uncover any remaining, resilient bugs,please contact the publisher by electronic mail at cod4bugs@mkp. com or by low-tech mail using the address found on the copyright page. This edition marks a break in the long-standing collaboration between Hennessy and Patterson,which started in 1989.The demands of running one of the world's great universities meant that President Hennessy could no longer make the sub- stantial commitment to create a new edition.The remaining author felt like a jug- gler who had always performed with a partner who suddenly is thrust on the stage as a solo act.Hence,the people in the acknowledgments and Berkeley colleagues played an even larger role in shaping the contents of this book.Nevertheless,this time around there is only one author to blame for the new material in what you are about to read. Acknowledgments for the Fourth Edition I'd like to thank David Kirk,John Nickolls,and their colleagues at NVIDIA(Michael Garland,John Montrym,Doug Voorhies,Lars Nyland,Erik Lindholm,Paulius Micikevicius,Massimiliano Fatica,Stuart Oberman,and Vasily Volkov)for writing
Preface xxi We have preserved useful book elements from prior editions. To make the book work better as a reference, we still place definitions of new terms in the margins at their first occurrence. The book element called “Understanding Program Performance” sections helps readers understand the performance of their programs and how to improve it, just as the “Hardware/Software Interface” book element helped readers understand the tradeoffs at this interface. “The Big Picture” section remains so that the reader sees the forest even despite all the trees. “Check Yourself” sections help readers to confirm their comprehension of the material on the first time through with answers provided at the end of each chapter. This edition also includes the green MIPS reference card, which was inspired by the “Green Card” of the IBM System/360. The removable card has been updated and should be a handy reference when writing MIPS assembly language programs. Instructor Support We have collected a great deal of material to help instructors teach courses using this book. Solutions to exercises, chapter quizzes, figures from the book, lecture notes, lecture slides, and other materials are available to adopters from the publisher. Check the publisher’s Web site for more information: Concluding Remarks If you read the following acknowledgments section, you will see that we went to great lengths to correct mistakes. Since a book goes through many printings, we have the opportunity to make even more corrections. If you uncover any remaining, resilient bugs, please contact the publisher by electronic mail at cod4bugs@mkp. com or by low-tech mail using the address found on the copyright page. This edition marks a break in the long-standing collaboration between Hennessy and Patterson, which started in 1989. The demands of running one of the world’s great universities meant that President Hennessy could no longer make the substantial commitment to create a new edition. The remaining author felt like a juggler who had always performed with a partner who suddenly is thrust on the stage as a solo act. Hence, the people in the acknowledgments and Berkeley colleagues played an even larger role in shaping the contents of this book. Nevertheless, this time around there is only one author to blame for the new material in what you are about to read. Acknowledgments for the Fourth Edition I’d like to thank David Kirk, John Nickolls, and their colleagues at NVIDIA (Michael Garland, John Montrym, Doug Voorhies, Lars Nyland, Erik Lindholm, Paulius Micikevicius, Massimiliano Fatica, Stuart Oberman, and Vasily Volkov) for writing textbooks.elsevier.com/9780123747501