xii Contents 7.4 Clusters and Other Message-Passing Multiprocessors 641 7.5 Hardware Multithreading 645 7.6 SISD,MIMD,SIMD,SPMD,and Vector 648 7.7 Introduction to Graphics Processing Units 654 7.8 Introduction to Multiprocessor Network Topologies 660 7.9 Multiprocessor Benchmarks 664 7.10 Roofline:A Simple Performance Model 667 7.11 Real Stuff:Benchmarking Four Multicores Using the Roofline Model 675 7.12 Fallacies and Pitfalls 684 7.13 Concluding Remarks 686 7.14 Historical Perspective and Further Reading 688 7.15 Exercises 688 APPENDICES Graphics and Computing GPUs A-2 A.1 Introduction A-3 A.2 GPU System Architectures A-7 A.3 Programming GPUs A-12 A.4 Multithreaded Multiprocessor Architecture A-25 A.5 Parallel Memory System A-36 A.6 Floating Point Arithmetic A-41 A.7 Real Stuff:The NVIDIA GeForce 8800 A-46 A.8 Real Stuff:Mapping Applications to GPUs A-55 A.9 Fallacies and Pitfalls A-72 A.10 Concluding Remarks A-76 A.11 Historical Perspective and Further Reading A-77 B Assemblers,Linkers,and the SPIM Simulator B-2 B.1 Introduction B-3 B.2 Assemblers B-10 B.3 Linkers B-18 B.4 Loading B-19 B.5 Memory Usage B-20 B.6 Procedure Call Convention B-22 B.7 Exceptions and Interrupts B-33 B.8 Input and Output B-38 B.9 SPIM B-40
7.4 Clusters and Other Message-Passing Multiprocessors 641 7.5 Hardware Multithreading 645 7.6 SISD, MIMD, SIMD, SPMD, and Vector 648 7.7 Introduction to Graphics Processing Units 654 7.8 Introduction to Multiprocessor Network Topologies 660 7.9 Multiprocessor Benchmarks 664 7.10 Roofline: A Simple Performance Model 667 7.11 Real Stuff: Benchmarking Four Multicores Using the Roofline Model 675 7.12 Fallacies and Pitfalls 684 7.13 Concluding Remarks 686 7.14 Historical Perspective and Further Reading 688 7.15 Exercises 688 A P P E N D I C E S A Graphics and Computing GPUs A-2 A.1 Introduction A-3 A.2 GPU System Architectures A-7 A.3 Programming GPUs A-12 A.4 Multithreaded Multiprocessor Architecture A-25 A.5 Parallel Memory System A-36 A.6 Floating Point Arithmetic A-41 A.7 Real Stuff: The NVIDIA GeForce 8800 A-46 A.8 Real Stuff: Mapping Applications to GPUs A-55 A.9 Fallacies and Pitfalls A-72 A.10 Concluding Remarks A-76 A.11 Historical Perspective and Further Reading A-77 B Assemblers, Linkers, and the SPIM Simulator B-2 B.1 Introduction B-3 B.2 Assemblers B-10 B.3 Linkers B-18 B.4 Loading B-19 B.5 Memory Usage B-20 B.6 Procedure Call Convention B-22 B.7 Exceptions and Interrupts B-33 B.8 Input and Output B-38 B.9 SPIM B-40 xii Contents
Contents xiii B.10 MIPS R2000 Assembly Language B-45 B.11 Concluding Remarks B-81 B.12 Exercises B-82 Index I-1 CD·ROM CONTENT The Basics of Logic Design C-2 C.1 Introduction C-3 C.2 Gates,Truth Tables,and Logic Equations C-4 C.3 Combinational Logic C-9 C.4 Using a Hardware Description Language C-20 C.5 Constructing a Basic Arithmetic Logic Unit C-26 C.6 Faster Addition:Carry Lookahead C-38 C.7 Clocks C-48 C.8 Memory Elements:Flip-Flops,Latches,and Registers C-50 C.9 Memory Elements:SRAMs and DRAMs C-58 C.10 Finite-State Machines C-67 C.11 Timing Methodologies C-72 C.12 Field Programmable Devices C-78 C.13 Concluding Remarks C-79 C.14 Exercises C-80 D Mapping Control to Hardware D-2 D.1 Introduction D-3 D.2 Implementing Combinational Control Units D-4 D.3 Implementing Finite-State Machine Control D-8 D.4 Implementing the Next-State Function with a Sequencer D-22 D.5 Translating a Microprogram to Hardware D-28 D.6 Concluding Remarks D-32 D.7 Exercises D-33 E A Survey of RISC Architectures for Desktop, Server,and Embedded Computers E-2 E.I Introduction E-3 E.2 Addressing Modes and Instruction Formats E-5 E.3 Instructions:The MIPS Core Subset E-9
B.10 MIPS R2000 Assembly Language B-45 B.11 Concluding Remarks B-81 B.12 Exercises B-82 Index I-1 C D - R O M C O N T E N T The Basics of Logic Design C-2 C.1 Introduction C-3 C.2 Gates, Truth Tables, and Logic Equations C-4 C.3 Combinational Logic C-9 C.4 Using a Hardware Description Language C-20 C.5 Constructing a Basic Arithmetic Logic Unit C-26 C.6 Faster Addition: Carry Lookahead C-38 C.7 Clocks C-48 C.8 Memory Elements: Flip-Flops, Latches, and Registers C-50 C.9 Memory Elements: SRAMs and DRAMs C-58 C.10 Finite-State Machines C-67 C.11 Timing Methodologies C-72 C.12 Field Programmable Devices C-78 C.13 Concluding Remarks C-79 C.14 Exercises C-80 Mapping Control to Hardware D-2 D.1 Introduction D-3 D.2 Implementing Combinational Control Units D-4 D.3 Implementing Finite-State Machine Control D-8 D.4 Implementing the Next-State Function with a Sequencer D-22 D.5 Translating a Microprogram to Hardware D-28 D.6 Concluding Remarks D-32 D.7 Exercises D-33 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers E-2 E.1 Introduction E-3 E.2 Addressing Modes and Instruction Formats E-5 E.3 Instructions: The MIPS Core Subset E-9 C D E Contents xiii
xiv Contents E.4 Instructions:Multimedia Extensions of the Desktop/Server RISCs E-16 E.5 Instructions:Digital Signal-Processing Extensions of the Embedded RISCs E-19 E.6 Instructions:Common Extensions to MIPS Core E-20 E.7 Instructions Unique to MIPS-64 E-25 E.8 Instructions Unique to Alpha E-27 E.9 Instructions Unique to SPARC v.9 E-29 E.10 Instructions Unique to PowerPC E-32 E.11 Instructions Unique to PA-RISC 2.0 E-34 E.12 Instructions Unique to ARM E-36 E.13 Instructions Unique to Thumb E-38 E.14 Instructions Unique to SuperH E-39 E.15 Instructions Unique to M32R E-40 E.16 Instructions Unique to MIPS-16 E-40 E.17 Concluding Remarks E-43 回 Glossary G-1 Further Reading FR-1 For the convenience of readers who have purchased an ebook edition,all CD-ROM content is available as a download from the book's companion page. Visit http://www.elsevierdirect.com/companion.jsp?ISBN-9780123747501 to download your CD-ROM files
E.4 Instructions: Multimedia Extensions of the Desktop/Server RISCs E-16 E.5 Instructions: Digital Signal-Processing Extensions of the Embedded RISCs E-19 E.6 Instructions: Common Extensions to MIPS Core E-20 E.7 Instructions Unique to MIPS-64 E-25 E.8 Instructions Unique to Alpha E-27 E.9 Instructions Unique to SPARC v.9 E-29 E.10 Instructions Unique to PowerPC E-32 E.11 Instructions Unique to PA-RISC 2.0 E-34 E.12 Instructions Unique to ARM E-36 E.13 Instructions Unique to Thumb E-38 E.14 Instructions Unique to SuperH E-39 E.15 Instructions Unique to M32R E-40 E.16 Instructions Unique to MIPS-16 E-40 E.17 Concluding Remarks E-43 Glossary G-1 Further Reading FR-1 xiv Contents For the convenience of readers who have purchased an ebook edition, all CD-ROM content is available as a download from the book’s companion page. Visit http://www.elsevierdirect.com/companion.jsp?ISBN=9780123747501 to download your CD-ROM files
Preface The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. Albert Einstein,What I Believe,1930 About This Book We believe that learning in computer science and engineering should reflect the current state of the field,as well as introduce the principles that are shaping com- puting.We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities,performance,and, ultimately,the success of computer systems. Modern computer technology requires professionals of every computing spe- cialty to understand both hardware and software.The interaction between hard- ware and software at a variety of levels also offers a framework for understanding the fundamentals of computing.Whether your primary interest is hardware or software,computer science or electrical engineering,the central ideas in computer organization and design are the same.Thus,our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers. The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective,given since the first edition.While programmers could ignore the advice and rely on computer architects,compiler writers,and silicon engineers to make their programs run faster without change,that era is over. For programs to run faster,they must become parallel.While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming,it will take many years to realize this vision.Our view is that for at least the next decade,most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers. The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does
Preface The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. Albert Einstein, What I Believe, 1930 About This Book We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, and, ultimately, the success of computer systems. Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for understanding the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers. The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective, given since the first edition. While programmers could ignore the advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster without change, that era is over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers. The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does
xvi Preface About the Other Book Some readers may be familiar with Computer Architecture:A Quantitative Approach, popularly known as Hennessy and Patterson.(This book in turn is often called Patterson and Hennessy.)Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs.We used an approach that combined exam- ples and measurements,based on commercial systems,to create realistic design experiences.Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach.It was intended for the serious computing professional who wanted a detailed under- standing of computers. A majority of the readers for this book do not plan to become computer archi- tects.The performance and energy efficiency of future software systems will be dramatically affected,however,by how well software designers understand the basic hardware techniques at work in a system.Thus,compiler writers,operating system designers,database programmers,and most other software engineers need a firm grounding in the principles presented in this book.Similarly,hardware designers must understand clearly the effects of their work on software applications. Thus,we knew that this book had to be much more than a subset of the material in Computer Architecture,and the material was extensively revised to match the different audience.We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory mate- rial;hence,there is much less overlap today than with the first editions of both books. Changes for the Fourth Edition We had five major goals for the fourth edition of Computer Organization and Design:given the multicore revolution in microprocessors,highlight parallel hardware and software topics throughout the book;streamline the existing mate- rial to make room for topics on parallelism;enhance pedagogy in general;update the technical content to reflect changes in the industry since the publication of the third edition in 2004;and restore the usefulness of exercises in this Internet age. Before discussing the goals in detail,let's look at the table on the next page.It shows the hardware and software paths through the material.Chapters 1,4,5,and 7 are found on both paths,no matter what the experience or the focus.Chapter 1 is a new introduction that includes a discussion on the importance of power and how it motivates the switch from single core to multicore microprocessors.It also includes performance and benchmarking material that was a separate chapter in the third edition.Chapter 2 is likely to be review material for the hardware-oriented, but it is essential reading for the software-oriented,especially for those readers interested in learning more about compilers and object-oriented programming
xvi Preface About the Other Book Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers. A majority of the readers for this book do not plan to become computer architects. The performance and energy efficiency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications. Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the first editions of both books. Changes for the Fourth Edition We had five major goals for the fourth edition of Computer Organization and Design: given the multicore revolution in microprocessors, highlight parallel hardware and software topics throughout the book; streamline the existing material to make room for topics on parallelism; enhance pedagogy in general; update the technical content to reflect changes in the industry since the publication of the third edition in 2004; and restore the usefulness of exercises in this Internet age. Before discussing the goals in detail, let’s look at the table on the next page. It shows the hardware and software paths through the material. Chapters 1, 4, 5, and 7 are found on both paths, no matter what the experience or the focus. Chapter 1 is a new introduction that includes a discussion on the importance of power and how it motivates the switch from single core to multicore microprocessors. It also includes performance and benchmarking material that was a separate chapter in the third edition.Chapter 2 islikely to be review material forthe hardware-oriented, but it is essential reading for the software-oriented, especially for those readers interested in learning more about compilers and object-oriented programming