R E V I S E D F O U R T H E D I T I O N Computer Organization and Design THE HARDWARE/SOFT WARE INTERFACE David A.Patterson University of California,Berkeley John L.Hennessy Stanford University With contributions by Perry Alexander David Kaeli Kevin Lim The University of Kansas Northeastern University Hewlett-Packard Peter J.Ashenden Nicole Kaiyan John Nickolls Ashenden Designs Pty Ltd University of Adelaide NVIDIA Javier Bruguera David Kirk John Oliver Universidade de Santiago de Compostela NVIDIA Cal Poly,San Luis Obispo Jichuan Chang James R.Larus Milos Prvulovic Hewlett-Packard Microsoft Research Georgia Tech Matthew Farrens Jacob Leverich Partha Ranganathan University of California,Davis Hewlett-Packard Hewlett-Packard AMSTERDAM·BOSTON·HEIDELBERG·LONDON NEW YORK·OXFORD·PARIS·SAN DIEGO M< SAN FRANCISCO·SINGAPORE·SYDNEY·TOKYO HORGAN KAUFHANN ELSEVIER Morgan Kaufmann is an imprint of Elsevier
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions by Perry Alexander David Kaeli Kevin Lim The University of Kansas Northeastern University Hewlett-Packard Peter J. Ashenden Nicole Kaiyan John Nickolls Ashenden Designs Pty Ltd University of Adelaide NVIDIA Javier Bruguera David Kirk John Oliver Universidade de Santiago de Compostela NVIDIA Cal Poly, San Luis Obispo Jichuan Chang James R. Larus Milos Prvulovic Hewlett-Packard Microsoft Research Georgia Tech Matthew Farrens Jacob Leverich Partha Ranganathan University of California, Davis Hewlett-Packard Hewlett-Packard
Acquiring Editor:Todd Green Development Editor:Nate McFadden Project Manager:Jessica Vaughan Designer:Eric DeCicco Morgan Kaufmnann is an imprint of Elsevier 225 Wyman Street,Waltham,MA 02451,USA 2012 Elsevier,Inc.All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means,electronic or mechanical,including photocopying,recording,or any information storage and retrieval system,without permission in writing from the publisher.Details on how to seek permission,further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency,can be found at our website:www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing.As new research and experience broaden our understanding,changes in research methods or professional practices,may become necessary.Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein.In using such information or methods they should be mindful of their own safety and the safety of others,including parties for whom they have a professional responsibility. To the fullest extent of the law,neither the Publisher nor the authors,contributors,or editors,assume any liability for any injury and/or damage to persons or property as a matter of products liability,negligence or otherwise,or from any use or operation of any methods,products,instructions,or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Patterson,David A. Computer organization and design:the hardware/software interface David A.Patterson,John L.Hennessy.-4th ed. p.cm.-(The Morgan Kaufmann series in computer architecture and design) Rev.ed.of:Computer organization and design/John L.Hennessy,David A.Patterson.1998. Summary:"Presents the fundamentals of hardware technologies,assembly language,computer arithmetic,pipelining, memory hierarchies and I/O"-Provided by publisher. ISBN978-0-12-374750-1(pbk) 1.Computer organization.2.Computer engineering.3.Computer interfaces.I.Hennessy,John L.II.Hennessy,John L. Computer organization and design.III.Title. QA76.9.C643H462011 004.22dc23 2011029199 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN:978-0-12-374750-1 For information on all MK publications visit our website at www.mkp.com Printed in the United States of America 1213141516 1098765432 Working together to grow libraries in developing countries www.elsevier.com www.bookaid.org www.sabre.org ELSEVIER BOOK AID International Sabre Foundation
Acquiring Editor: Todd Green Development Editor: Nate McFadden Project Manager: Jessica Vaughan Designer: Eric DeCicco Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA © 2012 Elsevier, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Patterson, David A. Computer organization and design: the hardware/software interface / David A. Patterson, John L. Hennessy. — 4th ed. p. cm. — (The Morgan Kaufmann series in computer architecture and design) Rev. ed. of: Computer organization and design / John L. Hennessy, David A. Patterson. 1998. Summary: “Presents the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies and I/O”— Provided by publisher. ISBN 978-0-12-374750-1 (pbk.) 1. Computer organization. 2. Computer engineering. 3. Computer interfaces. I. Hennessy, John L. II. Hennessy, John L. Computer organization and design. III. Title. QA76.9.C643H46 2011 004.2´2—dc23 2011029199 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-374750-1 Printed in the United States of America 12 13 14 15 16 10 9 8 7 6 5 4 3 2 For information on all MK publications visit our website at www.mkp.com
Contents Preface xv CHAPTERS 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Below Your Program 10 1.3 Under the Covers 13 1.4 Performance 26 1.5 The Power Wall 39 1.6 The Sea Change:The Switch from Uniprocessors to Multiprocessors 41 1.7 Real Stuff:Manufacturing and Benchmarking the AMD Opteron X4 44 1.8 Fallacies and Pitfalls 51 1.9 Concluding Remarks 54 1.10 Historical Perspective and Further Reading 55 1.11 Exercises 56 2 Instructions:Language of the Computer 74 2.1 Introduction 76 2.2 Operations of the Computer Hardware 77 2.3 Operands of the Computer Hardware 80 2.4 Signed and Unsigned Numbers 87 2.5 Representing Instructions in the Computer 94 2.6 Logical Operations 102 2.7 Instructions for Making Decisions 105 2.8 Supporting Procedures in Computer Hardware 112 2.9 Communicating with People 122 2.10 MIPS Addressing for 32-Bit Immediates and Addresses 128 2.11 Parallelism and Instructions:Synchronization 137 2.12 Translating and Starting a Program 139 2.13 A C Sort Example to Put It All Together 149
Contents Preface xv C H A P T E R S 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Below Your Program 10 1.3 Under the Covers 13 1.4 Performance 26 1.5 The Power Wall 39 1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors 41 1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4 44 1.8 Fallacies and Pitfalls 51 1.9 Concluding Remarks 54 1.10 Historical Perspective and Further Reading 55 1.11 Exercises 56 2 Instructions: Language of the Computer 74 2.1 Introduction 76 2.2 Operations of the Computer Hardware 77 2.3 Operands of the Computer Hardware 80 2.4 Signed and Unsigned Numbers 87 2.5 Representing Instructions in the Computer 94 2.6 Logical Operations 102 2.7 Instructions for Making Decisions 105 2.8 Supporting Procedures in Computer Hardware 112 2.9 Communicating with People 122 2.10 MIPS Addressing for 32-Bit Immediates and Addresses 128 2.11 Parallelism and Instructions: Synchronization 137 2.12 Translating and Starting a Program 139 2.13 A C Sort Example to Put It All Together 149
Contents 2.14 Arrays versus Pointers 157 E 2.15 Advanced Material:Compiling C and Interpreting Java 161 2.16 Real Stuff:ARM Instructions 161 2.17 Real Stuff:x86 Instructions 165 2.18 Fallacies and Pitfalls 174 2.19 Concluding Remarks 176 2.20 Historical Perspective and Further Reading 179 2.21 Exercises 179 3 Arithmetic for Computers 222 3.1 Introduction 224 3.2 Addition and Subtraction 224 3.3 Multiplication 230 3.4 Division 236 3.5 Floating Point 242 3.6 Parallelism and Computer Arithmetic:Associativity 270 3.7 Real Stuff:Floating Point in the x86 272 3.8 Fallacies and Pitfalls 275 3.9 Concluding Remarks 280 3.10 Historical Perspective and Further Reading 283 3.11 Exercises 283 The Processor 298 4.1 Introduction 300 4.2 Logic Design Conventions 303 4.3 Building a Datapath 307 4.4 A Simple Implementation Scheme 316 4.5 An Overview of Pipelining 330 4.6 Pipelined Datapath and Control 344 4.7 Data Hazards:Forwarding versus Stalling 363 4.8 Control Hazards 375 4.9 Exceptions 384 4.10 Parallelism and Advanced Instruction-Level Parallelism 391 4.11 Real Stuff:the AMD Opteron X4 (Barcelona)Pipeline 404 4.12 Advanced Topic:an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 406 4.13 Fallacies and Pitfalls 407 4.14 Concluding Remarks 408 4.15 Historical Perspective and Further Reading 409 4.16 Exercises 409
2.14 Arrays versus Pointers 157 2.15 Advanced Material: Compiling C and Interpreting Java 161 2.16 Real Stuff: ARM Instructions 161 2.17 Real Stuff: x86 Instructions 165 2.18 Fallacies and Pitfalls 174 2.19 Concluding Remarks 176 2.20 Historical Perspective and Further Reading 179 2.21 Exercises 179 3 Arithmetic for Computers 222 3.1 Introduction 224 3.2 Addition and Subtraction 224 3.3 Multiplication 230 3.4 Division 236 3.5 Floating Point 242 3.6 Parallelism and Computer Arithmetic: Associativity 270 3.7 Real Stuff: Floating Point in the x86 272 3.8 Fallacies and Pitfalls 275 3.9 Concluding Remarks 280 3.10 Historical Perspective and Further Reading 283 3.11 Exercises 283 4 The Processor 298 4.1 Introduction 300 4.2 Logic Design Conventions 303 4.3 Building a Datapath 307 4.4 A Simple Implementation Scheme 316 4.5 An Overview of Pipelining 330 4.6 Pipelined Datapath and Control 344 4.7 Data Hazards: Forwarding versus Stalling 363 4.8 Control Hazards 375 4.9 Exceptions 384 4.10 Parallelism and Advanced Instruction-Level Parallelism 391 4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline 404 4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 406 4.13 Fallacies and Pitfalls 407 4.14 Concluding Remarks 408 4.15 Historical Perspective and Further Reading 409 4.16 Exercises 409 x Contents
Contents xi 5 Large and Fast:Exploiting Memory Hierarchy 450 5.1 Introduction 452 5.2 The Basics of Caches 457 5.3 Measuring and Improving Cache Performance 475 5.4 Virtual Memory 492 5.5 A Common Framework for Memory Hierarchies 518 5.6 Virtual Machines 525 5.7 Using a Finite-State Machine to Control a Simple Cache 529 5.8 Parallelism and Memory Hierarchies:Cache Coherence 534 5.9 Advanced Material:Implementing Cache Controllers 538 5.10 Real Stuff:the AMD Opteron X4(Barcelona)and Intel Nehalem Memory Hierarchies 539 5.11 Fallacies and Pitfalls 543 5.12 Concluding Remarks 547 5.13 Historical Perspective and Further Reading 548 5.14 Exercises 548 6 Storage and Other I/O Topics 568 6.1 Introduction 570 6.2 Dependability,Reliability,and Availability 573 6.3 Disk Storage 575 6.4 Flash Storage 580 6.5 Connecting Processors,Memory,and I/O Devices 582 6.6 Interfacing I/O Devices to the Processor,Memory,and Operating System 586 6.7 I/O Performance Measures:Examples from Disk and File Systems 596 6.8 Designing an I/O System 598 6.9 Parallelism and I/O:Redundant Arrays of Inexpensive Disks 599 6.10 Real Stuff:Sun Fire x4150 Server 606 6.11 Advanced Topics:Networks 612 6.12 Fallacies and Pitfalls 613 6.13 Concluding Remarks 617 6.14 Historical Perspective and Further Reading 618 6.15 Exercises 619 7 Multicores,Multiprocessors,and Clusters 630 7.1 Introduction 632 7.2 The Difficulty of Creating Parallel Processing Programs 634 7.3 Shared Memory Multiprocessors 638
5 Large and Fast: Exploiting Memory Hierarchy 450 5.1 Introduction 452 5.2 The Basics of Caches 457 5.3 Measuring and Improving Cache Performance 475 5.4 Virtual Memory 492 5.5 A Common Framework for Memory Hierarchies 518 5.6 Virtual Machines 525 5.7 Using a Finite-State Machine to Control a Simple Cache 529 5.8 Parallelism and Memory Hierarchies: Cache Coherence 534 5.9 Advanced Material: Implementing Cache Controllers 538 5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies 539 5.11 Fallacies and Pitfalls 543 5.12 Concluding Remarks 547 5.13 Historical Perspective and Further Reading 548 5.14 Exercises 548 6 Storage and Other I/O Topics 568 6.1 Introduction 570 6.2 Dependability, Reliability, and Availability 573 6.3 Disk Storage 575 6.4 Flash Storage 580 6.5 Connecting Processors, Memory, and I/O Devices 582 6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System 586 6.7 I/O Performance Measures: Examples from Disk and File Systems 596 6.8 Designing an I/O System 598 6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks 599 6.10 Real Stuff: Sun Fire x4150 Server 606 6.11 Advanced Topics: Networks 612 6.12 Fallacies and Pitfalls 613 6.13 Concluding Remarks 617 6.14 Historical Perspective and Further Reading 618 6.15 Exercises 619 7 Multicores, Multiprocessors, and Clusters 630 7.1 Introduction 632 7.2 The Difficulty of Creating Parallel Processing Programs 634 7.3 Shared Memory Multiprocessors 638 Contents xi