Preface Database management has evolved from a specialized computer application to a doral component of a modern computing environment,and,as a resu central co sult kr edge art of an edu compute science.In this text.v A management.The e aspects of databasedesign database lan guages,and database-system implementation. This text is intended for a first course in databases at the junior or senior undergraduate,or first-year graduate,level.In addition to basic material for a first course,the text contains advanced material that can be used for course supplements,or as introductory material for an advanced course. We assume only a familiarity with basic data structures,computer organi- zation,and a high-level programming language such as Java,C,or Pascal.We hy a mitted.In place ormal d proofs,figures iption and xam s are sed to ugges s tru be foun in research papers and advanced texts that are referenced in the biblio graphical notes The fundamenta l concepts and algorithms covered in the book are often based on those used in existing commercial or experimental database systems Our aim is to present these concepts and algorithms in a general setting that is not tied to one particular database system.Details of particular database systems are discussed in Part 9,"Case Studies." In this.the sixth edition of Database sustem Concents.we have retained the overall style of the prior editions while evolving the content and organization to reflect the changes that areoccurring in the way databases are designed,managed, and used we have also taken in ount trends in the teachir ng of database concepts and made adaptations to facilitate th se trends where appropriate
Preface Database management has evolved from a specialized computer application to a central component of a modern computing environment, and, as a result, knowledge about database systems has become an essential part of an education in computer science. In this text, we present the fundamental concepts of database management. These concepts include aspects of database design, database languages, and database-system implementation. This text is intended for a first course in databases at the junior or senior undergraduate, or first-year graduate, level. In addition to basic material for a first course, the text contains advanced material that can be used for course supplements, or as introductory material for an advanced course. We assume only a familiarity with basic data structures, computer organization, and a high-level programming language such as Java, C, or Pascal. We present concepts as intuitive descriptions, many of which are based on our running example of a university. Important theoretical results are covered, but formal proofs are omitted. In place of proofs, figures and examples are used to suggest why a result is true. Formal descriptions and proofs of theoretical results may be found in research papers and advanced texts that are referenced in the bibliographical notes. The fundamental concepts and algorithms covered in the book are often based on those used in existing commercial or experimental database systems. Our aim is to present these concepts and algorithms in a general setting that is not tied to one particular database system. Details of particular database systems are discussed in Part 9, “Case Studies.” In this, the sixth edition of Database System Concepts, we have retained the overall style of the prior editions while evolving the content and organization to reflect the changes that are occurring in the way databases are designed, managed, and used. We have also taken into account trends in the teaching of database concepts and made adaptations to facilitate these trends where appropriate. xv
xvi Preface Organization The text is organized in nine major parts,plus five appendices. and purpose of database systems. system has developed,what the common features of dat se systems are, what a database system does for the user,and how a database system in- terfaces with operating systems.We also introduce an example database application:a university organization consisting of multiple departments, instructors,students,and courses.This application is used as a running ex- ample throughout the book.This chapter is motivational,historical,and ex- planatory in nature. Part 1:Relational databases(Chapters 2 through 6).Chapter 2 introduces the relational model of data,covering basic concepts such as the structure of relational databases database schemas,keys,sch adia 10 ns. 2ptes34an5 mos Chapter 6 cov ers the formal relational query languages:relational algebra,tuple relational calculus,and domain relational calculus. The chapters in this part describe data manipulation:queries,updates,in- sertions,and deletions,assuming a schema design has been provided.Schema design issues are deferred to Part 2. Part 2:Database Design(Chapters 7 through 9).Chapter 7 provides an overview of the databas process,with major emphasis on database design using the entity-relationship data model.The e entity-relationship data model provides a high-level view of the issues in database design,and of the s that we encounte UML also covered i Chapter 8 introduces the theory of relational database design.The the- ory of functional dependencies and normalization is covered,with emphasis on the motivation and intuitive understanding of each normal form.This chapter begins with an overview of relational design and relies on an intu- itive understanding of logical implication of functional dependencies.This allows the concept of normalization to be introduced prior to full coverage of functional-dependency theory.which is presented later in the chapter.In- structors may choose to use only this initial coverage in Sections 8.1 through 8.3 without loss of continuity.Instructors co od u derstandi enging concept Web-basedn 0 eae the construction of databas
xvi Preface Organization The text is organized in nine major parts, plus five appendices. • Overview (Chapter 1). Chapter 1 provides a general overview of the nature and purpose of database systems. We explain how the concept of a database system has developed, what the common features of database systems are, what a database system does for the user, and how a database system interfaces with operating systems. We also introduce an example database application: a university organization consisting of multiple departments, instructors, students, and courses. This application is used as a running example throughout the book. This chapter is motivational, historical, and explanatory in nature. • Part 1: Relational Databases (Chapters 2 through 6). Chapter 2 introduces the relational model of data, covering basic concepts such as the structure of relational databases, database schemas, keys, schema diagrams, relational query languages, and relational operations. Chapters 3, 4, and 5 focus on the most influential of the user-oriented relational languages: SQL. Chapter 6 covers the formal relational query languages: relational algebra, tuple relational calculus, and domain relational calculus. The chapters in this part describe data manipulation: queries, updates, insertions, and deletions, assuming a schema design has been provided. Schema design issues are deferred to Part 2. • Part 2: Database Design (Chapters 7 through 9). Chapter 7 provides an overview of the database-design process, with major emphasis on database design using the entity-relationship data model. The entity-relationship data model provides a high-level view of the issues in database design, and of the problems that we encounter in capturing the semantics of realistic applications within the constraints of a data model. UML class-diagram notation is also covered in this chapter. Chapter 8 introduces the theory of relational database design. The theory of functional dependencies and normalization is covered, with emphasis on the motivation and intuitive understanding of each normal form. This chapter begins with an overview of relational design and relies on an intuitive understanding of logical implication of functional dependencies. This allows the concept of normalization to be introduced prior to full coverage of functional-dependency theory, which is presented later in the chapter. Instructors may choose to use only this initial coverage in Sections 8.1 through 8.3 without loss of continuity. Instructors covering the entire chapter will benefit from students having a good understanding of normalization concepts to motivate some of the challenging concepts of functional-dependency theory. Chapter 9 covers application design and development. This chapter emphasizes the construction of database applications with Web-based interfaces. In addition, the chapter covers application security
Preface xvii Part 3:Datatornhapters 10 through 13).Chapter 1 als with A variet acc tree indice and hashing.Chapters 12 and 13 address query-evaluati on algo rithms and query optimization.These chapters provide an understanding of the internals of the storage and retrieval components of a database. Part 4:Transaction Management(Chapters 14 through 16).Chapter 14 fo- cuses on the fundamentals of a transaction-processing system:atomicity, consistency,isolation,and durability.It provides an overview of the methods used to en ure these properties,including locking and snapshot isolation. Chapter 15 focuses on concurrency control and for serializability,includi ng locking,ti and ation ch The ping apte tives to ser widely-use ho y are cover h is disc ssed in detail Chapter 16 covers the primary techniques for ensuring correct transac. tion execution despite system crashes and storage failures.These techniques include logs,checkpoints,and database dumps.The widely-used ARIES al- gorithm is presented. Part 5:System Architecture (Chapters 17 through 19).Chapter 17 covers compute system architecture.and describes the influence of the underly puter system on the database system.We discuss centralized sy ms,client-server systems,and paralleland distributed architectures in this 18,on parallel datab expl ores a variety of parallelization tec iques,incl ding I/parallelism,interquery and intraquery parallelism and interoperation and intraoperation parallelism.The chapter also describes Paral 1-system design. Chapter 19 covers distributed database systems,revisiting the issues of database design,transaction management,and query evaluation and op timization,in the context of distributed databases.The chapter also cov ers issues of system availability during failures,heterogeneous distributed databases,cloud-based databases,and distributed directory systems. Part 6:Data warehousin Data Mining and Information Retrieval(Chap s20and21). Ch 20 and data apte epts of data min 21 d querying textua data,including hyperlink-based techniques used search engines Part 6 uses the modeling and language concepts from Parts 1 and 2,but does not depend on Parts 3,4,or 5.It can therefore be incorporated easily into a course that focuses on SQL and on database design
Preface xvii • Part 3: Data Storage and Querying (Chapters 10 through 13). Chapter 10 deals with storage devices, files, and data-storage structures. A variety of data-access techniques are presented in Chapter 11, including B+-tree indices and hashing. Chapters 12 and 13 address query-evaluation algorithms and query optimization. These chapters provide an understanding of the internals of the storage and retrieval components of a database. • Part 4: Transaction Management (Chapters 14 through 16). Chapter 14 focuses on the fundamentals of a transaction-processing system: atomicity, consistency, isolation, and durability. It provides an overview of the methods used to ensure these properties, including locking and snapshot isolation. Chapter 15 focuses on concurrency control and presents several techniques for ensuring serializability, including locking, timestamping, and optimistic (validation) techniques. The chapter also covers deadlock issues. Alternatives to serializability are covered, most notably the widely-used snapshot isolation, which is discussed in detail. Chapter 16 covers the primary techniques for ensuring correct transaction execution despite system crashes and storage failures. These techniques include logs, checkpoints, and database dumps. The widely-used ARIES algorithm is presented. • Part 5: System Architecture (Chapters 17 through 19). Chapter 17 covers computer-system architecture, and describes the influence of the underlying computer system on the database system. We discuss centralized systems, client–server systems, and parallel and distributed architectures in this chapter. Chapter 18, on parallel databases, explores a variety of parallelization techniques, including I/O parallelism, interquery and intraquery parallelism, and interoperation and intraoperation parallelism. The chapter also describes parallel-system design. Chapter 19 covers distributed database systems, revisiting the issues of database design, transaction management, and query evaluation and optimization, in the context of distributed databases. The chapter also covers issues of system availability during failures, heterogeneous distributed databases, cloud-based databases, and distributed directory systems. • Part 6: Data Warehousing, Data Mining, and Information Retrieval (Chapters 20 and 21). Chapter 20 introduces the concepts of data warehousing and data mining. Chapter 21 describes information-retrieval techniques for querying textual data, including hyperlink-based techniques used in Web search engines. Part 6 uses the modeling and language concepts from Parts 1 and 2, but does not depend on Parts 3, 4, or 5. It can therefore be incorporated easily into a course that focuses on SQL and on database design
xviii Preface Part 7:Specialty Databases(Chapters 22 and 23).Chapter 22 covers object- based databases.The chapter describes the object-relational data model, object-oriented p rogramming lanc apter 23c Mstandard representation,which is see incr 8 ing use in the exchange and storage of complex data.The chapterals describes query languages for XMI .Part 8:Advanced Topics(Chapters 24 through 26).Chapter 24 covers ad- vanced issues in application development,including performance tuning, performance benchmarks,database-application testing,and standardization. Chapter 25 covers spatial and geographic data,temporal data,multimedia data,and issues in the management of mobile and personal databases. Finally,Chapter 26 deals with advanced transaction processing.Top ics covered in the chapter include transaction-processing monitors,transac- tional workflo ws.electro onic commerce,high erformance transaction sys tems,real-time ansaction n syste ms,and long-duration transactions Part 9:Case Studies(Cha apters 27 through 30).In this die a we pre s of fo resent cas r of e lead ng datab system Oracle and Microsoft sQ erver nese chapter s outline unique features of ea these systems,and describe their internal structure.They provide a wealth of interesting information about the respective products,and help you see how the various implementation techniques described in earlier parts are used in real systems.They also cover several interesting practical aspects in the design of real systems. Appendices.We provide five appendices that cover material that is of histor nature or is advanced:thes ppendices are available Web site of the book(http:// m).An otion is A ndix A our unive sity sch the full schema bles.Thi appears in t e actua 1 App r relational query languages,including QBE and Datal 0g Appendix de es advanced relational database design,including the theory of multivalued dependencies,join dependencies,and the project-join and domain-kev normal forms.This appendix is for the benefit of individuals who wish to study the theory of relational database design in more detail, and instructors who wish to do so in their courses.This appendix,too,is available only online.on the web site of the book. Although most new database ostew database applcations use either the relational etwork and hierarchical data models are still in use in s ome lega and habout dat Yelications,Eorthebenehtofreadswhodna eprovide erarc ata mc odels,in Appendic and E respectively
xviii Preface • Part 7: Specialty Databases (Chapters 22 and 23). Chapter 22 covers objectbased databases. The chapter describes the object-relational data model, which extends the relational data model to support complex data types, type inheritance, and object identity. The chapter also describes database access from object-oriented programming languages. Chapter 23 covers the XML standard for data representation, which is seeing increasing use in the exchange and storage of complex data. The chapter also describes query languages for XML. • Part 8: Advanced Topics (Chapters 24 through 26). Chapter 24 covers advanced issues in application development, including performance tuning, performance benchmarks, database-application testing, and standardization. Chapter 25 covers spatial and geographic data, temporal data, multimedia data, and issues in the management of mobile and personal databases. Finally, Chapter 26 deals with advanced transaction processing. Topics covered in the chapter include transaction-processing monitors, transactional workflows, electronic commerce, high-performance transaction systems, real-time transaction systems, and long-duration transactions. • Part 9: Case Studies (Chapters 27 through 30). In this part, we present case studies of four of the leading database systems, PostgreSQL, Oracle, IBM DB2, and Microsoft SQL Server. These chapters outline unique features of each of these systems, and describe their internal structure. They provide a wealth of interesting information about the respective products, and help you see how the various implementation techniques described in earlier parts are used in real systems. They also cover several interesting practical aspects in the design of real systems. • Appendices. We provide five appendices that cover material that is of historical nature or is advanced; these appendices are available only online on the Web site of the book (http://www.db-book.com). An exception is Appendix A, which presents details of our university schema including the full schema, DDL, and all the tables. This appendix appears in the actual text. Appendix B describes other relational query languages, including QBE Microsoft Access, and Datalog. Appendix C describes advanced relational database design, including the theory of multivalued dependencies, join dependencies, and the project-join and domain-key normal forms. This appendix is for the benefit of individuals who wish to study the theory of relational database design in more detail, and instructors who wish to do so in their courses. This appendix, too, is available only online, on the Web site of the book. Although most new database applications use either the relational model or the object-relational model, the network and hierarchical data models are still in use in some legacy applications. For the benefit of readers who wish to learn about these data models, we provide appendices describing the network and hierarchical data models, in Appendices D and E respectively
Preface xix The Sixth Edition The production of this sixth edition has been guided by the many comments and suggestions we received concerning the earlier editions,by our own observations while teaching at Yale University,Lehigh University,and IIT Bombay,and by our analysis of the directions in which database technology is evolving We have replaced the earlier running example of bank enterprise with a uni- versity example.This example has an immediate intuitive connection to students ing deeper insight into the various desigr ve re nized the book so as to collect all of our together e it early in ok.Cha ers 3,4,and 5pr sent com nts the basics of the language,with Chapter 4.In Chapter 5,we present JDBC along with other means of acces SQL from a general-purpose programming language.We present triggers and re cursion,and then conclude with coverage of online analytic proces ng (OLAP) Introductory courses may choose to cover only certain sections of Chapter 5 or defer sections until after the coverage of database design without loss of continu- Beyond these two major changes,we revised the material in each chapter, bringi d diffic ult to d.We h dded ises and updated fspecific changes in s the foll ·Earlier coverage of SQL.Many instruc rs us SQL as a key component of term projects see our We eb site,www k.com,for sample projects).In order to give students ample time for the projects,particularly for universities and colleges on the quarter system,it is essential to teach SQL as early as possible With this in mind,we have undertaken several changes in organization: A new chapter on the relational model (Chapter 2)precedes QL,laying the conceptual foundation,without getting lost in details of relational algebra. o Chapters 3,4,and 5 provide detailed coverage of SQL.These chapters also discuss variants supported by different database systems,to minimize problems that students face when they execute queries on actual database systems.These chapters cover all asp cts of soI includin queries,data definition constraint s ecification,OLAP,and the use of SOL fr rom within a variety of languages, ncluding Java/JDBC. o Formal lar 9 t er 6)have been postpon ed to after SQL and can ted other chapters.O discussion of query optimization in Chapter 13 depends on the relational my ou algebra coverage of Chapter 6
Preface xix The Sixth Edition The production of this sixth edition has been guided by the many comments and suggestions we received concerning the earlier editions, by our own observations while teaching at Yale University, Lehigh University, and IIT Bombay, and by our analysis of the directions in which database technology is evolving. We have replaced the earlier running example of bank enterprise with a university example. This example has an immediate intuitive connection to students that assists not only in remembering the example, but, more importantly, in gaining deeper insight into the various design decisions that need to be made. We have reorganized the book so as to collect all of our SQL coverage together and place it early in the book. Chapters 3, 4, and 5 present complete SQL coverage. Chapter 3 presents the basics of the language, with more advanced features in Chapter 4. In Chapter 5, we present JDBC along with other means of accessing SQL from a general-purpose programming language. We present triggers and recursion, and then conclude with coverage of online analytic processing (OLAP). Introductory courses may choose to cover only certain sections of Chapter 5 or defer sections until after the coverage of database design without loss of continuity. Beyond these two major changes, we revised the material in each chapter, bringing the older material up-to-date, adding discussions on recent developments in database technology, and improving descriptions of topics that students found difficult to understand. We have also added new exercises and updated references. The list of specific changes includes the following: • Earlier coverage of SQL. Many instructors use SQL as a key component of term projects (see our Web site, www.db-book.com, for sample projects). In order to give students ample time for the projects, particularly for universities and colleges on the quarter system, it is essential to teach SQL as early as possible. With this in mind, we have undertaken several changes in organization: ◦ A new chapter on the relational model (Chapter 2) precedes SQL, laying the conceptual foundation, without getting lost in details of relational algebra. ◦ Chapters 3, 4, and 5 provide detailed coverage of SQL. These chapters also discuss variants supported by different database systems, to minimize problems that students face when they execute queries on actual database systems. These chapters cover all aspects of SQL, including queries, data definition, constraint specification, OLAP, and the use of SQL from within a variety of languages, including Java/JDBC. ◦ Formal languages (Chapter 6) have been postponed to after SQL, and can be omitted without affecting the sequencing of other chapters. Only our discussion of query optimization in Chapter 13 depends on the relational algebra coverage of Chapter 6