Contents XXI 28.3 Mobile Databases 28.4 Main Memory Databases 28.5 Multimedia Databases 28.6 Geographic Information Systems 28.7 Temporal and Sequence Databases 28 8 Information Visualization 28.9 Summary a databaSE DESIGN CASE STUDY: THE INTERNET SHOP A 1 Requirements Analy A2 Conceptual Design A 3 Logical database design A 4 Schema Refinement A.5 Physical Database Design A.5.1 Tuning the Database A 6 Security A7 Application Layers B THE MINIBASE SOFTWARE 842 B. 1 What's Available B2 Overview of Minibase Assignments 843 B.2.1 Overview of Programming Projects B.2.2 Overview of Nonprogramming Assignme B3 Acknowledgments REFERENCES SUBJECT INDEX AUTHOR INDEX
Contents xxi 28.3 Mobile Databases 825 28.4 Main Memory Databases 825 28.5 Multimedia Databases 826 28.6 Geographic Information Systems 827 28.7 Temporal and Sequence Databases 828 28.8 Information Visualization 829 28.9 Summary 829 A DATABASE DESIGN CASE STUDY: THE INTERNET SHOP 831 A.1 Requirements Analysis 831 A.2 Conceptual Design 832 A.3 Logical Database Design 832 A.4 Schema Refinement 835 A.5 Physical Database Design 836 A.5.1 Tuning the Database 838 A.6 Security 838 A.7 Application Layers 840 B THE MINIBASE SOFTWARE 842 B.1 What’s Available 842 B.2 Overview of Minibase Assignments 843 B.2.1 Overview of Programming Projects 843 B.2.2 Overview of Nonprogramming Assignments 844 B.3 Acknowledgments 845 REFERENCES 847 SUBJECT INDEX 879 AUTHOR INDEX 896
PREFACE dvantage of doing one's praising for oneself is that one can lay it on so thick xactly in the right places aging information, and a course on the principles and practice of database SybLeBsR Database management systems have become ubiquitous as a fundamental tool for ma now an integral part of computer science curricula. This book covers the fundamentals of modern database management systems, in particular relational database systems It is intended as a text for an introductory database course for undergraduates, and we have attempted to present the material in a clear, simple style a quantitative approach is used throughout and detailed examples abound. An exten- sive set of exercises(for which solutions are available online to instructors)accompanies each chapter and reinforces students'ability to apply the concepts to real problems. The book contains enough material to support a second course, ideally supplemented by selected research papers. It can be used, with the accompanying software and SQL programming assignments, in two distinct kinds of introductory courses 1. A course that aims to present the principles of database systems, with a practical focus but without any implementation assignments. The SQL programming as- signments are a useful supplement for such a course. The supplementary minibase software can be used to create exercises and experiments with no programming 2. A course that has a strong systems emphasis and assumes that students have good programming skills in C and C++. In this case the software can be used as the basis for projects in which students are asked to implement various parts of a relational DBMS. Several central modules in the project software (e.g, heap files, buffer manager, B+ trees, hash indexes, various join methods, concurrency control, and recovery algorithms) are described in sufficient detail in the text te enable students to implement them, given the(c++) class interface Many instructors will no doubt teach a course that falls between these two extremes
PREFACE The advantage of doing one’s praising for oneself is that one can lay it on so thick and exactly in the right places. —Samuel Butler Database management systems have become ubiquitous as a fundamental tool for managing information, and a course on the principles and practice of database systems is now an integral part of computer science curricula. This book covers the fundamentals of modern database management systems, in particular relational database systems. It is intended as a text for an introductory database course for undergraduates, and we have attempted to present the material in a clear, simple style. A quantitative approach is used throughout and detailed examples abound. An extensive set of exercises (for which solutions are available online to instructors) accompanies each chapter and reinforces students’ ability to apply the concepts to real problems. The book contains enough material to support a second course, ideally supplemented by selected research papers. It can be used, with the accompanying software and SQL programming assignments, in two distinct kinds of introductory courses: 1. A course that aims to present the principles of database systems, with a practical focus but without any implementation assignments. The SQL programming assignments are a useful supplement for such a course. The supplementary Minibase software can be used to create exercises and experiments with no programming. 2. A course that has a strong systems emphasis and assumes that students have good programming skills in C and C++. In this case the software can be used as the basis for projects in which students are asked to implement various parts of a relational DBMS. Several central modules in the project software (e.g., heap files, buffer manager, B+ trees, hash indexes, various join methods, concurrency control, and recovery algorithms) are described in sufficient detail in the text to enable students to implement them, given the (C++) class interfaces. Many instructors will no doubt teach a course that falls between these two extremes. xxii
Preface Choice of Topics The choice of material has been infuenced by these considerations: I To concentrate on issues central to the design, tuning, and implementation of rela- tional database applications. However, many of the issues discussed(e. g, buffering d access methods) are not specific to relational systems, and additional topics such as decision support and object-database systems are covered in later chapters To provide adequate coverage of implementation topics to support a concurrent laboratory section or course project. For example, implementation of relational operations has been covered in more detail than is necessary in a first course However, the variety of alternative implementation techniques permits a wide choice of project assignments. An instructor who wishes to assign implementation of sort-merge join might cover that topic in depth, whereas another might choose a To provide in-depth coverage of the state of the art in currently available commer- cial systems, rather than a broad coverage of several alternatives. For example we discuss the relational data model, B+ trees, SQL, System R style query op- timization, lock-based concurrency control, the ARIES recovery algorithm, the two-phase commit protocol, asynchronous replication in distributed databases and object-relational DBMSs in detail, with numerous illustrative examples. This is made possible by omitting or briefly covering some related topics such as the hierarchical and network models, B tree variants, Quel, semantic query optimiza- tion, view serializability, the shadow-page recovery algorithm, and the three-phase commit protocoL. The same preference for in-depth coverage of selected topics governed our choice of topics for chapters on advanced material. Instead of covering a broad range of topics briefly, we have chosen topics that we believe to be practically important and at the cutting edge of current thinking in database systems, and we have covered them in depth New in the second edition Based on extensive user surveys and feedback, we have refined the books organization. The major change is the early introduction of the ER model, together with a discussion of conceptual database design. As in the first edition, we introduce SQL-92's data definition features together with the relational model (in Chapter 3), and whenever appropriate, relational model concepts(e.g, definition of a relation, updates, views, ER to relational mapping) are illustrated and discussed in the context of SQL. Of course, we maintain a careful separation between the concepts and their SQL realization. The naterial on data storage, file organization, and indexes has been moved back, and the
Preface xxiii Choice of Topics The choice of material has been influenced by these considerations: To concentrate on issues central to the design, tuning, and implementation of relational database applications. However, many of the issues discussed (e.g., buffering and access methods) are not specific to relational systems, and additional topics such as decision support and object-database systems are covered in later chapters. To provide adequate coverage of implementation topics to support a concurrent laboratory section or course project. For example, implementation of relational operations has been covered in more detail than is necessary in a first course. However, the variety of alternative implementation techniques permits a wide choice of project assignments. An instructor who wishes to assign implementation of sort-merge join might cover that topic in depth, whereas another might choose to emphasize index nested loops join. To provide in-depth coverage of the state of the art in currently available commercial systems, rather than a broad coverage of several alternatives. For example, we discuss the relational data model, B+ trees, SQL, System R style query optimization, lock-based concurrency control, the ARIES recovery algorithm, the two-phase commit protocol, asynchronous replication in distributed databases, and object-relational DBMSs in detail, with numerous illustrative examples. This is made possible by omitting or briefly covering some related topics such as the hierarchical and network models, B tree variants, Quel, semantic query optimization, view serializability, the shadow-page recovery algorithm, and the three-phase commit protocol. The same preference for in-depth coverage of selected topics governed our choice of topics for chapters on advanced material. Instead of covering a broad range of topics briefly, we have chosen topics that we believe to be practically important and at the cutting edge of current thinking in database systems, and we have covered them in depth. New in the Second Edition Based on extensive user surveys and feedback, we have refined the book’s organization. The major change is the early introduction of the ER model, together with a discussion of conceptual database design. As in the first edition, we introduce SQL-92’s data definition features together with the relational model (in Chapter 3), and whenever appropriate, relational model concepts (e.g., definition of a relation, updates, views, ER to relational mapping) are illustrated and discussed in the context of SQL. Of course, we maintain a careful separation between the concepts and their SQL realization. The material on data storage, file organization, and indexes has been moved back, and the
DATABASE MANAGEMENT SYSTEMS material on relational queries has been moved forward. Nonetheless, the two parts (storage and organization vs. queries) can still be taught in either order based on the instructor's preferences In order to facilitate brief coverage in a first course. the second edition contains overview chapters on transaction processing and query optimization. Most chapters have been revised extensively, and additional explanations and figures have been added in many places. For example, the chapters on query languages now contain a uniform numberin of all queries to facilitate comparisons of the same query (in algebra, calculus, and SQL), and the results of several queries are shown in figures. JDBC and ODBC coverage has been added to the sQL query chapter and SQL: 1999 features are discussed both in this chapter and the chapter on object-relational databases. A discussion of RAID has been added to Chapter 7. We have added a new database design case study, illustrating the entire design cycle, as an appendix. Two new pedagogical features have been introduced. First, "floating boxes provide ad- ditional perspective and relate the concepts to real systems, while keeping the cussion free of product-specific details. Second, each chapter concludes with a'Point to Review' section that summarizes the main ideas introduced in the chapter and ncludes pointers to the sections where they are discussed For use in a second course, many advanced chapters from the first edition have been extended or split into multiple chapters to provide thorough coverage of current top- ics. In particular, new material has been added to the chapters on decision support, deductive databases, and object databases. New chapters on Internet databases, data mining, and spatial databases have been added, greatly expanding the coverage of these topics. The material can be divided into roughly seven parts, as indicated in Figure 0.1, which also shows the dependencies between chapters. An arrow from Chapter I to Chapter J means that I depends on material in J. The broken arrows indicate a weak dependency, which can be ignored at the instructors discretion. It is recommended that Part I be covered first, followed by Part II and Part III (in either order). Other than these three parts, dependencies across parts are minimal Order of presentation The book's modular organization offers instructors a variety of choices. For exam- sIe, some instructors will want to cover SQL and get students to use a relational database, before discussing file organizations or indexing: they should cover Part II before Part Ill. In fact, in a course that emphasizes concepts and SQL, many of the implementation-oriented chapters might be skipped. On the other hand, instructor assigning implementation projects based on file organizations may want to cover Part
xxiv Database Management Systems material on relational queries has been moved forward. Nonetheless, the two parts (storage and organization vs. queries) can still be taught in either order based on the instructor’s preferences. In order to facilitate brief coverage in a first course, the second edition contains overview chapters on transaction processing and query optimization. Most chapters have been revised extensively, and additional explanations and figures have been added in many places. For example, the chapters on query languages now contain a uniform numbering of all queries to facilitate comparisons of the same query (in algebra, calculus, and SQL), and the results of several queries are shown in figures. JDBC and ODBC coverage has been added to the SQL query chapter and SQL:1999 features are discussed both in this chapter and the chapter on object-relational databases. A discussion of RAID has been added to Chapter 7. We have added a new database design case study, illustrating the entire design cycle, as an appendix. Two new pedagogical features have been introduced. First, ‘floating boxes’ provide additional perspective and relate the concepts to real systems, while keeping the main discussion free of product-specific details. Second, each chapter concludes with a ‘Points to Review’ section that summarizes the main ideas introduced in the chapter and includes pointers to the sections where they are discussed. For use in a second course, many advanced chapters from the first edition have been extended or split into multiple chapters to provide thorough coverage of current topics. In particular, new material has been added to the chapters on decision support, deductive databases, and object databases. New chapters on Internet databases, data mining, and spatial databases have been added, greatly expanding the coverage of these topics. The material can be divided into roughly seven parts, as indicated in Figure 0.1, which also shows the dependencies between chapters. An arrow from Chapter I to Chapter J means that I depends on material in J. The broken arrows indicate a weak dependency, which can be ignored at the instructor’s discretion. It is recommended that Part I be covered first, followed by Part II and Part III (in either order). Other than these three parts, dependencies across parts are minimal. Order of Presentation The book’s modular organization offers instructors a variety of choices. For example, some instructors will want to cover SQL and get students to use a relational database, before discussing file organizations or indexing; they should cover Part II before Part III. In fact, in a course that emphasizes concepts and SQL, many of the implementation-oriented chapters might be skipped. On the other hand, instructors assigning implementation projects based on file organizations may want to cover Part
Preface XXV 9 Relational Algebra Introduction, Data Storage 8 ER Model Introduction to 10 nceptual Design SQL Queries, etc. File Organizations( Hash Indexes 3 RelationalModel SQL DDL 12 14 External Sorting Introduction to A Typical Relational Operat Query Optimization(Relational Optimizer 15 Ds Normalization Design, Tuning Distributed Dbs 18 19 20 Transaction Mgmt Concurrency Internet Overview Control Databases VIl Decision Spatial Deductive Additional Databases Figure 0.1 Chapter Organization and Dependence Ill early to space assignments. As another example, it is not necessary to cover all the alternatives for a given operator(e.g, various techniques for joins)in Chapter 12 in order to cover later related material (e.g, on optimization or tuning) adequately. The database design case study in the appendix can be discussed concurrently with the appropriate design chapters, or it can be discussed after all design topics have been covered. as a review. Several section headings contain an asterisk. This symbol does not necessarily indicate a higher level of difficulty. Rather, omitting all asterisked sections leaves about the right amount of material in Chapters 1-18, possibly omitting Chapters 6, 10, and 14 for a broad introductory one-quarter or one-semester course(depending on the depth at which the remaining material is discussed and the nature of the course assignments)
Preface xxv Introduction, 2 ER Model Conceptual Design 1 QBE 5 4 Relational Algebra and Calculus 6 7 8 Introduction to File Organizations Hash Indexes 10 Tree Indexes 9 II III I V Schema Refinement, 16 17 Database Security Physical DB Design, Tuning 15 VI Transaction Mgmt 19 20 Concurrency 18 Overview Control Crash Recovery 13 Introduction to 11 External Sorting 14 Relational Optimizer IV A Typical 3 Relational Model SQL DDL VII Parallel and Distributed DBs 21 22 FDs, Normalization Evaluation of Relational Operators 12 Query Optimization Data Storage Internet Databases Decision 23 24 Object-Database Systems 25 Databases Spatial 26 Additional Topics 27 28 Mining Data Support Deductive Databases SQL Queries, etc. Figure 0.1 Chapter Organization and Dependencies III early to space assignments. As another example, it is not necessary to cover all the alternatives for a given operator (e.g., various techniques for joins) in Chapter 12 in order to cover later related material (e.g., on optimization or tuning) adequately. The database design case study in the appendix can be discussed concurrently with the appropriate design chapters, or it can be discussed after all design topics have been covered, as a review. Several section headings contain an asterisk. This symbol does not necessarily indicate a higher level of difficulty. Rather, omitting all asterisked sections leaves about the right amount of material in Chapters 1–18, possibly omitting Chapters 6, 10, and 14, for a broad introductory one-quarter or one-semester course (depending on the depth at which the remaining material is discussed and the nature of the course assignments)