INTRODUCTION TO DATABASE SYSTEMS Has everyone noticed that all the letters of the word database are typed with the left hand? Now the layout of the QWErTY typewriter keyboard was designed, among other things, to facilitate the even use of both hands. It follows, therefore, that writing about databases is not only unnatural, but a lot harder than it appears anonymous Today, more than at any previous time, the success of an organization depends its ability to acquire accurate and timely data about its operations, to manage this data effectively, and to use it to analyze and guide its activities. Phrases such as the information superhighway have become ubiquitous, and information processing is a rapidly growing multibillion dollar industry. The amount of information available to us is literally exploding, and the value of data as an organizational asset is widely recognized. Yet without the ability to manage this vast amount of data, and to quickly find the information that is relevant to a given question, as the amount of information increases, it tends to become a distraction and a liability, rather than an asset. This paradox drives the need for increasingly powerful and flexible data management systems. To get the most out of their large and complex datasets, users must have tools that simplify the tasks of managing the data and extracting useful information in a timely fashion. Otherwise, data can become a liability, with the cost of acquiring it and managing it far exceeding the value that is derived from it A database is a collection of data, typically describing the activities of one or more related organizations. For example, a university database might contain information about the followin a Entities such as students. faculty courses and classrooms Relationships between entities, such as students'enrollment in courses, faculty teaching courses, and the use of rooms for courses A database management system, or DBMS, is software designed to assist in maintaining and utilizing large collections of data, and the need for such systems, as well as their use, is growing rapidly. The alternative to using a DBMs is to use ad
1 INTRODUCTION TO DATABASE SYSTEMS Has everyone noticed that all the letters of the word database are typed with the left hand? Now the layout of the QWERTY typewriter keyboard was designed, among other things, to facilitate the even use of both hands. It follows, therefore, that writing about databases is not only unnatural, but a lot harder than it appears. —Anonymous Today, more than at any previous time, the success of an organization depends on its ability to acquire accurate and timely data about its operations, to manage this data effectively, and to use it to analyze and guide its activities. Phrases such as the information superhighway have become ubiquitous, and information processing is a rapidly growing multibillion dollar industry. The amount of information available to us is literally exploding, and the value of data as an organizational asset is widely recognized. Yet without the ability to manage this vast amount of data, and to quickly find the information that is relevant to a given question, as the amount of information increases, it tends to become a distraction and a liability, rather than an asset. This paradox drives the need for increasingly powerful and flexible data management systems. To get the most out of their large and complex datasets, users must have tools that simplify the tasks of managing the data and extracting useful information in a timely fashion. Otherwise, data can become a liability, with the cost of acquiring it and managing it far exceeding the value that is derived from it. A database is a collection of data, typically describing the activities of one or more related organizations. For example, a university database might contain information about the following: Entities such as students, faculty, courses, and classrooms. Relationships between entities, such as students’ enrollment in courses, faculty teaching courses, and the use of rooms for courses. A database management system, or DBMS, is software designed to assist in maintaining and utilizing large collections of data, and the need for such systems, as well as their use, is growing rapidly. The alternative to using a DBMS is to use ad 3
CHAPTER 1 hoc paches that do not carry over from one application to another; for example, to store the data in files and write application-specific code to manage it. The use of a DBMs has several important advantages, as we will see in Section 1.4 The area of database management systems is a microcosm of computer science in gen eral. The issues addressed and the techniques used span a wide spectrum, including languages, object-orientation and other programming paradigms, compilation, oper ems, concurrent programming, data structures, algorithms, theory, parallel and distributed systems, user interfaces, expert systems and artificial intelligence, sta- tistical techniques, and dynamic programming. We will not be able to go into all these aspects of database management in this book, but it should be clear that this is a rich and vibrant discipline 1.1 OVERVIEW The goal of this book is to present an in-depth introduction to database management systems, with an emphasis on how to organize information in a DBMS and to main- tain it and retrieve it efficiently, that is, how to design a database and use a DBMs effectively. Not surprisingly, many decisions about how to use a DBMS for a given application depend on what capabilities the DBMS supports efficiently. Thus, to use a DBMS well, it is necessary to also understand how a DBMS works. The approach taken in this book is to emphasize how to use a dBms, while covering dBMs implementation and architecture in sufficient detail to understand how to design a database Many kinds of database management systems are in use, but this book concentrates or relational systems, which are by far the dominant type of DBMS today. The following questions are addressed in the core chapters of this book 1. Datab gn: How can a user describe a real-world enterprise(e.g,a of the data stored in a dbms? what factors must be conside 2,3,15,16,and17.) 2. Data Analysis: How can a user answer questions about the enterprise by posing queries over the data in the DBMS?(Chapters 4, 5, 6, and 23) 3. Concurrency and Robustness: How does a DBMs allow many users to access data concurrently, and how does it protect the data in the event of system failures? (Chapters 18, 19, and 20.) 4. Efficiency and Scalability: How does a DBMS store large datasets and answer questions against this data efficiently?(Chapters 7, 8, 9, 10, 11, 12, 13, and 14) Later chapters cover important and rapidly evolving topics such as parallel and dis- tributed database management, Internet databases, data warehousing and complex
4 Chapter 1 hoc approaches that do not carry over from one application to another; for example, to store the data in files and write application-specific code to manage it. The use of a DBMS has several important advantages, as we will see in Section 1.4. The area of database management systems is a microcosm of computer science in general. The issues addressed and the techniques used span a wide spectrum, including languages, object-orientation and other programming paradigms, compilation, operating systems, concurrent programming, data structures, algorithms, theory, parallel and distributed systems, user interfaces, expert systems and artificial intelligence, statistical techniques, and dynamic programming. We will not be able to go into all these aspects of database management in this book, but it should be clear that this is a rich and vibrant discipline. 1.1 OVERVIEW The goal of this book is to present an in-depth introduction to database management systems, with an emphasis on how to organize information in a DBMS and to maintain it and retrieve it efficiently, that is, how to design a database and use a DBMS effectively. Not surprisingly, many decisions about how to use a DBMS for a given application depend on what capabilities the DBMS supports efficiently. Thus, to use a DBMS well, it is necessary to also understand how a DBMS works. The approach taken in this book is to emphasize how to use a DBMS, while covering DBMS implementation and architecture in sufficient detail to understand how to design a database. Many kinds of database management systems are in use, but this book concentrates on relational systems, which are by far the dominant type of DBMS today. The following questions are addressed in the core chapters of this book: 1. Database Design: How can a user describe a real-world enterprise (e.g., a university) in terms of the data stored in a DBMS? What factors must be considered in deciding how to organize the stored data? (Chapters 2, 3, 15, 16, and 17.) 2. Data Analysis: How can a user answer questions about the enterprise by posing queries over the data in the DBMS? (Chapters 4, 5, 6, and 23.) 3. Concurrency and Robustness: How does a DBMS allow many users to access data concurrently, and how does it protect the data in the event of system failures? (Chapters 18, 19, and 20.) 4. Efficiency and Scalability: How does a DBMS store large datasets and answer questions against this data efficiently? (Chapters 7, 8, 9, 10, 11, 12, 13, and 14.) Later chapters cover important and rapidly evolving topics such as parallel and distributed database management, Internet databases, data warehousing and complex
Introduction to Database Systems queries for decision support, data mining, object databases, spatial data management and rule-oriented DbMS extensions In the rest of this chapter, we introduce the issues listed above. In Section 1.2, we begin ith a brief history of the field and a discussion of the role of database management in modern information systems. We then identify benefits of storing data in a DBMs instead of a file system in Section 1.3, and discuss the advantages of using a DBMS to manage data in Section 1.4. In Section 1. 5 we consider how information about an enterprise should be organized and stored in a DBMs. a user probably thinks about this information in high-level terms corresponding to the entities in the organization and their relationships, whereas the DBMS ultimately stores data in the form of (many many) bits. The gap between how users think of their data and how the data is ultimately stored is bridged through several levels of abstraction supported by the DBMS. Intuitively, a user can begin by describing the data in fairly high-level terms nd then refine this description by considering additional storage and representation details as needed In Section 1.6 we consider how users can retrieve data stored in a dbms and the need for techniques to efficiently compute answers to questions involving such data. In Section 1.7 we provide an overview of how a DBMs supports concurrent access data by several users, and how it protects the data in the event of system failures. We then briefly describe the internal structure of a DBMS in Section 1. 8, and mention various groups of people associated with the development and use of a dBMs in Section 1.2 A HISTORICAL PERSPECTIVE From the earliest days of computers, storing and manipulating data have been a major application focus. The first general-purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and was called the Integrated Data Store. It formed the basis for the network data model, which was standardized by the Conference on Data Systems Languages(CODASYL) and strongly influenced database systems through the 1960s. Bachman was the first recipient of ACM,s Turing Award(the computer science equivalent of a Nobel prize) for work in the database area; he received the award in 1973 In the late 1960s, IBM developed the Information Management System(IMS) DBMS used even today in many major installations. IMS formed the basis for an alternative data representation framework called the hierarchical data model. The SaBre system for making airline reservations was jointly developed by American Airlines and IBM around the same time, and it allowed several people to access the same data through
Introduction to Database Systems 5 queries for decision support, data mining, object databases, spatial data management, and rule-oriented DBMS extensions. In the rest of this chapter, we introduce the issues listed above. In Section 1.2, we begin with a brief history of the field and a discussion of the role of database management in modern information systems. We then identify benefits of storing data in a DBMS instead of a file system in Section 1.3, and discuss the advantages of using a DBMS to manage data in Section 1.4. In Section 1.5 we consider how information about an enterprise should be organized and stored in a DBMS. A user probably thinks about this information in high-level terms corresponding to the entities in the organization and their relationships, whereas the DBMS ultimately stores data in the form of (many, many) bits. The gap between how users think of their data and how the data is ultimately stored is bridged through several levels of abstraction supported by the DBMS. Intuitively, a user can begin by describing the data in fairly high-level terms, and then refine this description by considering additional storage and representation details as needed. In Section 1.6 we consider how users can retrieve data stored in a DBMS and the need for techniques to efficiently compute answers to questions involving such data. In Section 1.7 we provide an overview of how a DBMS supports concurrent access to data by several users, and how it protects the data in the event of system failures. We then briefly describe the internal structure of a DBMS in Section 1.8, and mention various groups of people associated with the development and use of a DBMS in Section 1.9. 1.2 A HISTORICAL PERSPECTIVE From the earliest days of computers, storing and manipulating data have been a major application focus. The first general-purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and was called the Integrated Data Store. It formed the basis for the network data model, which was standardized by the Conference on Data Systems Languages (CODASYL) and strongly influenced database systems through the 1960s. Bachman was the first recipient of ACM’s Turing Award (the computer science equivalent of a Nobel prize) for work in the database area; he received the award in 1973. In the late 1960s, IBM developed the Information Management System (IMS) DBMS, used even today in many major installations. IMS formed the basis for an alternative data representation framework called the hierarchical data model. The SABRE system for making airline reservations was jointly developed by American Airlines and IBM around the same time, and it allowed several people to access the same data through
CHAPTER 1 a computer network. Interestingly, today the same SABRE system is used to power popular Web-based travel services such as Travelocity In 1970, Edgar Codd, at IBMs San Jose Research Laboratory, proposed a new data representation framework called the relational data model. This proved to be a water shed in the development of database systems: it sparked rapid development of several DBMSS based on the relational model, along with a rich body of theoretical results that placed the field on a firm foundation. Codd won the 1981 Turing Award for his seminal work. Database systems matured as an academic discipline, and the popu larity of relational DBMSs changed the commercial landscape. Their benefits were widely recognized, and the use of DBMSs for managing corporate data became stan dard practice In the 1980s, the relational model consolidated its position as the dominant DBMs paradigm, and database systems continued to gain widespread use. The SQL query language for relational databases, developed as part of IBMs System R project, is now the standard query language SQL was standardized in the late 1980s, and the current standard, SQL-92, was adopted by the American National Standards Institute(ANSI) nd International Standards Organization(ISO). Arguably, the most widely used forn of concurrent programming is the concurrent execution of database programs(called transactions ). Users write programs as if they are to be run by themselves, and the responsibility for running them concurrently is given to the DBMS. James gray won the 1999 Turing award for his contributions to the field of transaction management in a dbMs In the late 1980s and the 1990s, advances have been made in many areas of database systems. Considerable research has been carried out into more powerful query lan- guages and richer data models, and there has been a big emphasis on supporting complex analysis of data from all parts of an enterprise. Several vendors (e. g, IBM's DB2, Oracle 8, Informix UDS) have extended their systems with the ability to store new data types such as images and text, and with the ability to ask more complex queries. Specialized systems have been developed by numerous vendors for creating data warehouses, consolidating data from several databases, and for carrying out spe- cialized analysis An interesting phenomenon is the emergence of several enterprise resource planning (ERP) and management resource planning(MRP) packages, which add a substantial layer of application-oriented features on top of a DBMs. Widely used packages include systems from Baan, Oracle, PeopleSoft, SAP, and Siebel. These packages identify a set of common tasks(e. g, inventory management, human resources planning, finan- cial analysis) encountered by a large number of organizations and provide a general application layer to carry out these tasks. The data is stored in a relational DBMs, nd the application layer can be customized to different companies, leading to lower
6 Chapter 1 a computer network. Interestingly, today the same SABRE system is used to power popular Web-based travel services such as Travelocity! In 1970, Edgar Codd, at IBM’s San Jose Research Laboratory, proposed a new data representation framework called the relational data model. This proved to be a watershed in the development of database systems: it sparked rapid development of several DBMSs based on the relational model, along with a rich body of theoretical results that placed the field on a firm foundation. Codd won the 1981 Turing Award for his seminal work. Database systems matured as an academic discipline, and the popularity of relational DBMSs changed the commercial landscape. Their benefits were widely recognized, and the use of DBMSs for managing corporate data became standard practice. In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database systems continued to gain widespread use. The SQL query language for relational databases, developed as part of IBM’s System R project, is now the standard query language. SQL was standardized in the late 1980s, and the current standard, SQL-92, was adopted by the American National Standards Institute (ANSI) and International Standards Organization (ISO). Arguably, the most widely used form of concurrent programming is the concurrent execution of database programs (called transactions). Users write programs as if they are to be run by themselves, and the responsibility for running them concurrently is given to the DBMS. James Gray won the 1999 Turing award for his contributions to the field of transaction management in a DBMS. In the late 1980s and the 1990s, advances have been made in many areas of database systems. Considerable research has been carried out into more powerful query languages and richer data models, and there has been a big emphasis on supporting complex analysis of data from all parts of an enterprise. Several vendors (e.g., IBM’s DB2, Oracle 8, Informix UDS) have extended their systems with the ability to store new data types such as images and text, and with the ability to ask more complex queries. Specialized systems have been developed by numerous vendors for creating data warehouses, consolidating data from several databases, and for carrying out specialized analysis. An interesting phenomenon is the emergence of several enterprise resource planning (ERP) and management resource planning (MRP) packages, which add a substantial layer of application-oriented features on top of a DBMS. Widely used packages include systems from Baan, Oracle, PeopleSoft, SAP, and Siebel. These packages identify a set of common tasks (e.g., inventory management, human resources planning, financial analysis) encountered by a large number of organizations and provide a general application layer to carry out these tasks. The data is stored in a relational DBMS, and the application layer can be customized to different companies, leading to lower