Introduction to Database Systems all costs for the companies, compared to the cost of building the application layer scratch Most significantly, perhaps, DBMSs have entered the Internet Age. While the first generation of Web sites stored their data exclusively in operating systems files, the se of a dbms to store data that is accessed through a Web browser is becoming widespread. Queries are generated through Web-accessible forms and answers are formatted using a markup language such as HTML, in order to be easily displayed in a browser. All the database vendors are adding features to their dbms aimed at making it more suitable for deployment over the Internet Database management continues to gain importance as more and more data is brought on-line, and made ever more accessible through computer networking. Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital libraries, a host of scientific projects such as the human genome mapping effort and NASAs Earth Observation System project, and the desire of companies to consolidate their decision-making processes and mine their data repositories for useful information about their businesses. Commercially, database management systems repres nt one of the largest and most vigorous market segments. Thus the study of database systems could prove to be richly rewarding in more ways than one 1.3 FILE SYSTEMS VERSUS A DBMS To understand the need for a DBMS, let us consider a motivating scenario: A company has a large collection(say, 500 GB)of data on employees, departments, products, sales, and so on. This data is accessed concurrently by several employees. Questions about the data must be answered quickly, changes made to the data by different users must be applied consistently, and access to certain parts of the data(.g, salaries) must be restricted We can try to deal with this data management problem by storing the data in a collection of operating system files. This approach has many drawbacks, including the following: u We probably do not have 500 GB of main memory to ho the data. We must therefore store data in a storage device such as a disk or tape and bring relevant parts into main memory for processing as needed. I Even if we have 500 GB of main memory, on computer systems with 32-bit ad- dressing, we cannot refer directly to more than about 4 GB of data! We have to program some method of identifying all data items <r 1A kilobyte(KB)is 1024 bytes, a megabyte(MB) is 1024 KBs, a gigabyte(GB)is 1024 MBs,a rabyte(TB)is 1024 GBs, and a petabyte(PB)is 1024 terabytes
Introduction to Database Systems 7 overall costs for the companies, compared to the cost of building the application layer from scratch. Most significantly, perhaps, DBMSs have entered the Internet Age. While the first generation of Web sites stored their data exclusively in operating systems files, the use of a DBMS to store data that is accessed through a Web browser is becoming widespread. Queries are generated through Web-accessible forms and answers are formatted using a markup language such as HTML, in order to be easily displayed in a browser. All the database vendors are adding features to their DBMS aimed at making it more suitable for deployment over the Internet. Database management continues to gain importance as more and more data is brought on-line, and made ever more accessible through computer networking. Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital libraries, a host of scientific projects such as the human genome mapping effort and NASA’s Earth Observation System project, and the desire of companies to consolidate their decision-making processes and mine their data repositories for useful information about their businesses. Commercially, database management systems represent one of the largest and most vigorous market segments. Thus the study of database systems could prove to be richly rewarding in more ways than one! 1.3 FILE SYSTEMS VERSUS A DBMS To understand the need for a DBMS, let us consider a motivating scenario: A company has a large collection (say, 500 GB1) of data on employees, departments, products, sales, and so on. This data is accessed concurrently by several employees. Questions about the data must be answered quickly, changes made to the data by different users must be applied consistently, and access to certain parts of the data (e.g., salaries) must be restricted. We can try to deal with this data management problem by storing the data in a collection of operating system files. This approach has many drawbacks, including the following: We probably do not have 500 GB of main memory to hold all the data. We must therefore store data in a storage device such as a disk or tape and bring relevant parts into main memory for processing as needed. Even if we have 500 GB of main memory, on computer systems with 32-bit addressing, we cannot refer directly to more than about 4 GB of data! We have to program some method of identifying all data items. 1A kilobyte (KB) is 1024 bytes, a megabyte (MB) is 1024 KBs, a gigabyte (GB) is 1024 MBs, a terabyte (TB) is 1024 GBs, and a petabyte (PB) is 1024 terabytes
CHAPTER 1 a We have to write special programs to answer question that users may to ask about the data. These programs are to be complex because f th large volume of data to be searched We must protect the data from inconsistent changes made by different users ac cessing the data concurrently. If programs that access the data are written with such concurrent access in mind, this adds greatly to their complexity I We must ensure that data is restored to a consistent state if the system crashes hile changes are being made. a Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce security policies in which different users have per mission to access different subsets of the data A DBMS is a piece of software that is designed to make the preceding tasks easier By storing data in a DBMS, rather than as a collection of operating system files,we can use the DBMS's features to manage the data in a robust and efficient manner As the volume of data and the number of users grow-hundreds of gigabytes of data and thousands of users are common in current corporate databases-DBMS support becomes indispensable 1. 4 ADVANTAGES OF A DBMS Using a dBms to manage data has many advantage Data independence: Application programs should be as independent as pos ble from details of data representation and storage. The DBMs can provide an abstract view of the data to insulate application code from such details Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data stored on external storage I Data integrity and security: If data is always accessed through the dbms, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMs can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern chat data is visible to different classes of users Data administration: When several users share the data centralizing the ad ministration of data can offer significant improvements. Experienced professionals ho understand the nature of the data being managed, and how different groups of users use it, can be responsible for organizing the data representation to min- imize redundancy and for fine-tuning the storage of the data to make retrieval efficient
8 Chapter 1 We have to write special programs to answer each question that users may want to ask about the data. These programs are likely to be complex because of the large volume of data to be searched. We must protect the data from inconsistent changes made by different users accessing the data concurrently. If programs that access the data are written with such concurrent access in mind, this adds greatly to their complexity. We must ensure that data is restored to a consistent state if the system crashes while changes are being made. Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce security policies in which different users have permission to access different subsets of the data. A DBMS is a piece of software that is designed to make the preceding tasks easier. By storing data in a DBMS, rather than as a collection of operating system files, we can use the DBMS’s features to manage the data in a robust and efficient manner. As the volume of data and the number of users grow—hundreds of gigabytes of data and thousands of users are common in current corporate databases—DBMS support becomes indispensable. 1.4 ADVANTAGES OF A DBMS Using a DBMS to manage data has many advantages: Data independence: Application programs should be as independent as possible from details of data representation and storage. The DBMS can provide an abstract view of the data to insulate application code from such details. Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices. Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different classes of users. Data administration: When several users share the data, centralizing the administration of data can offer significant improvements. Experienced professionals who understand the nature of the data being managed, and how different groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-tuning the storage of the data to make retrieval efficient
Introduction to Database Systems Concurrent access and crash recovery: A DBMS schedules concurrent ac- cesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBms protects users from the effects of system failures Reduced application development time: Clearly, the DBMS supports many portant functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data, facil itates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application Given all these advantages. is there ever a reason not to use a DBMS? A DBMs is a complex piece of software, optimized for certain kinds of workloads(e. g, answering complex queries or handling many concurrent requests), and its performance may not be adequate for certain specialized applications. Examples include applications with tight real-time constraints or applications with just a few well-defined critical opera tions for which efficient custom code must be written. Another reason for not using a DBMS is that an application may need to manipulate the data in ways not supported by the query language. In such a situation, the abstract view of the data presented by the dbms does not match the applications needs, and actually gets in the way. As an example, relational databases do not support Flexible analysis of text data(although vendors are now extending their products in this direction). If specialized performance or data manipulation requirements are central to an application, the application may choose not to use a DBMS, especially if the added benefits of a DBMS(e. g, flexible querying, security, concurrent access, and crash recovery)are not required. In most situations calling for large-scale data management, however, DBMSs have become an ndis ble tool 1.5 DESCRIBING AND STORING DATA INA DBMS The user of a DBMS is ultimately concerned with some real-world enterprise, and the data to be stored describes various aspects of this enterprise. For example, there are students, faculty, and courses in a university, and the data in a university database describes these entities and their relationships. data model is a collection of high-level data description constructs that hide many low-level storage details. A DBMS allows a user to define the data to be stored in terms of a data model. Most database management systems today are based on the relational data model. which we will focus on in this book While the data model of the DBMs hides many details, it is nonetheless closer to how the dBMS stores data than to how a user thinks about the underlying enterprise. A semantic data model is a more abstract, high-level data model that makes it easier
Introduction to Database Systems 9 Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures. Reduced application development time: Clearly, the DBMS supports many important functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application. Given all these advantages, is there ever a reason not to use a DBMS? A DBMS is a complex piece of software, optimized for certain kinds of workloads (e.g., answering complex queries or handling many concurrent requests), and its performance may not be adequate for certain specialized applications. Examples include applications with tight real-time constraints or applications with just a few well-defined critical operations for which efficient custom code must be written. Another reason for not using a DBMS is that an application may need to manipulate the data in ways not supported by the query language. In such a situation, the abstract view of the data presented by the DBMS does not match the application’s needs, and actually gets in the way. As an example, relational databases do not support flexible analysis of text data (although vendors are now extending their products in this direction). If specialized performance or data manipulation requirements are central to an application, the application may choose not to use a DBMS, especially if the added benefits of a DBMS (e.g., flexible querying, security, concurrent access, and crash recovery) are not required. In most situations calling for large-scale data management, however, DBMSs have become an indispensable tool. 1.5 DESCRIBING AND STORING DATA IN A DBMS The user of a DBMS is ultimately concerned with some real-world enterprise, and the data to be stored describes various aspects of this enterprise. For example, there are students, faculty, and courses in a university, and the data in a university database describes these entities and their relationships. A data model is a collection of high-level data description constructs that hide many low-level storage details. A DBMS allows a user to define the data to be stored in terms of a data model. Most database management systems today are based on the relational data model, which we will focus on in this book. While the data model of the DBMS hides many details, it is nonetheless closer to how the DBMS stores data than to how a user thinks about the underlying enterprise. A semantic data model is a more abstract, high-level data model that makes it easier
CHAPTER 1 or a user to come up with a good initial description of the data in an enterprise. These models contain a wide variety of constructs that help describe a real application scenario. A DBMS is not intended to support all these constructs directly; it is typically built around a data model with just a few basic constructs, such as the relational model a database design in terms of a semantic model serves as a useful starting point and is subsequently translated into a database design in terms of the data model the DBMs actu lly supports A widely used semantic data model called the entity-relationship(ER) model allows us to pictorially denote entities and the relationships among them. We cover the er nodel in Chapter 2 1.5.1 The relational model In this section we provide a brief introduction to the relational model. The central data description construct in this model is a relation, which can be thought of as a set of record A description of data in terms of a data model is called a schema. In the relational model, the schema for a relation specifies its name, the name of each field(or attribute or column), and the type of each field. As an example, student information in a university database may be stored in a relation with the following schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) The preceding schema says that each record in the Students relation has five fields. with field names and types as indicated. An example instance of the Students relation appears in Figure 1.1 name 53666 Jones 53688Smith smith@ee 1832 53650 Smith smith@math 193.8 53831 Madayan madayan@music[11.8 53832 Guldu gulou @music 1220 Figure 1.1 An Instance of the Students Relation sToring date of birth is preferable to storing age, since it does not change over time, unlike age We' ve used age for simplicity in our discussion
10 Chapter 1 for a user to come up with a good initial description of the data in an enterprise. These models contain a wide variety of constructs that help describe a real application scenario. A DBMS is not intended to support all these constructs directly; it is typically built around a data model with just a few basic constructs, such as the relational model. A database design in terms of a semantic model serves as a useful starting point and is subsequently translated into a database design in terms of the data model the DBMS actually supports. A widely used semantic data model called the entity-relationship (ER) model allows us to pictorially denote entities and the relationships among them. We cover the ER model in Chapter 2. 1.5.1 The Relational Model In this section we provide a brief introduction to the relational model. The central data description construct in this model is a relation, which can be thought of as a set of records. A description of data in terms of a data model is called a schema. In the relational model, the schema for a relation specifies its name, the name of each field (or attribute or column), and the type of each field. As an example, student information in a university database may be stored in a relation with the following schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) The preceding schema says that each record in the Students relation has five fields, with field names and types as indicated.2 An example instance of the Students relation appears in Figure 1.1. sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 53650 Smith smith@math 19 3.8 53831 Madayan madayan@music 11 1.8 53832 Guldu guldu@music 12 2.0 Figure 1.1 An Instance of the Students Relation 2Storing date of birth is preferable to storing age, since it does not change over time, unlike age. We’ve used age for simplicity in our discussion
Introduction to Database Systems Each row in the Students relation is a record that describes a student. The description is not complete--for example, the students height is not included-but is presumably adequate for the intended applications in the university database. Every row follows the schema of the Students relation. The schema can therefore be regarded as a template for describing a student We can make the description of a collection of students more precise by specifyin integrity constraints, which are conditions that the records in a relation must satisfy For example, we could specify that every student has a unique sid value. Observe that we cannot capture this information by simply adding another field to the Students schema. Thus, the ability to specify uniqueness of the values in a field increases the accuracy with which we can describe our data. The expressiveness of the constructs available for specifying integrity constraints is an important aspect of a data model Other data models In addition to the relational data model(which is used in numerous systems, including IBMs DB2, Informix, Oracle, Sybase, Microsofts Access, Fox Base, Paradox, Tandem nd Teradata), other important data models include the hierarchical model (e. g, used IBMs IMS DBMS), the network model(e.g, used in IDS and IDMS), the object oriented model (e. g, used in Objectstore and Versant), and the object-relational model (e. g, used in DBMS products from IBM, Informix, ObjectStore, Oracle, Versant, and others). While there are many databases that use the hierarchical and network models nd systems based on the object-oriented and object-relational models are gaining ceptance in the marketplace, the dominant model today is the relational model In this book, we will focus on the relational model because of its wide use and impor- tance. Indeed, the object-relational model, which is gaining in popularity, is an effort to combine the best features of the relational and object-oriented models, and a good grasp of the relational model is necessary to understand object-relational concepts (We discuss the object-oriented and object-relational models in Chapter 25. 1.5.2 Levels of Abstraction in a dbMs The data in a DBMS is described at three levels of abstraction, as illustrated in Figure 1. 2. The database description consists of a schema at each of these three levels of abstraction: the conceptual, physical, and external schemas. A data definition language(DDL) is used to define the external and conceptual schemas. We will discuss the ddl facilities of the most widely used database language SQL, in Chapter 3. All DBMS vendors also support SQL commands to describe aspects of the physical schema, but these commands are not part of the sQL-92 languag
Introduction to Database Systems 11 Each row in the Students relation is a record that describes a student. The description is not complete—for example, the student’s height is not included—but is presumably adequate for the intended applications in the university database. Every row follows the schema of the Students relation. The schema can therefore be regarded as a template for describing a student. We can make the description of a collection of students more precise by specifying integrity constraints, which are conditions that the records in a relation must satisfy. For example, we could specify that every student has a unique sid value. Observe that we cannot capture this information by simply adding another field to the Students schema. Thus, the ability to specify uniqueness of the values in a field increases the accuracy with which we can describe our data. The expressiveness of the constructs available for specifying integrity constraints is an important aspect of a data model. Other Data Models In addition to the relational data model (which is used in numerous systems, including IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s Access, FoxBase, Paradox, Tandem, and Teradata), other important data models include the hierarchical model (e.g., used in IBM’s IMS DBMS), the network model (e.g., used in IDS and IDMS), the objectoriented model (e.g., used in Objectstore and Versant), and the object-relational model (e.g., used in DBMS products from IBM, Informix, ObjectStore, Oracle, Versant, and others). While there are many databases that use the hierarchical and network models, and systems based on the object-oriented and object-relational models are gaining acceptance in the marketplace, the dominant model today is the relational model. In this book, we will focus on the relational model because of its wide use and importance. Indeed, the object-relational model, which is gaining in popularity, is an effort to combine the best features of the relational and object-oriented models, and a good grasp of the relational model is necessary to understand object-relational concepts. (We discuss the object-oriented and object-relational models in Chapter 25.) 1.5.2 Levels of Abstraction in a DBMS The data in a DBMS is described at three levels of abstraction, as illustrated in Figure 1.2. The database description consists of a schema at each of these three levels of abstraction: the conceptual, physical, and external schemas. A data definition language (DDL) is used to define the external and conceptual schemas. We will discuss the DDL facilities of the most widely used database language, SQL, in Chapter 3. All DBMS vendors also support SQL commands to describe aspects of the physical schema, but these commands are not part of the SQL-92 language