CHAPTER 1 External Schema I External Schema 2 External Schema 3 Conceptual Schema hysical Schema Figure 1.2 Levels of Abstraction in a DBMS standard. Information about the conceptual, external, and physical schemas is stored in the system catalogs(Section 13.2). We discuss the three levels of abstraction in the rest of this section Conceptual Schema The conceptual schema(sometimes called the logical schema) describes the stored data in terms of the data model of the DBMs. In a relational DBMs, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about relationships, such as students'enrollment in courses. All student entities can be described using records in a Students relation, as we saw earlier. In fact, each collection of entities and each collection of relationships can be described as a relation, leading to the following conceptual schema Students(sid: string, name: string, login: string, gpa: real Faculty(fid: string, fname: string, sal: real Courses(cid: string, cname: string, credits: integer) Rooms(mo: integer, address: string, capacity: integer) Enrolled (sid: string, cid: string, grade: string) Teaches(fid: string, cid: string) Meets_In(cid: string, rno: integer, time: string) The choice of relations, and the choice of fields for each relation, is not always obvi- ous, and the process of arriving at a good conceptual schema is called conceptual database design. We discuss conceptual database design in Chapters 2 and 15
12 Chapter 1 DISK External Schema 1 External Schema 2 External Schema 3 Conceptual Schema Physical Schema Figure 1.2 Levels of Abstraction in a DBMS standard. Information about the conceptual, external, and physical schemas is stored in the system catalogs (Section 13.2). We discuss the three levels of abstraction in the rest of this section. Conceptual Schema The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about relationships, such as students’ enrollment in courses. All student entities can be described using records in a Students relation, as we saw earlier. In fact, each collection of entities and each collection of relationships can be described as a relation, leading to the following conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) Faculty(fid: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer) Rooms(rno: integer, address: string, capacity: integer) Enrolled(sid: string, cid: string, grade: string) Teaches(fid: string, cid: string) Meets In(cid: string, rno: integer, time: string) The choice of relations, and the choice of fields for each relation, is not always obvious, and the process of arriving at a good conceptual schema is called conceptual database design. We discuss conceptual database design in Chapters 2 and 15
Introduction to Database Systems SIc chema The physical schema specifies additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes We must decide what file organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations. A sample physical chema for the university database follows: a Store all relations as unsorted files of records. (A file in a DBMS is either collection of records or a collection of pages, rather than a string of characters as in an operating system Create indexes on the first column of the Students, Faculty, and Courses relations, the sal column of Faculty, and the capacity column of Rooms. Decisions about the physical schema are based on an understanding of how the data is typically accessed. The process of arriving at a good physical schema is called physical database design. We discuss physical database design in Chapter 16 External schema External schemas, which usually are also in terms of the data model of the DBMs allow data access to be customized(and authorized) at the level of individual users or groups of users. Any given database has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may hav external schemas, each tailored to a particular group of users. Each external consists of a collection of one or more views and relations from the conceptual A view is conceptually a relation, but the records in a view are not stored in the DBMs Rather, they are computed using a definition for the view, in terms of relations stored in the DBMs. We discuss views in more detail in Chapter 3 The external schema design is guided by end user requirements. For example, want to allow students to find out the names of faculty members teaching cou well as course enrollments. This can be done by defining the following view: Courseinfo(cid: string, fname: string, enrollment: integer) A user can treat a view just like a relation and ask questions about the records in view. Even though the records in the view are not stored explicitly, they are computed as needed. We did not include Courseinfo in the conceptual schema because we can compute Courseinfo from the relations in the conceptual schema, and to store it in addition would be redundant. Such redundancy, in addition to the wasted space, could
Introduction to Database Systems 13 Physical Schema The physical schema specifies additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. We must decide what file organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations. A sample physical schema for the university database follows: Store all relations as unsorted files of records. (A file in a DBMS is either a collection of records or a collection of pages, rather than a string of characters as in an operating system.) Create indexes on the first column of the Students, Faculty, and Courses relations, the sal column of Faculty, and the capacity column of Rooms. Decisions about the physical schema are based on an understanding of how the data is typically accessed. The process of arriving at a good physical schema is called physical database design. We discuss physical database design in Chapter 16. External Schema External schemas, which usually are also in terms of the data model of the DBMS, allow data access to be customized (and authorized) at the level of individual users or groups of users. Any given database has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may have several external schemas, each tailored to a particular group of users. Each external schema consists of a collection of one or more views and relations from the conceptual schema. A view is conceptually a relation, but the records in a view are not stored in the DBMS. Rather, they are computed using a definition for the view, in terms of relations stored in the DBMS. We discuss views in more detail in Chapter 3. The external schema design is guided by end user requirements. For example, we might want to allow students to find out the names of faculty members teaching courses, as well as course enrollments. This can be done by defining the following view: Courseinfo(cid: string, fname: string, enrollment: integer) A user can treat a view just like a relation and ask questions about the records in the view. Even though the records in the view are not stored explicitly, they are computed as needed. We did not include Courseinfo in the conceptual schema because we can compute Courseinfo from the relations in the conceptual schema, and to store it in addition would be redundant. Such redundancy, in addition to the wasted space, could
CHAPTER 1 lead to inconsistencies. For example, a tuple may be inserted into the Enrolled relation, indicating that a particular student has enrolled in some course, without incrementing the value in the enrollment field of the corresponding record of Courseinfo(if the latter also is part of the conceptual schema and its tuples are stored in the DBMS) 1.5.3 Data Independence A very important advantage of using a DBMS is that it offers data independence That is, application programs are insulated from changes in the way the data is struc- tured and stored. Data independence is achieved through use of the three levels of data abstraction; in particular, the conceptual schema and the external schema pro- vide distinct benefits in this area Relations in the external schema(view relations) are in principle generated on demand from the relations corresponding to the conceptual schema. If the underlying data is reorganized, that is, the conceptual schema is changed, the definition of a view relation can be modified so that the same relation is computed as before. For example, suppose that the Faculty relation in our university database is replaced by the following two relations: Faculty-publicfid: string, fname: string, office: integer) Faculty-privatelfid: string, sal: real) Intuitively, some confidential information about faculty has been placed in a relation and information about offices has been added. The Courseinfo view can be redefined in terms of Faculty-public and Faculty -private, which togethe all the information in Faculty, so that a user who queries Courseinfo will get w Ie answers as before Thus users can be shielded from changes in the logical structure of the data, or changes in the choice of relations to be stored. This property is called logical data indepen- den In turn, the conceptual schema insulates users from changes in the physical storage of the data. This property is referred to as physical data independence. The conceptual schema hides details such as how the data is actually laid out on disk, the file structure, and the choice of indexes. As long as the conceptual schema remains the same, we can change these storage details without altering applications.(Of course performance might be affected by such changes. BIn practice, they could be precomputed and stored to speed up queries on view relations, but the computed view relations must be updated whenever the underlying relations are updated
14 Chapter 1 lead to inconsistencies. For example, a tuple may be inserted into the Enrolled relation, indicating that a particular student has enrolled in some course, without incrementing the value in the enrollment field of the corresponding record of Courseinfo (if the latter also is part of the conceptual schema and its tuples are stored in the DBMS). 1.5.3 Data Independence A very important advantage of using a DBMS is that it offers data independence. That is, application programs are insulated from changes in the way the data is structured and stored. Data independence is achieved through use of the three levels of data abstraction; in particular, the conceptual schema and the external schema provide distinct benefits in this area. Relations in the external schema (view relations) are in principle generated on demand from the relations corresponding to the conceptual schema.3 If the underlying data is reorganized, that is, the conceptual schema is changed, the definition of a view relation can be modified so that the same relation is computed as before. For example, suppose that the Faculty relation in our university database is replaced by the following two relations: Faculty public(fid: string, fname: string, office: integer) Faculty private(fid: string, sal: real) Intuitively, some confidential information about faculty has been placed in a separate relation and information about offices has been added. The Courseinfo view relation can be redefined in terms of Faculty public and Faculty private, which together contain all the information in Faculty, so that a user who queries Courseinfo will get the same answers as before. Thus users can be shielded from changes in the logical structure of the data, or changes in the choice of relations to be stored. This property is called logical data independence. In turn, the conceptual schema insulates users from changes in the physical storage of the data. This property is referred to as physical data independence. The conceptual schema hides details such as how the data is actually laid out on disk, the file structure, and the choice of indexes. As long as the conceptual schema remains the same, we can change these storage details without altering applications. (Of course, performance might be affected by such changes.) 3In practice, they could be precomputed and stored to speed up queries on view relations, but the computed view relations must be updated whenever the underlying relations are updated
Introduction to Database Systems 1.6 QUERIES IN A DBMS The ease with which information can be obtained from a database often determines its value to a user. In contrast to older database systems, relational database systems allow a rich class of questions to be posed easily, this feature has contributed greatly to their popularity. Consider the sample university database in Section 1.5.2. Here ar examples of questions that a user might ask: 1. What is the name of the student with student id 123456? 2. What is the average salary of professors who teach the course with cid CS564? 3. How many students are enrolled in course CS564? 4. What fraction of students in course CS564 received a grade better than B? 5. Is any student with a GPA less than 3.0 enrolled in course CS564? Such questions involving the data stored in a DBMS are called queries. A DBMS provides a specialized language, called the query language, in which queries can be posed. A very attractive feature of the relational model is that it supports powerful query languages. Relational calculus is a formal query language based on mathemat- ical logic, and queries in this language have an intuitive, precise meaning. Relational algebra is another formal query language, based on a collection of operators for manipulating relations, which is equivalent in power to the calculus. a DBMS takes great care to evaluate queries as efficiently as possible. We discuss query optimization and evaluation in Chapters 12 and 13. Of course, the efficiency of query evaluation is determined to a large extent by how the data is stored physically. Indexes can be used to speed up many queriesin fact, a good choice of indexes for the underlying relations can speed up each query in the preceding list. We discuss data storage and indexing in Chapters 7, 8, 9, and 10 A DBMS enables users to create, modify, and query data through a data manipula tion language(DML). Thus, the query language is only one part of the DML, which also provides constructs to insert, delete, and modify data. We will discuss the DmL features of SQL in Chapter 5. The DML and DDL are collectively referred to as the data sublanguage when embedded within a host language(e.g, C or COBOL) 1.7 TRANSACTION MANAGEMENT Consider a database that holds information about airline reservations. At any given instant, it is possible(and likely) that several travel agents are looking up information bout available seats on various fights and making new seat reservations. When several users access(and possibly modify) a database concurrently, the DBMS must order
Introduction to Database Systems 15 1.6 QUERIES IN A DBMS The ease with which information can be obtained from a database often determines its value to a user. In contrast to older database systems, relational database systems allow a rich class of questions to be posed easily; this feature has contributed greatly to their popularity. Consider the sample university database in Section 1.5.2. Here are examples of questions that a user might ask: 1. What is the name of the student with student id 123456? 2. What is the average salary of professors who teach the course with cid CS564? 3. How many students are enrolled in course CS564? 4. What fraction of students in course CS564 received a grade better than B? 5. Is any student with a GPA less than 3.0 enrolled in course CS564? Such questions involving the data stored in a DBMS are called queries. A DBMS provides a specialized language, called the query language, in which queries can be posed. A very attractive feature of the relational model is that it supports powerful query languages. Relational calculus is a formal query language based on mathematical logic, and queries in this language have an intuitive, precise meaning. Relational algebra is another formal query language, based on a collection of operators for manipulating relations, which is equivalent in power to the calculus. A DBMS takes great care to evaluate queries as efficiently as possible. We discuss query optimization and evaluation in Chapters 12 and 13. Of course, the efficiency of query evaluation is determined to a large extent by how the data is stored physically. Indexes can be used to speed up many queries—in fact, a good choice of indexes for the underlying relations can speed up each query in the preceding list. We discuss data storage and indexing in Chapters 7, 8, 9, and 10. A DBMS enables users to create, modify, and query data through a data manipulation language (DML). Thus, the query language is only one part of the DML, which also provides constructs to insert, delete, and modify data. We will discuss the DML features of SQL in Chapter 5. The DML and DDL are collectively referred to as the data sublanguage when embedded within a host language (e.g., C or COBOL). 1.7 TRANSACTION MANAGEMENT Consider a database that holds information about airline reservations. At any given instant, it is possible (and likely) that several travel agents are looking up information about available seats on various flights and making new seat reservations. When several users access (and possibly modify) a database concurrently, the DBMS must order
CHAPTER 1 ir requests carefully to avoid conflicts. For example, when one travel agent le Flight 100 on some given day and finds an empty seat, another travel agent simultaneously be making a reservation for that seat, thereby making the information seen by the first agent obsolete Another example of concurrent use is a bank's database. While one user's application program is computing the total deposits, another application may transfer money from count that the first application has just ' to an account that has not yet been seen, thereby causing the total to appear larger than it should be. Clearly, such anomalies should not be allowed to occur. However, disallowing concurrent access can degrade performance. Further, the DBMS must protect users from the effects of system failures by ensuring that all data(and the status of active applications) is restored to a consistent state when the system is restarted after a crash. For example, if a travel agent asks for a reservation to be made, and the dbms responds saying that the reservation has been made, the reservation should not be lost if the system crashes. On the other hand, if the dBMs has not yet responded to the request, but is in the process of making the necessary changes to the data while the crash occurs, the partial changes should be undone when the system comes back up A transaction is any one execution of a user program in a DBMS(Executing the same program several times will generate several transactions. This is the basic unit of change as seen by the DBMS: Partial transactions are not allowed, and the effect of a group of transactions is equivalent to some serial execution of all transactions. We briefly outline how these properties are guaranteed, deferring a detailed discussion to later chapters. 1.7.1 Concurrent Execution of transactions An important task of a DBMS is to schedule concurrent accesses to data so that each user can safely ignore the fact that others are accessing the data concurrently. The im portance of this task cannot be underestimated because a database is typically shared by a large number of users, who submit their requests to the dBMs independently, and simply cannot be expected to deal with arbitrary changes being made concurrently by other users. A DBMS allows users to think of their programs as if they were executing in isolation, one after the other in some order chosen by the DBMs. For example, if a program that deposits cash into an account is submitted to the dbms at the same time as another program that debits money from the same account, either of these programs could be run first by the dBMs, but their steps will not be interleaved in such a way that they interfere with each other
16 Chapter 1 their requests carefully to avoid conflicts. For example, when one travel agent looks up Flight 100 on some given day and finds an empty seat, another travel agent may simultaneously be making a reservation for that seat, thereby making the information seen by the first agent obsolete. Another example of concurrent use is a bank’s database. While one user’s application program is computing the total deposits, another application may transfer money from an account that the first application has just ‘seen’ to an account that has not yet been seen, thereby causing the total to appear larger than it should be. Clearly, such anomalies should not be allowed to occur. However, disallowing concurrent access can degrade performance. Further, the DBMS must protect users from the effects of system failures by ensuring that all data (and the status of active applications) is restored to a consistent state when the system is restarted after a crash. For example, if a travel agent asks for a reservation to be made, and the DBMS responds saying that the reservation has been made, the reservation should not be lost if the system crashes. On the other hand, if the DBMS has not yet responded to the request, but is in the process of making the necessary changes to the data while the crash occurs, the partial changes should be undone when the system comes back up. A transaction is any one execution of a user program in a DBMS. (Executing the same program several times will generate several transactions.) This is the basic unit of change as seen by the DBMS: Partial transactions are not allowed, and the effect of a group of transactions is equivalent to some serial execution of all transactions. We briefly outline how these properties are guaranteed, deferring a detailed discussion to later chapters. 1.7.1 Concurrent Execution of Transactions An important task of a DBMS is to schedule concurrent accesses to data so that each user can safely ignore the fact that others are accessing the data concurrently. The importance of this task cannot be underestimated because a database is typically shared by a large number of users, who submit their requests to the DBMS independently, and simply cannot be expected to deal with arbitrary changes being made concurrently by other users. A DBMS allows users to think of their programs as if they were executing in isolation, one after the other in some order chosen by the DBMS. For example, if a program that deposits cash into an account is submitted to the DBMS at the same time as another program that debits money from the same account, either of these programs could be run first by the DBMS, but their steps will not be interleaved in such a way that they interfere with each other