1.4 Database Languages There The mod re data m odels that we shall cover in the text into four different categories Relational Model.The relational model uses a collection of tables to repre- sent both data and the relationships among those data.Each table has mul- tiple columns.and each column has a unique name.Tables are also known as relations.The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several t yp es.Each table contains records of a rd type define fixed umb er of fields s,or attrib columns of the correspon to the att he record type.The relational dat del is the most widely used d data mode st major ity of current database systems are based on the relational model.Chapters 2 through 8 cover the relational model in detail. .Entity-Relationship Model.The entity-relationship(E-R)data model uses a collection of basic objects,called entities,and relationships among these objects. An entity is a“thing”or“object'”in the real world that is distinguishable from other objects.The entity-relationship model is widely used in database design,and Chapter 7 explores it in detail Object-Based Data Model.Object-oriented programming(especially in Java the 4 o d to the ent an object- odel that ca seen as extendi ing the w enc me (functions), and object identity.The object-relational dat model com features of the object-oriented data model and relational data model.Chap- ter 22 examines the object-relational data model. Semistructured Data Model.The semistructured data model permits the fiei ote ta mo metoned earlier.where ev ry data item of a particular type must have the same set of attributes.The Extensible Mark (XML)is widely used to represent semistructured data.Chapter 23c overs it Historically,the network data model and the hierarchical data model pre. ceded the relational data model.These models were tied closely to the underlying implementation,and complicated the task of modeling data.As a result they are used little now,except in old database code that is still in service in some places. They are outlined online in Appendices D and E for interested readers. 1.4 Database Languages A database system provides a data-definition language to specify the database schema and a data-manipulation language to express database queries and up-
1.4 Database Languages 9 There are a number of different data models that we shall cover in the text. The data models can be classified into four different categories: • Relational Model. The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields, or attributes. The columns of the table correspond to the attributes of the record type. The relational data model is the most widely used data model, and a vast majority of current database systems are based on the relational model. Chapters 2 through 8 cover the relational model in detail. • Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design, and Chapter 7 explores it in detail. • Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development methodology. This led to the development of an object-oriented data model that can be seen as extending the E-R model with notions of encapsulation, methods (functions), and object identity. The object-relational data model combines features of the object-oriented data model and relational data model. Chapter 22 examines the object-relational data model. • Semistructured Data Model. The semistructured data model permits the specification of data where individual data items of the same type may have different sets of attributes. This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. The Extensible Markup Language (XML) is widely used to represent semistructured data. Chapter 23 covers it. Historically, the network data model and the hierarchical data model preceded the relational data model. These models were tied closely to the underlying implementation, and complicated the task of modeling data. As a result they are used little now, except in old database code that is still in service in some places. They are outlined online in Appendices D and E for interested readers. 1.4 Database Languages A database system provides a data-definition language to specify the database schema and a data-manipulation language to express database queries and up-
10 Chapter 1 Introduction dates.In practice,the data-definition and data-manipulation languages are not two separate language es;instead they simply form parts of a single database lan- guage,such as the widely used sQL language 1.4.1 Data-Manipulation Language A data-manipulation language (DML)is a language that enablesu ers to access or manipulate data as organized by the appropriate data model.The types of access are: Retrieval of information stored in the database Insertion of new information into the database Deletion of information from the database Modification of information stored in the database There are basically two types .Procedural DMLs require a user to specify what data are needed and how to get those data Declarative DMLs(also referred to as nonprocedural DMLs)require a user to specify what data are needed without specifying how to get those data. Declarative DMLs are usually easier to learn and use than are procedural DMLs.However,since a user does not have to specify how to get the data,the database system has to figure out an efficient means of accessing data. A query is a statement requesting the retrieval of information.The portion of a DML that involves information retrieval is called a Although technically ect,it is common erms query language ar fatabascou qu anguages in use,either co mmercially or experime /the most videly use I query lang ge,SQL in Chap ters 3 4,and also study some oth r query languages iC apter 6 The levels of abstraction that we discussed in Section 1.3 apply not only to defining or structuring data,but also to manipulating data.At the physical level,we must define algorithms that allow efficient access to data.At higher levels of abstraction,we emphasize ease of use.The goal is to allow humans to interact efficiently with the system.The query processor component of the database system(which we study in Chapters 12 and 13)translates DML queries into sequer nces of actions at the physical level of the database system. 1.4.2 Data-Definition Language We specify ad器gecalledatt
10 Chapter 1 Introduction dates. In practice, the data-definition and data-manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language. 1.4.1 Data-Manipulation Language A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. The types of access are: • Retrieval of information stored in the database • Insertion of new information into the database • Deletion of information from the database • Modification of information stored in the database There are basically two types: • Procedural DMLs require a user to specify what data are needed and how to get those data. • Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data. Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user does not have to specify how to get the data, the database system has to figure out an efficient means of accessing data. A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. Although technically incorrect, it is common practice to use the terms query language and data-manipulation language synonymously. There are a number of database query languages in use, either commercially or experimentally. We study the most widely used query language, SQL, in Chapters 3, 4, and 5. We also study some other query languages in Chapter 6. The levels of abstraction that we discussed in Section 1.3 apply not only to defining or structuring data, but also to manipulating data. At the physical level, we must define algorithms that allow efficient access to data. At higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the system. The query processor component of the database system (which we study in Chapters 12 and 13) translates DML queries into sequences of actions at the physical level of the database system. 1.4.2 Data-Definition Language We specify a database schema by a set of definitions expressed by a special language called a data-definition language (DDL). The DDL is also used to specify additional properties of the data
1.4 Database Languages 11 vtr byt ore storage structure and acces f st :。1 ment in a spe type anguage ese statem he database schemas,which are usuall hidden from the users The data values stored in the database must satisfy certain consistency con- straints.For example,suppose the university requires that the account balance of a department must never be negative.The DDL provides facilities to specify such constraints.The database system checks these constraints every time the database is updated.In general,a constraint can be an arbitrary predicate per- taining to the database.However,arbitrary predicates may be costly to test.Thus database tems implement integrity constraints that can be tested with minimal overhead: Domain Constraints.A domain of possible values must be asso ciated with every attribute(for example,integer types,character types,date/time types) Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take.Domain constraints are the most elementary form of integrity constraint.They are tested easily by the system whenever a new data item is entered into the database. Referential Integrity.There are cases where we wish to ensure that a value pears in a cer For exa the deparme nt listed for must be that precis ar in t 0 recor difi straint is violated,the normal procedure is to reiect the action that caused the can cause v egrity. Whenmareferentiai violation. Assertions.An assertion is any condition that the database must always satisfy.Domain constraints and referential-integrity constraints are special forms of assertions.However,there are many constraints that we cannot express by using only these special forms.For example,"Every department offered every semeste must he ed a an assertion.When an assertion is created.the system tests it for validity.If the n is valid,the re n to the database is allowe only if it does not cause that assertion to be violated Authorization.We may want to differentiate among the users as far as the type ofaccess they are permitted on various data valu s in the database.These differentiations are expressed in terms of authorization,the most common being:read authorization,which allows reading,but not modification,of data;insert authorization,which allows insertion of new data,but not mod- ification of existing data;update authorization,which allows modification but not deletion,of data;and delete authorization,which allows deletion of data.We may assign the user all,none,or a combination of these types of authorization
1.4 Database Languages 11 We specify the storage structure and access methods used by the database system by a set of statements in a special type of DDL called a data storage and definition language. These statements define the implementation details of the database schemas, which are usually hidden from the users. The data values stored in the database must satisfy certain consistency constraints. For example, suppose the university requires that the account balance of a department must never be negative. The DDL provides facilities to specify such constraints. The database system checks these constraints every time the database is updated. In general, a constraint can be an arbitrary predicate pertaining to the database. However, arbitrary predicates may be costly to test. Thus, database systems implement integrity constraints that can be tested with minimal overhead: • Domain Constraints. A domain of possible values must be associated with every attribute (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. Domain constraints are the most elementary form of integrity constraint. They are tested easily by the system whenever a new data item is entered into the database. • Referential Integrity. There are cases where we wish to ensure that a value that appears in one relation for a given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For example, the department listed for each course must be one that actually exists. More precisely, the dept name value in a course record must appear in the dept name attribute of some record of the department relation. Database modifications can cause violations of referential integrity. When a referential-integrity constraint is violated, the normal procedure is to reject the action that caused the violation. • Assertions. An assertion is any condition that the database must always satisfy. Domain constraints and referential-integrity constraints are special forms of assertions. However, there are many constraints that we cannot express by using only these special forms. For example, “Every department must have at least five courses offered every semester” must be expressed as an assertion. When an assertion is created, the system tests it for validity. If the assertion is valid, then any future modification to the database is allowed only if it does not cause that assertion to be violated. • Authorization. We may want to differentiate among the users as far as the type of access they are permitted on various data values in the database. These differentiations are expressed in terms of authorization, the most common being: read authorization, which allows reading, but not modification, of data; insert authorization, which allows insertion of new data, but not modification of existing data; update authorization, which allows modification, but not deletion, of data; and delete authorization, which allows deletion of data. We may assign the user all, none, or a combination of these types of authorization
12 Chapter 1 Introduction The DDL,just like any other programming language,gets as input some instructions(statements)and generates some output.The output of the DDL is placed in the data dictionary,which contains metadata-that is,data about data The data dictionary is considered to be a special type of table that can only be accessed and updated by the database m its eif(not a regular user).The database system consult sthe data dictionary b fore rea odifying actual data. ading or m 1.5 Relational Databases nase is basd on the relational model and uses a collection of oth data and the relationships among those data.It als cludes a DM L and DDL.In CH napter 2 we present a gentle introd uct ion to the fundamentals of the relational model.Most commercial relational database sys tems employ the SQL language,which we cover in great detail in Chapters 3,4, and 5.In Chapter 6 we discuss other influential languages. 1.5.1 Tables table has multiple colmacn column has a unique name.Figure presents a sample relational es:one shows details of university instructors and the other shows details of the various university departments. The first table,the instructor table,shows,for example,that an instructor named Einstein with ID 22222 is a member of the Physics department and has an annual salary of s95.000.The second table.departient.shows.for example.that the biolo ogy department is located in the Watson building and has a budget of $90.000.Ofc ourse a real-world university would have many more departments rs We nall tables in the xt to illust cepts.A larger a is available odel is an exampl ofa record-ba model.Record-based models are so named b cause the database is structured in fixed format records of several types.Each tab contains records of a particular type.Ea h record type defines a fixed number of fields,or attributes.The columns of the table correspond to the attributes of the record type. It is not hard to see how tables may be stored in files.For instance,a special character (such as a comma)may be used to delimit the different attributes of a record,and another special character(such as a new-line character)may be used to delimit records.The relational model hides such low-level implementation details from database developers and users We also note that it is s possible to create sche as in the relational model that have proble arily duplica ted info ation.For example, sup t as d.Then I a particula budge one for 1 sics depart ment)changes,that change must to be reflected in the records of all instructors
12 Chapter 1 Introduction The DDL, just like any other programming language, gets as input some instructions (statements) and generates some output. The output of the DDL is placed in the data dictionary, which contains metadata— that is, data about data. The data dictionary is considered to be a special type of table that can only be accessed and updated by the database system itself (not a regular user). The database system consults the data dictionary before reading or modifying actual data. 1.5 Relational Databases A relational database is based on the relational model and uses a collection of tables to represent both data and the relationships among those data. It also includes a DML and DDL. In Chapter 2 we present a gentle introduction to the fundamentals of the relational model. Most commercial relational database systems employ the SQL language, which we cover in great detail in Chapters 3, 4, and 5. In Chapter 6 we discuss other influential languages. 1.5.1 Tables Each table has multiple columns and each column has a unique name. Figure 1.2 presents a sample relational database comprising two tables: one shows details of university instructors and the other shows details of the various university departments. The first table, the instructor table, shows, for example, that an instructor named Einstein with ID 22222 is a member of the Physics department and has an annual salary of $95,000. The second table, department, shows, for example, that the Biology department is located in the Watson building and has a budget of $90,000. Of course, a real-world university would have many more departments and instructors. We use small tables in the text to illustrate concepts. A larger example for the same schema is available online. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields, or attributes. The columns of the table correspond to the attributes of the record type. It is not hard to see how tables may be stored in files. For instance, a special character (such as a comma) may be used to delimit the different attributes of a record, and another special character (such as a new-line character) may be used to delimit records. The relational model hides such low-level implementation details from database developers and users. We also note that it is possible to create schemas in the relational model that have problems such as unnecessarily duplicated information. For example, suppose we store the department budget as an attribute of the instructor record. Then, whenever the value of a particular budget (say that one for the Physics department) changes, that change must to be reflected in the records of all instructors
1.5 Relational Databases 13 ID name dept name salary Einstein Physics 95000 1212 00 34 El Said 6000 4556 Comp.Sci 98345 Kim Elec.Eng. ǒUU 76760 Crick Biology 72000 10101 Srinivasan Comp.Sci. 65000 58582 Califieri 62000 R2871 Brandt Sci g2000 15161 Mozart 40000 33456 Gold Physic 87000 76543 Singh Finance 80000 (a)The instructor table dept name building budget mp.Sci. 100000 9000 85000 Music nance 2000 History ainter 5000 Physics Watson 70000 (b)The department table Figure 1.2 A sample relational database. distinguish goods nt.In Chapter we shall study how to ad schema designs 1.5.2 Data-Manipulation Language The SQL query language is nonprocedural.A query takes as input several tables (possibly only one)and always returns a single table.Here is an example of an SQL query that finds the names of all instructors in the History department: select instructor.name from instructor where instructor.dept name ='History'; The query specifies that those rows from the table instructor where the dept name is History must be retrieved,and the name attribute of these rows must be displayed. More specifically,the result of executing this query is a table with a single column
1.5 Relational Databases 13 ID name dept name salary 22222 Einstein Physics 95000 12121 Wu Finance 90000 32343 El Said History 60000 45565 Katz Comp. Sci. 75000 98345 Kim Elec. Eng. 80000 76766 Crick Biology 72000 10101 Srinivasan Comp. Sci. 65000 58583 Califieri History 62000 83821 Brandt Comp. Sci. 92000 15151 Mozart Music 40000 33456 Gold Physics 87000 76543 Singh Finance 80000 (a) The instructor table dept name building budget Comp. Sci. Taylor 100000 Biology Watson 90000 Elec. Eng. Taylor 85000 Music Packard 80000 Finance Painter 120000 History Painter 50000 Physics Watson 70000 (b) The department table Figure 1.2 A sample relational database. associated with the Physics department. In Chapter 8, we shall study how to distinguish good schema designs from bad schema designs. 1.5.2 Data-Manipulation Language The SQL query language is nonprocedural. A query takes as input several tables (possibly only one) and always returns a single table. Here is an example of an SQL query that finds the names of all instructors in the History department: select instructor.name from instructor where instructor.dept name = ’History’; The query specifies that those rows from the table instructor where the dept nameis History must be retrieved, and the name attribute of these rows must be displayed. More specifically, the result of executing this query is a table with a single column