14 Chapter 1 Introduction labeled name,and a set of rows,each of which contains the name of an instructor whose dept name.is History.If the querv is run on the table in figure 1.2.the result will consist of two rows,one with the name El Said and the other with the name Califieri. Queries may involve information from more than one table.For instance,the owing qu n with budget of greater. of all instructor select instructor.ID,department.dept-name from instructor.department where instructor.dept name=department.dept name and department.budget>95000; If the above query were run on the tables in Figure 1.2,the system would find that there are two departments with budget of greater than $95,000-Computer Science and Finance;there are five instructors in these departments.Thus,the result will consist of a table with two columns (ID,dept name)and five rows: (12121,Finance),(45565,Computer Science),(10101,Computer Science),(83821, Computer Science),and(76543,Finance). 1.5.3 Data-Definition Language SQL provides a rich DDL that allows one to define tables,integrity constraints, assertions,etc. For instance,the following SQL DDL statement defines the department table: create table department (dept char (15) buaget numeric (12,2)); Execution of the above dDL statement creates the table with three com:dep budget,each of which hasa specific datat ted w vith it.We dis ail in Ch apter 3.In the DDL ent dictionary,which cor ins meta Section 1.4.2).The schema of a table is an example of metadata 1.5.4 Database Access from Application Programs SOL is not as powerful as a universal Turing machine:that is.there are some o i sers ut to ays,or on over or .Suc ns mus C++,or Java bas pp 10 programs are programs that are used to interact with the database in this fashion
14 Chapter 1 Introduction labeled name, and a set of rows, each of which contains the name of an instructor whose dept name, is History. If the query is run on the table in Figure 1.2, the result will consist of two rows, one with the name El Said and the other with the name Califieri. Queries may involve information from more than one table. For instance, the following query finds the instructor ID and department name of all instructors associated with a department with budget of greater than $95,000. select instructor.ID, department.dept name from instructor, department where instructor.dept name= department.dept name and department.budget > 95000; If the above query were run on the tables in Figure 1.2, the system would find that there are two departments with budget of greater than $95,000—Computer Science and Finance; there are five instructors in these departments. Thus, the result will consist of a table with two columns (ID, dept name) and five rows: (12121, Finance), (45565, Computer Science), (10101, Computer Science), (83821, Computer Science), and (76543, Finance). 1.5.3 Data-Definition Language SQL provides a rich DDL that allows one to define tables, integrity constraints, assertions, etc. For instance, the following SQL DDL statement defines the department table: create table department (dept name char (20), building char (15), budget numeric (12,2)); Execution of the above DDL statement creates the department table with three columns: dept name, building, and budget, each of which has a specific data type associated with it. We discuss data types in more detail in Chapter 3. In addition, the DDL statement updates the data dictionary, which contains metadata (see Section 1.4.2). The schema of a table is an example of metadata. 1.5.4 Database Access from Application Programs SQL is not as powerful as a universal Turing machine; that is, there are some computations that are possible using a general-purpose programming language but are not possible using SQL. SQL also does not support actions such as input from users, output to displays, or communication over the network. Such computations and actions must be written in a host language, such as C, C++, or Java, with embedded SQL queries that access the data in the database. Application programs are programs that are used to interact with the database in this fashion
1.6 Database Design 15 cou 03 lent To acce from the hos language.There are two ways to do this By providing an application program interface(set of procedures)that can be used to send dML and dDL statements to the database and retrieve the results. The Open Database Connectivity (ODBC)standard for use with the C anguage s a commonly uscd application proidronineature Java Database Connectivity (IDBC)standard p to the Java language nguage synt to embed DML calls within the host ual a special prefa calls,and preprocessor,calle precompiler,converts the DML statements to normal procedure calls in the host language. 1.6 Database Design Database svstems are designed to manage large bodies of information.These large bodies of information do not exist in isolation.They are part of the operation of some enterprise whose end t product may be information from the database or may be some device or service for which the database plays only a supporting Database design mainly involves the design of the database schema.Th design of a complete dat e applic ation environment tha meet e needs the enterprise being modeled requires attention to a broader set of issues.In this text,we focus initially on the writing of database queries and the design of database schemas.Chapter 9 discusses the overall process of application design. 1.6.1 Design Process A high-level data model provides the database designer with a conceptual frame- work in which to st pecify the data requirements of the database use rs,and how the database will be structured to fulfill these requirements.The initial phase of database desi then,is to characterize fully the data database use 7etsand .The databa eds to 1 19 sively with do carry out thi sk.Theoutcome of this phase is a specification er requirements. Next,the designer chooses a data model,and by applying the concepts of the chosen data model,translates these requirements into a conceptual schema of the database.The schema developed at this conceptual-design phase provides a detailed overview of the enterprise.The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another.The designer can also examine the design to remove any redundant
1.6 Database Design 15 Examples in a university system are programs that allow students to register for courses, generate class rosters, calculate student GPA, generate payroll checks, etc. To access the database, DML statements need to be executed from the host language. There are two ways to do this: • By providing an application program interface (set of procedures) that can be used to send DML and DDL statements to the database and retrieve the results. The Open Database Connectivity (ODBC) standard for use with the C language is a commonly used application program interface standard. The Java Database Connectivity (JDBC) standard provides corresponding features to the Java language. • By extending the host language syntax to embed DML calls within the host language program. Usually, a special character prefaces DML calls, and a preprocessor, called the DML precompiler, converts the DML statements to normal procedure calls in the host language. 1.6 Database Design Database systems are designed to manage large bodies of information. These large bodies of information do not exist in isolation. They are part of the operation of some enterprise whose end product may be information from the database or may be some device or service for which the database plays only a supporting role. Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues. In this text, we focus initially on the writing of database queries and the design of database schemas. Chapter 9 discusses the overall process of application design. 1.6.1 Design Process A high-level data model provides the database designer with a conceptual framework in which to specify the data requirements of the database users, and how the database will be structured to fulfill these requirements. The initial phase of database design, then, is to characterize fully the data needs of the prospective database users. The database designer needs to interact extensively with domain experts and users to carry out this task. The outcome of this phase is a specification of user requirements. Next, the designer chooses a data model, and by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database. The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. The designer can also examine the design to remove any redundant
16 Chapter 1 Introduction features.The focus at this point is on describing the data and their relationships rather than on specifying physical storage details. In terms of the relational model,the conceptual-design process involves de- cisions on what attributes we want to capture in the database and how to these attributes to form the various tables The"what decisio n,and we shallr cussit furtherin this textTh ck par y one is t use the ther is to employ a set of algorithms(collectively known as normalization)that takes as input the set of all attributes and generates a set of tables(Section 1.6.4). A fully developed conceptual schema indicates the functional requirements of the enterprise.Ina specification of functional requirements.users describe the kinds of operations(or transactions)that will be performed on the data.Example operations include modifying or updating data,searching for and retrieving specific data,and deleting data.At this stage of conceptual design,the designer can review the schema to ensure it meets functional requirements The moving from an abstract data model to the the databas e pr n two final de the falemenationo e th des igh che o the me ntation data e system that ill be used.The de uses the resulting system specific dat se sch subsequent ph ical-design phase which the physical features of th e databas e are specified.These features includ the form of file organization and the internal storage structures;they are discussed in Chapter 10. 1.6.2 Database Design for a University Organization To illustrate the design process.let us examine how a database for a university could be desis ed.The initial spe quirements may be based inte vith the databas lys The t aris s fr s the spe characte ts the ptual structure o lere are th e major e university The university is organized into departments.Each department is identified by a unique name (dept name),is located in a particular building,and has a budget. Each department has a list of courses it offers.Each course has associated with it a courseid,title,dept name,and credits,and may also have have associated prerequisites que ID.Each instructor has name,asso alary .Students are identified b y their unique ID.Each studenth e,an associ- ated major department(deptme),and totcred (total credit hours thestudent earned thus far)
16 Chapter 1 Introduction features. The focus at this point is on describing the data and their relationships, rather than on specifying physical storage details. In terms of the relational model, the conceptual-design process involves decisions on what attributes we want to capture in the database and how to group these attributes to form the various tables. The “what” part is basically a business decision, and we shall not discuss it further in this text. The “how” part is mainly a computer-science problem. There are principally two ways to tackle the problem. The first one is to use the entity-relationship model (Section 1.6.3); the other is to employ a set of algorithms (collectively known as normalization) that takes as input the set of all attributes and generates a set of tables (Section 1.6.4). A fully developed conceptual schema indicates the functional requirements of the enterprise. In a specification of functional requirements, users describe the kinds of operations (or transactions) that will be performed on the data. Example operations include modifying or updating data, searching for and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can review the schema to ensure it meets functional requirements. The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases. In the logical-design phase, the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used. The designer uses the resulting system-specific database schema in the subsequent physical-design phase, in which the physical features of the database are specified. These features include the form of file organization and the internal storage structures; they are discussed in Chapter 10. 1.6.2 Database Design for a University Organization To illustrate the design process, let us examine how a database for a university could be designed. The initial specification of user requirements may be based on interviews with the database users, and on the designer’s own analysis of the organization. The description that arises from this design phase serves as the basis for specifying the conceptual structure of the database. Here are the major characteristics of the university. • The university is organized into departments. Each department is identified by a unique name (dept name), is located in a particular building, and has a budget. • Each department has a list of courses it offers. Each course has associated with it a course id, title, dept name, and credits, and may also have have associated prerequisites. • Instructors are identified by their unique ID. Each instructor has name, associated department (dept name), and salary. • Students are identified by their unique ID. Each student has a name, an associated major department (dept name), and tot cred (total credit hours the student earned thus far)
1.6 Database Design 17 .The university maintains a list of classrooms,specifying the name of the building,roomnumber,and room capacity. .The university maintains a list of all classes(sections)taught.Each section is identified by a course id,secid,year,and semester,and has associated with it a semester,year,building,room-number,and time-slot id (the time slot when the class meets). The department has a list of teaching assignments specifying,for each in- structor,the sections the instructor is teaching. The university has a list of all student course registrations, becifving.for each student,the courses and the associated sections that the student has taken(registered for). A real university database would be much more complex than the preceding design.However we use this simplified model to help you understand conceptual ideas without getting lost in details of a complex design. 1.6.3 The Entity-Relationship Model The entity-relationship(E-R)data model uses a collection of basic objects,called entities. nd relationshi cts.An entity is a“thing”or"object in the eal world that is dis ects.E example,each perso an entity,and b IK acc nts can be cons d as entities Entities are described in a database by a set of attributes.For example,the attributes dept name,building,and budget may describe one particular department in a university,and they form attributes of the department entity set.Similarly, attributes ID,name,and salary may describe an instructor entity. The extra attribute ID is used to identify an instructor uniquely(since it may be possible to have two instructors with the same name and the same salary) A unique instructor identifier must be assigned to each instructor.In the United States,many organizations use the social-security number of a person(a unique number the U.S government assigns to every person in the United States uni tifier ionship is ana ociatio g seve ralentities.Fo ex relatio mple an in The se the same type and th all en eset of all relationships of the same type are termed an entity set and r lationship set,respectively The overall logical structure(schema)of a database can be expressed graph- ically by an entity-relationship (E-R)diagram.There are several ways in which to draw these diagrams.One of the most popular is to use the Unified Modeling Language (UML).In the notation we use,which is based on UML,an E-R diagram is represented as follows:
1.6 Database Design 17 • The university maintains a list of classrooms, specifying the name of the building, room number, and room capacity. • The university maintains a list of all classes (sections) taught. Each section is identified by a course id, sec id, year, and semester, and has associated with it a semester, year, building, room number, and time slot id (the time slot when the class meets). • The department has a list of teaching assignments specifying, for each instructor, the sections the instructor is teaching. • The university has a list of all student course registrations, specifying, for each student, the courses and the associated sections that the student has taken (registered for). A real university database would be much more complex than the preceding design. However we use this simplified model to help you understand conceptual ideas without getting lost in details of a complex design. 1.6.3 The Entity-Relationship Model The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example, each person is an entity, and bank accounts can be considered as entities. Entities are described in a database by a set of attributes. For example, the attributes dept name, building, and budget may describe one particular department in a university, and they form attributes of the department entity set. Similarly, attributes ID, name, and salary may describe an instructor entity.2 The extra attribute ID is used to identify an instructor uniquely (since it may be possible to have two instructors with the same name and the same salary). A unique instructor identifier must be assigned to each instructor. In the United States, many organizations use the social-security number of a person (a unique number the U.S. government assigns to every person in the United States) as a unique identifier. A relationship is an association among several entities. For example, a member relationship associates an instructor with her department. The set of all entities of the same type and the set of all relationships of the same type are termed an entity set and relationship set, respectively. The overall logical structure (schema) of a database can be expressed graphically by an entity-relationship (E-R) diagram. There are several ways in which to draw these diagrams. One of the most popular is to use the Unified Modeling Language (UML). In the notation we use, which is based on UML, an E-R diagram is represented as follows: 2The astute reader will notice that we dropped the attribute dept name from the set of attributes describing the instructor entity set; this is not an error. In Chapter 7 we shall provide a detailed explanation of why this is the case
18 Chapter1 Introduction instructor department member dept name name buildig salary budget Figure 1.3 A sample E-R diagram. Entity sets are represented by a rectangular box with the entity set name in the header and the attributes listed below it. Relationship sets are represented by a diamond connecting a pair of related entity sets.The name of the relationship is placed inside the diamond. As an illustration,consider pao these consisting of instruc. tors and the c sociated Fi E-R di The E-R dia indicates gure 1.shows the that t nent,with attri earli are tw es The diagram als In addition to entities and relationships,the E-R model represents certain constraints to which the contents of a database must conform.One important constraint is mapping cardinalities,which express the number of entities to which another entity can be associated via a relationship set.For example,if each instructor must be associated with only a single department,the E-R model can express that constraint. The entity-relationship model is widely used in database design,and Chapter 7explores it in detail. 1.6.4 Normalization Another method for designing a relational database is to use a process commonly The goal is to generate a set of relation schemas tha allows us to store information without unnecessary redundancy,yet also allows us to retrieve information easily.The approach is to design schemas that are in an appropriate normal form.To determine whether a relation schema is in one of the desirable normal forms,we need additional information about the real-world enterprise that we are modeling with the database.The most common approach is to use functional dependencies,which we cover in Section 8.4. To understand the may have are: Repetition of information .Inability to represent certain information
18 Chapter 1 Introduction instructor ID name salary department dept_name building budget member Figure 1.3 A sample E-R diagram. • Entity sets are represented by a rectangular box with the entity set name in the header and the attributes listed below it. • Relationship sets are represented by a diamond connecting a pair of related entity sets. The name of the relationship is placed inside the diamond. As an illustration, consider part of a university database consisting of instructors and the departments with which they are associated. Figure 1.3 shows the corresponding E-R diagram. The E-R diagram indicates that there are two entity sets, instructor and department, with attributes as outlined earlier. The diagram also shows a relationship member between instructor and department. In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform. One important constraint is mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set. For example, if each instructor must be associated with only a single department, the E-R model can express that constraint. The entity-relationship model is widely used in database design, and Chapter 7 explores it in detail. 1.6.4 Normalization Another method for designing a relational database is to use a process commonly known as normalization. The goal is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to design schemas that are in an appropriate normal form. To determine whether a relation schema is in one of the desirable normal forms, we need additional information about the real-world enterprise that we are modeling with the database. The most common approach is to use functional dependencies, which we cover in Section 8.4. To understand the need for normalization, let us look at what can go wrong in a bad database design. Among the undesirable properties that a bad design may have are: • Repetition of information • Inability to represent certain information