The paradigm mismatch 11 In chapter 3,section 3.6."Mapping class inheritance,"we discuss how object/ relational mapping solutions such as Hibernate solve the problem of persisting a class hierarchy to a database table or tables.This problem is now quite well under- stood in the community,and most solutions support approximately the same func- tionality.But we aren't quite finished with inheritance-as soon as we introduce inheritance into the object model,we have the possibility of polymorphism. The User class has an association to the BillingDetails superclass.This is a poly morphic association.At runtime,a user object might be associated with an instance of any of the subclasses of BillingDetails.Similarly,we'd like to be able to write queries that refer to the BillingDetails class and have the query return instances of its subclasses.This feature is called polymorphic queries. Since SQL databases don't provide a notion of inheritance,it's hardly surprising that they also lack an obvious way to represent a polymorphic association.A stan- dard foreign key constraint refers to exactly one table;it isn't straightforward to define a foreign key that refers to multiple tables.We might explain this by saying that Java(and other object-oriented languages)is less strictly typed than SQL.For tunately,two of the inheritance mapping solutions we show in chapter 3 are designed to accommodate the representation of polymorphic associations and effi- cient execution of polymorphic queries. So.the mismatch of subtypes is one in which the inheritance structure in your Java model must be persisted in an SQL database that doesn't offer an inheritance strategy.The next aspect of the mismatch problem is the issue of object identity You probably noticed that we defined UsERNAME as the primary key of our UsER table.Was that a good choice?Not really.as you'll see next. 1.2.3 The problem of identity Although the problem of object identity might not be obvious at first,we'll encoun- ter it often in our growing and expanding example e-commerce system.This problem can be seen when we consider two objects(for example,two Users)and check if they're identical.There are three ways to tackle this problem,two in the Java world and one in our SQL database.As expected,they work together only with some help. Java objects define two different notions of sameness. Object identity (roughly equivalent to memory location,checked with a==b) Equality as determined by the implementation of the equals()method (also called equality by value)
The paradigm mismatch 11 In chapter 3, section 3.6, “Mapping class inheritance,” we discuss how object/ relational mapping solutions such as Hibernate solve the problem of persisting a class hierarchy to a database table or tables. This problem is now quite well understood in the community, and most solutions support approximately the same functionality. But we aren’t quite finished with inheritance—as soon as we introduce inheritance into the object model, we have the possibility of polymorphism. The User class has an association to the BillingDetails superclass. This is a polymorphic association. At runtime, a User object might be associated with an instance of any of the subclasses of BillingDetails. Similarly, we’d like to be able to write queries that refer to the BillingDetails class and have the query return instances of its subclasses. This feature is called polymorphic queries. Since SQL databases don’t provide a notion of inheritance, it’s hardly surprising that they also lack an obvious way to represent a polymorphic association. A standard foreign key constraint refers to exactly one table; it isn’t straightforward to define a foreign key that refers to multiple tables. We might explain this by saying that Java (and other object-oriented languages) is less strictly typed than SQL. Fortunately, two of the inheritance mapping solutions we show in chapter 3 are designed to accommodate the representation of polymorphic associations and efficient execution of polymorphic queries. So, the mismatch of subtypes is one in which the inheritance structure in your Java model must be persisted in an SQL database that doesn’t offer an inheritance strategy. The next aspect of the mismatch problem is the issue of object identity. You probably noticed that we defined USERNAME as the primary key of our USER table. Was that a good choice? Not really, as you’ll see next. 1.2.3 The problem of identity Although the problem of object identity might not be obvious at first, we’ll encounter it often in our growing and expanding example e-commerce system. This problem can be seen when we consider two objects (for example, two Users) and check if they’re identical. There are three ways to tackle this problem, two in the Java world and one in our SQL database. As expected, they work together only with some help. Java objects define two different notions of sameness: ■ Object identity (roughly equivalent to memory location, checked with a==b) ■ Equality as determined by the implementation of the equals() method (also called equality by value) Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
12 CHAPTER 1 Understanding object/relational persistence On the other hand,the identity of a database row is expressed as the primary key value.As you'll see in section 3.4."Understanding object identity."neither equals()nor =is naturally equivalent to the primary key value.It's common for several(nonidentical)objects to simultaneously represent the same row of the database.Furthermore,some subtle difficulties are involved in implementing equals()correctly for a persistent class. Let's discuss another problem related to database identity with an example.In our table definition for USER,we've used UsERNAME as a primary key.Unfortunately, this decision makes it difficult to change a username:We'd need to update not only the USERNAME column in USER,but also the foreign key column in BILLING_DETAILS. So,later in the book,we'll recommend that you use surrogate keyswherever possible A surrogate key is a primary key column with no meaning to the user.For example we might change our table definitions to look like this: create table USER USER ID BIGINT NOT NULL PRIMARY KEY USERNAME VARCHAR(15)NOT NULL UNIQUE. NAME VARCHAR(50)NOT NULL, create table BILLING_DETAILS BILLING DETAILS ID BIGINT NOT NULL PRIMARY KEY. ACCOUNT_NUMBER VARCHAR(10)NOT NULL UNIQUE, ACCOUNT_NAM E VARCHAR(50)NOT NULL, VARCHAR (2)NOT NULL. REFERENCES USER The USER_ID and BILLING_DETAILS_ID columns contain system-generated values These columns were introduced purely for the benefit of the relational data model How (if at all)should they be represented in the object model?We'll discuss this question in section 3.4 and find a solution with object/relational mapping. In the context of persistence.identity is closely related to how the system han dles caching and transactions.Different persistence solutions have chosen various strategies,and this has been an area of confusion.We cover all these interesting topics-and show how they're related-in chapter 5. The skeleton e-commerce application we've designed and implemented has served our purpose well.We've identified the mismatch problems with mapping granularity,subtypes,and object identity.We're almost ready to move on to other parts of the application.But first,we need to discuss the important concept of asso- ciations-that is,how the relationships between our classes are mapped and han- dled.Is the foreign key in the database all we need?
12 CHAPTER 1 Understanding object/relational persistence On the other hand, the identity of a database row is expressed as the primary key value. As you’ll see in section 3.4, “Understanding object identity,” neither equals() nor == is naturally equivalent to the primary key value. It’s common for several (nonidentical) objects to simultaneously represent the same row of the database. Furthermore, some subtle difficulties are involved in implementing equals() correctly for a persistent class. Let’s discuss another problem related to database identity with an example. In our table definition for USER, we’ve used USERNAME as a primary key. Unfortunately, this decision makes it difficult to change a username: We’d need to update not only the USERNAME column in USER, but also the foreign key column in BILLING_DETAILS. So, later in the book, we’ll recommend that you use surrogate keys wherever possible. A surrogate key is a primary key column with no meaning to the user. For example, we might change our table definitions to look like this: create table USER ( USER_ID BIGINT NOT NULL PRIMARY KEY, USERNAME VARCHAR(15) NOT NULL UNIQUE, NAME VARCHAR(50) NOT NULL, ... ) create table BILLING_DETAILS ( BILLING_DETAILS_ID BIGINT NOT NULL PRIMARY KEY, ACCOUNT_NUMBER VARCHAR(10) NOT NULL UNIQUE, ACCOUNT_NAME VARCHAR(50) NOT NULL, ACCOUNT_TYPE VARCHAR(2) NOT NULL, USER_ID BIGINT FOREIGN KEY REFERENCES USER ) The USER_ID and BILLING_DETAILS_ID columns contain system-generated values. These columns were introduced purely for the benefit of the relational data model. How (if at all) should they be represented in the object model? We’ll discuss this question in section 3.4 and find a solution with object/relational mapping. In the context of persistence, identity is closely related to how the system handles caching and transactions. Different persistence solutions have chosen various strategies, and this has been an area of confusion. We cover all these interesting topics—and show how they’re related—in chapter 5. The skeleton e-commerce application we’ve designed and implemented has served our purpose well. We’ve identified the mismatch problems with mapping granularity, subtypes, and object identity. We’re almost ready to move on to other parts of the application. But first, we need to discuss the important concept of associations—that is, how the relationships between our classes are mapped and handled. Is the foreign key in the database all we need? Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
The paradigm mismatch 13 1.2.4 Problems relating to associations In our object model,associations represent the relationships between entities. You remember that the User.Address.and BillingDetails classes are all associ ated.Unlike Address,BillingDetails stands on its own.BillingDetails objects are stored in their own table.Association mapping and the management of entity associations are central concepts of any object persistence solution. Object-oriented languages represent associations using object references and col- lections of object references.In the relational world,an association is represented as a foreign key column,with copies of key values in several tables.There are subtle differences between the two representations. Object references are inherently directional;the association is from one object to the other.If an association between objects should be navigable in both direc- tions,you must define the association /wice once in each of the associated classes. You've already seen this in our object model classes: public public class BillingDetails private Useruser; On the other hand,foreign key associations aren't by nature directional.In fact, navigation has no meaning for a relational data model,because you can create arbitrary data associations with table joins and projection. Actually,it isn't possible to determine the multiplicity of a unidirectional associ- ation by looking only at the Java classes.Java associations may have many-to-many multiplicity.For example,our object model might have looked like this: public class BillingDetails private Set users: Table associations on the other hand,are always one-to-many or one-to-one You can see the multiplicity immediately by looking at the foreign key definition.The fol- lowing is a one-to-many association (or,if read in that direction,a many-to-one):
The paradigm mismatch 13 1.2.4 Problems relating to associations In our object model, associations represent the relationships between entities. You remember that the User, Address, and BillingDetails classes are all associated. Unlike Address, BillingDetails stands on its own. BillingDetails objects are stored in their own table. Association mapping and the management of entity associations are central concepts of any object persistence solution. Object-oriented languages represent associations using object references and collections of object references. In the relational world, an association is represented as a foreign key column, with copies of key values in several tables. There are subtle differences between the two representations. Object references are inherently directional; the association is from one object to the other. If an association between objects should be navigable in both directions, you must define the association twice, once in each of the associated classes. You’ve already seen this in our object model classes: public class User { private Set billingDetails; ... } public class BillingDetails { private User user; ... } On the other hand, foreign key associations aren’t by nature directional. In fact, navigation has no meaning for a relational data model, because you can create arbitrary data associations with table joins and projection. Actually, it isn’t possible to determine the multiplicity of a unidirectional association by looking only at the Java classes. Java associations may have many-to-many multiplicity. For example, our object model might have looked like this: public class User { private Set billingDetails; ... } public class BillingDetails { private Set users; ... } Table associations on the other hand, are always one-to-many or one-to-one. You can see the multiplicity immediately by looking at the foreign key definition. The following is a one-to-many association (or, if read in that direction, a many-to-one): Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
14 CHAPTER 1 Understanding object/relational persistenc USER_ID BIGINT FOREIGN KEY REFERENCES USER These are one-to-one associations: If you wish to represent a many-to-many association in a relational database.you must introduce a new table,called a link table This table doesn't appear anywhere in the object model.For our example,if we consider the relationship between a user and the user's billing information to be many-to-many,the link table is defined as follows: CREATE DETAILS 。 S USER CRENCES BILLING_DETAILS PRIMARY KEY (USER_ID,BILLING_DETAILS_ID) We'll discuss association mappings in great detail in chapters 3 and 6. So far,the issues we've considered are mainly structural.We can see them by considering a purely static view of the system.Perhaps the most difficult problem in object persistence is a dynamic.It concerns associations.and we've already hinted at it when we drew a distinction between object graph navigationand table joins in section 1.1.4."Persistence in object-oriented applications."Let's explore thissig. nificant mismatch problem in more depth. 1.2.5 The problem of object graph navigation There is a fundamental difference in the way you access objects in Java and in a relational database.In Java,when you access the billing information of a user,you call aUser.getBillingDetails().getAccountNumber().This is the most natural way to access object-oriented data and is often described as walking the object graph. You navigate from one object to another,following associations between instances. Unfortunately.this isn't an efficient way to retrieve data from an SQL database. The single most important thing to do to improve performance of data access code is to minimize the number of requests to the database.The most obvious way to do this is to minimize the number of SQL queries.(Other ways include using stored procedures or the JDBC batch APL.) Therefore,efficient access to relational data using SQL usually requires the use of joins between the tables of interest.The number of tables included in the join determines the depth of the object graph you can navigate.For example,if we need to retrieve a User and aren't interested in the user's BillingDetails,we use this simple query:
14 CHAPTER 1 Understanding object/relational persistence USER_ID BIGINT FOREIGN KEY REFERENCES USER These are one-to-one associations: USER_ID BIGINT UNIQUE FOREIGN KEY REFERENCES USER BILLING_DETAILS_ID BIGINT PRIMARY KEY FOREIGN KEY REFERENCES USER If you wish to represent a many-to-many association in a relational database, you must introduce a new table, called a link table. This table doesn’t appear anywhere in the object model. For our example, if we consider the relationship between a user and the user’s billing information to be many-to-many, the link table is defined as follows: CREATE TABLE USER_BILLING_DETAILS ( USER_ID BIGINT FOREIGN KEY REFERENCES USER, BILLING_DETAILS_ID BIGINT FOREIGN KEY REFERENCES BILLING_DETAILS PRIMARY KEY (USER_ID, BILLING_DETAILS_ID) ) We’ll discuss association mappings in great detail in chapters 3 and 6. So far, the issues we’ve considered are mainly structural. We can see them by considering a purely static view of the system. Perhaps the most difficult problem in object persistence is a dynamic. It concerns associations, and we’ve already hinted at it when we drew a distinction between object graph navigation and table joins in section 1.1.4, “Persistence in object-oriented applications.” Let’s explore this significant mismatch problem in more depth. 1.2.5 The problem of object graph navigation There is a fundamental difference in the way you access objects in Java and in a relational database. In Java, when you access the billing information of a user, you call aUser.getBillingDetails().getAccountNumber(). This is the most natural way to access object-oriented data and is often described as walking the object graph. You navigate from one object to another, following associations between instances. Unfortunately, this isn’t an efficient way to retrieve data from an SQL database. The single most important thing to do to improve performance of data access code is to minimize the number of requests to the database. The most obvious way to do this is to minimize the number of SQL queries. (Other ways include using stored procedures or the JDBC batch API.) Therefore, efficient access to relational data using SQL usually requires the use of joins between the tables of interest. The number of tables included in the join determines the depth of the object graph you can navigate. For example, if we need to retrieve a User and aren’t interested in the user’s BillingDetails, we use this simple query: Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
The paradigm mismatch 15 select from USER u where u.USER_ID 123 On the other hand,if we need to retrieve the same User and then subsequently visit each of the associated BillingDetails instances,we use a different query: select from USER u left outer join BILLING_DETAILS bd on bd.USER_ID u.USER_ID where u.USER_ID 123 As you can see,we need to know what portion of the object graph we plan to access when we retrieve the initial User,before we start navigating the object graph! On the other hand,any object persistence solution provides functionality for fetching the data of associated objects only when the object is first accessed.How- ever,this piecemeal style of data access is fundamentally inefficient in the context of a relational database,because it requires execution of one select statement for each node of the object graph.This is the dreaded n+l selects problem. This mismatch in the way we access objects in Java and in a relational database is perhaps the single most common source of performance problems in Java appli cations.Yet,although we've been blessed with innumerable books and magazine articles advising us to use stringBuffer for string concatenation,it seems impossi- ble to find any advice about strategies for avoiding the n+1 selects problem.Fortu- nately.Hibernate provides sophisticated features for efficiently fetching graphs of objects from the database,transparently to the application accessing the graph.We discuss these features in chapters 4 and 7. We now have a quite elaborate list of object/relational mismatch problems. and it will be costly to find solutions,as you might know from experience.This cost is often underestimated.and we think this is a major reason for many failed software projects. 1.2.6 The cost of the mismatch The overall solution for the list of mismatch problems can require a significant outlay of time and effort.In our experience,the main purpose of up to 30 per cent of the Java application code written is to handle the tedious SQL/JDBC and the manual bridging of the object/relational paradigm mismatch.Despite all this effort,the end result still doesn't feel quite right.We've seen projects nearly sink due to the complexity and inflexibility of their database abstraction layers One of the major costs is in the area of modeling.The relational and object mod- els must both encompass the same business entities.But an object-oriented purist will model these entities in a very different way than an experienced relational data
The paradigm mismatch 15 select * from USER u where u.USER_ID = 123 On the other hand, if we need to retrieve the same User and then subsequently visit each of the associated BillingDetails instances, we use a different query: select * from USER u left outer join BILLING_DETAILS bd on bd.USER_ID = u.USER_ID where u.USER_ID = 123 As you can see, we need to know what portion of the object graph we plan to access when we retrieve the initial User, before we start navigating the object graph! On the other hand, any object persistence solution provides functionality for fetching the data of associated objects only when the object is first accessed. However, this piecemeal style of data access is fundamentally inefficient in the context of a relational database, because it requires execution of one select statement for each node of the object graph. This is the dreaded n+1 selects problem. This mismatch in the way we access objects in Java and in a relational database is perhaps the single most common source of performance problems in Java applications. Yet, although we’ve been blessed with innumerable books and magazine articles advising us to use StringBuffer for string concatenation, it seems impossible to find any advice about strategies for avoiding the n+1 selects problem. Fortunately, Hibernate provides sophisticated features for efficiently fetching graphs of objects from the database, transparently to the application accessing the graph. We discuss these features in chapters 4 and 7. We now have a quite elaborate list of object/relational mismatch problems, and it will be costly to find solutions, as you might know from experience. This cost is often underestimated, and we think this is a major reason for many failed software projects. 1.2.6 The cost of the mismatch The overall solution for the list of mismatch problems can require a significant outlay of time and effort. In our experience, the main purpose of up to 30 percent of the Java application code written is to handle the tedious SQL/JDBC and the manual bridging of the object/relational paradigm mismatch. Despite all this effort, the end result still doesn’t feel quite right. We’ve seen projects nearly sink due to the complexity and inflexibility of their database abstraction layers. One of the major costs is in the area of modeling. The relational and object models must both encompass the same business entities. But an object-oriented purist will model these entities in a very different way than an experienced relational data Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>