CHAPTER 1 Understanding object/relational persistence aren't persistent;a transient object has a limited lifetime that is bounded by the life of the process that instantiated it.Almost all Java applications contain a mix of per sistent and transient objects;hence we need a subsystem that manages our persis tent data. Modern relational databases provide a structured representation of persistent data,enabling sorting.searching,and aggregation of data.Database management systems are responsible for managing concurrency and data integrity:they're responsible for sharing data between multiple users and multiple applications.A database management system also provides data-level security.When we discuss persistence in this book,we're thinking of all these things: Storage,organization,and retrieval of structured data Concurrency and data integrity ■Data sharing In particular,we're thinking of these problems in the context of an object-ori- ented application that uses a domain model. An application with a domain model doesn't work directly with the tabular rep resentation of the business entities;the application has its own,object-oriented model of the business entities.If the database has ITEM and BID tables,the Java application defines Item and Bid classes. Then,instead of directly working with the rows and columns of an SQL result set,the business logic interacts with this object-oriented domain model and its runtime realization as a graph of interconnected objects.The business logic is never executed in the database (as an SQL stored procedure),it's implemented in Java.This allows business logic to make use of sophisticated object-oriented con- cepts such as inheritance and polymorphism.For example,we could use well known design patterns such as Strategy.Mediator,and Composite [GOF 1995].all of which depend on polymorphic method calls.Now a caveat:Not all Java applica tions are designed this way.nor should they be.Simple applications might be much better off without a domain model.SQL and the JDBC API are perfectly serviceable for dealing with pure tabular data,and the new JDBC Row Set (Sun JCP,JSR 114) makes CRUD operations even easier.Working with a tabular representation of per. sistent data is straightforward and well understood. However,in the case of applications with nontrivial business logic,the domain model helps to improve code reuse and maintainability significantly.We focus on applications with a domain model in this book,since Hibernate and ORM in gen- eral are most relevant to this kind of application
6 CHAPTER 1 Understanding object/relational persistence aren’t persistent; a transient object has a limited lifetime that is bounded by the life of the process that instantiated it. Almost all Java applications contain a mix of persistent and transient objects; hence we need a subsystem that manages our persistent data. Modern relational databases provide a structured representation of persistent data, enabling sorting, searching, and aggregation of data. Database management systems are responsible for managing concurrency and data integrity; they’re responsible for sharing data between multiple users and multiple applications. A database management system also provides data-level security. When we discuss persistence in this book, we’re thinking of all these things: ■ Storage, organization, and retrieval of structured data ■ Concurrency and data integrity ■ Data sharing In particular, we’re thinking of these problems in the context of an object-oriented application that uses a domain model. An application with a domain model doesn’t work directly with the tabular representation of the business entities; the application has its own, object-oriented model of the business entities. If the database has ITEM and BID tables, the Java application defines Item and Bid classes. Then, instead of directly working with the rows and columns of an SQL result set, the business logic interacts with this object-oriented domain model and its runtime realization as a graph of interconnected objects. The business logic is never executed in the database (as an SQL stored procedure), it’s implemented in Java. This allows business logic to make use of sophisticated object-oriented concepts such as inheritance and polymorphism. For example, we could use wellknown design patterns such as Strategy, Mediator, and Composite [GOF 1995], all of which depend on polymorphic method calls. Now a caveat: Not all Java applications are designed this way, nor should they be. Simple applications might be much better off without a domain model. SQL and the JDBC API are perfectly serviceable for dealing with pure tabular data, and the new JDBC RowSet (Sun JCP, JSR 114) makes CRUD operations even easier. Working with a tabular representation of persistent data is straightforward and well understood. However, in the case of applications with nontrivial business logic, the domain model helps to improve code reuse and maintainability significantly. We focus on applications with a domain model in this book, since Hibernate and ORM in general are most relevant to this kind of application. Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
The paradigm mismatch 1 If we consider SQL and relational databases again,we finally observe the mis- match between the two paradigms. SQL operations such as projection and join always result in a tabular representa- tion of the resulting data.This is quite different than the graph of interconnected objects used to execute the business logic in a Java application!These are funda- mentally different models,not just different ways of visualizing the same model. With this realization,we can begin to see the problems-some well understood and some less well understood-that must be solved by an application that com bines both data representations:an object-oriented domain model and a persistent relational model.Let's take a closer look. 1.2 The paradigm mismatch The paradigm mismatch can be broken down into several parts,which we'll exam- User 1.BillingDetails ine one at a time.Let's start our explora- Figure 11 A simple UML class diagram of the tion with a simple example that is problem user and billing details entities free.Then,as we build on it,you'll begin to see the mismatch appear. Suppose you have to design and implement an online e-commerce application.In this application,you'd need a class to represent information about a user of the system.and another class to represent information about the user's billing details. as shown in figure 1.1. Looking at this diagram,you see that a User has many BillingDetails.You can navigate the relationship between the classes in both directions.To begin with,the classes representing these entities might be extremely simple: public class User private String userName; private s private set billingDetails /accessor methods (get/set pairs),business methods,etc public class BillingDetails private er private User
The paradigm mismatch 7 If we consider SQL and relational databases again, we finally observe the mismatch between the two paradigms. SQL operations such as projection and join always result in a tabular representation of the resulting data. This is quite different than the graph of interconnected objects used to execute the business logic in a Java application! These are fundamentally different models, not just different ways of visualizing the same model. With this realization, we can begin to see the problems—some well understood and some less well understood—that must be solved by an application that combines both data representations: an object-oriented domain model and a persistent relational model. Let’s take a closer look. 1.2 The paradigm mismatch The paradigm mismatch can be broken down into several parts, which we’ll examine one at a time. Let’s start our exploration with a simple example that is problem free. Then, as we build on it, you’ll begin to see the mismatch appear. Suppose you have to design and implement an online e-commerce application. In this application, you’d need a class to represent information about a user of the system, and another class to represent information about the user’s billing details, as shown in figure 1.1. Looking at this diagram, you see that a User has many BillingDetails. You can navigate the relationship between the classes in both directions. To begin with, the classes representing these entities might be extremely simple: public class User { private String userName; private String name; private String address; private Set billingDetails; // accessor methods (get/set pairs), business methods, etc. ... } public class BillingDetails { private String accountNumber; private String accountName; private String accountType; private User user; User BillingDetails 1..* Figure 1.1 A simple UML class diagram of the user and billing details entities Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
CHAPTER 1 Understanding object/relational persistence //methods,get/set pairs... 1 Note that we're only interested in the state of the entities with regard to persis tence,so we've omitted the implementation of property accessors and business methods (such as getuserName()or billAuction()).It's quite easy to come up with a good SQL schema design for this case: create table USER USERNAME VARCHAR(15)NOT NULL PRIMARY KEY NAME VARCHAR (50)NOT NULL, ADDRESS VARCHAR(100) create table BILLING_DETAILS TYPE VARCHAR(2 USERNAME VARCHAR(15)FOREIGN KEY REFERENCES USER The relationship between the two entities is represented as the foreign key USERNAME,in BILLING_DETAILS.For this simple object model,the object/relational mismatch is barely in evidence;it's straightforward to write JDBC code to insert, update,and delete information about user and billing details. Now,let's see what happens when we consider something a little more realistic. The paradigm mismatch will be visible when we add more entities and entity rela- tionships to our application. The most glaringly obvious problem with our current implementation is that we've modeled an address as a simple string value.In most systems,it's necessary to store street.city.state,country.and ZIP code information separately.Of course,we could add these properties directly to the User class,but since it's highly likely that other classes in the system will also carry address information.it makes more sense to create a separate Address class.The updated object model is shown in figure 1.2. Should we also add an ADDREss table?Not necessarily.It's common to keep address information in the usER table,in individual columns.This design is likely to perform better,since we don't require a table join to retrieve the user and address in a single query.The nicest solution might even be to create a user-defined Addressk← ◆User 1.BillingDetails Figure 12 The User has an Address
8 CHAPTER 1 Understanding object/relational persistence //methods, get/set pairs... ... } Note that we’re only interested in the state of the entities with regard to persistence, so we’ve omitted the implementation of property accessors and business methods (such as getUserName() or billAuction()). It’s quite easy to come up with a good SQL schema design for this case: create table USER ( USERNAME VARCHAR(15) NOT NULL PRIMARY KEY, NAME VARCHAR(50) NOT NULL, ADDRESS VARCHAR(100) ) create table BILLING_DETAILS ( ACCOUNT_NUMBER VARCHAR(10) NOT NULL PRIMARY Key, ACCOUNT_NAME VARCHAR(50) NOT NULL, ACCOUNT_TYPE VARCHAR(2) NOT NULL, USERNAME VARCHAR(15) FOREIGN KEY REFERENCES USER ) The relationship between the two entities is represented as the foreign key, USERNAME, in BILLING_DETAILS. For this simple object model, the object/relational mismatch is barely in evidence; it’s straightforward to write JDBC code to insert, update, and delete information about user and billing details. Now, let’s see what happens when we consider something a little more realistic. The paradigm mismatch will be visible when we add more entities and entity relationships to our application. The most glaringly obvious problem with our current implementation is that we’ve modeled an address as a simple String value. In most systems, it’s necessary to store street, city, state, country, and ZIP code information separately. Of course, we could add these properties directly to the User class, but since it’s highly likely that other classes in the system will also carry address information, it makes more sense to create a separate Address class. The updated object model is shown in figure 1.2. Should we also add an ADDRESS table? Not necessarily. It’s common to keep address information in the USER table, in individual columns. This design is likely to perform better, since we don’t require a table join to retrieve the user and address in a single query. The nicest solution might even be to create a user-defined User BillingDetails 1..* Address Figure 1.2 The User has an Address. Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
The paradigm mismatch SQL data type to represent addresses and to use a single column of that new type in the USER table instead of several new columns. Basically,we have the choice of adding either several columns or a single col- umn (of a new SQL data type).This is clearly a problem of granularity. 1.2.1 The problem of granularity Granularity refers to the relative size of the objects you're working with.When we're talking about Java objects and database tables,the granularity problem means persisting objects that can have various kinds of granularity to tables and columns that are inherently limited in granularity. Let's return to our example.Adding a new data type to store Address Java objects in a single column to our database catalog sounds like the best approach After all,a new Address type(class)in Java and a new ADDRESS SQL data type should guarantee interoperability.However.you'll find various problems if you check the support for user-defined column types (UDT)in today's SQL database manage ment systems UDT support is one of a number of so-called object-relational extensions to tradi- tional SQL.Unfortunately.UDT support is a somewhat obscure feature of most SQL database management systems and certainly isn't portable between different sys- tems.The SQL standard supports user-defined data types,but very poorly.For this reason and (whatever)other reasons,use of UDTs isn't common practice in the industry at this time-and it's unlikely that you'll encounter a legacy schema that makes extensive use of UDTs.We therefore can't store objects of our new Address class in a single new column of an equivalent user-defined SQL data type.Our solu tion for this problem has several columns,of vendor-defined SQL types (such as boolean,numeric,and string data types).Considering the granularity of our tables again,the usER table is usually defined as follows: create table USER USERNAME VARCHAR (15)NOT NULL PRIMARY KEY NAME VARCHAR(50)NOT NULL ADDRESS_STREET VARCHAR(50) ADDRES SS CITY RCHAR (1 ADDRESS ZTP ADDRESS COUNTRY VARCHAR (15) This leads to the following observation:Classes in our domain object model come in a range of different levels of granularity-from coarse-grained entity classes like
The paradigm mismatch 9 SQL data type to represent addresses and to use a single column of that new type in the USER table instead of several new columns. Basically, we have the choice of adding either several columns or a single column (of a new SQL data type). This is clearly a problem of granularity. 1.2.1 The problem of granularity Granularity refers to the relative size of the objects you’re working with. When we’re talking about Java objects and database tables, the granularity problem means persisting objects that can have various kinds of granularity to tables and columns that are inherently limited in granularity. Let’s return to our example. Adding a new data type to store Address Java objects in a single column to our database catalog sounds like the best approach. After all, a new Address type (class) in Java and a new ADDRESS SQL data type should guarantee interoperability. However, you’ll find various problems if you check the support for user-defined column types (UDT) in today’s SQL database management systems. UDT support is one of a number of so-called object-relational extensions to traditional SQL. Unfortunately, UDT support is a somewhat obscure feature of most SQL database management systems and certainly isn’t portable between different systems. The SQL standard supports user-defined data types, but very poorly. For this reason and (whatever) other reasons, use of UDTs isn’t common practice in the industry at this time—and it’s unlikely that you’ll encounter a legacy schema that makes extensive use of UDTs. We therefore can’t store objects of our new Address class in a single new column of an equivalent user-defined SQL data type. Our solution for this problem has several columns, of vendor-defined SQL types (such as boolean, numeric, and string data types). Considering the granularity of our tables again, the USER table is usually defined as follows: create table USER ( USERNAME VARCHAR(15) NOT NULL PRIMARY KEY, NAME VARCHAR(50) NOT NULL, ADDRESS_STREET VARCHAR(50), ADDRESS_CITY VARCHAR(15), ADDRESS_STATE VARCHAR(15), ADDRESS_ZIPCODE VARCHAR(5), ADDRESS_COUNTRY VARCHAR(15) ) This leads to the following observation: Classes in our domain object model come in a range of different levels of granularity—from coarse-grained entity classes like Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>
10 CHAPTER 1 Understanding object/relational persistenc User,to finer grained classes like Address,right down to simple string-valued properties such as zipcode In contrast,just two levels of granularity are visible at the level of the database: tables such as UsER,along with scalar columns such as ADDRESS_ZIPCODE.This obvi ously isn't as flexible as our Java type system.Many simple persistence mechanisms fail to recognize this mismatch and so end up forcing the less flexible representa- tion upon the object model.We've seen countless User classes with properties named zipcode! It turns out that the granularity problem isn't especially difficult to solve Indeed.we probably wouldn't even list it were it not for the fact that it's visible n so many existing systems.We describe the solution to this problem in chapter 3. section 3.5,"Fine-grained object models." A much more difficult and interesting problem arises when we consider domain object models that use inheritance,a feature of object-oriented design we might use to bill the users of our e-commerce application in new and interesting ways. 1.2.2 The problem of subtypes In Java,we implement inheritance using super-and subclasses.To illustrate why this can present a mismatch problem,let's continue to build our example.Let's add to our e-commerce application so that we now can accept not only bank account billing.but also credit and debit cards.We therefore have several meth- ods to bill a user account.The most natural way to reflect this change in our object model is to use inheritance for the BillingDetails class. We might have an abstract BillingDetails superclass along with several con- crete subclasses:Creditcard,DirectDebit,Cheque,and so on.Each of these sub classes will define slightly different data (and completely different functionality that acts upon that data).The UML class diagram in figure 1.3 illustrates this object model. We notice immediately that SQL provides no direct support for inheritance.We can't declare that a CREDIT_CARD_DETAILS table is a subtype of BILLING_DETAILs by writing,say,CREATE TABLE CREDIT_CARD_DETAILS EXTENDS BILLING_DETAILS (...) User 1 BillingDetails Figure 13 CreditCard BankAccount
10 CHAPTER 1 Understanding object/relational persistence User, to finer grained classes like Address, right down to simple String-valued properties such as zipcode. In contrast, just two levels of granularity are visible at the level of the database: tables such as USER, along with scalar columns such as ADDRESS_ZIPCODE. This obviously isn’t as flexible as our Java type system. Many simple persistence mechanisms fail to recognize this mismatch and so end up forcing the less flexible representation upon the object model. We’ve seen countless User classes with properties named zipcode! It turns out that the granularity problem isn’t especially difficult to solve. Indeed, we probably wouldn’t even list it, were it not for the fact that it’s visible in so many existing systems. We describe the solution to this problem in chapter 3, section 3.5, “Fine-grained object models.” A much more difficult and interesting problem arises when we consider domain object models that use inheritance, a feature of object-oriented design we might use to bill the users of our e-commerce application in new and interesting ways. 1.2.2 The problem of subtypes In Java, we implement inheritance using super- and subclasses. To illustrate why this can present a mismatch problem, let’s continue to build our example. Let’s add to our e-commerce application so that we now can accept not only bank account billing, but also credit and debit cards. We therefore have several methods to bill a user account. The most natural way to reflect this change in our object model is to use inheritance for the BillingDetails class. We might have an abstract BillingDetails superclass along with several concrete subclasses: CreditCard, DirectDebit, Cheque, and so on. Each of these subclasses will define slightly different data (and completely different functionality that acts upon that data). The UML class diagram in figure 1.3 illustrates this object model. We notice immediately that SQL provides no direct support for inheritance. We can’t declare that a CREDIT_CARD_DETAILS table is a subtype of BILLING_DETAILS by writing, say, CREATE TABLE CREDIT_CARD_DETAILS EXTENDS BILLING_DETAILS (...). Figure 1.3 Using inheritance for different billing strategies Licensed to Jose Carlos Romero Figueroa <jose.romero@galicia.seresco.es>