31 Object persistence and databases inrepiciomeas reating and manipulating an number of objects.What happens to these objects when the current execution terminates? Transient objects will disappear with the current session;but many applications also need persistent objects,which will stay around from session to session.Persistent objects may need to be shared by several applications,raising the need for databases. In this overview of persistence issues and solutions we will examine the three approaches that O-O developers have at their disposal for manipulating persistent objects. They can rely on persistence mechanisms from the programming language and development environment to get object structures to and from permanent storage.They can combine object technology with databases of the most commonly available kind(not O-O):relational databases.Or they can use one of the newer object-oriented database systems,which undertake to transpose to databases the basic ideas of object technology. This chapter describes these techniques in turn,providing an overview of the technology of O-O databases with emphasis on two of the best-known products.It ends with a more futuristic discussion of the fate of database ideas in an O-O context. 31.1 PERSISTENCE FROM THE LANGUAGE For many persistence needs it suffices to have,associated with the development environment,a set of mechanisms for storing objects in files and retrieving them from files.For simple objects such as integers and characters,we can use input-output facilities similar to those of traditional programming. Storing and retrieving object structures See“Deep storage: As soon as composite objects enter the picture,it is not sufficient to store and retrieve a first view ofpersis- individual objects since they may contain references to other objects,and an object tence",page 250. deprived of its dependents would be inconsistent.This observation led us in an earlier chapter to the Persistence Closure principle,stating that any storage and retrieval mechanism must handle,together with an object,all its direct and indirect dependents.The following figure served to illustrate the issue:
31 Object persistence and databases Executing an object-oriented application means creating and manipulating a certain number of objects. What happens to these objects when the current execution terminates? Transient objects will disappear with the current session; but many applications also need persistent objects, which will stay around from session to session. Persistent objects may need to be shared by several applications, raising the need for databases. In this overview of persistence issues and solutions we will examine the three approaches that O-O developers have at their disposal for manipulating persistent objects. They can rely on persistence mechanisms from the programming language and development environment to get object structures to and from permanent storage. They can combine object technology with databases of the most commonly available kind (not O-O): relational databases. Or they can use one of the newer object-oriented database systems, which undertake to transpose to databases the basic ideas of object technology. This chapter describes these techniques in turn, providing an overview of the technology of O-O databases with emphasis on two of the best-known products. It ends with a more futuristic discussion of the fate of database ideas in an O-O context. 31.1 PERSISTENCE FROM THE LANGUAGE For many persistence needs it suffices to have, associated with the development environment, a set of mechanisms for storing objects in files and retrieving them from files. For simple objects such as integers and characters, we can use input-output facilities similar to those of traditional programming. Storing and retrieving object structures As soon as composite objects enter the picture, it is not sufficient to store and retrieve individual objects since they may contain references to other objects, and an object deprived of its dependents would be inconsistent. This observation led us in an earlier chapter to the Persistence Closure principle, stating that any storage and retrieval mechanism must handle, together with an object, all its direct and indirect dependents. The following figure served to illustrate the issue: See “Deep storage: a first view of persistence”, page 250
1038 OBJECT PERSISTENCE AND DATABASES $31.1 The need for persistence name "Almaviva" closure landlord loved_one ▲(PERSONI) 02 037 name "Figaro" "Susanna" name landlord landlord loved_one loved_one (PERSONI) (PERSONI) The Persistence Closure principle stated that any mechanism that stores Ol must also store all the objects to which it refers,directly or indirectly;otherwise when you retrieve the structure you would get a meaningless value ("dangling reference")in the loved one field for O1. We saw the mechanisms of class STORABLE which provide the corresponding facilities:store to store an object structure and retrieved to access it back.This is a precious mechanism,whose presence in an O-O environment is by itself a major advantage over traditional environments.The earlier discussion gave a typical example of use:implementing the SAVE facility of an editor.Here is another,from ISE's own practice.Our compiler performs several passes on representations of the software text. The first pass creates an internal representation,known as an Abstract Syntax Tree(AST). Roughly speaking,the task of the subsequent passes is to add more and more semantic information to the AST (to "decorate the tree")until there is enough to generate the compiler's target code.Each pass finishes by a store;the next pass starts by retrieving the AST through retrieved. The STORABLE mechanism works not only on files but also on network connections such as sockets;it indeed lies at the basis of the Net client-server library. Storable format variants Procedure store has several variants.One,basic store,stores objects to be retrieved by the same system running on the same machine architecture,as part of the same execution or ofa later one.These assumptions make it possible to use the most compact format possible for representing objects. Another variant,independent store,removes all these assumptions;the object representation is platform-independent and system-independent.It consequently takes a little more space,since it must use a portable data representation for floating-point and other numerical values,and must include some elementary information about the classes of the system.But it is precious for client-server systems,which must exchange
1038 OBJECT PERSISTENCE AND DATABASES §31.1 The Persistence Closure principle stated that any mechanism that stores O1 must also store all the objects to which it refers, directly or indirectly; otherwise when you retrieve the structure you would get a meaningless value (“dangling reference”) in the loved_one field for O1. We saw the mechanisms of class STORABLE which provide the corresponding facilities: store to store an object structure and retrieved to access it back. This is a precious mechanism, whose presence in an O-O environment is by itself a major advantage over traditional environments. The earlier discussion gave a typical example of use: implementing the SAVE facility of an editor. Here is another, from ISE’s own practice. Our compiler performs several passes on representations of the software text. The first pass creates an internal representation, known as an Abstract Syntax Tree (AST). Roughly speaking, the task of the subsequent passes is to add more and more semantic information to the AST (to “decorate the tree”) until there is enough to generate the compiler’s target code. Each pass finishes by a store; the next pass starts by retrieving the AST through retrieved. The STORABLE mechanism works not only on files but also on network connections such as sockets; it indeed lies at the basis of the Net client-server library. Storable format variants Procedure store has several variants. One, basic_store, stores objects to be retrieved by the same system running on the same machine architecture, as part of the same execution or of a later one. These assumptions make it possible to use the most compact format possible for representing objects. Another variant, independent_store, removes all these assumptions; the object representation is platform-independent and system-independent. It consequently takes a little more space, since it must use a portable data representation for floating-point and other numerical values, and must include some elementary information about the classes of the system. But it is precious for client-server systems, which must exchange (PERSON1) name "Almaviva" landlord loved_one (PERSON1) name "Figaro" landlord loved_one (PERSON1) "Susanna" name landlord loved_one O1 O2 O3 The need for persistence closure
$31.2 BEYOND PERSISTENCE CLOSURE 1039 potentially large and complex collections of objects among machines of widely different architectures,running entirely different systems.For example a workstation server and a PC client can run two different applications and communicate through the Net library, with the server application performing the fundamental computations and the client application taking care of the user interface thanks to a graphical library such as Vision. Note that the storing part is the only one to require several procedures-basic store, independent store.Even though the implementation of retrieval is different for each format,you will always use a single feature retrieved,whose implementation will detect the format actually used by the file or network data being retrieved,and will automatically apply the appropriate retrieval algorithm. 31.2 BEYOND PERSISTENCE CLOSURE The Persistence Closure principle is,in theory,applicable to all forms of persistence.It makes it possible,as we saw,to preserve the consistency of objects stored and retrieved. In some practical cases,however,you may need to adapt the data structure before letting it be applied by mechanisms such as STORABLE or the O-0 database tools reviewed later in this chapter.Otherwise you may end up storing more than you want. The problem arises in particular because of shared structures,as in this setup: Small structure CITY structure with reference to big shared address structure FAMILY structure A relatively small data structure needs to be archived.Because it contains one or more references to a large shared structure,the Persistence Closure principle requires archiving that structure too.In some cases you may not want this.For example,as illustrated by the figure,you could be doing some genealogical research,or other processing on objects representing persons;a person object might,through an address field,reference a much bigger set of objects representing geographical information.A similar situation occurs in ISE's ArchiText product,which enables users to manipulate structured documents,such as programs or specifications.Each document,like the FAMILY structure in the figure, contains a reference to a structure representing the underlying grammar,playing the role ofthe C/TY structure;we may want to store a document but not the grammar,which already exists elsewhere and may be shared by many documents
§31.2 BEYOND PERSISTENCE CLOSURE 1039 potentially large and complex collections of objects among machines of widely different architectures, running entirely different systems. For example a workstation server and a PC client can run two different applications and communicate through the Net library, with the server application performing the fundamental computations and the client application taking care of the user interface thanks to a graphical library such as Vision. Note that the storing part is the only one to require several procedures — basic_store, independent_store. Even though the implementation of retrieval is different for each format, you will always use a single feature retrieved, whose implementation will detect the format actually used by the file or network data being retrieved, and will automatically apply the appropriate retrieval algorithm. 31.2 BEYOND PERSISTENCE CLOSURE The Persistence Closure principle is, in theory, applicable to all forms of persistence. It makes it possible, as we saw, to preserve the consistency of objects stored and retrieved. In some practical cases, however, you may need to adapt the data structure before letting it be applied by mechanisms such as STORABLE or the O-O database tools reviewed later in this chapter. Otherwise you may end up storing more than you want. The problem arises in particular because of shared structures, as in this setup: A relatively small data structure needs to be archived. Because it contains one or more references to a large shared structure, the Persistence Closure principle requires archiving that structure too. In some cases you may not want this. For example, as illustrated by the figure, you could be doing some genealogical research, or other processing on objects representing persons; a person object might, through an address field, reference a much bigger set of objects representing geographical information. A similar situation occurs in ISE’s ArchiText product, which enables users to manipulate structured documents, such as programs or specifications. Each document, like the FAMILY structure in the figure, contains a reference to a structure representing the underlying grammar, playing the role of the CITY structure; we may want to store a document but not the grammar, which already exists elsewhere and may be shared by many documents. address FAMILY structure CITY structure Small structure with reference to big shared structure
1040 OBJECT PERSISTENCE AND DATABASES $31.2 In such cases you may want to"cut out"the references to the shared structure before storing the referring structure.This is,however,a delicate process.First,you must as always make sure that at retrieval time the objects will still be consistent-satisfy their invariants. But there is also a practical problem:to avoid complication and errors,you do not really want to modify the original structure;only in the stored version should references be cut out. Once again the techniques of object-oriented software construction provide an See“Deferred elegant solution,based on the ideas of behavior class reviewed in the discussion of classes as partial inheritance.One of the versions of the storing procedure,custom independent store,has implementations: the notion of behav- the same effect as independent store by default,but also lets any descendant of a library iorc☑ass”,page504. class ACTIONABLE redefine a number of procedures which do nothing by default,such as pre store which will be executed just before an object is stored and post store which will be executed after.So you can for example have pre store perform preserve;address :Void where preserve,also a feature of ACT/ONABLE,copies the object safely somewhere. Then post action will perform a call to restore which restores the object from the preserved copy. For this common case it is in fact possible to obtain the same effect through a call of the form store ignore ("address") where ignore takes a field name as argument.Since the implementation of store ignore may simply skip the field,avoiding the two-way copy of preserve and restore,it will be more efficient in this case,but the pre store-post store mechanism is more general, allowing any actions before and after storage.Again,you must make sure that these actions will not adversely affect the objects. You may in fact use a similar mechanism to remove an inconsistency problem arising at retrieval time;it suffices to redefine the procedure post retrieve which will be executed just before the retrieved object rejoins the community of approved objects.For example an application might redefine post retrieve,in the appropriate class inheriting from ACT/ONABLE,to execute something like address:=my_city_structure.address_value (... hence making the object presentable again before it has had the opportunity to violate its class invariant or any informal consistency constraint. There are clearly some rules associated with the ACT/ONABLE mechanism;in particular,pre store must not perform any change of the data structure unless post store corrects it immediately thereafter.You must also make sure that post retrieve will perform the necessary actions (often the same as those of post store)to correct any inconsistency introduced into the stored structure by pre store.Used under these rules,the mechanism lets you remain faithful to the spirit of the Persistent Closure principle while making its application more flexible
1040 OBJECT PERSISTENCE AND DATABASES §31.2 In such cases you may want to “cut out” the references to the shared structure before storing the referring structure. This is, however, a delicate process. First, you must as always make sure that at retrieval time the objects will still be consistent — satisfy their invariants. But there is also a practical problem: to avoid complication and errors, you do not really want to modify the original structure; only in the stored version should references be cut out. Once again the techniques of object-oriented software construction provide an elegant solution, based on the ideas of behavior class reviewed in the discussion of inheritance. One of the versions of the storing procedure, custom_independent_store, has the same effect as independent_store by default, but also lets any descendant of a library class ACTIONABLE redefine a number of procedures which do nothing by default, such as pre_store which will be executed just before an object is stored and post_store which will be executed after. So you can for example have pre_store perform preserve; address := Void where preserve, also a feature of ACTIONABLE, copies the object safely somewhere. Then post_action will perform a call to restore which restores the object from the preserved copy. For this common case it is in fact possible to obtain the same effect through a call of the form store_ignore ("address") where ignore takes a field name as argument. Since the implementation of store_ignore may simply skip the field, avoiding the two-way copy of preserve and restore, it will be more efficient in this case, but the pre_store-post_store mechanism is more general, allowing any actions before and after storage. Again, you must make sure that these actions will not adversely affect the objects. You may in fact use a similar mechanism to remove an inconsistency problem arising at retrieval time; it suffices to redefine the procedure post_retrieve which will be executed just before the retrieved object rejoins the community of approved objects. For example an application might redefine post_retrieve, in the appropriate class inheriting from ACTIONABLE, to execute something like address := my_city_structure ● address_value (…) hence making the object presentable again before it has had the opportunity to violate its class invariant or any informal consistency constraint. There are clearly some rules associated with the ACTIONABLE mechanism; in particular, pre_store must not perform any change of the data structure unless post_store corrects it immediately thereafter. You must also make sure that post_retrieve will perform the necessary actions (often the same as those of post_store) to correct any inconsistency introduced into the stored structure by pre_store. Used under these rules, the mechanism lets you remain faithful to the spirit of the Persistent Closure principle while making its application more flexible. See “Deferred classes as partial implementations: the notion of behavior class”, page 504
$31.3 SCHEMA EVOLUTION 1041 31.3 SCHEMA EVOLUTION A general issue arises in all approaches to O-O persistence.Classes can change.What if you change a class of which instances exist somewhere in a persistent store?This is known as the schema evolution problem. The word schema comes from the relational database world,where it describes the architecture of a database:its set of relations (as defined in the next section)with,for every relation,what we would call its type-number of fields and type of each.In an O-O context the schema will also be the set of types,given here by the classes. Although some development environments and database systems have provided interesting tools for O-O schema evolution,none has yet provided a fully satisfactory solution.Let us define the components of a comprehensive approach. An object's generat- Some precise terminology will be useful.Schema evolution occurs if at least one ing class(or genera-class used by a system that attempts to retrieve some objects (the retrieving system) tor)is the class of which it is a direct differs from its counterpart in the system that stored these objects(the storing system). instance.See"Basic Object retrieval mismatch,or just object mismatch for short,occurs when the retrieving form",page 219. system actually retrieves a particular object whose own generating class was different in the storing system.Object mismatch is an individual consequence,for one particular object,of the general phenomenon of schema evolution for one or more classes. Remember that in spite of the terms“storing system”and“retrieving system”this whole discussion is applicable not only to storage and retrieval using files or databases,but also to object transmission over a network,as with the Net library.In such a case the more accurate terms would be“sending system”and“receiving system”. Exercise E31.1.page To keep the discussion simple,we will make the usual assumption that a software 1062,asks you to system does not change while it is being executed.This means in particular that all the study the conse- instances of a class stored by a particular system execution refer to the same version of the quences of removing this assumption. class;so at retrieval time either all of them will produce an object mismatch,or none of them will.This assumption is not too restrictive;note in particular that it does not rule out the case of a database that contains instances ofmany different versions of the same class, produced by different system executions. Naive approaches We can rule out two extreme approaches to schema evolution: You might be tempted to forsake previously stored objects (schema revolution!). The developers of the new application will like the idea,which makes their life so much easier.But the users of the application will not be amused You may offer a migration path from old format to new,requiring a one-time,en masse conversion of old objects.Although this solution may be acceptable in some cases,it will not do for a large persistent store or one that must be available continuously. What we really need is a way to convert old objects on the fly as they are retrieved or updated.This is the most general solution,and the only one considered in the rest of this discussion
§31.3 SCHEMA EVOLUTION 1041 31.3 SCHEMA EVOLUTION A general issue arises in all approaches to O-O persistence. Classes can change. What if you change a class of which instances exist somewhere in a persistent store? This is known as the schema evolution problem. The word schema comes from the relational database world, where it describes the architecture of a database: its set of relations (as defined in the next section) with, for every relation, what we would call its type — number of fields and type of each. In an O-O context the schema will also be the set of types, given here by the classes. Although some development environments and database systems have provided interesting tools for O-O schema evolution, none has yet provided a fully satisfactory solution. Let us define the components of a comprehensive approach. Some precise terminology will be useful. Schema evolution occurs if at least one class used by a system that attempts to retrieve some objects (the retrieving system) differs from its counterpart in the system that stored these objects (the storing system). Object retrieval mismatch, or just object mismatch for short, occurs when the retrieving system actually retrieves a particular object whose own generating class was different in the storing system. Object mismatch is an individual consequence, for one particular object, of the general phenomenon of schema evolution for one or more classes. Remember that in spite of the terms “storing system” and “retrieving system” this whole discussion is applicable not only to storage and retrieval using files or databases, but also to object transmission over a network, as with the Net library. In such a case the more accurate terms would be “sending system” and “receiving system”. To keep the discussion simple, we will make the usual assumption that a software system does not change while it is being executed. This means in particular that all the instances of a class stored by a particular system execution refer to the same version of the class; so at retrieval time either all of them will produce an object mismatch, or none of them will. This assumption is not too restrictive; note in particular that it does not rule out the case of a database that contains instances of many different versions of the same class, produced by different system executions. Naïve approaches We can rule out two extreme approaches to schema evolution: • You might be tempted to forsake previously stored objects (schema revolution!). The developers of the new application will like the idea, which makes their life so much easier. But the users of the application will not be amused. • You may offer a migration path from old format to new, requiring a one-time, en masse conversion of old objects. Although this solution may be acceptable in some cases, it will not do for a large persistent store or one that must be available continuously. What we really need is a way to convert old objects on the fly as they are retrieved or updated. This is the most general solution, and the only one considered in the rest of this discussion. An object’s generating class (or generator) is the class of which it is a direct instance. See “Basic form”, page 219. Exercise E31.1, page 1062, asks you to study the consequences of removing this assumption