1042 OBJECT PERSISTENCE AND DATABASES $31.3 Ifyou happen to need en-masse conversion,an on-the-fly mechanism will trivially let you do it:simply write a small system that retrieves all the existing objects using the new classes,applying on-the-fly conversion as needed,and stores everything. On-the-fly object conversion The mechanics of on-the-fly conversion can be tricky;we must be particularly careful to get the details right,lest we end up with corrupted objects and corrupted databases. First,an application that retrieves an object and has a different version of its generating class may not have the rights to update the stored objects,which may be just as well since other applications may still use the old version.This is not,however,a new problem.What counts is that the objects manipulated by the application be consistent with their own class descriptions;an on-the-fly conversion mechanism will ensure this property.Whether to write back the converted object to the database is a separate question -a classical question of access privilege,which arises as soon as several applications,or even several sessions of the same application,can access the same persistent data. Database systems,object-oriented or not,have proposed various solutions Regardless of write-back aspects,the newer and perhaps more challenging problem is how each application will deal with an obsolete object.Schema evolution involves three separate issues-detection,notification and correction: Detection is the task of catching object mismatches (cases in which a retrieved object is obsolete)at retrieval time. Notification is the task of making the retrieving system aware of the object mismatch,so that it will be able to react appropriately,rather than continuing with an inconsistent object(a likely cause of major trouble ahead!) Correction is the task,for the retrieving system,of bringing the mismatched object to a consistent state that will make it a correct instance of the new version of its class -a citizen,or at least a permanent resident,of its system of adoption. All three problems are delicate.Fortunately,it is possible to address them separately. Detection We can define two general categories of detection policy:nominal and structural. In both cases the problem is to detect a mismatch between two versions of an object's generating class:the version used by the system that stored the object,and the version used by the system which retrieves it. In the nominal approach,each class version is identified by a version name.This assumes some kind of registration mechanism,which may have two variants: If you are using a configuration management system,you can register each new version of the class and get a version name in return (or specify the version name yourself)
1042 OBJECT PERSISTENCE AND DATABASES §31.3 If you happen to need en-masse conversion, an on-the-fly mechanism will trivially let you do it: simply write a small system that retrieves all the existing objects using the new classes, applying on-the-fly conversion as needed, and stores everything. On-the-fly object conversion The mechanics of on-the-fly conversion can be tricky; we must be particularly careful to get the details right, lest we end up with corrupted objects and corrupted databases. First, an application that retrieves an object and has a different version of its generating class may not have the rights to update the stored objects, which may be just as well since other applications may still use the old version. This is not, however, a new problem. What counts is that the objects manipulated by the application be consistent with their own class descriptions; an on-the-fly conversion mechanism will ensure this property. Whether to write back the converted object to the database is a separate question — a classical question of access privilege, which arises as soon as several applications, or even several sessions of the same application, can access the same persistent data. Database systems, object-oriented or not, have proposed various solutions Regardless of write-back aspects, the newer and perhaps more challenging problem is how each application will deal with an obsolete object. Schema evolution involves three separate issues — detection, notification and correction: • Detection is the task of catching object mismatches (cases in which a retrieved object is obsolete) at retrieval time. • Notification is the task of making the retrieving system aware of the object mismatch, so that it will be able to react appropriately, rather than continuing with an inconsistent object (a likely cause of major trouble ahead!). • Correction is the task, for the retrieving system, of bringing the mismatched object to a consistent state that will make it a correct instance of the new version of its class — a citizen, or at least a permanent resident, of its system of adoption. All three problems are delicate. Fortunately, it is possible to address them separately. Detection We can define two general categories of detection policy: nominal and structural. In both cases the problem is to detect a mismatch between two versions of an object’s generating class: the version used by the system that stored the object, and the version used by the system which retrieves it. In the nominal approach, each class version is identified by a version name. This assumes some kind of registration mechanism, which may have two variants: • If you are using a configuration management system, you can register each new version of the class and get a version name in return (or specify the version name yourself)
$31.3 SCHEMA EVOLUTION 1043 More automatic schemes are possible,similar to the automatic identification facility of Microsoft's OLE 2,or the techniques used to assign "dynamic IP addresses"to computers on the Internet (for example a laptop that you plug in temporarily into a new network).These techniques are based on random number assignments,with numbers so large as to make the likelihood of a clash infinitesimal. Either solution requires some kind of central registry.If you want to avoid the resulting hassle,you will have to rely on the structural approach.The idea here is to associate with each class version a class descriptor deduced from the actual structure of the class,as defined by the class declaration,and to make sure that whenever a persistent mechanism stores objects it also stores the associated class descriptors.(Of course if you store many instances ofa class you will only need to store one copy of the class descriptor. Then the detection mechanism is simple:just compare the class descriptor ofeach retrieved object with the new class descriptor.If they are different,you have an object mismatch. What goes into a class descriptor?There is some flexibility;the answer is a tradeoff between efficiency and reliability.For efficiency,you will not want to waste too much space for keeping class information in the stored structure,or too much time for comparing descriptors at retrieval time;but for reliability you will want to minimize the risk of missing an object mismatch-of treating a retrieved object as up-to-date if it is in fact obsolete.Here are various possible strategies: CI.At one extreme,the class descriptor could just be the class name.This is generally insufficient:if the generator of an object in the storing system has the same name as a class in the retrieving system,we will accept the object even though the two classes may be totally incompatible.Trouble will inevitably follow. C2.At the other extreme,we might use as class descriptor the entire class text-perhaps not as a string but in an appropriate internal form(abstract syntax tree).This is clearly the worst solution for efficiency,both in space occupation and in descriptor comparison time.But it may not even be right for reliability,since some class changes are harmless.Assume for example the new class text has added a routine, but has not changed any attribute or invariant clause.Then nothing bad can happen if we consider a retrieved object up-to-date;but if we detect an object mismatch we may cause some unwarranted trouble(such as an exception)in the retrieving system. C3.A more realistic approach is to make the class descriptor include the class name and the list of its attributes,each characterized by its name and its type.As compared to the nominal approach,there is still the risk that two completely different classes might have both the same name and the same attributes,but (unlike in case Cl) such chance clashes are extremely unlikely to happen in practice. C4.A variation on C3 would include not just the attribute list but also the whole class invariant.With the invariant you should be assured that the addition or removal of a routine,which will not yield a detected object mismatch,is harmless,since if it changed the semantics of the class it would affect the invariant. C3 is the minimum reasonable policy,and in usual cases seems a good tradeoff,at least to start
§31.3 SCHEMA EVOLUTION 1043 • More automatic schemes are possible, similar to the automatic identification facility of Microsoft’s OLE 2, or the techniques used to assign “dynamic IP addresses” to computers on the Internet (for example a laptop that you plug in temporarily into a new network). These techniques are based on random number assignments, with numbers so large as to make the likelihood of a clash infinitesimal. Either solution requires some kind of central registry. If you want to avoid the resulting hassle, you will have to rely on the structural approach. The idea here is to associate with each class version a class descriptor deduced from the actual structure of the class, as defined by the class declaration, and to make sure that whenever a persistent mechanism stores objects it also stores the associated class descriptors. (Of course if you store many instances of a class you will only need to store one copy of the class descriptor.) Then the detection mechanism is simple: just compare the class descriptor of each retrieved object with the new class descriptor. If they are different, you have an object mismatch. What goes into a class descriptor? There is some flexibility; the answer is a tradeoff between efficiency and reliability. For efficiency, you will not want to waste too much space for keeping class information in the stored structure, or too much time for comparing descriptors at retrieval time; but for reliability you will want to minimize the risk of missing an object mismatch — of treating a retrieved object as up-to-date if it is in fact obsolete. Here are various possible strategies: C1 • At one extreme, the class descriptor could just be the class name. This is generally insufficient: if the generator of an object in the storing system has the same name as a class in the retrieving system, we will accept the object even though the two classes may be totally incompatible. Trouble will inevitably follow. C2 • At the other extreme, we might use as class descriptor the entire class text — perhaps not as a string but in an appropriate internal form (abstract syntax tree). This is clearly the worst solution for efficiency, both in space occupation and in descriptor comparison time. But it may not even be right for reliability, since some class changes are harmless. Assume for example the new class text has added a routine, but has not changed any attribute or invariant clause. Then nothing bad can happen if we consider a retrieved object up-to-date; but if we detect an object mismatch we may cause some unwarranted trouble (such as an exception) in the retrieving system. C3 • A more realistic approach is to make the class descriptor include the class name and the list of its attributes, each characterized by its name and its type. As compared to the nominal approach, there is still the risk that two completely different classes might have both the same name and the same attributes, but (unlike in case C1) such chance clashes are extremely unlikely to happen in practice. C4 • A variation on C3 would include not just the attribute list but also the whole class invariant. With the invariant you should be assured that the addition or removal of a routine, which will not yield a detected object mismatch, is harmless, since if it changed the semantics of the class it would affect the invariant. C3 is the minimum reasonable policy, and in usual cases seems a good tradeoff, at least to start
1044 OBJECT PERSISTENCE AND DATABASES $31.3 Notification What should happen when the detection mechanism,nominal or structural,has caught an object mismatch? We want the retrieving system to know,so that it will be able to take the appropriate correction actions.A library mechanism will address the problem.Class GENERAL (ancestor of all classes)must include a procedure correct mismatch is correct in this proce- dure name is not an do adjective but a verb, ...See full version below .. asin“Correct this mismatch,fast!".See end “Grammatical cate- gories".page 881. with the rule that any detection of an object mismatch will cause a call to correct mismatch on the temporarily retrieved version of the object.Any class can redefine the default version of correct mismatch;like a creation procedure,and like any redefinition of the default exception handling procedure default rescue,any redefinition of correct mismatch must ensure the invariant of the class. What should the default version of correct mismatch do?It may be tempting,in the name of unobtrusiveness,to give it an empty body.But this is not appropriate,since it would mean that by default object retrieval mismatches will be ignored-leading to all kinds of possible abnormal behavior.The better global default is to raise an exception: correct mismatch is --Handle object retrieval mismatch. do raise mismatch exception end where the procedure called in the body does what its name suggests.It might cause some "THE GLOBAL unexpected exceptions,but this is better than letting mismatches go through undetected. INHERITANCE A project that wants to override this default behavior,for example to execute a null STRUCTURE”, page 580. instruction rather than raise an exception,can always redefine correct mismatch,at its own risk,in class ANY.(As you will remember,developer-defined classes inherit from GENERAL not directly but through ANY,which a project or installation can customize.) For more flexibility,there is also a feature mismatch information of type ANY,defined as a once function,and a procedure set_mismatch_information (info:ANY)which resets its value.This makes it possible to provide correct_mismatch with more information,for example about the various preceding versions of a class. If you do expect object mismatches for a certain class,you will not want the default exception behavior for that class:instead you will redefine correct mismatch so as to update the retrieved object.This is our last task:correction
1044 OBJECT PERSISTENCE AND DATABASES §31.3 Notification What should happen when the detection mechanism, nominal or structural, has caught an object mismatch? We want the retrieving system to know, so that it will be able to take the appropriate correction actions. A library mechanism will address the problem. Class GENERAL (ancestor of all classes) must include a procedure correct_mismatch is do …See full version below … end with the rule that any detection of an object mismatch will cause a call to correct_mismatch on the temporarily retrieved version of the object. Any class can redefine the default version of correct_mismatch; like a creation procedure, and like any redefinition of the default exception handling procedure default_rescue, any redefinition of correct_ mismatch must ensure the invariant of the class. What should the default version of correct_mismatch do? It may be tempting, in the name of unobtrusiveness, to give it an empty body. But this is not appropriate, since it would mean that by default object retrieval mismatches will be ignored — leading to all kinds of possible abnormal behavior. The better global default is to raise an exception: correct_mismatch is -- Handle object retrieval mismatch. do raise_mismatch_exception end where the procedure called in the body does what its name suggests. It might cause some unexpected exceptions, but this is better than letting mismatches go through undetected. A project that wants to override this default behavior, for example to execute a null instruction rather than raise an exception, can always redefine correct_mismatch, at its own risk, in class ANY. (As you will remember, developer-defined classes inherit from GENERAL not directly but through ANY, which a project or installation can customize.) For more flexibility, there is also a feature mismatch_information of type ANY, defined as a once function, and a procedure set_mismatch_information (info: ANY) which resets its value. This makes it possible to provide correct_mismatch with more information, for example about the various preceding versions of a class. If you do expect object mismatches for a certain class, you will not want the default exception behavior for that class: instead you will redefine correct_mismatch so as to update the retrieved object. This is our last task: correction. correct in this procedure name is not an adjective but a verb, as in “Correct this mismatch, fast!”. See “Grammatical categories”, page 881. “THE GLOBAL INHERITANCE STRUCTURE”, page 580
$31.3 SCHEMA EVOLUTION 1045 Correction How do we correct a object that has been found,upon retrieval,to cause a mismatch?The answer requires a careful analysis,and a more sophisticated approach than has usually been implemented by existing systems or proposed in the literature. The precise situation is this:the retrieval mechanism (through feature retrieved of class STORABLE,a database operation,or any other available primitive)has created a new object in the retrieving system,deduced from a stored object with the same generating class;but it has also detected a mismatch.The new object is in a temporary state and may be inconsistent;it may for example have lost a field which was present in the stored object,or gained a field not present in the original.Think of it as a foreigner without a visa. -The attribute for this field Object 0.0 was not in the stored mismatch version,the field has been The attributes for these two initialized to the default fields have not changed from value for the attribute's type. the stored object's generating class to the new version. The stored object had a field here,but the new version of the class has removed the corresponding attribute; so the field has been lost. See“The role of cre. Such an object state is similar to the intermediate state of an object being created- ation procedures”, outside of any persistence consideration-by a creation instruction !x.make(...),just page 372. after the object's memory cell has been allocated and initialized to default values,but just before make has been called.At that stage the object has all the required components but is not yet ready for acceptance by the community since it may have inconsistent values in some of its fields;it is,as we saw,the official purpose of a creation procedure make to override default initializations as may be needed to ensure the invariant. Let us assume for simplicity that the detection technique is structural and based on attributes (that is to say,policy C3 as defined earlier),although the discussion will transpose to the other solutions,nominal or structural.The mismatch is a consequence of a change in the attribute properties of the class.We may reduce it to a combination of any number of attribute additions and attribute removals.(If a class change is the replacement of the type of an attribute,we can consider it as a removal followed by an addition.)The figure above shows one addition and one removal. Attribute removal does not raise any apparent difficulty:if the new class does not include a certain attribute present in the old class,the corresponding object fields are not needed any more and we may simply discard them.In fact procedure correct mismatch does not need to do anything for such fields,since the retrieval mechanism,when creating a tentative instance of the new class,will have discarded them;the figure shows this for the bottom field-rather,non-field-of the illustrated object
§31.3 SCHEMA EVOLUTION 1045 Correction How do we correct a object that has been found, upon retrieval, to cause a mismatch? The answer requires a careful analysis, and a more sophisticated approach than has usually been implemented by existing systems or proposed in the literature. The precise situation is this: the retrieval mechanism (through feature retrieved of class STORABLE, a database operation, or any other available primitive) has created a new object in the retrieving system, deduced from a stored object with the same generating class; but it has also detected a mismatch. The new object is in a temporary state and may be inconsistent; it may for example have lost a field which was present in the stored object, or gained a field not present in the original. Think of it as a foreigner without a visa. Such an object state is similar to the intermediate state of an object being created — outside of any persistence consideration — by a creation instruction !! x ● make (…), just after the object’s memory cell has been allocated and initialized to default values, but just before make has been called. At that stage the object has all the required components but is not yet ready for acceptance by the community since it may have inconsistent values in some of its fields; it is, as we saw, the official purpose of a creation procedure make to override default initializations as may be needed to ensure the invariant. Let us assume for simplicity that the detection technique is structural and based on attributes (that is to say, policy C3 as defined earlier), although the discussion will transpose to the other solutions, nominal or structural. The mismatch is a consequence of a change in the attribute properties of the class. We may reduce it to a combination of any number of attribute additions and attribute removals. (If a class change is the replacement of the type of an attribute, we can consider it as a removal followed by an addition.) The figure above shows one addition and one removal. Attribute removal does not raise any apparent difficulty: if the new class does not include a certain attribute present in the old class, the corresponding object fields are not needed any more and we may simply discard them. In fact procedure correct_mismatch does not need to do anything for such fields, since the retrieval mechanism, when creating a tentative instance of the new class, will have discarded them; the figure shows this for the bottom field — rather, non-field — of the illustrated object. Object mismatch The attribute for this field was not in the stored version; the field has been initialized to the default value for the attribute’s type. The stored object had a field here, but the new version of the class has removed the corresponding attribute; so the field has been lost. The attributes for these two fields have not changed from the stored object’s generating class to the new version. 0.0 See “The role of creation procedures”, page 372