Abstract data types This opened my mind,I started to grasp what it means to use the tool known as algebra.I'll be damned if anyone had ever told me before:over and again Mr.Dupuy [the mathematics teacher]was making pompous sentences on the subject,but not once would he say this simple word:it is a division of labor,which like any division of labor produces miracles, and allows the mind to concentrate all of its forces on just one side ofobjects,on just one of their qualities. What a difference it would have made for us if Mr.Dupuy had told us:This cheese is sof or it is hard,it is white,it is blue;it is old,it is young,it is yours,it is mine,it is light or it is heavy.Of so many qualities let us consider only the weight.Whatever that weight may be, let us call it A.Now,without thinking of the weight any more,let us apply to A everything that we know of quantities. Such a simple thing,yet no one was saying it to us in that faraway province... Stendhal,The Life of Henry Brulard,1836 For abstraction consists only in separating the perceptible qualities of bodies,either from other qualities,or from the bodies to which they apply.Errors arise when this separation is poorly done or wrongly applied:poorly done in philosophical questions,and wronghy applied in physical and mathematical questions.An almost sure way to err in philosophy is to fail to simplify enough the objects under study,and an infallible way to obtain defective results in physics and mathematics is to view the objects as less composite than they are. Denis Diderot,A Letter on the Blind for the Benefit of Those Who Can See,1749 ettingobets play the edo iusofare architectures requre that we describe them adequately.This chapter shows how. You are perhaps impatient to dive into the depths of object technology and explore the details of multiple inheritance,dynamic binding and other joys;then you may at first look at this chapter as an undue delay since it is mostly devoted to the study of some mathematical concepts(although all the mathematics involved is elementary). But in the same way that even the most gifted musician will benefit from learning a little music theory,knowing about abstract data types will help you understand and enjoy the practice of object-oriented analysis,design and programming,however attractive the concepts might already appear without the help of the theory.Since abstract data types
6 Abstract data types This opened my mind, I started to grasp what it means to use the tool known as algebra. I’ll be damned if anyone had ever told me before: over and again Mr. Dupuy [the mathematics teacher] was making pompous sentences on the subject, but not once would he say this simple word: it is a division of labor, which like any division of labor produces miracles, and allows the mind to concentrate all of its forces on just one side of objects, on just one of their qualities. What a difference it would have made for us if Mr. Dupuy had told us: This cheese is soft or it is hard; it is white, it is blue; it is old, it is young; it is yours, it is mine, it is light or it is heavy. Of so many qualities let us consider only the weight. Whatever that weight may be, let us call it A. Now, without thinking of the weight any more, let us apply to A everything that we know of quantities. Such a simple thing; yet no one was saying it to us in that faraway province… Stendhal, The Life of Henry Brulard, 1836. For abstraction consists only in separating the perceptible qualities of bodies, either from other qualities, or from the bodies to which they apply. Errors arise when this separation is poorly done or wrongly applied: poorly done in philosophical questions, and wrongly applied in physical and mathematical questions. An almost sure way to err in philosophy is to fail to simplify enough the objects under study; and an infallible way to obtain defective results in physics and mathematics is to view the objects as less composite than they are. Denis Diderot, A Letter on the Blind for the Benefit of Those Who Can See, 1749. Letting objects play the lead role in our software architectures requires that we describe them adequately. This chapter shows how. You are perhaps impatient to dive into the depths of object technology and explore the details of multiple inheritance, dynamic binding and other joys; then you may at first look at this chapter as an undue delay since it is mostly devoted to the study of some mathematical concepts (although all the mathematics involved is elementary). But in the same way that even the most gifted musician will benefit from learning a little music theory, knowing about abstract data types will help you understand and enjoy the practice of object-oriented analysis, design and programming, however attractive the concepts might already appear without the help of the theory. Since abstract data types
122 ABSTRACT DATA TYPES $6.1 establish the theoretical basis for the entire method,the consequences of the ideas introduced in this chapter will be felt throughout the rest of this book. There is more.As we will see at chapter end,these consequences actually extend beyond the study of software proper,yielding a few principles of intellectual investigation which one may perhaps apply to other disciplines. 6.1 CRITERIA To obtain proper descriptions of objects,we need a method satisfying three conditions: The descriptions should be precise and unambiguous. They should be complete-or at least as complete as we want them in each case (we may decide to leave some details out). They should not be overspecifying. The last point is what makes the answer non-trivial.It is after all easy to be precise, unambiguous and complete if we "spill the beans"by giving out all the details of the objects'representation.But this is usually too much information for the authors of software elements that need to access the objects. This observation is close to the comments that led to the notion of information "Information Hid- hiding.The concern there was that by providing a module's source code (or,more ing".page 51. generally,implementation-related elements)as the primary source of information for the authors of software elements that rely on that module,we may drown them in a flood of details,prevent them from concentrating on their own job,and hamper prospects of smooth evolution.Here the danger is the same if we let modules use a certain data structure on the basis of information that pertains to the structure's representation rather than to its essential properties. 6.2 IMPLEMENTATION VARIATIONS To understand better why the need for abstract data descriptions is so crucial,let us explore further the potential consequences of using physical representation as the basis for describing objects. A well-known and convenient example is the description of stack objects.A stack object serves to pile up and retrieve other objects in a last-in,first-out ("LIFO")manner, the latest inserted element being the first one to be retrieved.The stack is a ubiquitous structure in computing science and in many software systems;the typical compiler or interpreter,for example,is peppered with stacks of many kinds. Stacks,it must be said,are also ubiquitous in didactic presentations of abstract data types, so much so that Edsger Dijkstra is said to have once quipped that"abstract data types are a remarkable theory,whose purpose is to describe stacks".Fair enough.But the notion of abstract data type applies to so many more advanced cases in the rest of this book that I do not feel ashamed of starting with this staple example.It is the simplest I know which includes about every important idea about abstract data types
122 ABSTRACT DATA TYPES §6.1 establish the theoretical basis for the entire method, the consequences of the ideas introduced in this chapter will be felt throughout the rest of this book. There is more. As we will see at chapter end, these consequences actually extend beyond the study of software proper, yielding a few principles of intellectual investigation which one may perhaps apply to other disciplines. 6.1 CRITERIA To obtain proper descriptions of objects, we need a method satisfying three conditions: • The descriptions should be precise and unambiguous. • They should be complete — or at least as complete as we want them in each case (we may decide to leave some details out). • They should not be overspecifying. The last point is what makes the answer non-trivial. It is after all easy to be precise, unambiguous and complete if we “spill the beans” by giving out all the details of the objects’ representation. But this is usually too much information for the authors of software elements that need to access the objects. This observation is close to the comments that led to the notion of information hiding. The concern there was that by providing a module’s source code (or, more generally, implementation-related elements) as the primary source of information for the authors of software elements that rely on that module, we may drown them in a flood of details, prevent them from concentrating on their own job, and hamper prospects of smooth evolution. Here the danger is the same if we let modules use a certain data structure on the basis of information that pertains to the structure’s representation rather than to its essential properties. 6.2 IMPLEMENTATION VARIATIONS To understand better why the need for abstract data descriptions is so crucial, let us explore further the potential consequences of using physical representation as the basis for describing objects. A well-known and convenient example is the description of stack objects. A stack object serves to pile up and retrieve other objects in a last-in, first-out (“LIFO”) manner, the latest inserted element being the first one to be retrieved. The stack is a ubiquitous structure in computing science and in many software systems; the typical compiler or interpreter, for example, is peppered with stacks of many kinds. Stacks, it must be said, are also ubiquitous in didactic presentations of abstract data types, so much so that Edsger Dijkstra is said to have once quipped that “abstract data types are a remarkable theory, whose purpose is to describe stacks”. Fair enough. But the notion of abstract data type applies to so many more advanced cases in the rest of this book that I do not feel ashamed of starting with this staple example. It is the simplest I know which includes about every important idea about abstract data types. “Information Hiding”, page 51
$6.2 IMPLEMENTATION VARIATIONS 123 Stack representations Several possible physical representations exist for stacks: Three possible capacity “Push”operation: representations count :count for a stack count representation [count:=x (ARRAY_UP) representation capacity “Push”operation: representation [free]:=x (ARRAY_DOWN) free:=free-1 free representation “Push”operation: item previous new(n) previous item n.item=x n.previous:=last previous item last n (LINKED) previous ifem The figure illustrates three of the most common representations.Each has been given a name for ease of reference: ARRAY UP:represent a stack through an array representation and an integer count whose value ranges from 0(for an empty stack)to capacity,the size of the array representation;stack elements are stored in the array at indices 1 up to count ARRAY DOWN:like ARRAY UP,but with elements stored from the end of the array rather than from the beginning.Here the integer is called free (it is the index of the highest free array position,or 0 if all positions are occupied)and ranges from capacity for an empty stack down to 0.The stack elements are stored in the array at indices capacity down to free +I. LINKED:a linked representation which stores each stack element in a cell with two fields:item representing the element,and previous containing a pointer to the cell containing the previously pushed element.The representation also needs last,a pointer to the cell representing the top
§6.2 IMPLEMENTATION VARIATIONS 123 Stack representations Several possible physical representations exist for stacks: The figure illustrates three of the most common representations. Each has been given a name for ease of reference: • ARRAY_UP: represent a stack through an array representation and an integer count whose value ranges from 0 (for an empty stack) to capacity, the size of the array representation; stack elements are stored in the array at indices 1 up to count. • ARRAY_DOWN: like ARRAY_UP, but with elements stored from the end of the array rather than from the beginning. Here the integer is called free (it is the index of the highest free array position, or 0 if all positions are occupied) and ranges from capacity for an empty stack down to 0. The stack elements are stored in the array at indices capacity down to free + 1. • LINKED: a linked representation which stores each stack element in a cell with two fields: item representing the element, and previous containing a pointer to the cell containing the previously pushed element. The representation also needs last, a pointer to the cell representing the top. Three possible representations for a stack representation (ARRAY_UP) “Push” operation: count := count + 1 count representation [count] := x capacity 1 representation (ARRAY_DOWN) “Push” operation: representation [free] := x free := free – 1 free capacity 1 (LINKED) “Push” operation: new (n) n ● item := x n ● previous := last last := n item item item item previous previous previous previous last
124 ABSTRACT DATA TYPES $6.2 Next to each representation,the figure shows a program extract (in Pascal-like notation)giving the corresponding implementation for a basic stack operation:pushing an element x onto the top. For the array representations,ARRAY_UP and ARRAY_DOWN,the instructions increase or decrease the top indicator (count or free)and assign x to the corresponding array element.Since these representations support stacks of at most capacity elements, robust implementations should include guards of the respective forms if count capacity then .. if free >0 then... which the figure omits for simplicity. For LINKED,the linked representation,pushing an element requires four operations: create a new cell n(done here with Pascal's new procedure,which allocates space for a new object);assign x to the new cell's ifem field;chain the new cell to the earlier stack top by assigning to its previous field the current value of last;and update last so that it will now be attached to the newly created cell. Although these are the most frequently used stack representations,many others exist. For example if you need two stacks of elements of the same type,and have only limited space available,you may rely on a single array with two integer top markers,count as in ARRAY_UP and free as in ARRAY_DOWN;one of the stacks will grow up and the other will grow down.The representation is full if and only if count=free. capacity Head-to-head Stack 2 representation for two stacks free CO111 representation Stack 1 The advantage,of course,is to lessen the risk of running out of space:with two arrays of capacity n representing stacks under ARRAY UP or ARRAY DOWN,you exhaust the available space whenever either stack reaches n elements;with a single array of size 2n holding two head-to-head stacks,you run out when the combined size reaches 2n,a less likely occurrence if the two stacks grow independently.(For any variable values p and g, max (p+q)<max (p)+max (g).) Each of these and other possible representations is useful in some cases.Choosing one of them as"the"definition of stacks would be a typical case ofoverspecification.Why should we consider ARRAY UP,for example,more representative than LINKED?The most visible properties of ARRAY_UP-the array,the integer count,the upper bound-are irrelevant to an understanding of the underlying structure
124 ABSTRACT DATA TYPES §6.2 Next to each representation, the figure shows a program extract (in Pascal-like notation) giving the corresponding implementation for a basic stack operation: pushing an element x onto the top. For the array representations, ARRAY_UP and ARRAY_DOWN, the instructions increase or decrease the top indicator (count or free) and assign x to the corresponding array element. Since these representations support stacks of at most capacity elements, robust implementations should include guards of the respective forms if count < capacity then … if free > 0 then … which the figure omits for simplicity. For LINKED, the linked representation, pushing an element requires four operations: create a new cell n (done here with Pascal’s new procedure, which allocates space for a new object); assign x to the new cell’s item field; chain the new cell to the earlier stack top by assigning to its previous field the current value of last; and update last so that it will now be attached to the newly created cell. Although these are the most frequently used stack representations, many others exist. For example if you need two stacks of elements of the same type, and have only limited space available, you may rely on a single array with two integer top markers, count as in ARRAY_UP and free as in ARRAY_DOWN; one of the stacks will grow up and the other will grow down. The representation is full if and only if count = free. The advantage, of course, is to lessen the risk of running out of space: with two arrays of capacity n representing stacks under ARRAY_UP or ARRAY_DOWN, you exhaust the available space whenever either stack reaches n elements; with a single array of size 2n holding two head-to-head stacks, you run out when the combined size reaches 2n, a less likely occurrence if the two stacks grow independently. (For any variable values p and q, max (p+q) ≤ max (p) + max (q).) Each of these and other possible representations is useful in some cases. Choosing one of them as “the” definition of stacks would be a typical case of overspecification. Why should we consider ARRAY_UP, for example, more representative than LINKED? The most visible properties of ARRAY_UP — the array, the integer count, the upper bound — are irrelevant to an understanding of the underlying structure. Head-to-head representation for two stacks representation count 1 capacity free Stack 2 Stack 1
$6.2 IMPLEMENTATION VARIATIONS 125 The danger of overspecification Why is it so bad to use a particular representation as specification? “ABOUT SOFT. The results of the Lientz and Swanson maintenance study,which you may recall, WARE MAINTE- give a hint.More than 17%of software costs was found to come from the need to take into NANCE”,I.3.page account changes of data formats.As was noted in the discussion,too many programs are 17. closely tied to the physical structure of the data they manipulate.A method relying on the physical representation of data structures to guide analysis and design would not be likely to yield flexible software. So if we are to use objects or object types as the basis of our system architectures, we should find a better description criterion than the physical representation. How long is a middle initial? Lest stacks make us forget that,beyond the examples favored by computer scientists,data structures are ultimately connected with real-life objects,here is an amusing example, taken from a posting on the Risks forum(comp.risks Usenet newsgroup)of the dangers of a view of data that is too closely dependent on concrete properties: Risks forum,10.74, My dear mother blessed (or perhaps cursed)all of her children with two middle initials, 3Jan.1993.Post- in my case“D”and“E".This has cau.sed me a good deal of trouble. ing by Darrell D.E. Long:Dehuman- It seems that TRW sells certain parts of your credit information,such as your name and ization by old a demographic profile.I recently got a new credit card from Gottchalks and found to my Cobol programs". chagrin that my name had been truncated to "Darrell D.Long".I went to the credit Abbreviated. manager and was assured that things would be fixed.Well,two things happened:I got a new credit card,this time as"Darrell E.Long",and TRW now has an annotation in my See exercise E6.5. file to the effect"File variation:middle initial is E".Soon after this I start getting mail page 161. for "Darrell E.Long"(along with the usual "Darrell Long"and "Darrell D.Long"and the occasional "Darrell D.E.Long"). I called up the credit bureau and it seems that the programmer who coded up the TRW database decided that all good Americans are entitled to only one middle initial.As the woman on the phone patiently told me "They only allocated enough megabytes (sic)in the system for one middle initial,and it would probably be awfully hard to change". Aside from the typical example of technobabble justification ("megabytes"),the lesson here is the need to avoid tying software to the exact physical properties of data. TRW's system seems similar to those programs,mentioned in an earlier discussion,which "knew"that postal codes consist of exactly five digits. See page 18. The author of the message reproduced above was mainly concerned about junk mail, an unpleasant but not life-threatening event;the archives of the Risks forum are full of computer-originated name confusions with more serious consequences.The "millenium problem",mentioned in the discussion of software maintenance,is another example ofthe dangers of accessing data based on physical representation,this one with hundreds of millions of dollars'worth of consequences
§6.2 IMPLEMENTATION VARIATIONS 125 The danger of overspecification Why is it so bad to use a particular representation as specification? The results of the Lientz and Swanson maintenance study, which you may recall, give a hint. More than 17% of software costs was found to come from the need to take into account changes of data formats. As was noted in the discussion, too many programs are closely tied to the physical structure of the data they manipulate. A method relying on the physical representation of data structures to guide analysis and design would not be likely to yield flexible software. So if we are to use objects or object types as the basis of our system architectures, we should find a better description criterion than the physical representation. How long is a middle initial? Lest stacks make us forget that, beyond the examples favored by computer scientists, data structures are ultimately connected with real-life objects, here is an amusing example, taken from a posting on the Risks forum (comp.risks Usenet newsgroup) of the dangers of a view of data that is too closely dependent on concrete properties: My dear mother blessed (or perhaps cursed) all of her children with two middle initials, in my case “D” and “E”. This has caused me a good deal of trouble. It seems that TRW sells certain parts of your credit information, such as your name and a demographic profile. I recently got a new credit card from Gottchalks and found to my chagrin that my name had been truncated to “Darrell D. Long”. I went to the credit manager and was assured that things would be fixed. Well, two things happened: I got a new credit card, this time as “Darrell E. Long”, and TRW now has an annotation in my file to the effect “File variation: middle initial is E”. Soon after this I start getting mail for “Darrell E. Long” (along with the usual “Darrell Long” and “Darrell D. Long” and the occasional “Darrell D. E. Long”). I called up the credit bureau and it seems that the programmer who coded up the TRW database decided that all good Americans are entitled to only one middle initial. As the woman on the phone patiently told me “They only allocated enough megabytes (sic) in the system for one middle initial, and it would probably be awfully hard to change”. Aside from the typical example of technobabble justification (“megabytes”), the lesson here is the need to avoid tying software to the exact physical properties of data. TRW’s system seems similar to those programs, mentioned in an earlier discussion, which “knew” that postal codes consist of exactly five digits. The author of the message reproduced above was mainly concerned about junk mail, an unpleasant but not life-threatening event; the archives of the Risks forum are full of computer-originated name confusions with more serious consequences. The “millenium problem”, mentioned in the discussion of software maintenance, is another example of the dangers of accessing data based on physical representation, this one with hundreds of millions of dollars’ worth of consequences. “ABOUT SOFTWARE MAINTENANCE”, 1.3, page 17. Risks forum, 10.74, 3 Jan. 1993. Posting by Darrell D.E. Long: ``Dehumanization by old Cobol programs''. Abbreviated. See exercise E6.5, page 161. See page 18