126 ABSTRACT DATA TYPES $6.3 6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS How do we retain completeness,precision and non-ambiguity without paying the price of overspecification? Using the operations In the stack example,what unites the various representations in spite of all their differences is that they describe a"container"structure (a structure used to contain other objects),where certain operations are applicable and enjoy certain properties.By focusing not on a particular choice of representation but on these operations and properties,we may be able to obtain an abstract yet useful characterization of the notion of stack. The operations typically available on a stack are the following: A command to push an element on top of a stack.Let us call that operation put. A command to remove the stack's top element,if the stack is not empty.Let us call it remove. A query to find out what the topelement is,ifthe stack is not empty.Let us call it item. .A query to determine whether the stack is empty.(This will enable clients to determine beforehand if they can use remove and item. In addition we may need a creator operation giving us a stack,initially empty.Let us call it make. Two points may have caught your attention and will deserve more explanation later in this chapter.First,the operation names may seem surprising;for the moment,just think of put as meaning push,remove as meaning pop,and item as meaning top.Details shortly (on the facing page,actually).Second,the operations have been divided into three categories:creators,which yield objects;queries,which return information about objects;and commands,which can modify objects.This classification will also require some more comments. In a traditional view of data structures,we would consider that the notion of stack is given by some data declaration corresponding to one of the above representations,for example (representation ARRAY_UP,Pascal-like syntax): count:INTEGER representation:array [1..capacity]of STACK ELEMENT TYPE where capacity,a constant integer,is the maximum number ofelements on the stack.Then put,remove,item,empty and make would be routines (subprograms)that work on the object structures defined by these declarations. The key step towards data abstraction is to reverse the viewpoint:forget for the moment about the representation;take the operations themselves as defining the data structure.In other words,a stack is any structure to which clients may apply the operations listed above
126 ABSTRACT DATA TYPES §6.3 6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS How do we retain completeness, precision and non-ambiguity without paying the price of overspecification? Using the operations In the stack example, what unites the various representations in spite of all their differences is that they describe a “container” structure (a structure used to contain other objects), where certain operations are applicable and enjoy certain properties. By focusing not on a particular choice of representation but on these operations and properties, we may be able to obtain an abstract yet useful characterization of the notion of stack. The operations typically available on a stack are the following: • A command to push an element on top of a stack. Let us call that operation put. • A command to remove the stack’s top element, if the stack is not empty. Let us call it remove. • A query to find out what the top element is, if the stack is not empty. Let us call it item. • A query to determine whether the stack is empty. (This will enable clients to determine beforehand if they can use remove and item.) In addition we may need a creator operation giving us a stack, initially empty. Let us call it make. Two points may have caught your attention and will deserve more explanation later in this chapter. First, the operation names may seem surprising; for the moment, just think of put as meaning push, remove as meaning pop, and item as meaning top. Details shortly (on the facing page, actually). Second, the operations have been divided into three categories: creators, which yield objects; queries, which return information about objects; and commands, which can modify objects. This classification will also require some more comments. In a traditional view of data structures, we would consider that the notion of stack is given by some data declaration corresponding to one of the above representations, for example (representation ARRAY_UP, Pascal-like syntax): count: INTEGER representation: array [1 ● ● capacity] of STACK_ELEMENT_TYPE where capacity, a constant integer, is the maximum number of elements on the stack. Then put, remove, item, empty and make would be routines (subprograms) that work on the object structures defined by these declarations. The key step towards data abstraction is to reverse the viewpoint: forget for the moment about the representation; take the operations themselves as defining the data structure. In other words, a stack is any structure to which clients may apply the operations listed above
$6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS 127 A laissez-faire policy for the society of modules The method just outlined for describing data structures shows a rather selfish approach to the world of data structures:like an economist of the most passionate supply-side, invisible-hand,let-the-free-market-decide school,we are interested in individual agents not so much for what they are internally as for what they have to offer to each other.The world ofobjects (and hence of software architecture)will be a world of interacting agents, communicating on the basis of precisely defined protocols. The economic analogy will indeed accompany us throughout this presentation;the agents-the software modules-are called suppliers and clients;the protocols will be called contracts,and much of object-oriented design is indeed Design by Contract,the title of a later chapter. See "BEYOND As always with analogies,we should not get too carried away:this work is not a SOFTWARE”,6.6. textbook on economics,and contains no hint of its author's views in that field.It will page 147. suffice for the moment to note the remarkable analogies of the abstract data type approach to some theories of how human agents should work together.Later in this chapter we will again explore what abstract data types can tell us beyond their original area of application. Name consistency For the moment,let us get back to more immediate concerns,and make sure you are comfortable with the above example specification in all its details.If you have encountered stacks before,the operation names chosen for the discussion of stacks may have surprised or even shocked you.Any self-respecting computer scientist will know stack operations under other names: Common stack operation name Name used here push put pop remove top item new make Why use anything else than the traditional terminology?The reason is a desire to take a high-level view of data structures-especially "containers",those data structures used to keep objects. Stacks are just one brand of container;more precisely,they belong to a category of containers which we may call dispensers.A dispenser provides its clients with a mechanism for storing (pur),retrieving (item)and removing (remove)objects,but without giving them any control over the choice of object to be stored,retrieved or removed.For example,the LIFO policy of stacks implies that you may only retrieve or remove the element that was stored last.Another brand of dispenser is the queue,which has a first-in, first-out(FIFO)policy:you store at one end,retrieve and remove at the other;the element
§6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS 127 A laissez-faire policy for the society of modules The method just outlined for describing data structures shows a rather selfish approach to the world of data structures: like an economist of the most passionate supply-side, invisible-hand, let-the-free-market-decide school, we are interested in individual agents not so much for what they are internally as for what they have to offer to each other. The world of objects (and hence of software architecture) will be a world of interacting agents, communicating on the basis of precisely defined protocols. The economic analogy will indeed accompany us throughout this presentation; the agents — the software modules — are called suppliers and clients; the protocols will be called contracts, and much of object-oriented design is indeed Design by Contract, the title of a later chapter. As always with analogies, we should not get too carried away: this work is not a textbook on economics, and contains no hint of its author’s views in that field. It will suffice for the moment to note the remarkable analogies of the abstract data type approach to some theories of how human agents should work together. Later in this chapter we will again explore what abstract data types can tell us beyond their original area of application. Name consistency For the moment, let us get back to more immediate concerns, and make sure you are comfortable with the above example specification in all its details. If you have encountered stacks before, the operation names chosen for the discussion of stacks may have surprised or even shocked you. Any self-respecting computer scientist will know stack operations under other names: Why use anything else than the traditional terminology? The reason is a desire to take a high-level view of data structures — especially “containers”, those data structures used to keep objects. Stacks are just one brand of container; more precisely, they belong to a category of containers which we may call dispensers. A dispenser provides its clients with a mechanism for storing (put), retrieving (item) and removing (remove) objects, but without giving them any control over the choice of object to be stored, retrieved or removed. For example, the LIFO policy of stacks implies that you may only retrieve or remove the element that was stored last. Another brand of dispenser is the queue, which has a first-in, first-out (FIFO) policy: you store at one end, retrieve and remove at the other; the element Common stack operation name Name used here push put pop remove top item new make See “BEYOND SOFTWARE”, 6.6, page 147
128 ABSTRACT DATA TYPES $6.3 that you retrieve or remove is the oldest one stored but not yet removed.An example of a container which is not a dispenser is an array,where you choose,through integer indices, the positions where you store and retrieve objects. Because the similarities between various kinds of container(dispensers,arrays and others)are more important than the differences between their individual storage,retrieval and removal properties,this book constantly adheres to a standardized terminology which downplays the differences between data structure variants and instead emphasizes the commonality.So the basic operation to retrieve an element will always be called item,the basic operation to remove an element will always be called remove and so on. These naming issues may appear superficial at first-"cosmetic",as programmers sometimes say.But do not forget that one of our eventual aims is to provide the basis for powerful,professional libraries of reusable software components.Such libraries will contain tens of thousands of available operations.Without a systematic and clear nomenclature,both the developers and the users of these libraries would quickly be swamped in a flood of specific and incompatible names,providing a strong (and unjustifiable)obstacle to large-scale reuse. Naming,then,is not cosmetic.Good reusable software is software that provides the right functionality and provides it under the right names. The names used here for stack operations are part of a systematic set of naming Chapter 26,in par- conventions used throughout this book.A later chapter will introduce them in more detail. ticular“CHOOS- ING THE RIGHT How not to handle abstractions NAMES”,26.2. page 879. In software engineering as in other scientific and technical disciplines,a seminal idea may seem obvious once you have been exposed to it,even though it may have taken a long time to emerge.The bad ideas and the complicated ones(they are often the same)often appear first;it takes time for the simple and the elegant to take over. This observation is true of abstract data types.Although good software developers have always(as a result ofeducation or mere instinct)made good use of abstraction,many of the systems in existence today were designed without much consideration of this goal. I once did a little involuntary experiment which provided a good illustration of this state of affairs.While setting up the project part of a course which I was teaching,I decided to provide students with a sort of anonymous marketplace,where they could place mock"for sale"announcements of software modules,without saying who was the source of the advertisement.(The idea,which may or may not have been a good one,was to favor a selection process based only on a precise specification of the modules'advertized facilities.)The mail facility of a famous operating system commonly favored by universities seemed to provide the right base mechanism (why write a new mail system just for a course project?);but naturally that mail facility shows the sender's name when it delivers a message to its recipients.I had access to the source of the corresponding code -a huge C program-and decided,perhaps foolishly,to take that code,remove all references to the sender's name in delivered messages,and recompile
128 ABSTRACT DATA TYPES §6.3 that you retrieve or remove is the oldest one stored but not yet removed. An example of a container which is not a dispenser is an array, where you choose, through integer indices, the positions where you store and retrieve objects. Because the similarities between various kinds of container (dispensers, arrays and others) are more important than the differences between their individual storage, retrieval and removal properties, this book constantly adheres to a standardized terminology which downplays the differences between data structure variants and instead emphasizes the commonality. So the basic operation to retrieve an element will always be called item, the basic operation to remove an element will always be called remove and so on. These naming issues may appear superficial at first — “cosmetic”, as programmers sometimes say. But do not forget that one of our eventual aims is to provide the basis for powerful, professional libraries of reusable software components. Such libraries will contain tens of thousands of available operations. Without a systematic and clear nomenclature, both the developers and the users of these libraries would quickly be swamped in a flood of specific and incompatible names, providing a strong (and unjustifiable) obstacle to large-scale reuse. Naming, then, is not cosmetic. Good reusable software is software that provides the right functionality and provides it under the right names. The names used here for stack operations are part of a systematic set of naming conventions used throughout this book. A later chapter will introduce them in more detail. How not to handle abstractions In software engineering as in other scientific and technical disciplines, a seminal idea may seem obvious once you have been exposed to it, even though it may have taken a long time to emerge. The bad ideas and the complicated ones (they are often the same) often appear first; it takes time for the simple and the elegant to take over. This observation is true of abstract data types. Although good software developers have always (as a result of education or mere instinct) made good use of abstraction, many of the systems in existence today were designed without much consideration of this goal. I once did a little involuntary experiment which provided a good illustration of this state of affairs. While setting up the project part of a course which I was teaching, I decided to provide students with a sort of anonymous marketplace, where they could place mock “for sale” announcements of software modules, without saying who was the source of the advertisement. (The idea, which may or may not have been a good one, was to favor a selection process based only on a precise specification of the modules’ advertized facilities.) The mail facility of a famous operating system commonly favored by universities seemed to provide the right base mechanism (why write a new mail system just for a course project?); but naturally that mail facility shows the sender’s name when it delivers a message to its recipients. I had access to the source of the corresponding code — a huge C program — and decided, perhaps foolishly, to take that code, remove all references to the sender’s name in delivered messages, and recompile. Chapter 26, in particular “CHOOSING THE RIGHT NAMES”, 26.2, page 879
$6.4 FORMALIZING THE SPECIFICATION 129 Aided by a teaching assistant,I thus embarked on a task which seemed obvious enough although not commonly taught in software engineering courses:systematic program deconstruction.Sure enough,we quickly found the first place where the program accessed the sender's name,and we removed the corresponding code.This,we naively thought,would have done the job,so we recompiled and sent a test mail message;but the sender's name was still there!Thus began a long and surreal process:time and again, believing we had finally found the last reference to the sender's name,we would remove it,recompile,and mail a test message,only to find the name duly recorded once again in its habitual field.Like the Hydra in its famous fight,the mailer kept growing a new head every time we thought we had cut the last neck. Finally,repeating for the modern era the earlier feat of Hercules,we slew the beast for good;by then we had removed more than twenty code extracts which all accessed,in some way or other,information about the message sender. Writing MAIL Although the previous sections have only got us barely started on our road to abstract MESSAGE is the data types,it should be clear by now that any program written in accordance with even the topic of exercise E6.4,page161. most elementary concepts of data abstraction would treat MAIL MESSAGE as a carefully defined abstract notion,supporting a query operation,perhaps called sender,which returns information about the message sender.Any portion of the mail program that needs this information would obtain it solely through the sender query.Had the mail program been designed according to this seemingly obvious principle,it would have been sufficient,for the purpose of my little exercise,to modify the code of the sender query. Most likely,the software would also then have provided an associated command operation set sender to update sender information,making the job even easier. What is the real moral of that little story (besides lowering the reader's guard in preparation for the surprise mathematical offensive of the next section)?After all,the mail program in question is successful,at least judging by its widespread use.But it typifies the current quality standard in the industry.Until we move significantly beyond that standard, the phrase "software engineering"will remain a case of wishful thinking Oh yes,one more note.Some time after my brief encounter with the mail program, I read that certain network hackers had intruded into the computer systems of highly guarded government laboratories,using a security hole of that very mail program-a hole which was familiar,so the press reported,to all those in the know.I was not in the know; but,when I learned the news,I was not surprised. 6.4 FORMALIZING THE SPECIFICATION The glimpse of data abstraction presented so far is too informal to be of durable use. Consider again our staple example:a stack,as we now understand it,is defined in terms of the applicable operations;but then we need to define these operations! Informal descriptions as above(put pushes an element"on top of"the stack,remove pops the element"last pushed"and so on)do not suffice.We need to know precisely how these operations can be used by clients,and what they will do for them
§6.4 FORMALIZING THE SPECIFICATION 129 Aided by a teaching assistant, I thus embarked on a task which seemed obvious enough although not commonly taught in software engineering courses: systematic program deconstruction. Sure enough, we quickly found the first place where the program accessed the sender’s name, and we removed the corresponding code. This, we naïvely thought, would have done the job, so we recompiled and sent a test mail message; but the sender’s name was still there! Thus began a long and surreal process: time and again, believing we had finally found the last reference to the sender’s name, we would remove it, recompile, and mail a test message, only to find the name duly recorded once again in its habitual field. Like the Hydra in its famous fight, the mailer kept growing a new head every time we thought we had cut the last neck. Finally, repeating for the modern era the earlier feat of Hercules, we slew the beast for good; by then we had removed more than twenty code extracts which all accessed, in some way or other, information about the message sender. Although the previous sections have only got us barely started on our road to abstract data types, it should be clear by now that any program written in accordance with even the most elementary concepts of data abstraction would treat MAIL_MESSAGE as a carefully defined abstract notion, supporting a query operation, perhaps called sender, which returns information about the message sender. Any portion of the mail program that needs this information would obtain it solely through the sender query. Had the mail program been designed according to this seemingly obvious principle, it would have been sufficient, for the purpose of my little exercise, to modify the code of the sender query. Most likely, the software would also then have provided an associated command operation set_sender to update sender information, making the job even easier. What is the real moral of that little story (besides lowering the reader’s guard in preparation for the surprise mathematical offensive of the next section)? After all, the mail program in question is successful, at least judging by its widespread use. But it typifies the current quality standard in the industry. Until we move significantly beyond that standard, the phrase “software engineering” will remain a case of wishful thinking. Oh yes, one more note. Some time after my brief encounter with the mail program, I read that certain network hackers had intruded into the computer systems of highly guarded government laboratories, using a security hole of that very mail program — a hole which was familiar, so the press reported, to all those in the know. I was not in the know; but, when I learned the news, I was not surprised. 6.4 FORMALIZING THE SPECIFICATION The glimpse of data abstraction presented so far is too informal to be of durable use. Consider again our staple example: a stack, as we now understand it, is defined in terms of the applicable operations; but then we need to define these operations! Informal descriptions as above (put pushes an element “on top of” the stack, remove pops the element “last pushed” and so on) do not suffice. We need to know precisely how these operations can be used by clients, and what they will do for them. Writing MAIL_ MESSAGE is the topic of exercise E6.4, page 161
130 ABSTRACT DATA TYPES $6.4 An abstract data type specification will provide this information.It consists of four paragraphs,explained in the next sections: ·TYPES ·FUNCTIONS. ·AXIOMS ·PRECONDITIONS. These paragraphs will rely on a simple mathematical notation for specifying the properties of an abstract data type (ADT for short). The notation-a mathematical formalism,not to be confused with the software notation of the rest of this book even though for consistency it uses a similar syntactic style-has no name and is not a programming language;it could serve as the starting point for a formal specification language,but we shall not pursue this avenue here, being content enough to use self-explanatory conventions for the unambiguous specification of abstract data types. Specifying types The TYPES paragraph indicates the types being specified.In general,it may be convenient to specify several ADTs together,although our example has only one,STACK. By the way,what is a type?The answer to this question will combine all the ideas developed in the rest of this chapter;a type is a collection of objects characterized by functions,axioms and preconditions.If for the moment you just view a type as a set of objects,in the mathematical sense of the word "set"-type STACK as the set of all possible stacks,type INTEGER as the set of all possible integer values and so on-you are not guilty of any terrible misunderstanding.As you read this discussion you will be able to refine this view.In the meantime the discussion will not be too fussy about using “set”for"type”and conversely. On one point,however,you should make sure to avoid any confusion:an abstract data type such as STACK is not an object(one particular stack)but a collection of objects (the set of all stacks).Remember what our real goal is:finding a good basis for the modules of our software systems.As was noted in the previous chapter,basing a module on one particular object-one stack,one airplane,one bank account-would not make sense.O-O design will enable us to build modules covering the properties of all stacks,all airplanes,all bank accounts-or at least of some stacks,airplanes or accounts. An object belonging to the set of objects described by an ADT specification is called an instance of the ADT.For example,a specific stack which satisfies the properties of the STACK abstract data type will be an instance of STACK.The notion of instance will carry over to object-oriented design and programming,where it will play an important role in explaining the run-time behavior of programs
130 ABSTRACT DATA TYPES §6.4 An abstract data type specification will provide this information. It consists of four paragraphs, explained in the next sections: • TYPES. • FUNCTIONS. • AXIOMS. • PRECONDITIONS. These paragraphs will rely on a simple mathematical notation for specifying the properties of an abstract data type (ADT for short). The notation — a mathematical formalism, not to be confused with the software notation of the rest of this book even though for consistency it uses a similar syntactic style — has no name and is not a programming language; it could serve as the starting point for a formal specification language, but we shall not pursue this avenue here, being content enough to use self-explanatory conventions for the unambiguous specification of abstract data types. Specifying types The TYPES paragraph indicates the types being specified. In general, it may be convenient to specify several ADTs together, although our example has only one, STACK. By the way, what is a type? The answer to this question will combine all the ideas developed in the rest of this chapter; a type is a collection of objects characterized by functions, axioms and preconditions. If for the moment you just view a type as a set of objects, in the mathematical sense of the word “set” — type STACK as the set of all possible stacks, type INTEGER as the set of all possible integer values and so on — you are not guilty of any terrible misunderstanding. As you read this discussion you will be able to refine this view. In the meantime the discussion will not be too fussy about using “set” for “type” and conversely. On one point, however, you should make sure to avoid any confusion: an abstract data type such as STACK is not an object (one particular stack) but a collection of objects (the set of all stacks). Remember what our real goal is: finding a good basis for the modules of our software systems. As was noted in the previous chapter, basing a module on one particular object — one stack, one airplane, one bank account — would not make sense. O-O design will enable us to build modules covering the properties of all stacks, all airplanes, all bank accounts — or at least of some stacks, airplanes or accounts. An object belonging to the set of objects described by an ADT specification is called an instance of the ADT. For example, a specific stack which satisfies the properties of the STACK abstract data type will be an instance of STACK. The notion of instance will carry over to object-oriented design and programming, where it will play an important role in explaining the run-time behavior of programs