752 DESIGNING CLASS INTERFACES $23.1 representing true and false,will light up.If you press the button several times in a row, without touching the command buttons,you will get the same result each time.If,on the other hand,you push a command button and then a query button,the answer that you get will usually be different from what you would have obtained before the command. Commands as well as queries may take arguments;these are figuratively entered in the slot at the top left. The figure is based on the example of a list object with the kind of interface hinted at in earlier chapters and studied in more detail later in the present one.Commands include start(move the cursor to the first element),forth(advance the cursor one position),search (move the cursor to the next occurrence of the element entered into the top-left slot); queries include item(show in the display panel the value of the element at cursor position) and index(show the current cursor position).Note the difference between a notion such as "cursor",relative to the internal state and hence not directly visible,and item or index which provide more abstract,officially exported information about the state. Functions that create objects A technical point needs to be clarified before we examine further consequences of the Command-Query Separation principle:should we treat object creation as a side effect? The answer is yes,as we have seen,if the target of the creation is an attribute a:in this case,the instruction !a changes the value of an object's field.The answer is no if the target is a local entity of the routine.But what if the target is the result of the function itself, as in !Result or the more general form !Result.make (...) Such a creation instruction need not be considered a side effect.It does not change any existing object and so does not endanger referential transparency (at least if we assume that there is enough memory to allocate all the objects we need).From a mathematical perspective we may pretend that all of the objects of interest,for all times past,present and future,are already inscribed in the Great Book of Objects;a creation instruction is just a way to obtain one of them,but it does not by itself change anything in the environment.It is common,and legitimate,for a function to create,initialize and return such an object. These observations assume that in the second form the creation procedure make does not produce side effects on any object other than the one being created. A clean style for class interfaces From the Command-Query Separation principle followsa style of design that yields simple and readable software,and tremendously helps reliability,reusability and extendibility. As you may have realized,this style is very different from the dominant practices of today,as fostered in particular by the C programming language.The predilection of C for side effects-for ignoring the difference between an action and a value-is not just a feature of the common C style (it sometimes seems just psychologically impossible for a C programmer to resist the temptation,when accessing a value,also to modify it a little in
752 DESIGNING CLASS INTERFACES §23.1 representing true and false, will light up. If you press the button several times in a row, without touching the command buttons, you will get the same result each time. If, on the other hand, you push a command button and then a query button, the answer that you get will usually be different from what you would have obtained before the command. Commands as well as queries may take arguments; these are figuratively entered in the slot at the top left. The figure is based on the example of a list object with the kind of interface hinted at in earlier chapters and studied in more detail later in the present one. Commands include start (move the cursor to the first element), forth (advance the cursor one position), search (move the cursor to the next occurrence of the element entered into the top-left slot); queries include item (show in the display panel the value of the element at cursor position) and index (show the current cursor position). Note the difference between a notion such as “cursor”, relative to the internal state and hence not directly visible, and item or index which provide more abstract, officially exported information about the state. Functions that create objects A technical point needs to be clarified before we examine further consequences of the Command-Query Separation principle: should we treat object creation as a side effect? The answer is yes, as we have seen, if the target of the creation is an attribute a: in this case, the instruction !! a changes the value of an object’s field. The answer is no if the target is a local entity of the routine. But what if the target is the result of the function itself, as in !! Result or the more general form !! Result ● make (…)? Such a creation instruction need not be considered a side effect. It does not change any existing object and so does not endanger referential transparency (at least if we assume that there is enough memory to allocate all the objects we need). From a mathematical perspective we may pretend that all of the objects of interest, for all times past, present and future, are already inscribed in the Great Book of Objects; a creation instruction is just a way to obtain one of them, but it does not by itself change anything in the environment. It is common, and legitimate, for a function to create, initialize and return such an object. These observations assume that in the second form the creation procedure make does not produce side effects on any object other than the one being created. A clean style for class interfaces From the Command-Query Separation principle follows a style of design that yields simple and readable software, and tremendously helps reliability, reusability and extendibility. As you may have realized, this style is very different from the dominant practices of today, as fostered in particular by the C programming language. The predilection of C for side effects — for ignoring the difference between an action and a value — is not just a feature of the common C style (it sometimes seems just psychologically impossible for a C programmer to resist the temptation, when accessing a value, also to modify it a little in
$23.1 SIDE EFFECTS IN FUNCTIONS 753 passing);it is embedded deeply into the language,with such constructs asx++,meaning: return the value of x,then increase it by one-saving a few keystrokes iny=x++ compared to y=x;x:=x+1,and not to be confused with ++x which increments before computing the value.A whole civilization,in fact,is built on side effects It would be foolish to dismiss this side-effect-full style as thoughtless;its widespread use shows that many people have found it convenient,and it may even be part of the reason for the amazing success of C and its derivatives.But what was attractive in the nineteen-seventies and eighties -when the software development population was growing by an order of magnitude every few years,and the emphasis was on getting some kind of job done rather than on long-term quality-may not be appropriate for the software technology of the twenty-first century.There we want software that will grow with us,software that we can understand,explain,maintain,reuse and trust.The Command-Query Separation principle is one of the required conditions for these goals. Applying a strict separation between commands and queries by prohibiting abstract side effects in functions is particularly appropriate for the development of large systems, where the key to success is to exert full control on every inter-module interaction. If you have been used to the converse style,you may at first,like many people,find the new one too extreme.But after starting to practice it I think you will quickly realize its benefits. Quietly,the preceding chapters have already applied Command-Query Separation to its full extent.You may remember for example that the interface for all our stack classes included a procedure remove describing the operation of popping a stack(removing the top element),and a function or attribute item which yields the top element.The first is a command,the second a query.In other approaches you might have seen a routine pop which both removes the element and returns it-a side-effect-producing function.This example has,I hope,been studied in enough depth to show the gains of clarity and simplicity that we achieve by keeping the two aspects cleanly separated. Other consequences of the principles may seem more alarming at first.For reading input,many people are used to the style of using functions such as getint-the C name, but its equivalent exists in many other languages-whose effect is to read a new input element and return its value.This is a side-effect-producing function in all its splendor:a call to the function,written getint ()-with the empty parentheses so unmistakably characteristic of the C look-and-feel-does not just return a value but affects the context ("asking a question changes the answer"),as typical consequences,excluding the chance case in which the input has two identical consecutive values: If you call getint ()twice you will get different answers. getint ()+getint (and 2 getint()will not yield the same value.(If an overzealous "optimizing"compiler treats the first expression like the second,you will report a bug to the compiler vendor,and you will be right.) In other words,we lose the benefits of referential transparency-of reasoning about software functions as if they were mathematical functions,with a crystal-clear view of how we can build expressions from them and what values these expressions will denote
§23.1 SIDE EFFECTS IN FUNCTIONS 753 passing); it is embedded deeply into the language, with such constructs as x++, meaning: return the value of x, then increase it by one — saving a few keystrokes in y = x++ compared to y = x; x := x+1, and not to be confused with ++x which increments before computing the value. A whole civilization, in fact, is built on side effects. It would be foolish to dismiss this side-effect-full style as thoughtless; its widespread use shows that many people have found it convenient, and it may even be part of the reason for the amazing success of C and its derivatives. But what was attractive in the nineteen-seventies and eighties — when the software development population was growing by an order of magnitude every few years, and the emphasis was on getting some kind of job done rather than on long-term quality — may not be appropriate for the software technology of the twenty-first century. There we want software that will grow with us, software that we can understand, explain, maintain, reuse and trust. The Command-Query Separation principle is one of the required conditions for these goals. Applying a strict separation between commands and queries by prohibiting abstract side effects in functions is particularly appropriate for the development of large systems, where the key to success is to exert full control on every inter-module interaction. If you have been used to the converse style, you may at first, like many people, find the new one too extreme. But after starting to practice it I think you will quickly realize its benefits. Quietly, the preceding chapters have already applied Command-Query Separation to its full extent. You may remember for example that the interface for all our stack classes included a procedure remove describing the operation of popping a stack (removing the top element), and a function or attribute item which yields the top element. The first is a command, the second a query. In other approaches you might have seen a routine pop which both removes the element and returns it — a side-effect-producing function. This example has, I hope, been studied in enough depth to show the gains of clarity and simplicity that we achieve by keeping the two aspects cleanly separated. Other consequences of the principles may seem more alarming at first. For reading input, many people are used to the style of using functions such as getint — the C name, but its equivalent exists in many other languages — whose effect is to read a new input element and return its value. This is a side-effect-producing function in all its splendor: a call to the function, written getint () — with the empty parentheses so unmistakably characteristic of the C look-and-feel — does not just return a value but affects the context (“asking a question changes the answer”); as typical consequences, excluding the chance case in which the input has two identical consecutive values: • If you call getint () twice you will get different answers. • getint () + getint () and 2 ∗ getint () will not yield the same value. (If an overzealous “optimizing” compiler treats the first expression like the second, you will report a bug to the compiler vendor, and you will be right.) In other words, we lose the benefits of referential transparency — of reasoning about software functions as if they were mathematical functions, with a crystal-clear view of how we can build expressions from them and what values these expressions will denote
754 DESIGNING CLASS INTERFACES $23.1 The Command-Query Separation principle brings referential transparency back.Here this means that we will distinguish between the procedure that advances the input cursor to the next item and the function or attribute that yields the item last read.Assume input is of type FILE;the instructions to read the next integer from file input will be something like input.advance n:=input.last integer If you call last integer ten times in a row you will,unlike with getint,get ten times the same result.If you are new to this style,it may take some getting used to;but the resulting simplicity and clarity will soon remove any temptation to go back to side effects. In this example as in the x++case seen earlier,the traditional form beats the object- oriented one if the goal of the game is to minimize keystrokes.This illustrates a general observation:the productivity gains of object technology will not derive from trying to be as terse as possible on a microscopic scale(a game at which APL or modern"scripting languages"such as Perl will always win against a good O-O language).The achievements are on the global structure of a system:through reuse,through such mechanisms as genericity and garbage collection,through the use of assertions,you can decrease the size of your software by amounts far higher than anything you can achieve by skimping by a character here or a line there.Keystroke-wise is often system-foolish. Pseudo-random number generators:a design exercise An example sometimes quoted in favor of functions with side effects is that of pseudo- random number generators,which return successive values from a sequence enjoying adequate statistical properties.The sequence is initialized by a call of the form random seed (seed) where seed is a seed value provided by the client.A common way to get the successive pseudo-random values is by calling a function: xx:=next random ( But here too there is no reason to make an exception to the command/query dichotomy.Before looking at the solution let us just forget that we have seen the above and restart from scratch by asking the question:how should we handle random generation in an object-oriented context?This will provide the opportunity of a little design exercise, and will enable us,if the need arises,to explain the results to someone whose view has not been unduly influenced by pre-O-O approaches. As always in object technology,the relevant question-often the only one-is: What are the data abstractions? The relevant abstraction here is not "random number generation"or "random number generator",both of them quite functional in nature,focusing on what the system does rather than what it does it to. Probing further,we might think "random number",but that is not the right answer yet.Remember,a data abstraction is characterized by features-commands and queries; it is hard to think of features applicable to "random number
754 DESIGNING CLASS INTERFACES §23.1 The Command-Query Separation principle brings referential transparency back. Here this means that we will distinguish between the procedure that advances the input cursor to the next item and the function or attribute that yields the item last read. Assume input is of type FILE; the instructions to read the next integer from file input will be something like input ● advance n := input ● last_integer If you call last_integer ten times in a row you will, unlike with getint, get ten times the same result. If you are new to this style, it may take some getting used to; but the resulting simplicity and clarity will soon remove any temptation to go back to side effects. In this example as in the x++ case seen earlier, the traditional form beats the objectoriented one if the goal of the game is to minimize keystrokes. This illustrates a general observation: the productivity gains of object technology will not derive from trying to be as terse as possible on a microscopic scale (a game at which APL or modern “scripting languages” such as Perl will always win against a good O-O language). The achievements are on the global structure of a system: through reuse, through such mechanisms as genericity and garbage collection, through the use of assertions, you can decrease the size of your software by amounts far higher than anything you can achieve by skimping by a character here or a line there. Keystroke-wise is often system-foolish. Pseudo-random number generators: a design exercise An example sometimes quoted in favor of functions with side effects is that of pseudorandom number generators, which return successive values from a sequence enjoying adequate statistical properties. The sequence is initialized by a call of the form random_seed (seed) where seed is a seed value provided by the client. A common way to get the successive pseudo-random values is by calling a function: xx := next_random () But here too there is no reason to make an exception to the command/query dichotomy. Before looking at the solution let us just forget that we have seen the above and restart from scratch by asking the question: how should we handle random generation in an object-oriented context? This will provide the opportunity of a little design exercise, and will enable us, if the need arises, to explain the results to someone whose view has not been unduly influenced by pre-O-O approaches. As always in object technology, the relevant question — often the only one — is: What are the data abstractions? The relevant abstraction here is not “random number generation” or “random number generator”, both of them quite functional in nature, focusing on what the system does rather than what it does it to. Probing further, we might think “random number”, but that is not the right answer yet. Remember, a data abstraction is characterized by features — commands and queries; it is hard to think of features applicable to “random number
$23.1 SIDE EFFECTS IN FUNCTIONS 755 “Discovery and That "random number"leads to a dead end illustrates the Class Elicitation principle rejection",page 725. encountered when we studied the general rules for finding the classes:a key step may be to reject inappropriate candidates.And once again we see that not all promising nouns yield classes:were a"requirements document"written for this problem,the noun random number would certainly figure prominently in it. A random number does not mean much by itself;it must be understood in relation to its predecessors and successors in the sequence. Wait a minute-here we have it:sequence,more precisely pseudo-random number sequence.This is the abstraction we have been looking for,a perfectly legitimate data abstraction,similar to the cursor lists we have seen on a number of occasions,only infinite (do not look for an after boolean query!).Features will include: Commands:make-initialize with a certain seed;forth-advance to next element. Queries:item-return the element at cursor position. An infinite list item as a machine start To get a new random number sequence rand,clients will use !rand.make (seed),to advance to the next value,they will call rand.forth;and they will obtain the current value by xx:=rand.item. There is really nothing specific to random number sequences in the interface,except for the seed argument to the creation procedure.Adding a start procedure which brings the cursor to the first item (and which make may call for random number sequences),what we have is the framework for a deferred class COUNTABLE SEOUENCE describing arbitrary infinite sequences.Think for example of how to model prime numbers in an object-oriented way;the answer is the same:define a class PRIMES,an heir to COUNTABLE SEOUENCE,whose successive elements are the prime numbers.Other sequences-Fibonacci numbers and the like-will be modeled in the same way. These examples illustrate in passing that contrary to popular belief it is quite possible,and even trivial,to represent infinite structures on a computer.Abstract data types provide the key:a structure is entirely defined by the applicable operations,of which there is of course a finite number,three in this case-starl,forth,item-plus any auxil iary features we may want to add.The trick,of course,is that any execution will only try to evaluate a finite number of elements of an infinite structure. M1994a COUNTABLE SEOUENCE and its heirs such as PRIMES are part of the universal computing science hierarchy described in the companion guide to reusable components
§23.1 SIDE EFFECTS IN FUNCTIONS 755 That “random number” leads to a dead end illustrates the Class Elicitation principle encountered when we studied the general rules for finding the classes: a key step may be to reject inappropriate candidates. And once again we see that not all promising nouns yield classes: were a “requirements document” written for this problem, the noun random number would certainly figure prominently in it. A random number does not mean much by itself; it must be understood in relation to its predecessors and successors in the sequence. Wait a minute — here we have it: sequence, more precisely pseudo-random number sequence. This is the abstraction we have been looking for; a perfectly legitimate data abstraction, similar to the cursor lists we have seen on a number of occasions, only infinite (do not look for an after boolean query!). Features will include: • Commands: make — initialize with a certain seed; forth — advance to next element. • Queries: item — return the element at cursor position. To get a new random number sequence rand, clients will use !! rand ● make (seed); to advance to the next value, they will call rand ● forth; and they will obtain the current value by xx := rand ● item. There is really nothing specific to random number sequences in the interface, except for the seed argument to the creation procedure. Adding a start procedure which brings the cursor to the first item (and which make may call for random number sequences), what we have is the framework for a deferred class COUNTABLE_SEQUENCE describing arbitrary infinite sequences. Think for example of how to model prime numbers in an object-oriented way; the answer is the same: define a class PRIMES, an heir to COUNTABLE_SEQUENCE, whose successive elements are the prime numbers. Other sequences — Fibonacci numbers and the like — will be modeled in the same way. These examples illustrate in passing that contrary to popular belief it is quite possible, and even trivial, to represent infinite structures on a computer. Abstract data types provide the key: a structure is entirely defined by the applicable operations, of which there is of course a finite number, three in this case — start, forth, item — plus any auxiliary features we may want to add. The trick, of course, is that any execution will only try to evaluate a finite number of elements of an infinite structure. COUNTABLE_SEQUENCE and its heirs such as PRIMES are part of the universal computing science hierarchy described in the companion guide to reusable components. “Discovery and rejection”, page 725. An infinite list as a machine forth item start [M 1994a]
756 DESIGNING CLASS INTERFACES $23.1 Abstract state,concrete state From the discussion of referential transparency it would seem desirable to bar all concrete "If it is barogue.fix side effects from functions.Such a rule would have the advantage that-in line with one it".page 670. of our methodology precepts-we could build it into the language,since a compiler can easily detect concrete side effects(as we saw just after the definition of this notion). Unfortunately,this would be unacceptably restrictive,explaining why the Command-Query Separation principle only prohibits abstract side effects,a notion that will now be defined.The problem is that some concrete side effects are not only harmless but necessary.They are of two kinds. The first category includes functions which,in the course of their execution,modify the state,sometimes drastically,and affecting very visible features;but then they restore the original state.Consider for example a class describing integer lists with cursor,and the following function for computing the maximum of a list: max is --The highest value of items in the list require not empty local original index:INTEGER do original index index from start;Result item until is last loop forth.Result :Result.max (item) end go (original index) end To traverse the list,the algorithm needs to move the cursor over all elements.The function,calling such procedures as start,forth and go,is indeed full of concrete side effects on the cursor position;but it begins by noting the cursor position into original index and ends by returning the cursor to that position through a call to go.All is well that ends well:the function leaves the list in exactly the state in which it found it.But no compiler in the world is going to detect that the side effect is only apparent. Side effects of the second acceptable category may change the state of the object,but Pages 376 to 378. only affecting properties that are not visible to clients.To understand the concepts in depth,it will be useful to make sure that you are familiar with the discussion of “abstraction function”and“implementation invariants'”in the presentation of Design by Contract.(In particular,take a look at the accompanying figures to refresh your memory.)
756 DESIGNING CLASS INTERFACES §23.1 Abstract state, concrete state From the discussion of referential transparency it would seem desirable to bar all concrete side effects from functions. Such a rule would have the advantage that — in line with one of our methodology precepts — we could build it into the language, since a compiler can easily detect concrete side effects (as we saw just after the definition of this notion). Unfortunately, this would be unacceptably restrictive, explaining why the Command-Query Separation principle only prohibits abstract side effects, a notion that will now be defined. The problem is that some concrete side effects are not only harmless but necessary. They are of two kinds. The first category includes functions which, in the course of their execution, modify the state, sometimes drastically, and affecting very visible features; but then they restore the original state. Consider for example a class describing integer lists with cursor, and the following function for computing the maximum of a list: max is -- The highest value of items in the list require not empty local original_index: INTEGER do original_index := index from start; Result := item until is_last loop forth; Result := Result ● max (item) end go (original_index) end To traverse the list, the algorithm needs to move the cursor over all elements. The function, calling such procedures as start, forth and go, is indeed full of concrete side effects on the cursor position; but it begins by noting the cursor position into original_index and ends by returning the cursor to that position through a call to go. All is well that ends well: the function leaves the list in exactly the state in which it found it. But no compiler in the world is going to detect that the side effect is only apparent. Side effects of the second acceptable category may change the state of the object, but only affecting properties that are not visible to clients. To understand the concepts in depth, it will be useful to make sure that you are familiar with the discussion of “abstraction function” and “implementation invariants” in the presentation of Design by Contract. (In particular, take a look at the accompanying figures to refresh your memory.) “If it is baroque, fix it”, page 670. Pages 376 to 378