17 Typing etive use ofobtoyqre that wecearly spcifythofou systems,the types ofall objects that they will manipulate at run time.This rule,known as static typing-a notion defined precisely in the next sections-makes our software: More reliable,by enabling compilers and other tools to suppress discrepancies before they have had time to cause damage. More readable,by providing precious information to authors of client systems, future maintainers of our own software,and other readers. More efficient,since this information helps a good compiler generate better code. Although the typing issue has been extensively discussed in non-O-O contexts,and static typing applied to many non-O-O languages,the concepts are particularly clear and relevant in object technology since the approach as a whole is largely based on the idea of type,merged with the idea of module to yield the basic O-O construct,the class. The desire to provide static typing has been a major influence on the mechanisms discussed in earlier chapters.Here we need to take a comprehensive look at typing and devise solutions to the remaining difficulties raised by this concept. 17.1 THE TYPING PROBLEM One nice thing can be said about the typing issue in object-oriented software construction: it may not be an easy problem,but it is a simple problem-simple,that is,to state. The Basic Construct The problem's simplicity comes from the simplicity of the object-oriented model of computation.If we put aside some of the details,only one kind of event ever occurs during the execution of an object-oriented system:feature call,of the general form x.f(arg) which executes on the object attached to x the operation f,using the argument arg,with the understanding that in some cases arg stands for several arguments,or no argument at all.Smalltalk programmers would say "pass to the objectx the message fwith argument arg",and use another syntax,but those are differences of style,not substance. That everything relies on this Basic Construct accounts in part for the general feeling of beauty that object-oriented ideas arouse in many people
17 Typing E ffective use of object technology requires that we clearly specify, in the texts of our systems, the types of all objects that they will manipulate at run time. This rule, known as static typing — a notion defined precisely in the next sections — makes our software: • More reliable, by enabling compilers and other tools to suppress discrepancies before they have had time to cause damage. • More readable, by providing precious information to authors of client systems, future maintainers of our own software, and other readers. • More efficient, since this information helps a good compiler generate better code. Although the typing issue has been extensively discussed in non-O-O contexts, and static typing applied to many non-O-O languages, the concepts are particularly clear and relevant in object technology since the approach as a whole is largely based on the idea of type, merged with the idea of module to yield the basic O-O construct, the class. The desire to provide static typing has been a major influence on the mechanisms discussed in earlier chapters. Here we need to take a comprehensive look at typing and devise solutions to the remaining difficulties raised by this concept. 17.1 THE TYPING PROBLEM One nice thing can be said about the typing issue in object-oriented software construction: it may not be an easy problem, but it is a simple problem — simple, that is, to state. The Basic Construct The problem’s simplicity comes from the simplicity of the object-oriented model of computation. If we put aside some of the details, only one kind of event ever occurs during the execution of an object-oriented system: feature call, of the general form x ● f (arg) which executes on the object attached to x the operation f, using the argument arg, with the understanding that in some cases arg stands for several arguments, or no argument at all. Smalltalk programmers would say “pass to the object x the message f with argument arg”, and use another syntax, but those are differences of style, not substance. That everything relies on this Basic Construct accounts in part for the general feeling of beauty that object-oriented ideas arouse in many people
612 TYPING $17.1 From the Basic Construct follows the basic kind of abnormal event that might occur at execution time: Definition:type violation A run-time type violation (or just type violation for short)occurs in the execution of a call xf(arg),wherex is attached to an object OBJ,if either: VI.There is no feature corresponding to fand applicable to OBJ. V2.There is such a feature,but arg is not an acceptable argument for it. The typing problem is the need to avoid such events: Object-oriented typing problem When do we know whether the execution of an object-oriented system may produce a type violation? The key word is when.If the feature or arguments do not match,you will find out sooner or later:applying the feature "raise salary"to an instance of SUBMARINE or "fire the torpedoes"to an instance of EMPLOYEE will not work;somehow the execution will fail.But you may prefer to find out sooner rather than later. Static and dynamic typing Although intermediate variants are possible,two main approaches present themselves: Dynamic typing:wait until the last possible moment,the execution of each call. Static typing:rely on a set of rules that determine,from the text of a system,whether its executions may cause type violations.Only execute systems for which the rules guarantee that no violation will ever occur. The names are easy to explain:with dynamic typing,type verification occurs at execution time(dynamically),with static typing,it is performed on the text of the software (statically,that is to say before any execution). The terms "typed"and "untyped"are sometimes used for "statically typed"and "dynamically typed".To avoid any confusion we will stick to the full names. Static typing is only interesting if the rules can be checked automatically.Since software texts are usually processed by a compiler before being executed,it is convenient to have the compiler,rather than a separate tool,take care of these checks.The rest of the discussion will indeed assume for simplicity that the compiler and the type checker are the same tool.This assumption yields a simple definition: Definition:statically typed language An object-oriented language is statically typed if it is equipped with a set of consistency rules,enforceable by compilers,whose observance by a system text guarantees that no execution of the system can cause a type violation
612 TYPING §17.1 From the Basic Construct follows the basic kind of abnormal event that might occur at execution time: The typing problem is the need to avoid such events: The key word is when. If the feature or arguments do not match, you will find out sooner or later: applying the feature “raise salary” to an instance of SUBMARINE or “fire the torpedoes” to an instance of EMPLOYEE will not work; somehow the execution will fail. But you may prefer to find out sooner rather than later. Static and dynamic typing Although intermediate variants are possible, two main approaches present themselves: • Dynamic typing: wait until the last possible moment, the execution of each call. • Static typing: rely on a set of rules that determine, from the text of a system, whether its executions may cause type violations. Only execute systems for which the rules guarantee that no violation will ever occur. The names are easy to explain: with dynamic typing, type verification occurs at execution time (dynamically); with static typing, it is performed on the text of the software (statically, that is to say before any execution). The terms “typed” and “untyped” are sometimes used for “statically typed” and “dynamically typed”. To avoid any confusion we will stick to the full names. Static typing is only interesting if the rules can be checked automatically. Since software texts are usually processed by a compiler before being executed, it is convenient to have the compiler, rather than a separate tool, take care of these checks. The rest of the discussion will indeed assume for simplicity that the compiler and the type checker are the same tool. This assumption yields a simple definition: Definition: type violation A run-time type violation (or just type violation for short) occurs in the execution of a call x ● f (arg), where x is attached to an object OBJ, if either: V1 • There is no feature corresponding to f and applicable to OBJ. V2 • There is such a feature, but arg is not an acceptable argument for it. Object-oriented typing problem When do we know whether the execution of an object-oriented system may produce a type violation? Definition: statically typed language An object-oriented language is statically typed if it is equipped with a set of consistency rules, enforceable by compilers, whose observance by a system text guarantees that no execution of the system can cause a type violation
$17.1 THE TYPING PROBLEM 613 In the literature you will encounter the term "strong typing".It corresponds to the all-or-nothing nature of this definition,which demands rules that guarantee the absence of type violations.Weak forms of static typing,whose rules eliminate certain type violations but not all,are also possible,and some O-O languages are indeed weakly-statically-typed in this sense.We shall strive,however,for the strongest possible form. Some authors also talk about strong forms of dynamic typing.But this is a contradiction In a dynamically typed language (also known as an"untyped"language),there are no type declarations;entities simply become associated with whatever values the execution of the software attaches to them.No static type checking is possible. Typing rules Our object-oriented notation is statically typed.Its type rules have been introduced in earlier chapters;they boil down to three simple constraints: Every entity or function must be declared as being of a certain type,as in acc:ACCOUNT;every routine declares zero or more formal arguments,with a type for each,as in put (x:G;i:INTEGER). Type Conformance In any assignmentx:=y,and in any routine call using y as the actual argument for rule,page 474. the formal argument x,the type of the source y must conform to the type of the target x.The definition of conformance is based on inheritance-B conforms to 4 if it is a descendant of4-complemented by rules for generic parameters. Feature Call rule, In a call of the form x.f(arg),fmust be a feature of the base class ofx's type,and page 473. must be available to the class in which the call appears. Realism Although the definition of"statically typed language"is precise,it also highlights the need for informal criteria in devising type rules.Consider the following two extreme cases: An all-valid language in which every syntactically correct system is also typewise- valid,with no need for type rules.Such languages are possible(imagine for example a small notation for Polish-style additions and subtractions with integers); unfortunately,as readers familiar with the theory of computation will know,no useful general-purpose language can meet that criterion. An all-invalid language,easy to devise:just take any existing language and add a type rule that makes any system invalid!This makes the language typed according to the definition:since no system passes the rules,no system that passes the rules can cause a type violation. We may say that an all-valid language is usable,but not useful for general-purpose development;an all-invalid language may be useful,but it is not usable. What we need in practice is a type system that makes the language both useful and usable:powerful enough to express the computations we need;convenient enough not to force us into undue complications to satisfy the type rules
§17.1 THE TYPING PROBLEM 613 In the literature you will encounter the term “strong typing”. It corresponds to the all-or-nothing nature of this definition, which demands rules that guarantee the absence of type violations. Weak forms of static typing, whose rules eliminate certain type violations but not all, are also possible, and some O-O languages are indeed weakly-statically-typed in this sense. We shall strive, however, for the strongest possible form. Some authors also talk about strong forms of dynamic typing. But this is a contradiction. In a dynamically typed language (also known as an “untyped” language), there are no type declarations; entities simply become associated with whatever values the execution of the software attaches to them. No static type checking is possible. Typing rules Our object-oriented notation is statically typed. Its type rules have been introduced in earlier chapters; they boil down to three simple constraints: • Every entity or function must be declared as being of a certain type, as in acc: ACCOUNT; every routine declares zero or more formal arguments, with a type for each, as in put (x: G; i: INTEGER). • In any assignment x := y, and in any routine call using y as the actual argument for the formal argument x, the type of the source y must conform to the type of the target x. The definition of conformance is based on inheritance — B conforms to A if it is a descendant of A — complemented by rules for generic parameters. • In a call of the form x ● f (arg), f must be a feature of the base class of x’s type, and must be available to the class in which the call appears. Realism Although the definition of “statically typed language” is precise, it also highlights the need for informal criteria in devising type rules. Consider the following two extreme cases: • An all-valid language in which every syntactically correct system is also typewisevalid, with no need for type rules. Such languages are possible (imagine for example a small notation for Polish-style additions and subtractions with integers); unfortunately, as readers familiar with the theory of computation will know, no useful general-purpose language can meet that criterion. • An all-invalid language, easy to devise: just take any existing language and add a type rule that makes any system invalid! This makes the language typed according to the definition: since no system passes the rules, no system that passes the rules can cause a type violation. We may say that an all-valid language is usable, but not useful for general-purpose development; an all-invalid language may be useful, but it is not usable. What we need in practice is a type system that makes the language both useful and usable: powerful enough to express the computations we need; convenient enough not to force us into undue complications to satisfy the type rules. Type Conformance rule, page 474. Feature Call rule, page 473
614 TYPING $17.1 We will say that a language is realistic if it is both useful and usable.Unlike the definition of static typing,which always yields an indisputable answer to the question"Is language X statically typed?",the definition of realism is partly subjective;reasonable people may disagree on whether a language,equipped with certain type rules,is still useful and usable In this chapter we will check that the ty ped notation defined in the preceding chapters is realistic. Pessimism In discussing approaches to O-O typing we should keep in mind another general property of static typing:it is always,by nature,a pessimistic policy.Trying to guarantee that no computation shall ever fail,you disallow some computations that might succeed. To see this,consider a trivial non-O-O language,Pascal-like,with distinct types INTEGER and REAL.With the declaration n:INTEGER,the assignment n:=r will be rejected as violating the type rules.So all the following will be considered type-invalid and rejected by the compiler: n=0.0 [A] n=1.0 [B] n=-3.67 [C] n=3.67-3.67 [D] Of these invalid operations,[A],if permitted to execute,would always work since any number system will provide an exact representation for the floating-point number 0.0, which can be transformed unambiguously to the integer 0.[B]would almost certainly work too.[C]is ambiguous(do we want the rounded version,the truncated version of the number?)But [D]would work.So would if n2<0 then n :3.67 end E] because the assignment will never be executed (n^2 denotes the square of n).If we replace n2 by just n,where n is read from user input just before the test,some executions would work (those for which n is non-negative),others would not.Assigning to n a very large real number,not representable as an integer,would not work. In a typed language,all these examples-those which would always work,those which would never work,and those which would work some of the time-are equally and mercilessly considered violations of the type rules,and any compiler will reject them. The question then is not whether to be pessimistic but how pessimistic we can afford to be.We are back to the realism requirement:if the type rules are so pessimistic as to bar us from expressing in a simple way the computations that we need,we will reject them.But if they achieve type safety with little loss of expressive power,we will accept them and enjoy the benefits.For example making n=rinvalid turns out to be good news if the environment provides functions such as round and truncate,enabling you to convert a real into an integer in exactly the way you want,without the ambiguity of an implicit conversion
614 TYPING §17.1 We will say that a language is realistic if it is both useful and usable. Unlike the definition of static typing, which always yields an indisputable answer to the question “Is language X statically typed?”, the definition of realism is partly subjective; reasonable people may disagree on whether a language, equipped with certain type rules, is still useful and usable. In this chapter we will check that the typed notation defined in the preceding chapters is realistic. Pessimism In discussing approaches to O-O typing we should keep in mind another general property of static typing: it is always, by nature, a pessimistic policy. Trying to guarantee that no computation shall ever fail, you disallow some computations that might succeed. To see this, consider a trivial non-O-O language, Pascal-like, with distinct types INTEGER and REAL. With the declaration n: INTEGER, the assignment n := r will be rejected as violating the type rules. So all the following will be considered type-invalid and rejected by the compiler: n := 0.0 [A] n := 1.0 [B] n := —3.67 [C] n := 3.67 — 3.67 [D] Of these invalid operations, [A], if permitted to execute, would always work since any number system will provide an exact representation for the floating-point number 0.0, which can be transformed unambiguously to the integer 0. [B] would almost certainly work too. [C] is ambiguous (do we want the rounded version, the truncated version of the number?) But [D] would work. So would if n ^ 2 < 0 then n := 3.67 end [E] because the assignment will never be executed (n ^ 2 denotes the square of n). If we replace n ^ 2 by just n, where n is read from user input just before the test, some executions would work (those for which n is non-negative), others would not. Assigning to n a very large real number, not representable as an integer, would not work. In a typed language, all these examples — those which would always work, those which would never work, and those which would work some of the time — are equally and mercilessly considered violations of the type rules, and any compiler will reject them. The question then is not whether to be pessimistic but how pessimistic we can afford to be. We are back to the realism requirement: if the type rules are so pessimistic as to bar us from expressing in a simple way the computations that we need, we will reject them. But if they achieve type safety with little loss of expressive power, we will accept them and enjoy the benefits. For example making n := r invalid turns out to be good news if the environment provides functions such as round and truncate, enabling you to convert a real into an integer in exactly the way you want, without the ambiguity of an implicit conversion
$172 STATIC TYPING:WHY AND HOW 615 17.2 STATIC TYPING:WHY AND HOW Although the advantages of static typing seem obvious,it is necessary to review the terms of the debate. The benefits The reasons for using a statically typed form of object technology were listed at the very beginning of this chapter:reliability,readability and efficiency. The reliability value comes from the use of static typing to detect errors that would otherwise manifest themselves only at run time,and only in certain runs.The rule that forces you to declare entities and functions-the first of our three type rules above- introduces redundancy into the software text;this enables the compiler,through the other two rules,to detect inconsistencies between the purpose and actual use of an entity,feature or expression. Catching errors early is essential,as correction cost grows quickly with the detection delay.This property,intuitively clear to all software professionals,is confirmed quantitatively,for specification errors,by Boehm's well-known studies,plotting the cost of correcting an error against the time at which it is found(base 1 if found at requirements time),for both a set of large industrial projects and a controlled small project experiment: Relative cost of Correction cost correcting 1000 errors After [Boehm 1981]. Reproduced with permission 500 LARGE PROJECTS 20 SMALL PROJECT 1 十 HTime Require- Design Code Develop- Accep- Opera- error ments ment test tance test tion found The readability benefit is also appreciable.As the examples appearing throughout this book should show convincingly,declaring every entity and function with a certain type is a powerful way of conveying to the software reader some information about its intended uses.This is particularly precious for maintainers of the software If readability were not part of the goal we might be able to obtain some of the other benefits of typing without explicit declarations.It is possible indeed,under certain conditions,to use an implicit form of typing in which the compiler,instead of requiring software authors to declare entity types,attempts to determine the type of each entity automatically from its uses.This is known as type inference.But from a software engineering perspective explicit declarations are a help,not a penalty;types should be clear not just to the compiler but to the human reader
§17.2 STATIC TYPING: WHY AND HOW 615 17.2 STATIC TYPING: WHY AND HOW Although the advantages of static typing seem obvious, it is necessary to review the terms of the debate. The benefits The reasons for using a statically typed form of object technology were listed at the very beginning of this chapter: reliability, readability and efficiency. The reliability value comes from the use of static typing to detect errors that would otherwise manifest themselves only at run time, and only in certain runs. The rule that forces you to declare entities and functions — the first of our three type rules above — introduces redundancy into the software text; this enables the compiler, through the other two rules, to detect inconsistencies between the purpose and actual use of an entity, feature or expression. Catching errors early is essential, as correction cost grows quickly with the detection delay. This property, intuitively clear to all software professionals, is confirmed quantitatively, for specification errors, by Boehm’s well-known studies, plotting the cost of correcting an error against the time at which it is found (base 1 if found at requirements time), for both a set of large industrial projects and a controlled small project experiment: The readability benefit is also appreciable. As the examples appearing throughout this book should show convincingly, declaring every entity and function with a certain type is a powerful way of conveying to the software reader some information about its intended uses. This is particularly precious for maintainers of the software. If readability were not part of the goal we might be able to obtain some of the other benefits of typing without explicit declarations. It is possible indeed, under certain conditions, to use an implicit form of typing in which the compiler, instead of requiring software authors to declare entity types, attempts to determine the type of each entity automatically from its uses. This is known as type inference. But from a software engineering perspective explicit declarations are a help, not a penalty; types should be clear not just to the compiler but to the human reader. Relative cost of correcting errors After [Boehm 1981]. Reproduced with permission. 1000 500 20 Requirements Design Code Acceptance test Operation Development test LARGE PROJECTS SMALL PROJECT 1 Time error found Correction cost