1.2 Abstract Binding Trees 1.x =a x. 2.o(.ai;n.an)=ao(.ai…;a.an)if for every 1≤i≤n,p(a)=ap(a明for all fresh renamings pi:Zi and p:Z. The idea is that we rename i and consistently,avoiding confusion,and check that ai and af are a-equivalent.If a =ab,then a and b are a-variants of each other. Some care is required in the definition of substitution of an abt b of sort s for free occurrences of a variable x of sort s in some abt a of some sort,written b/xa.Substitution is partially defined by the following conditions: 1.[b/x]x =b,and [b/xly =yif x #y. 2.[b/xlo(.a1;...;n.an)=o(.af ;...;n.an),where,for each 1 si<n,we require thatib,and we set a=[b/x]ai ifxi,and a=ai otherwise. The definition of b/xa is quite delicate,and merits careful consideration. One trouble spot for substitution is to notice that if x is bound by an abstractor within a,then x does not occur free within the abstractor,and hence is unchanged by substitution.For example, b/xlet(a1;x.a2)=let(b/xa;x.a2),there being no free occurrences of x in x.a2.Another trouble spot is the capture of a free variable of b during substitution.For example,if y E b,and xy,then [b/x]let(a1;y.a2)is undefined,rather than being let([b/x]ar;y.[b/x]a2),as one might at first suspect.For example,provided that xy,y/xlet(num[0];y.plus(x;y))is undefined,not let(num[0];y.plus(y;y)),which confuses two different variables named y. Although capture avoidance is an essential characteristic of substitution,it is,in a sense,merely a technical nuisance.If the names of bound variables have no significance,then capture can always be avoided by first renaming the bound variables in a to avoid any free variables in b.In the forego- ing example if we rename the bound variable y to y'to obtain a'let(num[0];y'.plus(x;y)), then [b/xla'is defined,and is equal to let(num[0];y.plus(b;y)).The price for avoiding cap- ture in this way is that substitution is only determined up to a-equivalence,and so we may no longer think of substitution as a function,but only as a proper relation. To restore the functional character of substitution,it is sufficient to adopt the identification con- vention,which is stated as follows: Abstract binding trees are always identified up to a-equivalence. That is,a-equivalent abts are regarded as identical.Substitution can be extended to a-equivalence classes of abts to avoid capture by choosing representatives of the equivalence classes of b and a in such a way that substitution is defined,then forming the equivalence class of the result.Any two choices of representatives for which substitution is defined gives a-equivalent results,so that substitution becomes a well-defined total function.We will adopt the identification convention for abts throughout this book. It will often be necessary to consider languages whose abstract syntax cannot be specified by a fixed set of operators,but rather requires that the available operators be sensitive to the context in which they occur.For our purposes it will suffice to consider a set of symbolic parameters,or symbols,that index families of operators so that as the set of symbols varies,so does the set of
PREVIEW 1.2 Abstract Binding Trees 9 1. x =α x. 2. o(⃗x1 . a1 ; . . . ; ⃗xn . an ) =α o(⃗x ′ 1 . a ′ 1 ; . . . ; ⃗x ′ n . a ′ n ) if for every 1 ≤ i ≤ n, ρbi(ai) =α bρ ′ i (a ′ i ) for all fresh renamings ρi : ⃗xi ↔⃗zi and ρ ′ i : ⃗x ′ i ↔⃗zi . The idea is that we rename ⃗xi and ⃗x ′ i consistently, avoiding confusion, and check that ai and a ′ i are α-equivalent. If a =α b, then a and b are α-variants of each other. Some care is required in the definition of substitution of an abt b of sort s for free occurrences of a variable x of sort s in some abt a of some sort, written [b/x]a. Substitution is partially defined by the following conditions: 1. [b/x]x = b, and [b/x]y = y if x ̸= y. 2. [b/x]o(⃗x1 . a1 ; . . . ; ⃗xn . an ) = o(⃗x1 . a ′ 1 ; . . . ; ⃗xn . a ′ n ), where, for each 1 ≤ i ≤ n, we require that ⃗xi ∈/ b, and we set a ′ i = [b/x]ai if x ∈/ ⃗xi , and a ′ i = ai otherwise. The definition of [b/x]a is quite delicate, and merits careful consideration. One trouble spot for substitution is to notice that if x is bound by an abstractor within a, then x does not occur free within the abstractor, and hence is unchanged by substitution. For example, [b/x]let( a1 ; x . a2 ) = let( [b/x]a1 ; x . a2 ), there being no free occurrences of x in x . a2. Another trouble spot is the capture of a free variable of b during substitution. For example, if y ∈ b, and x ̸= y, then [b/x]let( a1 ; y . a2 ) is undefined, rather than being let( [b/x]a1 ; y . [b/x]a2 ), as one might at first suspect. For example, provided that x ̸= y, [y/x]let( num[ 0 ] ; y . plus( x ; y ) ) is undefined, not let( num[ 0 ] ; y . plus( y ; y ) ), which confuses two different variables named y. Although capture avoidance is an essential characteristic of substitution, it is, in a sense, merely a technical nuisance. If the names of bound variables have no significance, then capture can always be avoided by first renaming the bound variables in a to avoid any free variables in b. In the foregoing example if we rename the bound variable y to y ′ to obtain a ′ ≜ let( num[ 0 ] ; y ′ . plus( x ; y ′ ) ), then [b/x]a ′ is defined, and is equal to let( num[ 0 ] ; y ′ . plus( b ; y ′ ) ). The price for avoiding capture in this way is that substitution is only determined up to α-equivalence, and so we may no longer think of substitution as a function, but only as a proper relation. To restore the functional character of substitution, it is sufficient to adopt the identification convention, which is stated as follows: Abstract binding trees are always identified up to α-equivalence. That is, α-equivalent abts are regarded as identical. Substitution can be extended to α-equivalence classes of abts to avoid capture by choosing representatives of the equivalence classes of b and a in such a way that substitution is defined, then forming the equivalence class of the result. Any two choices of representatives for which substitution is defined gives α-equivalent results, so that substitution becomes a well-defined total function. We will adopt the identification convention for abts throughout this book. It will often be necessary to consider languages whose abstract syntax cannot be specified by a fixed set of operators, but rather requires that the available operators be sensitive to the context in which they occur. For our purposes it will suffice to consider a set of symbolic parameters, or symbols, that index families of operators so that as the set of symbols varies, so does the set of
10 1.3 Notes operators.An indexed operator o is a family of operators indexed by symbols u,so that ofu]is an operator when u is an available symbol.If u is a finite set of symbols,then B;]is the sort-indexed family of abts that are generated by operators and variables as before,admitting all indexed operator instances by symbols uEu.Whereas a variable is a place-holder that stands for an unknown abt of its sort,a symbol does not stand for anything,and is not,itself,an abt.The only significance of symbol is whether it is the same as or differs from another symbol;the operator instances ofu]and of u']are the same exactly when u is u',and are the same symbol. The set of symbols is extended by introducing a new,or fresh,symbol within a scope using the abstractor u.a,which binds the symbol u within the abt a.An abstracted symbol is "new" in the same sense as for an abstracted variable:the name of the bound symbol can be varied at will provided that no conflicts arise.This renaming property ensures that an abstracted symbol is distinct from all others in scope.The only difference between symbols and variables is that the only operation on symbols is renaming;there is no notion of substitution for a symbol. Finally,a word about notation:to help improve the readability we often "group"and "stage" the arguments to an operator,using round brackets and braces to show grouping,and generally regarding stages to progress from right to left.All arguments in a group are considered to occur at the same stage,though their order is significant,and successive groups are considered to occur in sequential stages.Staging and grouping is often a helpful mnemonic device,but has no fundamen- tal significance.For example,the abt ofa;a (a3;x.a4 is the same as the abt o(a;a2;a3;x.a), as would be any other order-preserving grouping or staging of its arguments. 1.3 Notes The concept of abstract syntax has its origins in the pioneering work of Church,Turing,and Godel, who first considered writing programs that act on representations of programs.Originally pro- grams were represented by natural numbers,using encodings,now called Godel-numberings,based on the prime factorization theorem.Any standard text on mathematical logic,such as Kleene (1952),has a thorough account of such representations.The Lisp language(McCarthy,1965;Allen, 1978)introduced a much more practical and direct representation of syntax as symbolic expressions. These ideas were developed further in the language ML(Gordon et al.,1979),which featured a type system capable of expressing abstract syntax trees.The AUTOMATH project(Nederpelt et al.,1994)introduced the idea of using Church's A notation(Church,1941)to account for the binding and scope of variables.These ideas were developed further in LF(Harper et al.,1993). The concept of abstract binding trees presented here was inspired by the system of notation developed in the NuPRL Project,which is described in Constable(1986)and from Martin-Lof's system of arities,which is described in Nordstrom et al.(1990).Their enrichment with symbol binders is influenced by Pitts and Stark(1993). Exercises 1.1.Prove by structural induction on abstract syntax trees that if C y,then A[]CA[y]
PREVIEW 10 1.3 Notes operators. An indexed operator o is a family of operators indexed by symbols u, so that o[ u ] is an operator when u is an available symbol. If U is a finite set of symbols, then B[U ; X ] is the sort-indexed family of abts that are generated by operators and variables as before, admitting all indexed operator instances by symbols u ∈ U. Whereas a variable is a place-holder that stands for an unknown abt of its sort, a symbol does not stand for anything, and is not, itself, an abt. The only significance of symbol is whether it is the same as or differs from another symbol; the operator instances o[ u ] and o[ u ′ ] are the same exactly when u is u ′ , and are the same symbol. The set of symbols is extended by introducing a new, or fresh, symbol within a scope using the abstractor u . a, which binds the symbol u within the abt a. An abstracted symbol is “new” in the same sense as for an abstracted variable: the name of the bound symbol can be varied at will provided that no conflicts arise. This renaming property ensures that an abstracted symbol is distinct from all others in scope. The only difference between symbols and variables is that the only operation on symbols is renaming; there is no notion of substitution for a symbol. Finally, a word about notation: to help improve the readability we often “group” and “stage” the arguments to an operator, using round brackets and braces to show grouping, and generally regarding stages to progress from right to left. All arguments in a group are considered to occur at the same stage, though their order is significant, and successive groups are considered to occur in sequential stages. Staging and grouping is often a helpful mnemonic device, but has no fundamental significance. For example, the abt o{a1 ; a2}( a3 ; x . a4 ) is the same as the abt o( a1 ; a2 ; a3 ; x . a4 ), as would be any other order-preserving grouping or staging of its arguments. 1.3 Notes The concept of abstract syntax has its origins in the pioneering work of Church, Turing, and Godel, ¨ who first considered writing programs that act on representations of programs. Originally programs were represented by natural numbers, using encodings, now called G¨odel-numberings, based on the prime factorization theorem. Any standard text on mathematical logic, such as Kleene (1952), has a thorough account of such representations. The Lisp language (McCarthy, 1965; Allen, 1978) introduced a much more practical and direct representation of syntax as symbolic expressions. These ideas were developed further in the language ML (Gordon et al., 1979), which featured a type system capable of expressing abstract syntax trees. The AUTOMATH project (Nederpelt et al., 1994) introduced the idea of using Church’s λ notation (Church, 1941) to account for the binding and scope of variables. These ideas were developed further in LF (Harper et al., 1993). The concept of abstract binding trees presented here was inspired by the system of notation developed in the NuPRL Project, which is described in Constable (1986) and from Martin-Lof’s ¨ system of arities, which is described in Nordstrom et al. (1990). Their enrichment with symbol binders is influenced by Pitts and Stark (1993). Exercises 1.1. Prove by structural induction on abstract syntax trees that if X ⊆ Y, then A[X ] ⊆ A[Y]
1.3 Notes 11 1.2.Prove by structural induction modulo renaming on abstract binding trees that if Cy, then B[]C B[y]. 1.3.Show that if a =a a'and b=ab'and both [b/x]a and [b'/x]a'are defined,then [b/x]a =a b'/x a'. 1.4.Bound variables can be seen as the formal analogs of pronouns in natural languages.The binding occurrence of a variable at an abstractor fixes a"fresh"pronoun for use within its body that refers unambiguously to that variable(in contrast to English,in which the referent of a pronoun can often be ambiguous).This observation suggests an alternative representa- tion of abts,called abstract binding graphs,or abg's for short,as directed graphs constructed as follows: (a)Free variables are atomic nodes with no outgoing edges. (b)Operators with n arguments are n-ary nodes,with one outgoing edge directed at each of their children. (c)Abstractors are nodes with one edge directed to the scope of the abstracted variable. (d)Bound variables are back edges directed at the abstractor that introduced it. Notice that asts,thought of as abts with no abstractors,are acyclic directed graphs (more precisely,variadic trees),whereas general abts can be cyclic.Draw a few examples of abg's corresponding to the example abts given in this chapter.Give a precise definition of the sort-indexed family g[]of abstract binding graphs.What representation would you use PRE for bound variables(back edges)?
PREVIEW 1.3 Notes 11 1.2. Prove by structural induction modulo renaming on abstract binding trees that if X ⊆ Y, then B[X ] ⊆ B[Y]. 1.3. Show that if a =α a ′ and b =α b ′ and both [b/x]a and [b ′/x]a ′ are defined, then [b/x]a =α [b ′/x]a ′ . 1.4. Bound variables can be seen as the formal analogs of pronouns in natural languages. The binding occurrence of a variable at an abstractor fixes a “fresh” pronoun for use within its body that refers unambiguously to that variable (in contrast to English, in which the referent of a pronoun can often be ambiguous). This observation suggests an alternative representation of abts, called abstract binding graphs, or abg’s for short, as directed graphs constructed as follows: (a) Free variables are atomic nodes with no outgoing edges. (b) Operators with n arguments are n-ary nodes, with one outgoing edge directed at each of their children. (c) Abstractors are nodes with one edge directed to the scope of the abstracted variable. (d) Bound variables are back edges directed at the abstractor that introduced it. Notice that asts, thought of as abts with no abstractors, are acyclic directed graphs (more precisely, variadic trees), whereas general abts can be cyclic. Draw a few examples of abg’s corresponding to the example abts given in this chapter. Give a precise definition of the sort-indexed family G[X ] of abstract binding graphs. What representation would you use for bound variables (back edges)?
12 1.3 Notes PREVIEW
PREVIEW 12 1.3 Notes
Chapter 2 Inductive Definitions Inductive definitions are an indispensable tool in the study of programming languages.In this chapter we will develop the basic framework of inductive definitions,and give some examples of their use.An inductive definition consists of a set of rules for deriving judgments,or assertions,of a variety of forms.Judgments are statements about one or more abstract binding trees of some sort. The rules specify necessary and sufficient conditions for the validity of a judgment,and hence fully determine its meaning. 2.1 Judgments We start with the notion of a judgment,or assertion,about an abstract binding tree.We shall make use of many forms of judgment,including examples such as these: n nat n is a natural number 11+12=n n is the sum of n and n2 t type t is a type e:T expression e has typet e↓u expression e has value v A judgment states that one or more abstract binding trees have a property or stand in some relation to one another.The property or relation itself is called a judgment form,and the judgment that an object or objects have that property or stand in that relation is said to be an instance of that judgment form.A judgment form is also called a predicate,and the objects constituting an instance are its subjects.We write a J or J a,for the judgment asserting that J holds of the abt a.Correspondingly,we sometimes notate the judgment form J by-J,or J-,using a dash to indicate the absence of an argument to J.When it is not important to stress the subject of the judgment,we write I to stand for an unspecified judgment,that is,an instance of some judgment form.For particular judgment forms,we freely use prefix,infix,or mix-fix notation,as illustrated by the above examples,in order to enhance readability
PREVIEW Chapter 2 Inductive Definitions Inductive definitions are an indispensable tool in the study of programming languages. In this chapter we will develop the basic framework of inductive definitions, and give some examples of their use. An inductive definition consists of a set of rules for deriving judgments, or assertions, of a variety of forms. Judgments are statements about one or more abstract binding trees of some sort. The rules specify necessary and sufficient conditions for the validity of a judgment, and hence fully determine its meaning. 2.1 Judgments We start with the notion of a judgment, or assertion, about an abstract binding tree. We shall make use of many forms of judgment, including examples such as these: n nat n is a natural number n1 + n2 = n n is the sum of n1 and n2 τ type τ is a type e : τ expression e has type τ e ⇓ v expression e has value v A judgment states that one or more abstract binding trees have a property or stand in some relation to one another. The property or relation itself is called a judgment form, and the judgment that an object or objects have that property or stand in that relation is said to be an instance of that judgment form. A judgment form is also called a predicate, and the objects constituting an instance are its subjects. We write a J or J a, for the judgment asserting that J holds of the abt a. Correspondingly, we sometimes notate the judgment form J by − J, or J −, using a dash to indicate the absence of an argument to J. When it is not important to stress the subject of the judgment, we write J to stand for an unspecified judgment, that is, an instance of some judgment form. For particular judgment forms, we freely use prefix, infix, or mix-fix notation, as illustrated by the above examples, in order to enhance readability