6 A pointer logic and certifying compiler number of arrows in the figure,and the arrows which point to the same object correspond to the pointers in the same equivalent set.Although I,N and D are sets,they are essentially logical expressions connected by conjunction "A", and can appear in the precondition or postcondition of a Hoare triple."is a short notation for II ANA D.For example,in Fig.1,the equivalent set u,v is actually a logical expression,and it means that u and v are equal but unequal to any of other pointers;all the pointer sets can be expressed as {u,v}A{s,u->next}Aft,s->next}NA{w)D,where the sets without subscript represent equivalent pointer sets,and the sets with subscript N and D repre- sent null pointer set and dangling pointer set respectively.The null pointer set [t,s->next}w can also be written as {t}Afs->next)N,and the same for the dangling pointer set D.If,at a program point,IT is an empty set,the assertion at this point contains no equivalent set;if {...IN or {..does not appear in the assertion,it means that null pointer set or dangling pointer set is empty. 2.2 Elementary operations Access paths are a special kind of strings that satisfies certain syntactical re- quirements,and in this paper,all strings represent the strings or substrings that form access paths.If an access path p is a prefix of q(but they are not the same), the value of the predicate prefix(p,q)is true,otherwise is false.The symbol ""represents the concatenation operation of two strings or a string set and a single string.If the operands are a string set S and a single string s,it makes every element in the set S concatenated by the string s: Ss≌S'where(sl‖s)∈Sifs∈S. If the values of two access paths ss2 and s1 are equal(note that neither of si and s2 are empty strings),then the access path s2 is called a cycle in sills2lls3 (s3 is also a non-empty string).The symbol "=is used to represent that two strings are syntactically identical. Next,we define some functions operated on access paths.The functions have or II as their parameters.But such parameters are all omitted for simplicity. 1)Function closure(p)calculates alias set for the access path p,and the aliases in this set must have the simplest form.This set is called the closure of p,and it contains and only contains all acyclic aliases of p,including p itself. closure(p)≌ if length(p)=1 then {p} else let (sill...llsn-1llsn)=p in compression(expansion(closure(sill...llsn-1))Isn), where function length(p)calculates the length of the access path p.The length represents the number of meaningful units in p,rather than the number of char- acters in p.For example,the length of the access path t->next->data is 3. Function expansion(S)expands the alias set S with the access paths whose values are equal to the elements in S.and its formal definition is:
6 A pointer logic and certifying compiler number of arrows in the figure, and the arrows which point to the same object correspond to the pointers in the same equivalent set. Although Π, N and D are sets, they are essentially logical expressions connected by conjunction “∧”, and can appear in the precondition or postcondition of a Hoare triple. “Ψ” is a short notation for Π ∧ N ∧ D. For example, in Fig. 1, the equivalent set {u, v} is actually a logical expression, and it means that u and v are equal but unequal to any of other pointers; all the pointer sets can be expressed as {u, v}∧{s,u->next}∧{t, s->next}N ∧{w}D, where the sets without subscript represent equivalent pointer sets, and the sets with subscript N and D represent null pointer set and dangling pointer set respectively. The null pointer set {t, s->next}N can also be written as {t}N ∧{s->next}N , and the same for the dangling pointer set D. If, at a program point, Π is an empty set, the assertion at this point contains no equivalent set; if {. . . }N or {. . . }D does not appear in the assertion, it means that null pointer set or dangling pointer set is empty. 2.2 Elementary operations Access paths are a special kind of strings that satisfies certain syntactical requirements, and in this paper, all strings represent the strings or substrings that form access paths. If an access path p is a prefix of q (but they are not the same), the value of the predicate prefix(p, q) is true, otherwise is false. The symbol “||” represents the concatenation operation of two strings or a string set and a single string. If the operands are a string set S and a single string s, it makes every element in the set S concatenated by the string s: S||s , S 0 where (s 0 ||s) ∈ S0 iff s 0 ∈ S. If the values of two access paths s1||s2 and s1 are equal (note that neither of s1 and s2 are empty strings), then the access path s2 is called a cycle in s1||s2||s3 (s3 is also a non-empty string). The symbol “≡” is used to represent that two strings are syntactically identical. Next, we define some functions operated on access paths. The functions have Ψ or Π as their parameters. But such parameters are all omitted for simplicity. 1) Function closure(p) calculates alias set for the access path p, and the aliases in this set must have the simplest form. This set is called the closure of p, and it contains and only contains all acyclic aliases of p, including p itself. closure(p) , if length(p) = 1 then {p} else let (s1|| . . . ||sn−1||sn) ≡ p in compression(expansion(closure(s1|| . . . ||sn−1))||sn), where function length(p) calculates the length of the access path p. The length represents the number of meaningful units in p, rather than the number of characters in p. For example, the length of the access path t->next->data is 3. Function expansion(S) expands the alias set S with the access paths whose values are equal to the elements in S, and its formal definition is:
Front.Comput.Sci.China(2007),Research Article 7 expansion(S) if 3s':(IIUNUD)).(Sns'!=0) then let [p1,...Pn}=S-S where S'∈(ⅡU{W}U{D})ASnS'I=0 in SUclosure(p)U...Uclosure(pn) else 0. Function compression(S)deletes all the cyclic access paths from the alias set S,and its formal definition is: compression(S)≌S-S' where (S'CS)A ((s1lls2lls3)ES' f(s1!=e)∧(s2!=e)∧(s多!=e)Λ(sls3)∈S)A(sls2=s1) For brevity,the above function closure is a definition,not a computing al- gorithm.For example,it does not consider the termination of the calculation concerning cyclic data structures,such as doubly-linked cyclic list.But,in the implementation of closure,it is easy to solve the problem of termination.And given the closure function,it is also simple to delete cycles from the access paths. So for convenience,it is assumed that all access paths in programs have the sim- plest form. 2)Function alias(p,q)gets an alias of p from the closure of access path p.The alias it gets must not use any alias of effective pointer q as a prefix.If there is no such an alias,it returns p. alias(p,q) let S=[p'closure(p)I Vq'closure(q).-prefix(q',p)} in if S==0 then p else p'where p'S 3)Function equals(p)gets the equivalent set of p.If one of the aliases of p appears in an equivalent set,the function returns the set,or else it returns empty set. equals(p)≌ if3S:Ⅱ.(Snclosure(p)!=0) then S where S∈ⅡA Sn closure(p)!=0else0 Next,we introduce the operations and predicates directly used in inference rules. These operations show how to get the in a postcondition based on the in the corresponding precondition. 4)Suppose S is an equivalent set in I7,and p is an effective pointer.For each pointer q in S which uses one of p's aliases as its prefix,S/p finds an alias of q using alias(q,p),substitutes the alias for q in S,and then deletes p's aliases and any other pointers which use p's aliases as their prefixes from S. S/p≌ let S'={q:S Vp'closure(p).-prefix(p',q) [3q:S.3p closure(p).(prefix(p',q)=alias(q,p))} in {q:S'(qE closure(p))AVp':closure(p).-prefix(p',q)}
Front. Comput. Sci. China (2007), Research Article 7 expansion(S) , if ∃S0 : (Π ∪ {N } ∪ {D}).(S ∩ S0!=∅) then let {p1 , . . . , pn} = S 0 − S where S 0 ∈ (Π ∪ {N } ∪ {D}) ∧ S ∩ S0!=∅ in S ∪ closure(p1 ) ∪ . . . ∪ closure(pn) else ∅. Function compression(S) deletes all the cyclic access paths from the alias set S, and its formal definition is: compression(S) , S − S0 where (S 0 ⊂ S) ∧ ((s1||s2||s3) ∈ S0 iff (s1!=²) ∧ (s2!=²) ∧ (s3!=²) ∧ ((s1||s3) ∈ S) ∧ (s1||s2 = s1)) For brevity, the above function closure is a definition, not a computing algorithm. For example, it does not consider the termination of the calculation concerning cyclic data structures, such as doubly-linked cyclic list. But, in the implementation of closure, it is easy to solve the problem of termination. And given the closure function, it is also simple to delete cycles from the access paths. So for convenience, it is assumed that all access paths in programs have the simplest form. 2) Function alias(p, q) gets an alias of p from the closure of access path p. The alias it gets must not use any alias of effective pointer q as a prefix. If there is no such an alias, it returns p. alias(p, q) , let S = {p 0 : closure(p) | ∀q 0 : closure(q).¬prefix(q 0 , p 0 )} in if S == ∅ then p else p 0 where p 0 ∈ S 3) Function equals(p) gets the equivalent set of p. If one of the aliases of p appears in an equivalent set, the function returns the set, or else it returns empty set. equals(p) , if ∃S : Π.(S ∩ closure(p)!=∅) then S where S ∈ Π ∧ S ∩ closure(p)!=∅ else ∅ Next, we introduce the operations and predicates directly used in inference rules. These operations show how to get the Ψ in a postcondition based on the Ψ in the corresponding precondition. 4) Suppose S is an equivalent set in Π, and p is an effective pointer. For each pointer q in S which uses one of p’s aliases as its prefix, S/p finds an alias of q using alias(q,p), substitutes the alias for q in S, and then deletes p’s aliases and any other pointers which use p’s aliases as their prefixes from S. S/p , let S 0 = {q : S | ∀p 0 : closure(p).¬prefix(p 0 , q)}∪ {q 0 | ∃q : S.∃p 0 : closure(p).(prefix(p 0 , q) ∧ q 0 ≡ alias(q, p))} in {q : S 0 | ¬(q ∈ closure(p)) ∧ ∀p 0 : closure(p).¬prefix(p 0 , q)}
8 A pointer logic and certifying compiler We use II/p to denote performing S/p for each S in II. When an effective pointer q is assigned a value which is not equal to q,I/q should be performed.For example,in Fig.2 where II ={fu,v->pre},[v,u->next}}, II/v={fu,u->next->pre},fu->next}}.(fu->next}means that u->next is an effective pointer and is not equal to any of other pointers. data data pre pre next ext Fig.2.Pointers in a doubly-linked cyclic list 5)M\p or D\p replaces the pointers in N or D which use p's aliases as their prefixes with their other aliases.A/p or D/p deletes the aliases of p from A or p Np q:N|Vp':closure(p).-prefix(p',q)}u q'q:N.p'closure(p).(prefix(p',q)'=alias(q,p))} NIp≌{q:WI(q∈closure(p)} N/{p1…pn}≌(NIp1).…/Pn) The formal definitions of D\p and D/p are similar. 6)We use the set operator U to represent the union of two pointer sets.Usually. this operator is used to add a pointer to a pointer set.And we also use set operators U and -and their combinations to represent the addition,deletion and substitution of equivalent sets. 7)Function no_leak(p)tests the result of S/p for S containing p(just testing,not really deleting).This testing is to avoid memory leaks caused by an assignment to p. no_leak(p)if equals(p)/p!=0 then true else false 2.3 Axioms and inference rules In this section,we define axioms and inference rules in our pointer logic.In the following rules,p and q are access paths;p.eq is p's equivalent set,and p.eq equals(p)
8 A pointer logic and certifying compiler We use Π/p to denote performing S/p for each S in Π. When an effective pointer q is assigned a value which is not equal to q, Π/q should be performed. For example, in Fig. 2 where Π = {{u, v->pre}, {v, u->next}}, Π/v = {{u, u->next->pre}, {u->next}}. ({u->next} means that u->next is an effective pointer and is not equal to any of other pointers.) Fig. 2. Pointers in a doubly-linked cyclic list 5) N \p or D\p replaces the pointers in N or D which use p’s aliases as their prefixes with their other aliases. N /p or D/p deletes the aliases of p from N or D. N \p , {q : N | ∀p 0 : closure(p).¬prefix(p 0 , q)}∪ {q 0 | ∃q : N .∃p 0 : closure(p).(prefix(p 0 , q) ∧ q 0 ≡ alias(q, p))} N /p , {q : N | ¬(q ∈ closure(p))} N /{p1 . . . pn} , ((N /p1 ). . . /pn) The formal definitions of D\p and D/p are similar. 6) We use the set operator ∪ to represent the union of two pointer sets. Usually, this operator is used to add a pointer to a pointer set. And we also use set operators ∪ and − and their combinations to represent the addition, deletion and substitution of equivalent sets. 7) Function no leak(p) tests the result of S/p for S containing p (just testing, not really deleting). This testing is to avoid memory leaks caused by an assignment to p. no leak(p) , if equals(p)/p!=∅ then true else false 2.3 Axioms and inference rules In this section, we define axioms and inference rules in our pointer logic. In the following rules, p and q are access paths; p.eq is p’s equivalent set, and p.eq = equals(p).
Front.Comput.Sci.China (2007),Research Article 9 In each of the following rules,the premises of the rule should be computed based on the y before the statement. 1)Pointer assignment:p=q. Here different inference rules are used in different cases. a)Both p and q are effective pointers or null pointers,and they are equal(in- cluding the case where p is a null pointer and g is constant NULL). (p.eq!=0A p==q)V(p==NULL A q==NULL) (RULE 1) [IIANAD}P=q(IIANAD) b)Both p and q are effective pointers,and they are not equal. (p.eq!=0∧q.eq!=0∧p!=q)∧no_leak(p) [IIANAD}p=q {((II-{q.eq})/pu{q.eq/pu(p)})ANpAD\p} (RULE2)】 c)p is a null pointer and q is an effective pointer. p==NULL∧q.eq!=0 [IAN A D}p=q{((II-{q.eq})ufq.equ(p)))AN/p D] (RULE3) d)p is a dangling pointer and q is an effective pointer. The rule for this case is similar to(RULE 3). e)p is an effective pointer and q equals NULL (including the case where q is constant NULL). p.eq!=0A q==NULL A no_leak(p) (RULE 4) (IIANAD}p=q {I/PA(NPU(p))AD\p} f)p is a dangling pointer and q equals NULL (including the case where q is constant NULL). p:DA q==NULL (RULE 5) IIANAD)p=q (II(NUp))AD/p} where p:D represents that an alias of p is in D 2)The assignment axiom for non-pointer-type data is as follows.It is an exten- sion of Hoare logic assignment axiom,and aliasing has been considered in this axiom.The axiom does not interfere with {(亚AQ)y1-.[yn←-xx←}x=e{亚AQ}(AXIOM6-1) where y1,...,yn represent all the members of closure(x)
Front. Comput. Sci. China (2007), Research Article 9 In each of the following rules, the premises of the rule should be computed based on the Ψ before the statement. 1) Pointer assignment: p=q. Here different inference rules are used in different cases. a) Both p and q are effective pointers or null pointers, and they are equal (including the case where p is a null pointer and q is constant NULL). (p.eq!=∅ ∧ p==q) ∨ (p==NULL ∧ q==NULL) {Π ∧ N ∧ D} p=q {Π ∧ N ∧ D} (rule 1) b) Both p and q are effective pointers, and they are not equal. (p.eq!=∅ ∧ q.eq!=∅ ∧ p!=q) ∧ no leak(p) {Π ∧ N ∧ D} p=q {((Π − {q.eq})/p ∪ {q.eq/p ∪ {p}}) ∧ N \p ∧ D\p} (rule 2) c) p is a null pointer and q is an effective pointer. p==NULL ∧ q.eq!=∅ {Π ∧ N ∧ D} p=q {((Π − {q.eq}) ∪ {q.eq ∪ {p}}) ∧ N /p ∧ D} (rule 3) d) p is a dangling pointer and q is an effective pointer. The rule for this case is similar to (RULE 3). e) p is an effective pointer and q equals NULL (including the case where q is constant NULL). p.eq!=∅ ∧ q==NULL ∧ no leak(p) {Π ∧ N ∧ D} p=q {Π/p ∧ (N \p ∪ {p}) ∧ D\p} (rule 4) f) p is a dangling pointer and q equals NULL (including the case where q is constant NULL). p : D ∧ q==NULL {Π ∧ N ∧ D} p=q {Π ∧ (N ∪ {p}) ∧ D/p} (rule 5) where p : D represents that an alias of p is in D 2) The assignment axiom for non-pointer-type data is as follows. It is an extension of Hoare logic assignment axiom, and aliasing has been considered in this axiom. The axiom does not interfere with Ψ. {(Ψ ∧ Q)[y1 ← x] . . . [yn ← x][x ← e]} x=e {Ψ ∧ Q} (axiom 6-1) where y1, . . . , yn represent all the members of closure(x)