Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei,Anhui 230026,China yiyun @ustc.edu.cn gelin,huabj,zpli,liuc5 @mail.ustc.edu.cn Abstract safety policies is one of the hot topics in this research field. In the past decade,great progress has been made in the Safety is an important property of high-assurance soft- area of program verification.Many projects adopt type- ware,and one of the hot research topics on it is the veri- based or logic-based approaches to reason about the prop- fication method for software to meet its safety policies.In erties of low-level or high-level programs.The TAL (Typed our previous work,we designed a pointer logic system and Assembly Language)[15]project and the theory of type re- proposed a framework for developing and verifying safety- finements [13]are two typical projects using type-based ap- critical programs.And in this paper,we present the design proaches,while Proof-Carrying Code(PCC)[16],Founda- and implementation of a certifying compiler based on that tional PCC(FPCC)[1].Certified Assembly Programming framework.Here we will mainly explain verification con- (CAP)[19]and Stack-based CAP (SCAP)[6]are typical dition generation,generation of code and assertions,and projects on logic-based techniques.Type-based and logic- proof generation for basic blocks.Our certifying compiler based techniques are complementary to each other,and has the following novelties:1)it supports a programming some projects have tried to combine them.Hamid et al. language equipped with both a type system and a logic sys- adopted a syntactic approach to FPCC[7],and gave a trans- tem:2)and it can produce safety proofs for programs with lation from a typed assembly language into FPCC in the pointers. Flint project.The ATS(Applied Type System)[18]project proposed by Hongwei Xi et al.extends a type system with a notion of program states,so that invariants on states can 1.Introduction be captured in stateful programming.By encoding Hoare logic in its type system,ATS can support Hoare-logic-like Nowadays,high-assurance software is more and more reasoning via the type system. valued,and among its properties,safety and security are Based on the above work,we have proposed a new most important.Safety is the ability to guarantee that soft- framework for developing and verifying safety-critical soft- ware execution does no harm to the system.And security ware: is the ability to preserve the confidentiality,integrity,avail- 1.Safety policies of a program are specified formally by ability and authentication of data.There is a close relation- programmers.These specifications as well as the cor- ship between safety and security.For example,hackers usu- responding program are submitted to a compiler. ally use low-grade safety errors,such as buffer overflows, out-of-bound array accesses and dangling pointer derefer- 2.The compiler produces verification conditions (VCs) for the submitted program.These VCs are required to ences,to destroy a system or achieve unauthorized control of a system.Apparently,it is helpful to ensure software se- verify whether the program satisfies its specifications. curity by strengthening its safety,and it makes us focus on If all of them are proved,the program is believed to software safety in this paper. satisfy the specifications.Most VCs can be proved To achieve software safety,all program errors should automatically by the embedded theorem prover in the be discovered before the execution of the program or be compiler,and the rest are proved interactively by pro- gently captured during the execution.The research goal grammers using proof-assistant tools.All the proofs of software safety is to build a wholesome scientific and should be produced and preserved. technological infrastructure for the management of software 3.The compiler compiles the source code and specifica- safety.And the verification method for software to meet its tions into assembly level counterparts while translating
Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei, Anhui 230026, China yiyun@ustc.edu.cn {gelin, huabj, zpli, liuc5}@mail.ustc.edu.cn Abstract Safety is an important property of high-assurance software, and one of the hot research topics on it is the veri- fication method for software to meet its safety policies. In our previous work, we designed a pointer logic system and proposed a framework for developing and verifying safetycritical programs. And in this paper, we present the design and implementation of a certifying compiler based on that framework. Here we will mainly explain verification condition generation, generation of code and assertions, and proof generation for basic blocks. Our certifying compiler has the following novelties: 1) it supports a programming language equipped with both a type system and a logic system; 2) and it can produce safety proofs for programs with pointers. 1. Introduction Nowadays, high-assurance software is more and more valued, and among its properties, safety and security are most important. Safety is the ability to guarantee that software execution does no harm to the system. And security is the ability to preserve the confidentiality, integrity, availability and authentication of data. There is a close relationship between safety and security. For example, hackers usually use low-grade safety errors, such as buffer overflows, out-of-bound array accesses and dangling pointer dereferences, to destroy a system or achieve unauthorized control of a system. Apparently, it is helpful to ensure software security by strengthening its safety, and it makes us focus on software safety in this paper. To achieve software safety, all program errors should be discovered before the execution of the program or be gently captured during the execution. The research goal of software safety is to build a wholesome scientific and technological infrastructure for the management of software safety. And the verification method for software to meet its safety policies is one of the hot topics in this research field. In the past decade, great progress has been made in the area of program verification. Many projects adopt typebased or logic-based approaches to reason about the properties of low-level or high-level programs. The TAL (Typed Assembly Language) [15] project and the theory of type re- finements [13] are two typical projects using type-based approaches, while Proof-Carrying Code (PCC) [16], Foundational PCC (FPCC) [1], Certified Assembly Programming (CAP) [19] and Stack-based CAP (SCAP) [6] are typical projects on logic-based techniques. Type-based and logicbased techniques are complementary to each other, and some projects have tried to combine them. Hamid et al. adopted a syntactic approach to FPCC [7], and gave a translation from a typed assembly language into FPCC in the Flint project. The ATS (Applied Type System) [18] project proposed by Hongwei Xi et al. extends a type system with a notion of program states, so that invariants on states can be captured in stateful programming. By encoding Hoare logic in its type system, ATS can support Hoare-logic-like reasoning via the type system. Based on the above work, we have proposed a new framework for developing and verifying safety-critical software: 1. Safety policies of a program are specified formally by programmers. These specifications as well as the corresponding program are submitted to a compiler. 2. The compiler produces verification conditions (VCs) for the submitted program. These VCs are required to verify whether the program satisfies its specifications. If all of them are proved, the program is believed to satisfy the specifications. Most VCs can be proved automatically by the embedded theorem prover in the compiler, and the rest are proved interactively by programmers using proof-assistant tools. All the proofs should be produced and preserved. 3. The compiler compiles the source code and specifications into assembly level counterparts while translating
the proofs obtained from step 2 into proofs at assem- pointer arithmetic are forbidden.To guarantee that type- bly level.The latter proofs are carried in the assembly checked programs are well-typed,union types and coercion code and they ensure that the code meets the corre- of types are also forbidden.malloc and free are used as pre- sponding specifications.These specifications must be defined functions which meet the primary safety policies. equivalent to the counterparts at source level.Such a For example,each invocation of malloc will allocate an area compiler is called a certifying compiler. of memory successfully,and the region of the memory will 4.At assembly level,a proof checker checks the proofs never overlap those already in use. Pointers are classified into effective pointers (those carried in the assembly code automatically to ensure point to objects),null pointers and dangling pointers,and that the code satisfies its specifications. the latter two are also called ineffective pointers.Ineffec- The advantage of this framework is that it provides tive pointer dereferences,memory leaks or using an ineffec- source-level approach for reasoning about program proper- tive pointer as the actual parameter of function free are all ties rather than the assembly-level one.Compared with the considered as violations of the primary safety policies. approaches for assembly code certification,this approach A type system is a tractable syntactic method for prov- can improve the efficiency of safety-critical software devel- ing the absence of certain program behaviors by classifying opment.Since the proofs are checked at assembly level, phrases according to the kinds of values they compute.It the compiler,including a VC generator and a prover,can be is mainly used to eliminate context-sensitive errors in pro- removed from the TCB(Trusted Computing Base).There- grams.A traditional type system is not sufficient when the fore,the TCB of the system will be reduced remarkably. legality of a phrase depends not only on the context but In this framework,we choose a C-like programming lan- also on some expression values in the phrase.Dependent guage PointerC [9]as our source language and implement types [18]is one possible solution to this problem.Another a certifying compiler prototype for it.Our main contri- solution is not to treat the constraints on values as parts of bution is the design of a certifying compiler for a source the type system,and the constraints don't appear in the typ- language with pointers and explicit memory management ing rules.The former leads to a complicated type system (malloc/free).And a prototype of the certifying compiler and the latter leads to an unsound one.We try to find a trade- has been implemented in our work.A distinct feature of our off between these two solutions in the design of PointerC. certifying compiler is that it supports languages equipped To achieve simplicity in the type system and guarantee the with both a type system and a logic system. safety of the language at the same time,side conditions are In this paper,we introduce the design and implemen- introduced into the typing rules to express the constraints on tation of our certifying compiler.The rest of the paper is values.For example,the following two typing rules show organized as follows.We introduce the source language- that the constraints on subscript expressions and pointers PointerC briefly in section 2.In section 3,we discuss how are put in the side conditions. the compiler generates VCs at source level from the asser- THe:int THa:array(10,bool) tions given by programmers and the side conditions in the 0≤e∧e≤9 TFa[e]bool typing rules.Section 4 describes the scheme for generat- ing corresponding assertions at some program points dur- TFp ptr(struct(...,x:int,...)) p∈effective.-ptrs ing intermediate code generation.This ensures that each THp->x:int basic block has proper pre-and postconditions.Section 5 These side conditions must be checked before the ex- explains how the compiler generates a proof for each ba- ecution of the program to ensure that the execution does sic block.Each proof ensures that the corresponding basic not violate the primary safety policies.And to check these block satisfies its pre-and postconditions.Section 6 com- side conditions statically,we have designed a pointer logic pares our work with related work and section 7 concludes. for PointerC.The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool.It collects 2.Source language-PointerC pointer information in a forward manner.This information includes whether a pointer is null,dangling or effective,and PointerC is a C-like programming language that we have the equality relation between effective pointers.The in- designed as the source language in our work.In order to formation can be used to prove that the program satisfies check more pointer programs statically,some pointer oper- the side conditions in the typing rules and then to support ations are restricted in PointerC [9].These restrictions are value-sensitive static checking.For example,according to based on the premise of not affecting the language function- the pointer logic,the following two Hoare triples ality of PointerC.Variables with pointer type can only be used in assignment,equality comparison,dereference or as [p,q)A [p->next)}p=malloc(...) the parameters of functions;the address-of operator(&)and [fp)A {q)A{q->next}A {p->next)D}
the proofs obtained from step 2 into proofs at assembly level. The latter proofs are carried in the assembly code and they ensure that the code meets the corresponding specifications. These specifications must be equivalent to the counterparts at source level. Such a compiler is called a certifying compiler. 4. At assembly level, a proof checker checks the proofs carried in the assembly code automatically to ensure that the code satisfies its specifications. The advantage of this framework is that it provides source-level approach for reasoning about program properties rather than the assembly-level one. Compared with the approaches for assembly code certification, this approach can improve the efficiency of safety-critical software development. Since the proofs are checked at assembly level, the compiler, including a VC generator and a prover, can be removed from the TCB (Trusted Computing Base). Therefore, the TCB of the system will be reduced remarkably. In this framework, we choose a C-like programming language PointerC [9] as our source language and implement a certifying compiler prototype for it. Our main contribution is the design of a certifying compiler for a source language with pointers and explicit memory management (malloc/free). And a prototype of the certifying compiler has been implemented in our work. A distinct feature of our certifying compiler is that it supports languages equipped with both a type system and a logic system. In this paper, we introduce the design and implementation of our certifying compiler. The rest of the paper is organized as follows. We introduce the source language— PointerC briefly in section 2. In section 3, we discuss how the compiler generates VCs at source level from the assertions given by programmers and the side conditions in the typing rules. Section 4 describes the scheme for generating corresponding assertions at some program points during intermediate code generation. This ensures that each basic block has proper pre- and postconditions. Section 5 explains how the compiler generates a proof for each basic block. Each proof ensures that the corresponding basic block satisfies its pre- and postconditions. Section 6 compares our work with related work and section 7 concludes. 2. Source language—PointerC PointerC is a C-like programming language that we have designed as the source language in our work. In order to check more pointer programs statically, some pointer operations are restricted in PointerC [9]. These restrictions are based on the premise of not affecting the language functionality of PointerC. Variables with pointer type can only be used in assignment, equality comparison, dereference or as the parameters of functions; the address-of operator (&) and pointer arithmetic are forbidden. To guarantee that typechecked programs are well-typed, union types and coercion of types are also forbidden. malloc and free are used as predefined functions which meet the primary safety policies. For example, each invocation of malloc will allocate an area of memory successfully, and the region of the memory will never overlap those already in use. Pointers are classified into effective pointers (those point to objects), null pointers and dangling pointers, and the latter two are also called ineffective pointers. Ineffective pointer dereferences, memory leaks or using an ineffective pointer as the actual parameter of function free are all considered as violations of the primary safety policies. A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. It is mainly used to eliminate context-sensitive errors in programs. A traditional type system is not sufficient when the legality of a phrase depends not only on the context but also on some expression values in the phrase. Dependent types [18] is one possible solution to this problem. Another solution is not to treat the constraints on values as parts of the type system, and the constraints don’t appear in the typing rules. The former leads to a complicated type system and the latter leads to an unsound one. We try to find a tradeoff between these two solutions in the design of PointerC. To achieve simplicity in the type system and guarantee the safety of the language at the same time, side conditions are introduced into the typing rules to express the constraints on values. For example, the following two typing rules show that the constraints on subscript expressions and pointers are put in the side conditions. Γe : int Γa : array(10, bool) Γa[e] : bool 0 ≤ e ∧ e ≤ 9 Γp : ptr(struct(..., x:int,... )) Γp->x : int p ∈ effective ptrs These side conditions must be checked before the execution of the program to ensure that the execution does not violate the primary safety policies. And to check these side conditions statically, we have designed a pointer logic for PointerC. The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool. It collects pointer information in a forward manner. This information includes whether a pointer is null, dangling or effective, and the equality relation between effective pointers. The information can be used to prove that the program satisfies the side conditions in the typing rules and then to support value-sensitive static checking. For example, according to the pointer logic, the following two Hoare triples {{{p, q}∧{p->next}}} p=malloc(···) {{{p}∧{q}∧{q->next}∧{p->next}D}}
and 2.A verification condition generator(VCGen)is embed- ded into the front end of the compiler.It can convert {p,q}{p->nextN}free(p)[ip,q)D} the task of proving a program satisfying its specifica- tions into the task of proving a set of VCs.From the hold,where the set [p}represents an assertion,and its in- annotations mentioned in 1 and the side conditions in formal meaning is"p is an effective pointer and is not equal the typing rules,the VCGen generates a set of VCs us- to any other pointers".Similarly,the set {p,g}means"Ef- ing the pointer logic rules.And these VCs should be fective pointers p and q are equal(but not aliases),and they proved in order to guarantee program safety. are not equal to any other pointers";{p,g)p denotes pand q are dangling pointers;and [p->next)N means that p->next 3.A simple theorem prover,which produces correspond- is a null pointer.According to the pointer logic,there are ing proofs for pointer-related VCs,is embedded into no postconditions Q and Q2 which make the following two the compiler as well.Integer-related VCs are proved Hoare triples hold.That is,the preconditions of malloc and interactively in Coq [4]by the programmers.The VCs free here can not satisfy the side conditions in the corre- and their proofs show that the source program satisfies sponding typing rules. its specifications. {{p)A {p->next}p=malloc(...)Q} At first,we discuss how to generate VCs for functions with pointer-type data.Figure 1 shows the structure of a (The object previously pointed by p is lost.) function chosen from PointerC syntax,some irrelevant de- tails are omitted. fp}A {p->next)}p=free(p)Q2} (The object previously pointed by p->next is lost.) FunDcl → id(arg){Body) In history,pointer analyses were mostly conservative es- Body → VarDecList StmtList timates of the pointer status at runtime.And they mainly StmtList → Stmt StmtListe answered the question:what was the possible set of objects Stmt → Ival Exp that a pointer may point to at runtime?Such pointer analy- if(Bexp){StmtList else {StmtList ses can be used in many fields of static analysis and program while (Bexp)[StmtList} optimization,such as liveness analysis needed by register Ival alloc(Type) allocation and constant propagation etc.To meet the safety free(Exp) requirements of software,we have restricted some unde- Ival id(Exp) cidable pointer operations in PointerC,and thus obtained return res an accurate pointer analysis instead of an approximate one. This is the primary difference of our pointer analysis from Figure 1.Structure of function others.Although accurate pointer analysis does not exist for all programs,our practical experience on the applica- tions of some data structures,such as singly linked list,bi- The pointer logic is fit for collecting pointer information nary tree and circular doubly linked list etc.,shows that the in a forward manner,so that the VC generation is based pointer logic is feasible.Besides side conditions in the typ- on the strongest postcondition calculus.Figure 2 shows the ing rules,the pointer logic can also be used to reason about main rules for the strongest postcondition calculus (function other program properties to accommodate the requests of sp)in the pointer logic and the framework of VC generation user-defined safety policies. (for data with pointer types).These rules are recursively de- fined according to the syntactic structures in a function.In Figure 2,II,N and D denote the equivalent division of all 3.Verification condition generation effective pointers,the set of null pointers and the set of dan- gling pointers respectively [2]:is the short notation for In order to check program safety statically,side condi- IIANA D;the first parameter of function sp is a syntactic tions in the typing rules must be provable at the correspond- structure,and the second one is the precondition of the syn- ing program points.This is achieved through the following tactic structure.The VC generation described here has the steps: following important characteristics: 1.Programmers annotate each function with a pair of pre- 1.Since the precondition and the postcondition of and postconditions and each while loop with a loop in- a function are given,the VC sp(StmtList,)is variant.These annotations belong to the specifications generated at the exit of the function(note that StmtList of the source program. forms the statement list of the function)
and {{{p, q}∧{p->next}N }} free(p) {{{p, q}D}} hold, where the set {p} represents an assertion, and its informal meaning is “p is an effective pointer and is not equal to any other pointers”. Similarly, the set {p, q} means “Effective pointers p and q are equal (but not aliases), and they are not equal to any other pointers”; {p, q}D denotes p and q are dangling pointers; and {p->next}N means that p->next is a null pointer. According to the pointer logic, there are no postconditions Q1 and Q2 which make the following two Hoare triples hold. That is, the preconditions of malloc and free here can not satisfy the side conditions in the corresponding typing rules. {{{p}∧{p->next}N }} p=malloc(···) {{Q1}} (The object previously pointed by p is lost.) {{{p}∧{p->next}}} p=free(p) {{Q2}} (The object previously pointed by p->next is lost.) In history, pointer analyses were mostly conservative estimates of the pointer status at runtime. And they mainly answered the question: what was the possible set of objects that a pointer may point to at runtime? Such pointer analyses can be used in many fields of static analysis and program optimization, such as liveness analysis needed by register allocation and constant propagation etc. To meet the safety requirements of software, we have restricted some undecidable pointer operations in PointerC, and thus obtained an accurate pointer analysis instead of an approximate one. This is the primary difference of our pointer analysis from others. Although accurate pointer analysis does not exist for all programs, our practical experience on the applications of some data structures, such as singly linked list, binary tree and circular doubly linked list etc., shows that the pointer logic is feasible. Besides side conditions in the typing rules, the pointer logic can also be used to reason about other program properties to accommodate the requests of user-defined safety policies. 3. Verification condition generation In order to check program safety statically, side conditions in the typing rules must be provable at the corresponding program points. This is achieved through the following steps: 1. Programmers annotate each function with a pair of preand postconditions and each while loop with a loop invariant. These annotations belong to the specifications of the source program. 2. A verification condition generator (VCGen) is embedded into the front end of the compiler. It can convert the task of proving a program satisfying its specifications into the task of proving a set of VCs. From the annotations mentioned in 1 and the side conditions in the typing rules, the VCGen generates a set of VCs using the pointer logic rules. And these VCs should be proved in order to guarantee program safety. 3. A simple theorem prover, which produces corresponding proofs for pointer-related VCs, is embedded into the compiler as well. Integer-related VCs are proved interactively in Coq [4] by the programmers. The VCs and their proofs show that the source program satisfies its specifications. At first, we discuss how to generate VCs for functions with pointer-type data. Figure 1 shows the structure of a function chosen from PointerC syntax, some irrelevant details are omitted. FunDcl → id(arg){Body} Body → VarDecList StmtList StmtList → Stmt StmtList | Stmt → lval = Exp | if (Bexp) {StmtList} else {StmtList} | while (Bexp) {StmtList} | lval = alloc(Type) | free(Exp) | lval = id(Exp) | return res Figure 1. Structure of function The pointer logic is fit for collecting pointer information in a forward manner, so that the VC generation is based on the strongest postcondition calculus. Figure 2 shows the main rules for the strongest postcondition calculus (function sp) in the pointer logic and the framework of VC generation (for data with pointer types). These rules are recursively de- fined according to the syntactic structures in a function. In Figure 2, Π, N and D denote the equivalent division of all effective pointers, the set of null pointers and the set of dangling pointers respectively [2]; Ψ is the short notation for Π ∧N ∧D; the first parameter of function sp is a syntactic structure, and the second one is the precondition of the syntactic structure. The VC generation described here has the following important characteristics: 1. Since the precondition Ψ and the postcondition Ψ of a function are given, the VC sp(StmtList, Ψ) ⊃ Ψ is generated at the exit of the function (note that StmtList forms the statement list of the function)
1.Function definition [亚]id(arg){Body}[亚'],that is[亚]StmtList[亚'),where the StmtList is the StmtList in the Body production. 2.Statement List ·sp(Stmt StmtList,业)=sp(StmtList,sp(Stmt,业) ·sp(e,亚)=亚.If e forms the StmtList of a function,.then VC=亚p亚'(see 1 for亚will be generated.. 3.Statement ·assignment:sp(lval=Exp,亚)=亚',亚'can be calculated using亚according to the assignment rules in the pointer logic. ·condition:sp(if(Bexp){StmtList}else{StmtList2},平)=sp(StmtList1,BexpAΨ)Vsp(StmtList2,一BexpA平) loop:sp(while(Bexp){StmtList,)=-BexpA I,where Iis the loop invariant for pointer-type data. VC1=I and VC2 sp(StmtList,Bexp A DI should be generated at the entry point and the exit of StmtList respectively. 。allocation:sp(Ival=alloc(Type),亚)=亚',亚can be calculated using亚according to the allocation rules in the pointer logic. 。deallocation:sp(free(Exp),亚)=亚',亚'can be calculated using亚according to the deallocation rule in the pointer logic. .function call:If the pre-and postconditions of function id are farg)[lvalA and A Qrespectively,then sp(Ival id(Exp),)=(A Q)[arg-Exp][res-Ivall,where Q and Qarg are assertions that have nothing to do with pointers,arg is the formal parameter,and res is the return value (see Figure 1). VC=(farg}A [Ival)A Qn)[arg-Exp]should be generated at the entry point of this statement. ·return:sp(return res,.亚)=亚 Figure 2.The Strongest postcondition calculus and the VC generation of pointer-type data 2.One difficulty of strongest postcondition calculus is the assignment axiom.Instead,different assignment rules need to find a fixpoint for a recursive equation in a loop are used in different cases in our pointer logic.Since statement.The solutions to such equations are usually the pointer analysis is precise,it is easy to determine undecidable,and it is also the primary reason why the which rule to use in a certain case and it is clear how correctness of a program can not be proved automat- to compute 'using in the sp rule for assignment. ically.The loop invariant provided by programmers Please refer to [2]for some details skipped in Figure 2. is used to avoid the difficulty.However,in order to prove the validity of the loop invariant,two VCs must 5.At the entry point of a function,the pointer logic be generated at the entry point and the exit of the loop should check the initial values of the pointer-type for- respectively. mal parameters and local variables.And at the exit of the function,their effectiveness should also be checked 3.The pre-and postconditions of each function have also to avoid memory leaks.But for the space limit,the VC deeply simplified the computation of sp for function generation in Figure 2 does not reflect this. call statement.Briefly speaking,the before the call statement should imply the precondition of the callee, For integer-type data,we adopt a complemented ap- and the callee's postcondition should be used as the proach of Hoare logic-weakest precondition calculus strongest postcondition after the call statement.Cer- (wp)[5]to generate assertions or VCs at each program tainly,we also need to consider the substitution of ac- point.And we only introduce its distinctions from the ap- tual parameters for formal parameters as well as the proach for pointer-type data as follows. substitution of the variable Ival for the variable res(see function call in Figure 2). 1.Since the assertions in a function are calculated back- ward using the postcondition Q of the function,a VC 4.One remarkable distinction of our pointer logic from Pwp(StmtList,Q)is generated at the entry point of Hoare logic is that the pointer logic has no uniform the function,where P is the precondition of the func-
1. Function definition {{Ψ}} id(arg){Body} {{Ψ }}, that is {{Ψ}} StmtList {{Ψ }}, where the StmtList is the StmtList in the Body production. 2. Statement List • sp(Stmt StmtList, Ψ) = sp(StmtList, sp(Stmt, Ψ)) • sp(, Ψ) = Ψ. If forms the StmtList of a function, then VC = Ψ ⊃ Ψ (see 1 for Ψ ) will be generated. 3. Statement • assignment: sp(lval = Exp, Ψ) = Ψ , Ψ can be calculated using Ψ according to the assignment rules in the pointer logic. • condition: sp(if (Bexp) {StmtList1} else {StmtList2}, Ψ) = sp(StmtList1, Bexp∧Ψ)∨sp(StmtList2, ¬Bexp∧Ψ) • loop: sp(while (Bexp) {StmtList}, Ψ) = ¬Bexp ∧ I, where I is the loop invariant for pointer-type data. VC1 = Ψ ⊃ I and VC2 = sp(StmtList, Bexp ∧ I) ⊃ I should be generated at the entry point and the exit of StmtList respectively. • allocation: sp(lval = alloc(Type), Ψ) = Ψ , Ψ can be calculated using Ψ according to the allocation rules in the pointer logic. • deallocation: sp(free(Exp), Ψ) = Ψ , Ψ can be calculated using Ψ according to the deallocation rule in the pointer logic. • function call: If the pre- and postconditions of function id are {arg}∧{lval}N ∧Qarg and Ψ ∧Q respectively, then sp(lval = id(Exp), Ψ) = (Ψ ∧ Q)[arg ← Exp][res ← lval], where Q and Qarg are assertions that have nothing to do with pointers, arg is the formal parameter, and res is the return value (see Figure 1). VC = Ψ ⊃ ({arg}∧{lval}N ∧ Qarg)[arg ← Exp] should be generated at the entry point of this statement. • return: sp(return res, Ψ) = Ψ Figure 2. The Strongest postcondition calculus and the VC generation of pointer-type data 2. One difficulty of strongest postcondition calculus is the need to find a fixpoint for a recursive equation in a loop statement. The solutions to such equations are usually undecidable, and it is also the primary reason why the correctness of a program can not be proved automatically. The loop invariant provided by programmers is used to avoid the difficulty. However, in order to prove the validity of the loop invariant, two VCs must be generated at the entry point and the exit of the loop respectively. 3. The pre- and postconditions of each function have also deeply simplified the computation of sp for function call statement. Briefly speaking, the Ψ before the call statement should imply the precondition of the callee, and the callee’s postcondition should be used as the strongest postcondition after the call statement. Certainly, we also need to consider the substitution of actual parameters for formal parameters as well as the substitution of the variable lval for the variable res (see function call in Figure 2). 4. One remarkable distinction of our pointer logic from Hoare logic is that the pointer logic has no uniform assignment axiom. Instead, different assignment rules are used in different cases in our pointer logic. Since the pointer analysis is precise, it is easy to determine which rule to use in a certain case and it is clear how to compute Ψ using Ψ in the sp rule for assignment. Please refer to [2] for some details skipped in Figure 2. 5. At the entry point of a function, the pointer logic should check the initial values of the pointer-type formal parameters and local variables. And at the exit of the function, their effectiveness should also be checked to avoid memory leaks. But for the space limit, the VC generation in Figure 2 does not reflect this. For integer-type data, we adopt a complemented approach of Hoare logic—weakest precondition calculus (wp) [5] to generate assertions or VCs at each program point. And we only introduce its distinctions from the approach for pointer-type data as follows. 1. Since the assertions in a function are calculated backward using the postcondition Q of the function, a VC P ⊃ wp(StmtList, Q) is generated at the entry point of the function, where P is the precondition of the func-
tion and StmtList is the statement sequence of the func- 4.Code and assertion generation tion. 2.The side condition in a typing rule should be combined In our framework,specifications and the proof of code satisfying the specifications are carried in the assembly with the assertion which is at the entry point of the code.The assembly code is divided into basic blocks.Basic corresponding statement.For pointer-type data,such a problem does not exist.Because a pointer-related side block,which is a concept in code optimization and gener- ation,is a sequence of instruction;and in our design,the condition is consistent with the premise of the corre- instruction sequence ends with a control transfer instruction sponding inference rule in the pointer logic,and should have been checked before the rule is chosen. such as jmp or call.Each basic block B has its precon dition P,postcondition Q,and the proof or proof hint of 3.When we compute pointer-related assertions,we need [P}B {Q}.The proof can be checked by a proof checker. According to the principle that the postcondition of a basic to decide which assignment rule to use according to the block should imply the precondition of the succeeding ba- assertion before the assignment.However,there is no need to do that about integer-related assertions using sic block in the control flow.Q can be omitted since we can just take the precondition of the succeeding basic block as wp. In fact,the computations of assertions and VCs for In order to make sure that each basic block has perti- integer-type data and pointer-type data interact with each nent pre-and postconditions,assertions should be gener- other. ated at proper program points during code generation.In this section,we introduce a scheme for generating interme- 1.Both of them should be performed together with type diate code and the corresponding assertions.For the lack checking,since they both need side conditions in the of space,we only consider pointer-related assertions.Us- typing rules.But the backward calculus for integer- ing the calculi in section 3,we can get a proper assertion related assertions will reduce the efficiency of the per- at each program point.Figure 3 shows the scheme for the formance,because the calculus direction does not con- main syntactic structures.The first parameter of the recur- sist with the direction of type checking. sive function Code is a syntactic structure.and the second parameter is the precondition of the structure.The function 2.The pre-and postconditions of functions,loop invari- Code generates intermediate code and assertions at some ants and the Bexps of while and if statements may program points.The generated assertion is at the right side contain both integers and pointers.Such an assertion of the code.The assertion which follows"--"should be should be divided into two parts,each of which con- inserted before the line of the code,and which follows"++" cerns only one kind of data and participates one calcu- should be inserted after the line of the code;and the as- lus. sertion led by "**"is the VC to be proved.The function Code calls function code which uses syntactic structure as 3.The variable p->data makes sense only when p is an the only parameter.The function code only generates in- effective pointer,but in Hoare logic,there is no such a termediate code for the syntactic structure.The function consideration.Therefore.rules in Hoare logic can not Code does not generate assertions between the intermediate be used in the case where pointers are concerned.The code which will obviously be translated into the same ba- pointer logic collects pointer equality information,and sic block,such as the code for assignment statements and this information can help us check whether p->data Boolean expressions (Boolean expression calculation does makes sense.Moreover,when using assignment ax- not use short-circuit here).The description in Figure 3 actu- iom,we can perform alias substitution [2]according ally covers some assertion and VC generations in Figure 2, to the equality information of pointers. hence all of these can be done via a one-pass inspection of source programs during compilation. Using the pointer logic,it is easy to prove the correct- When considering the generation of integer-related as- ness of the VC generation,i.e.,if all of the VCs are proved. sertions and VCs,it is difficult to do all the work about the Hoare triple [P}id(arg){Body}[Q}holds.The VC pointers and integers in one pass,because they are based generator is removed from the TCB,since there is a proof on the calculi in different directions.The compiler can checker at assembly level in our framework.Specifications, do type checking,pointer-related assertion generation,VC VCs and the proofs of VCs at source level will be trans- generation,integer side-condition annotation and interme- lated into equivalent assembly-level counterparts.And the diate code generation in the first pass and generate integer- translated counterparts will be used in the verification at as- related assertions and VCs using the annotation of integer sembly level. side conditions in the second pass
tion and StmtList is the statement sequence of the function. 2. The side condition in a typing rule should be combined with the assertion which is at the entry point of the corresponding statement. For pointer-type data, such a problem does not exist. Because a pointer-related side condition is consistent with the premise of the corresponding inference rule in the pointer logic, and should have been checked before the rule is chosen. 3. When we compute pointer-related assertions, we need to decide which assignment rule to use according to the assertion before the assignment. However, there is no need to do that about integer-related assertions using wp. In fact, the computations of assertions and VCs for integer-type data and pointer-type data interact with each other. 1. Both of them should be performed together with type checking, since they both need side conditions in the typing rules. But the backward calculus for integerrelated assertions will reduce the efficiency of the performance, because the calculus direction does not consist with the direction of type checking. 2. The pre- and postconditions of functions, loop invariants and the Bexps of while and if statements may contain both integers and pointers. Such an assertion should be divided into two parts, each of which concerns only one kind of data and participates one calculus. 3. The variable p->data makes sense only when p is an effective pointer, but in Hoare logic, there is no such a consideration. Therefore, rules in Hoare logic can not be used in the case where pointers are concerned. The pointer logic collects pointer equality information, and this information can help us check whether p->data makes sense. Moreover, when using assignment axiom, we can perform alias substitution [2] according to the equality information of pointers. Using the pointer logic, it is easy to prove the correctness of the VC generation, i.e., if all of the VCs are proved, the Hoare triple {{P}} id(arg){Body} {{Q}} holds. The VC generator is removed from the TCB, since there is a proof checker at assembly level in our framework. Specifications, VCs and the proofs of VCs at source level will be translated into equivalent assembly-level counterparts. And the translated counterparts will be used in the verification at assembly level. 4. Code and assertion generation In our framework, specifications and the proof of code satisfying the specifications are carried in the assembly code. The assembly code is divided into basic blocks. Basic block, which is a concept in code optimization and generation, is a sequence of instruction; and in our design, the instruction sequence ends with a control transfer instruction such as jmp or call. Each basic block B has its precondition P, postcondition Q, and the proof or proof hint of {{P}} B {{Q}}. The proof can be checked by a proof checker. According to the principle that the postcondition of a basic block should imply the precondition of the succeeding basic block in the control flow, Q can be omitted since we can just take the precondition of the succeeding basic block as Q. In order to make sure that each basic block has pertinent pre- and postconditions, assertions should be generated at proper program points during code generation. In this section, we introduce a scheme for generating intermediate code and the corresponding assertions. For the lack of space, we only consider pointer-related assertions. Using the calculi in section 3, we can get a proper assertion at each program point. Figure 3 shows the scheme for the main syntactic structures. The first parameter of the recursive function Code is a syntactic structure, and the second parameter is the precondition of the structure. The function Code generates intermediate code and assertions at some program points. The generated assertion is at the right side of the code. The assertion which follows “--” should be inserted before the line of the code, and which follows “++” should be inserted after the line of the code; and the assertion led by “∗∗” is the VC to be proved. The function Code calls function code which uses syntactic structure as the only parameter. The function code only generates intermediate code for the syntactic structure. The function Code does not generate assertions between the intermediate code which will obviously be translated into the same basic block, such as the code for assignment statements and Boolean expressions (Boolean expression calculation does not use short-circuit here). The description in Figure 3 actually covers some assertion and VC generations in Figure 2, hence all of these can be done via a one-pass inspection of source programs during compilation. When considering the generation of integer-related assertions and VCs, it is difficult to do all the work about pointers and integers in one pass, because they are based on the calculi in different directions. The compiler can do type checking, pointer-related assertion generation, VC generation, integer side-condition annotation and intermediate code generation in the first pass and generate integerrelated assertions and VCs using the annotation of integer side conditions in the second pass