Design of a Certifying Compiler Supporting Proof of Program Safety.pdf

Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei,Anhui 230026,China yiyun @ustc.edu.cn gelin,huabj,zpli,liuc5 @mail.ustc.edu.cn Abstract safety policies is one of the hot topics in this research field. In the past decade,great progress has been made in the Safety is an important property of high-assurance soft- area of program verification.Many projects adopt type- ware,and one of the hot research topics on it is the veri- based or logic-based approaches to reason about the prop- fication method for software to meet its safety policies.In erties of low-level or high-level programs.The TAL (Typed our previous work,we designed a pointer logic system and Assembly Language)[15]project and the theory of type re- proposed a framework for developing and verifying safety- finements [13]are two typical projects using type-based ap- critical programs.And in this paper,we present the design proaches,while Proof-Carrying Code(PCC)[16],Founda- and implementation of a certifying compiler based on that tional PCC(FPCC)[1].Certified Assembly Programming framework.Here we will mainly explain verification con- (CAP)[19]and Stack-based CAP (SCAP)[6]are typical dition generation,generation of code and assertions,and projects on logic-based techniques.Type-based and logic- proof generation for basic blocks.Our certifying compiler based techniques are complementary to each other,and has the following novelties:1)it supports a programming some projects have tried to combine them.Hamid et al. language equipped with both a type system and a logic sys- adopted a syntactic approach to FPCC[7],and gave a trans- tem:2)and it can produce safety proofs for programs with lation from a typed assembly language into FPCC in the pointers. Flint project.The ATS(Applied Type System)[18]project proposed by Hongwei Xi et al.extends a type system with a notion of program states,so that invariants on states can 1.Introduction be captured in stateful programming.By encoding Hoare logic in its type system,ATS can support Hoare-logic-like Nowadays,high-assurance software is more and more reasoning via the type system. valued,and among its properties,safety and security are Based on the above work,we have proposed a new most important.Safety is the ability to guarantee that soft- framework for developing and verifying safety-critical soft- ware execution does no harm to the system.And security ware: is the ability to preserve the confidentiality,integrity,avail- 1.Safety policies of a program are specified formally by ability and authentication of data.There is a close relation- programmers.These specifications as well as the cor- ship between safety and security.For example,hackers usu- responding program are submitted to a compiler. ally use low-grade safety errors,such as buffer overflows, out-of-bound array accesses and dangling pointer derefer- 2.The compiler produces verification conditions (VCs) for the submitted program.These VCs are required to ences,to destroy a system or achieve unauthorized control of a system.Apparently,it is helpful to ensure software se- verify whether the program satisfies its specifications. curity by strengthening its safety,and it makes us focus on If all of them are proved,the program is believed to software safety in this paper. satisfy the specifications.Most VCs can be proved To achieve software safety,all program errors should automatically by the embedded theorem prover in the be discovered before the execution of the program or be compiler,and the rest are proved interactively by pro- gently captured during the execution.The research goal grammers using proof-assistant tools.All the proofs of software safety is to build a wholesome scientific and should be produced and preserved. technological infrastructure for the management of software 3.The compiler compiles the source code and specifica- safety.And the verification method for software to meet its tions into assembly level counterparts while translating

Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei, Anhui 230026, China yiyun@ustc.edu.cn {gelin, huabj, zpli, liuc5}@mail.ustc.edu.cn Abstract Safety is an important property of high-assurance software, and one of the hot research topics on it is the veri- fication method for software to meet its safety policies. In our previous work, we designed a pointer logic system and proposed a framework for developing and verifying safetycritical programs. And in this paper, we present the design and implementation of a certifying compiler based on that framework. Here we will mainly explain verification condition generation, generation of code and assertions, and proof generation for basic blocks. Our certifying compiler has the following novelties: 1) it supports a programming language equipped with both a type system and a logic system; 2) and it can produce safety proofs for programs with pointers. 1. Introduction Nowadays, high-assurance software is more and more valued, and among its properties, safety and security are most important. Safety is the ability to guarantee that software execution does no harm to the system. And security is the ability to preserve the confidentiality, integrity, availability and authentication of data. There is a close relationship between safety and security. For example, hackers usually use low-grade safety errors, such as buffer overflows, out-of-bound array accesses and dangling pointer dereferences, to destroy a system or achieve unauthorized control of a system. Apparently, it is helpful to ensure software security by strengthening its safety, and it makes us focus on software safety in this paper. To achieve software safety, all program errors should be discovered before the execution of the program or be gently captured during the execution. The research goal of software safety is to build a wholesome scientific and technological infrastructure for the management of software safety. And the verification method for software to meet its safety policies is one of the hot topics in this research field. In the past decade, great progress has been made in the area of program verification. Many projects adopt typebased or logic-based approaches to reason about the properties of low-level or high-level programs. The TAL (Typed Assembly Language) [15] project and the theory of type re- finements [13] are two typical projects using type-based approaches, while Proof-Carrying Code (PCC) [16], Foundational PCC (FPCC) [1], Certified Assembly Programming (CAP) [19] and Stack-based CAP (SCAP) [6] are typical projects on logic-based techniques. Type-based and logicbased techniques are complementary to each other, and some projects have tried to combine them. Hamid et al. adopted a syntactic approach to FPCC [7], and gave a translation from a typed assembly language into FPCC in the Flint project. The ATS (Applied Type System) [18] project proposed by Hongwei Xi et al. extends a type system with a notion of program states, so that invariants on states can be captured in stateful programming. By encoding Hoare logic in its type system, ATS can support Hoare-logic-like reasoning via the type system. Based on the above work, we have proposed a new framework for developing and verifying safety-critical software: 1. Safety policies of a program are specified formally by programmers. These specifications as well as the corresponding program are submitted to a compiler. 2. The compiler produces verification conditions (VCs) for the submitted program. These VCs are required to verify whether the program satisfies its specifications. If all of them are proved, the program is believed to satisfy the specifications. Most VCs can be proved automatically by the embedded theorem prover in the compiler, and the rest are proved interactively by programmers using proof-assistant tools. All the proofs should be produced and preserved. 3. The compiler compiles the source code and specifications into assembly level counterparts while translating

the proofs obtained from step 2 into proofs at assem- pointer arithmetic are forbidden.To guarantee that type- bly level.The latter proofs are carried in the assembly checked programs are well-typed,union types and coercion code and they ensure that the code meets the corre- of types are also forbidden.malloc and free are used as pre- sponding specifications.These specifications must be defined functions which meet the primary safety policies. equivalent to the counterparts at source level.Such a For example,each invocation of malloc will allocate an area compiler is called a certifying compiler. of memory successfully,and the region of the memory will 4.At assembly level,a proof checker checks the proofs never overlap those already in use. Pointers are classified into effective pointers (those carried in the assembly code automatically to ensure point to objects),null pointers and dangling pointers,and that the code satisfies its specifications. the latter two are also called ineffective pointers.Ineffec- The advantage of this framework is that it provides tive pointer dereferences,memory leaks or using an ineffec- source-level approach for reasoning about program proper- tive pointer as the actual parameter of function free are all ties rather than the assembly-level one.Compared with the considered as violations of the primary safety policies. approaches for assembly code certification,this approach A type system is a tractable syntactic method for prov- can improve the efficiency of safety-critical software devel- ing the absence of certain program behaviors by classifying opment.Since the proofs are checked at assembly level, phrases according to the kinds of values they compute.It the compiler,including a VC generator and a prover,can be is mainly used to eliminate context-sensitive errors in pro- removed from the TCB(Trusted Computing Base).There- grams.A traditional type system is not sufficient when the fore,the TCB of the system will be reduced remarkably. legality of a phrase depends not only on the context but In this framework,we choose a C-like programming lan- also on some expression values in the phrase.Dependent guage PointerC [9]as our source language and implement types [18]is one possible solution to this problem.Another a certifying compiler prototype for it.Our main contri- solution is not to treat the constraints on values as parts of bution is the design of a certifying compiler for a source the type system,and the constraints don't appear in the typ- language with pointers and explicit memory management ing rules.The former leads to a complicated type system (malloc/free).And a prototype of the certifying compiler and the latter leads to an unsound one.We try to find a trade- has been implemented in our work.A distinct feature of our off between these two solutions in the design of PointerC. certifying compiler is that it supports languages equipped To achieve simplicity in the type system and guarantee the with both a type system and a logic system. safety of the language at the same time,side conditions are In this paper,we introduce the design and implemen- introduced into the typing rules to express the constraints on tation of our certifying compiler.The rest of the paper is values.For example,the following two typing rules show organized as follows.We introduce the source language- that the constraints on subscript expressions and pointers PointerC briefly in section 2.In section 3,we discuss how are put in the side conditions. the compiler generates VCs at source level from the asser- THe:int THa:array(10,bool) tions given by programmers and the side conditions in the 0≤e∧e≤9 TFa[e]bool typing rules.Section 4 describes the scheme for generat- ing corresponding assertions at some program points dur- TFp ptr(struct(...,x:int,...)) p∈effective.-ptrs ing intermediate code generation.This ensures that each THp->x:int basic block has proper pre-and postconditions.Section 5 These side conditions must be checked before the ex- explains how the compiler generates a proof for each ba- ecution of the program to ensure that the execution does sic block.Each proof ensures that the corresponding basic not violate the primary safety policies.And to check these block satisfies its pre-and postconditions.Section 6 com- side conditions statically,we have designed a pointer logic pares our work with related work and section 7 concludes. for PointerC.The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool.It collects 2.Source language-PointerC pointer information in a forward manner.This information includes whether a pointer is null,dangling or effective,and PointerC is a C-like programming language that we have the equality relation between effective pointers.The in- designed as the source language in our work.In order to formation can be used to prove that the program satisfies check more pointer programs statically,some pointer oper- the side conditions in the typing rules and then to support ations are restricted in PointerC [9].These restrictions are value-sensitive static checking.For example,according to based on the premise of not affecting the language function- the pointer logic,the following two Hoare triples ality of PointerC.Variables with pointer type can only be used in assignment,equality comparison,dereference or as [p,q)A [p->next)}p=malloc(...) the parameters of functions;the address-of operator(&)and [fp)A {q)A{q->next}A {p->next)D}

1.Function definition [亚]id(arg){Body}[亚']，that is[亚]StmtList[亚')，where the StmtList is the StmtList in the Body production. 2.Statement List ·sp(Stmt StmtList,业)=sp(StmtList,sp(Stmt,业) ·sp(e,亚)=亚.If e forms the StmtList of a function,.then VC=亚p亚'(see 1 for亚will be generated.. 3.Statement ·assignment:sp(lval=Exp,亚)=亚'，亚'can be calculated using亚according to the assignment rules in the pointer logic. ·condition:sp(if(Bexp){StmtList}else{StmtList2},平)=sp(StmtList1,BexpAΨ)Vsp(StmtList2,一BexpA平) loop:sp(while(Bexp){StmtList,)=-BexpA I,where Iis the loop invariant for pointer-type data. VC1=I and VC2 sp(StmtList,Bexp A DI should be generated at the entry point and the exit of StmtList respectively. 。allocation:sp(Ival=alloc(Type),亚)=亚'，亚can be calculated using亚according to the allocation rules in the pointer logic. 。deallocation:sp(free(Exp),亚)=亚'，亚'can be calculated using亚according to the deallocation rule in the pointer logic. .function call:If the pre-and postconditions of function id are farg)[lvalA and A Qrespectively,then sp(Ival id(Exp),)=(A Q)[arg-Exp][res-Ivall,where Q and Qarg are assertions that have nothing to do with pointers,arg is the formal parameter,and res is the return value (see Figure 1). VC=(farg}A [Ival)A Qn)[arg-Exp]should be generated at the entry point of this statement. ·return:sp(return res,.亚)=亚 Figure 2.The Strongest postcondition calculus and the VC generation of pointer-type data 2.One difficulty of strongest postcondition calculus is the assignment axiom.Instead,different assignment rules need to find a fixpoint for a recursive equation in a loop are used in different cases in our pointer logic.Since statement.The solutions to such equations are usually the pointer analysis is precise,it is easy to determine undecidable,and it is also the primary reason why the which rule to use in a certain case and it is clear how correctness of a program can not be proved automat- to compute 'using in the sp rule for assignment. ically.The loop invariant provided by programmers Please refer to [2]for some details skipped in Figure 2. is used to avoid the difficulty.However,in order to prove the validity of the loop invariant,two VCs must 5.At the entry point of a function,the pointer logic be generated at the entry point and the exit of the loop should check the initial values of the pointer-type for- respectively. mal parameters and local variables.And at the exit of the function,their effectiveness should also be checked 3.The pre-and postconditions of each function have also to avoid memory leaks.But for the space limit,the VC deeply simplified the computation of sp for function generation in Figure 2 does not reflect this. call statement.Briefly speaking,the before the call statement should imply the precondition of the callee, For integer-type data,we adopt a complemented ap- and the callee's postcondition should be used as the proach of Hoare logic-weakest precondition calculus strongest postcondition after the call statement.Cer- (wp)[5]to generate assertions or VCs at each program tainly,we also need to consider the substitution of ac- point.And we only introduce its distinctions from the ap- tual parameters for formal parameters as well as the proach for pointer-type data as follows. substitution of the variable Ival for the variable res(see function call in Figure 2). 1.Since the assertions in a function are calculated back- ward using the postcondition Q of the function,a VC 4.One remarkable distinction of our pointer logic from Pwp(StmtList,Q)is generated at the entry point of Hoare logic is that the pointer logic has no uniform the function,where P is the precondition of the func-

tion and StmtList is the statement sequence of the func- 4.Code and assertion generation tion. 2.The side condition in a typing rule should be combined In our framework,specifications and the proof of code satisfying the specifications are carried in the assembly with the assertion which is at the entry point of the code.The assembly code is divided into basic blocks.Basic corresponding statement.For pointer-type data,such a problem does not exist.Because a pointer-related side block,which is a concept in code optimization and gener- ation,is a sequence of instruction;and in our design,the condition is consistent with the premise of the corre- instruction sequence ends with a control transfer instruction sponding inference rule in the pointer logic,and should have been checked before the rule is chosen. such as jmp or call.Each basic block B has its precon dition P,postcondition Q,and the proof or proof hint of 3.When we compute pointer-related assertions,we need [P}B {Q}.The proof can be checked by a proof checker. According to the principle that the postcondition of a basic to decide which assignment rule to use according to the block should imply the precondition of the succeeding ba- assertion before the assignment.However,there is no need to do that about integer-related assertions using sic block in the control flow.Q can be omitted since we can just take the precondition of the succeeding basic block as wp. In fact,the computations of assertions and VCs for In order to make sure that each basic block has perti- integer-type data and pointer-type data interact with each nent pre-and postconditions,assertions should be gener- other. ated at proper program points during code generation.In this section,we introduce a scheme for generating interme- 1.Both of them should be performed together with type diate code and the corresponding assertions.For the lack checking,since they both need side conditions in the of space,we only consider pointer-related assertions.Us- typing rules.But the backward calculus for integer- ing the calculi in section 3,we can get a proper assertion related assertions will reduce the efficiency of the per- at each program point.Figure 3 shows the scheme for the formance,because the calculus direction does not con- main syntactic structures.The first parameter of the recur- sist with the direction of type checking. sive function Code is a syntactic structure.and the second parameter is the precondition of the structure.The function 2.The pre-and postconditions of functions,loop invari- Code generates intermediate code and assertions at some ants and the Bexps of while and if statements may program points.The generated assertion is at the right side contain both integers and pointers.Such an assertion of the code.The assertion which follows"--"should be should be divided into two parts,each of which con- inserted before the line of the code,and which follows"++" cerns only one kind of data and participates one calcu- should be inserted after the line of the code;and the as- lus. sertion led by "**"is the VC to be proved.The function Code calls function code which uses syntactic structure as 3.The variable p->data makes sense only when p is an the only parameter.The function code only generates in- effective pointer,but in Hoare logic,there is no such a termediate code for the syntactic structure.The function consideration.Therefore.rules in Hoare logic can not Code does not generate assertions between the intermediate be used in the case where pointers are concerned.The code which will obviously be translated into the same ba- pointer logic collects pointer equality information,and sic block,such as the code for assignment statements and this information can help us check whether p->data Boolean expressions (Boolean expression calculation does makes sense.Moreover,when using assignment ax- not use short-circuit here).The description in Figure 3 actu- iom,we can perform alias substitution [2]according ally covers some assertion and VC generations in Figure 2, to the equality information of pointers. hence all of these can be done via a one-pass inspection of source programs during compilation. Using the pointer logic,it is easy to prove the correct- When considering the generation of integer-related as- ness of the VC generation,i.e.,if all of the VCs are proved. sertions and VCs,it is difficult to do all the work about the Hoare triple [P}id(arg){Body}[Q}holds.The VC pointers and integers in one pass,because they are based generator is removed from the TCB,since there is a proof on the calculi in different directions.The compiler can checker at assembly level in our framework.Specifications, do type checking,pointer-related assertion generation,VC VCs and the proofs of VCs at source level will be trans- generation,integer side-condition annotation and interme- lated into equivalent assembly-level counterparts.And the diate code generation in the first pass and generate integer- translated counterparts will be used in the verification at as- related assertions and VCs using the annotation of integer sembly level. side conditions in the second pass