A Certified Thread Library for Multithreaded User Programs Yu Guo Xinyu Jiang Yiyun Chen Chunxiao Lin Department of Computer Science and Technology University of Science and Technology of China Hefei,Anhui 230026,China Aguoyu,wewewe,cxlin3 @mail.ustc.edu.cn yiyun @ustc.edu.cn Abstract language level.Xu et al.[24]proposed a logic system to verify deadlock freedom and convergence by rely-guarantee Ensuring the safety of multithreaded software is a task method.Model checkers [9]are developed to verify con- both important and challenging.Currently,most ap- current programs with a fixed number of threads.Flanagan proaches focus on the safety of multithreaded programs et al.[10]used A-G method to check java multithreaded rather than the runtime based on which those concurrent programs.CCAP [26]applied the A-G method to the as- programs run.In order to fundamentally solve this problem, sembly code based on a concurrent abstract machine with a method of ensuring the safety of the runtime should be de- a built-in thread scheduler.CMAP [7]proposed by Feng veloped.Such a runtime could be organized as a thread and Shao supports thread-modular reasoning with dynamic library typically. thread creation and termination. This paper presents the development and certification of However,most of the previous work concentrates on a simple but realistic thread library.The thread library safe multithreaded programming,not the runtime (thread provides common multi-threading features such as dynamic library).Nonetheless,only when the safety of underlying thread creation,termination and joining as well.This li- thread library is guaranteed,can the programs verified by brary also carries machine-checkable proof which guaran- their methods be safe.The thread library often contains tees the library does not violate the safety policies.This lurking flaws which are subtle and hard to detect and fix paper also presents an approach to link the library to exist- due to its complexity,so a fully certified thread library is of ing certified multithreaded user programs to form an inte- urgent necessity.However,the certifying task is full of chal- grated foundational proof-carrying code (FPCC)package. lenges.A thread library generally involves sophisticated Comparing with the uncertified libraries,our work makes manipulations on memory and machine context.The invari- multithreaded applications much more reliable. ants are subtle and hard to specify.Furthermore,the control flow transfers between threads and the scheduler of thread library increase the certification burden. 1.Introduction In this paper,we propose a simple but realistic certified thread library,named CTL,which implements dynamic Multithreaded software is widely employed in realistic thread creation,termination and joining.CTL is written applications.For example,in a web browser,it is common in low-level code and certified thoroughly in the program that while one thread is displaying images or text,another logic SCAP [8].The certification convinces us that the se- thread is retrieving data from the Internet.However,the mantics of CTL conform to its formal specifications.The safety of multithreaded programs is hard to ensure,since the code and its specifications,as well as corresponding safety interference between the simultaneously executing threads proof are all encoded in a foundational mathematical logic must be taken into account. and packed together to form a foundational proof-carrying Many efforts have been devoted to the verification of code (FPCC)package [1,12],in which a neat and consis- concurrent programs.Jones [13]introduced the composi- tent logic system instead of the entire complicated thread tional rely-guarantee method (or A-G method)[26,7,5]to library has to be trusted.In this way,not only multithreaded describe the state changes performed by the environment programs but also CTL itself can be reasoned about and ver- and by the program respectively.Lamport proposed the ified on a solid and rigorous base. Temporal Logic of Action (TLA)[14]as a logic for speci- Even if we have the individual safety proof of them,it is fying and reasoning about concurrent programs at the high- still inadequate to ensure the safety of the interactions be-
A Certified Thread Library for Multithreaded User Programs Yu Guo Xinyu Jiang Yiyun Chen Chunxiao Lin Department of Computer Science and Technology University of Science and Technology of China Hefei, Anhui 230026, China {guoyu,wewewe,cxlin3}@mail.ustc.edu.cn yiyun@ustc.edu.cn Abstract Ensuring the safety of multithreaded software is a task both important and challenging. Currently, most approaches focus on the safety of multithreaded programs rather than the runtime based on which those concurrent programs run. In order to fundamentally solve this problem, a method of ensuring the safety of the runtime should be developed. Such a runtime could be organized as a thread library typically. This paper presents the development and certification of a simple but realistic thread library. The thread library provides common multi-threading features such as dynamic thread creation, termination and joining as well. This library also carries machine-checkable proof which guarantees the library does not violate the safety policies. This paper also presents an approach to link the library to existing certified multithreaded user programs to form an integrated foundational proof-carrying code (FPCC) package. Comparing with the uncertified libraries, our work makes multithreaded applications much more reliable. 1. Introduction Multithreaded software is widely employed in realistic applications. For example, in a web browser, it is common that while one thread is displaying images or text, another thread is retrieving data from the Internet. However, the safety of multithreaded programs is hard to ensure, since the interference between the simultaneously executing threads must be taken into account. Many efforts have been devoted to the verification of concurrent programs. Jones [13] introduced the compositional rely-guarantee method (or A-G method) [26, 7, 5] to describe the state changes performed by the environment and by the program respectively. Lamport proposed the Temporal Logic of Action (TLA) [14] as a logic for specifying and reasoning about concurrent programs at the highlanguage level. Xu et al. [24] proposed a logic system to verify deadlock freedom and convergence by rely-guarantee method. Model checkers [9] are developed to verify concurrent programs with a fixed number of threads. Flanagan et al. [10] used A-G method to check java multithreaded programs. CCAP [26] applied the A-G method to the assembly code based on a concurrent abstract machine with a built-in thread scheduler. CMAP [7] proposed by Feng and Shao supports thread-modular reasoning with dynamic thread creation and termination. However, most of the previous work concentrates on safe multithreaded programming, not the runtime (thread library). Nonetheless, only when the safety of underlying thread library is guaranteed, can the programs verified by their methods be safe. The thread library often contains lurking flaws which are subtle and hard to detect and fix due to its complexity, so a fully certified thread library is of urgent necessity. However, the certifying task is full of challenges. A thread library generally involves sophisticated manipulations on memory and machine context. The invariants are subtle and hard to specify. Furthermore, the control flow transfers between threads and the scheduler of thread library increase the certification burden. In this paper, we propose a simple but realistic certified thread library, named CTL, which implements dynamic thread creation, termination and joining. CTL is written in low-level code and certified thoroughly in the program logic SCAP [8]. The certification convinces us that the semantics of CTL conform to its formal specifications. The code and its specifications, as well as corresponding safety proof are all encoded in a foundational mathematical logic and packed together to form a foundational proof-carrying code (FPCC) package [1, 12], in which a neat and consistent logic system instead of the entire complicated thread library has to be trusted. In this way, not only multithreaded programs but also CTL itself can be reasoned about and verified on a solid and rigorous base. Even if we have the individual safety proof of them, it is still inadequate to ensure the safety of the interactions be-
Link 2.Basic settings CTL threaded proof program In order to certify CTL in the FPCC framework.we linked SCAP program CMAP formalize all the related concepts into a mechanized meta- logic.In other words,the machine model,the code of CTL OCAP its specifications,program logics and related safety proof Cic(Coq) are all based on a common formal logic,resulting in smaller trusted computing base for safety.In this section,we will Figure 1.The OCAP framework present these basic settings. tween CTL and multithreaded programs due to the absence The mechanized meta-Logic.We use the calculus of in- of safety proof for the code with respect to linkage.Al- ductive constructions (CiC)[21]as our meta-logic.CiC is though the construction of such safety proof is possible,it supported by the Cog proof assistant [3],which we use to is non-trivial because of the differences between specifica- implement the work presented in this paper. tion languages,as well as certification methods'variety. Recently,Feng et al.[6]proposed an open certification The target machine.A MIPS-style [15]target machine framework(OCAP)to support inter-operation of different (TM)is chosen as our machine model on which our thread certification systems.OCAP serves as a common layer in library runs.We omit some physical machine features which different program logics can be embedded. irrelevant to threading,such as address alignment,bits- Based on OCAP,we demonstrate in this paper the link- arithmetic etc..The target machine and its operational se- age between CTL certified in SCAP and user programs cer- mantics are formally defined in Figure 2.A machine pro- tified in CMAP.and construct the extra safety proof of in- gram P contains a code heap C and an updatable state S.A teraction,as shown in Figure 1. code heap C is a segment of memory mapping addresses to The main contributions of our work are as follows: machine instructions,but C is read-only and isolated from the mutable data heap H.The state S,which specifies the We describe,specify,and certify a simple but real- execution of the program,consists of a register file,muta- istic thread library CTL.It provides common multi- ble data heap H and the program counter pc.The target threading features including thread scheduling,dy- machine has 32 general-purpose registers.Following the namic thread creation,termination and joining. MIPS convention,the table below shows the register alias We define formal specifications to capture the seman- and usage. tics of CTL routines.This library carries machine- checkable proof which ensures that the library does not r0 IO always zero vo-v1 r2-r3 return values a0-a3 r4-r7 arguments to-t7 rs-ris temporary violate the safety policies. k0-k1 r26-r27 reserved ra r31 return address Meanwhile,we adapt and embed CMAP in a simpli- The basic TM instruction set covers the common MIPS fied OCAP framework.Thus.CTL can be linked to instructions for arithmetics,jump,conditional branch and the user programs certified in CMAP to construct an load/store.It is easy to add more instructions in our TM. integrated mechanized FPCC package.As far as we The relation PP indicates that the program state P know,CTL is the first thread library which can be steps to the program state P.The relation PP means safely linked to multithreaded user programs. that P reaches P in k steps,and*is the reflexive and We have formalized the work presented in this paper, transitive closure of the step relation.The auxiliary func- including CTL,OCAP.CMAP,and their soundness proof, tion Next(-)specifies the state transition according to the in the Coq proof assistant [3].Interested readers may find operational semantics instruction t.We use the notation S them on our web site [11]. to denote Next(S). The remainder of this paper is organized as follows:Sec- tion 2 presents the formalization of basic settings.In Sec- Safety.Safety of the program means that:(i)the execution tion 3 we describe and certify the CTL library.We dis- of the programs in CTL will not go stuck;(ii)the code of cuss the verification framework CMAP in which the mul- CTL satisfies certain safety specifications.The code heap tithreaded programs are certified in Section 4.Section 5 specification y is a map from code labels f to code specifi- shows the linkage of CTL and user programs in a simpli- cations 0,as defined below: fied OCAP framework.Section 6 is related work and the (CdSpec)6=… conclusion. (CHSpec)Ψ:={fo}
Figure 1. The OCAP framework tween CTL and multithreaded programs due to the absence of safety proof for the code with respect to linkage. Although the construction of such safety proof is possible, it is non-trivial because of the differences between specification languages, as well as certification methods’ variety. Recently, Feng et al. [6] proposed an open certification framework (OCAP) to support inter-operation of different certification systems. OCAP serves as a common layer in which different program logics can be embedded. Based on OCAP, we demonstrate in this paper the linkage between CTL certified in SCAP and user programs certified in CMAP, and construct the extra safety proof of interaction, as shown in Figure 1. The main contributions of our work are as follows: • We describe, specify, and certify a simple but realistic thread library CTL. It provides common multithreading features including thread scheduling, dynamic thread creation, termination and joining. • We define formal specifications to capture the semantics of CTL routines. This library carries machinecheckable proof which ensures that the library does not violate the safety policies. • Meanwhile, we adapt and embed CMAP in a simpli- fied OCAP framework. Thus, CTL can be linked to the user programs certified in CMAP to construct an integrated mechanized FPCC package. As far as we know, CTL is the first thread library which can be safely linked to multithreaded user programs. We have formalized the work presented in this paper, including CTL, OCAP, CMAP, and their soundness proof, in the Coq proof assistant [3]. Interested readers may find them on our web site [11]. The remainder of this paper is organized as follows: Section 2 presents the formalization of basic settings. In Section 3 we describe and certify the CTL library. We discuss the verification framework CMAP in which the multithreaded programs are certified in Section 4. Section 5 shows the linkage of CTL and user programs in a simpli- fied OCAP framework. Section 6 is related work and the conclusion. 2. Basic settings In order to certify CTL in the FPCC framework, we formalize all the related concepts into a mechanized metalogic. In other words, the machine model, the code of CTL, its specifications, program logics and related safety proof are all based on a common formal logic, resulting in smaller trusted computing base for safety. In this section, we will present these basic settings. The mechanized meta-Logic. We use the calculus of inductive constructions (CiC) [21] as our meta-logic. CiC is supported by the Coq proof assistant [3], which we use to implement the work presented in this paper. The target machine. A MIPS-style [15] target machine (TM) is chosen as our machine model on which our thread library runs. We omit some physical machine features irrelevant to threading, such as address alignment, bitsarithmetic etc.. The target machine and its operational semantics are formally defined in Figure 2. A machine program P contains a code heap C and an updatable state S. A code heap C is a segment of memory mapping addresses to machine instructions, but C is read-only and isolated from the mutable data heap H. The state S, which specifies the execution of the program, consists of a register file, mutable data heap H and the program counter pc. The target machine has 32 general-purpose registers. Following the MIPS convention, the table below shows the register alias and usage. r0 r0 always zero v0−v1 r2 −r3 return values a0−a3 r4 −r7 arguments t0−t7 r8 −r15 temporary k0−k1 r26 −r27 reserved ra r31 return address The basic TM instruction set covers the common MIPS instructions for arithmetics, jump, conditional branch and load/store. It is easy to add more instructions in our TM. The relation P 7−→ P ′ indicates that the program state P steps to the program state P ′ . The relation P 7−→k P ′ means that P reaches P ′ in k steps, and 7−→∗ is the reflexive and transitive closure of the step relation. The auxiliary function Nextι( ) specifies the state transition according to the operational semantics instruction ι. We use the notation Sˆ ι to denote Nextι(S). Safety. Safety of the program means that:(i) the execution of the programs in CTL will not go stuck; (ii) the code of CTL satisfies certain safety specifications. The code heap specification Ψ is a map from code labels f to code specifi- cations θ, as defined below: (CdSpec) θ ::= ··· (CHSpec) Ψ ::= {f ❀ θ} ∗
(Program) P =(C,S) &b1 (CodeHeap) 0 :={f+1}" (State) :=(R,H,pc) cth p日分T.T (Memory) H ={1*} hd (RegFile) = frw 34 (Register) ={r4}e031y (Labels)f,1,pc ::n (nat nums) (Word) :=i (integers) (nstr) 1 :=addu rdrsr addiurdrsw Figure 3.Core data structures of CTL subu rd rs r subiurd rsw move rrslirw 1wIt w(rs)SWI(Is) certifications of every routine of CTL.So we concentrate begrs r f bgtz rs f on certifying the yielding routine,which is an essential part jf jalf jrr, of the thread library. (C,但,R,pc)一(C,NextC(pc)(H,Rpc)= 3.1.Overview if C(pc)=1=then Next (H,R.pc)=S= addu rdrsrr (H,R{rdR(r,)+R(r:)}.pc+1) CTL is a lightweight implementation of thread library whose thread model is similar to GNU Pth [4]or FSU 1wr:v(rs) (HRH(R(r,)+w)}:pc+1) when R(r,)+w∈dom(H) pthreads [16],in which threads are implemented in the user- SWI:W(Is) (HR(r:)+R()).R.pc+1) space and the machine context switching is performed by an when R(r:)+w∈dom(Hl) application library without knowledge of the kernel.This beg rs r:f (但,R,pc+1)when R(r)≤R(s) model does not rely on the kernel threads and is adaptable to (H.R.f)when R(r,)>R(r,) various platforms.CTL supports non-preemptive schedul- jalf (H,R{r31+pc+1},f) ing and can directly run on the processors without interrupts jrrs (H.R.R(r,)) handling,for example,the SPIM simulator [20].CTL is fully certified at the assembly level,so the certification of the compiler correctness can be spared. Figure 2.The target machine TM 3.2.Thread model Note that e has different form in different program logic and is defined in Section 5.1. We use pseudo C code to illustrate the prototype of the core data structures and API of CTL. Separation logic.We define some notations common in The core data structures in the CTL space are shown in separation logic [22,19].These notations are defined as Figure 3.The main data structure is a queue,whose el- shorthand and are used in the specifications of CTL to spec- ements are called thread control block(TCB).Each TCB ify the data heap. identifies one dynamic thread and contains one thread id and corresponding executing context.The TCB is proto- HIkA叁AH typed by: AI*A2H.3H1,H2.H1=HAI AH2 IA2 Top H.True struct t_tcb word id; 1→gH.1≠NU儿LAH={1w} /米unique id*/ word pc; /米program counter*/ 1→.H.3w.(HIk1→ word state;/ready,dead or wait * 1一w1,,m兰1→1*1+1一2*..*1十(n-1)一Wm word wait_id;/id of thread to wait * word regfile[28] We use HA if the heap predicate A is valid with H. /array for saving registers * A1 A2 asserts the heapH in which H AI and H2 A2 hold respectively.Top is valid with any heap.1 Meanwhile,an isolated TCB identifies the current running asserts a heap with only one memory cell,at address 1 with thread.Note that the global pointers referring to these data content w. structures are stored in a global pointer array. CTL provides a threading API: 3.CTL:a certified thread library void ctl_yield(void); word ctl_spawn((*void)()); In this section,we give a detailed description of CTL. word ctl_exit(); Because of space limitations,we cannot fully discuss the word ctl_join(int thread_id);
(Program) P ::= (C,S) (CodeHeap) C ::= {f ❀ ι} ∗ (State) S ::= (R,H,pc) (Memory) H ::= {l ❀ w} ∗ (RegFile) R ::= {r ❀ w} ∗ (Register) r ::= {rk} k∈{0...31} (Labels) f,l,pc ::= n (nat nums) (Word) w ::= i (integers) (Instr) ι ::= addu rd rs rt | addiu rd rs w | subu rd rs rt | subiu rd rs w | move rd rs | li rd w | lw rt w(rs) | sw rt w(rs) | beq rs rt f | bgtz rs f | j f | jal f | jr rs (C,(H,R,pc)) 7−→ (C,NextC(pc) (H,R,pc)) = if C(pc) = ι = then Nextι(H,R,pc) = Sˆ ι = addu rd rs rt (H,R{rd ❀R(rs)+R(rt)},pc+1) lw rt w(rs) (H,R{rt ❀H(R(rs)+w)},pc+1) when R(rs)+w ∈ dom(H) sw rt w(rs) (H{R(rs)+w❀R(rt)},R,pc+1) when R(rs)+w ∈ dom(H) beq rs rt f (H,R,pc+1) when R(rs)≤R(rt) (H,R,f) when R(rs)>R(rt) jal f (H,R{r31❀pc+1},f) jr rs (H,R,R(rs)) Figure 2. The target machine TM Note that θ has different form in different program logic and is defined in Section 5.1. Separation logic. We define some notations common in separation logic [22, 19]. These notations are defined as shorthand and are used in the specifications of CTL to specify the data heap. H ° A , A H A1 ∗A2 , λH.∃H1,H2 .H1⊎H2 =H∧H1 ° A1 ∧H2 ° A2 Top , λH.True l 7→ w , λH.l6=NULL∧H={l❀w} l 7→ , λH.∃w.(H ° l 7→ w) l 7→ w1,...,wn , l 7→ w1 ∗ l+1 7→ w2 ∗ ... ∗ l+(n−1) 7→ wn We use H ° A if the heap predicate A is valid with H. A1 ∗ A2 asserts the heap H1 ⊎H2 in which H1 ° A1 and H2 ° A2 hold respectively. Top is valid with any heap. l 7→ w asserts a heap with only one memory cell, at address l with content w. 3. CTL: a certified thread library In this section, we give a detailed description of CTL. Because of space limitations, we cannot fully discuss the Figure 3. Core data structures of CTL certifications of every routine of CTL. So we concentrate on certifying the yielding routine, which is an essential part of the thread library. 3.1. Overview CTL is a lightweight implementation of thread library whose thread model is similar to GNU Pth [4] or FSU pthreads [16], in which threads are implemented in the userspace and the machine context switching is performed by an application library without knowledge of the kernel. This model does not rely on the kernel threads and is adaptable to various platforms. CTL supports non-preemptive scheduling and can directly run on the processors without interrupts handling, for example, the SPIM simulator [20]. CTL is fully certified at the assembly level, so the certification of the compiler correctness can be spared. 3.2. Thread model We use pseudo C code to illustrate the prototype of the core data structures and API of CTL. The core data structures in the CTL space are shown in Figure 3. The main data structure is a queue, whose elements are called thread control block(TCB). Each TCB identifies one dynamic thread and contains one thread id and corresponding executing context. The TCB is prototyped by: struct t_tcb { word id; /* unique id */ word pc; /* program counter */ word state; /* ready, dead or wait */ word wait_id; /* id of thread to wait */ word regfile[28]; /* array for saving registers */ }; Meanwhile, an isolated TCB identifies the current running thread. Note that the global pointers referring to these data structures are stored in a global pointer array. CTL provides a threading API: void ctl_yield(void); word ctl_spawn((* void)()); word ctl_exit(); word ctl_join(int thread_id);
(StPred) p =State→Pop (Thread-id): (GrPred) g=State→State→Pop (RegFileX):=Rfoko1ra} (CdSpec) 0sCAP ::(P,g) (TCB)Tcb:=(pc,成,ts,om) Hf∈dom(Ψ):平:C FsCAP{Ψ'(E)}f:C(E) (Th State)ts ::=ready dead wait (CDHP) (Th Queue)Q:={Tcb}UQ 平FsCAP C:Y t=ja1'(p,g)=ΨE)(p",g")=ΨE+1) S.pS一p'S Figure 5.Specification constructs of CTL S,S.S.pc-f→pS-gsS →p"SA(s".g"SS”→gSS") partial registers file R and a thread state ts.A partial regis- VS,S'.g'SS'-S.R(ra)=S'.R(ra) (CALL) ters file R is a registers file R excluding ro,ko,k1 and ra. Ψ:C HsCAP{p,g)}f:ja1f The registers ko and k1 are preserved for thread scheduling 1=jrra S.pS→gSS and then unusable in user programs.Because the register ro (RET) always holds the value of zero,it needn't to be saved.When :C HsCAP {(p.g))f:jr ra ra is used to call the routines of CTL,its value is saved in the Tcb.pc field.A thread queue Q is a dictionary mapping Figure 4.SCAP o to Tcb and constructed inductively. The routine ctl_yield()causes the current thread to Formal core data structures.We formally specify the yield its execution in favor of another thread with certain core data structures that are used in our thread library CTL scheduling policy.ctl_yield()stores the execution con- As shown in Figure 3,the memory space for CTL is divided text in the queue of the TCBs and picks up one TCB, into four parts,the global pointer array,the isolated TCB for loads it and switches the control to the new continuation. current thread,the thread queue and the free-block list. ct1_yield()adopts the round-robin scheduling algorithm. Thread creation is achieved by using ctl_spawn()with Core(o.Tcb.Q)3gbl.gbl cth,hd,tl.fl a parameter of start code pointer.This function first allo- *QNode(crh,6.Tcb,NULL)TQ(hd,tl,Q)GoodL(fl) cates a new thread control block (TCB).Then the current The global pointers(cth,hd,tl,fl)are saved in the mem- machine context is cloned and stored into the TCB with a ory starting from gbl.cth points to the current thread TCB. new thread id.Lastly,the new TCB is marked with ready hd and tl point to the thread queue.fl points to a memory flag and put into the queue.A thread will be running forever if it does not terminate.The role of ctl_exit()is stopping block list,which is used for dynamic heap allocation. the current thread and marking its TCB with dead flag.The QNode(p.c,(pc,R,ts,ow),q)≌(p-2一q,34) ct1_join()function suspends the calling thread until the *(p一o,pc,ts,ow) specified thread terminates.It takes a thread id as argument. *(p+4→(r1),,蓝(r25),(r28)),(r29),成(r30) 3.3.Certification of CTL Cth(p.c.Tcb)QNode(p.o.Tcb.NULL) SCAP.We use SCAP [8]to certify CTL.SCAP sup- We define the predicate TQ to model the queue for ports modular certification of assembly code with func- threads.TQ has three arguments,the formal queue Q which tion call/return abstraction,making CTL routines well- is isomorphism to the concrete memory data,a queue-head organized.The specification constructs of SCAP and some pointer hd and a queue-tail pointer tl.TOseg models a queue selected SCAP rules are shown in Figure 4.A code spec- segment in the middle of the queue,with a non-null pointer ification OscAP is a pair of two predicates p and g.p is the pointing to the next node.TOseg is defined inductively on precondition,while g is a predicate over the entry state and the structure of Q. the future return state after jr ra.g relates the entry state of a code to the return state of the corresponding procedure.g TOseg(hd,1l,g) can also be treated as a general postcondition parameterized (hd=11)(hd=q) by the entry state. TOseg(~Tcb).hd,tl,q) (hd=1l)A QNode(hd,G,Tcb.q) Specification constructs.Figure 5 describes the specifi- TOseg(Tcb)ua.hd,rl.q) cation constructs of CTL.We use the word value w to spec- 3'.QNode(hd,o,Tcb.q')*TOseg(Q.q,tl.q) ify the thread id o.The state ts of a thread may be ready, TO(Q.hd.tl) dead or wait.A thread control block Tcb consists of a pc,a TOseg(Q.hd,r/,NULL)
(StPred) p ::= State → Prop (GrPred) g ::= State → State → Prop (CdSpec) θSCAP ::= (p,g) ∀ f ∈ dom(Ψ′ ) : Ψ;C ⊢SCAP {Ψ′ (f)}f : C(f) Ψ ⊢SCAP C : Ψ′ (CDHP) ι = jal f′ (p ′ ,g ′ ) = Ψ(f ′ ) (p ′′ ,g ′′) = Ψ(f+1) ∀S. p S → p ′ Sˆ ι ∀S,S ′ . S.pc = f → p S → g ′ Sˆ ι S ′ → p ′′ S ′ ∧(∀S ′′ . g ′′ S ′ S ′′ → g S S′′) ∀S,S ′ .g ′ S S′ → S.R(ra) = S ′ .R(ra) Ψ;C ⊢SCAP {(p,g)}f : jal f′ (CALL) ι = jr ra ∀S.p S → g S Sˆ ι Ψ;C ⊢SCAP {(p,g)}f : jr ra (RET) Figure 4. SCAP The routine ctl_yield() causes the current thread to yield its execution in favor of another thread with certain scheduling policy. ctl_yield() stores the execution context in the queue of the TCBs and picks up one TCB, loads it and switches the control to the new continuation. ctl_yield() adopts the round-robin scheduling algorithm. Thread creation is achieved by using ctl_spawn() with a parameter of start code pointer. This function first allocates a new thread control block (TCB). Then the current machine context is cloned and stored into the TCB with a new thread id. Lastly, the new TCB is marked with ready flag and put into the queue. A thread will be running forever if it does not terminate. The role of ctl_exit() is stopping the current thread and marking its TCB with dead flag. The ctl_join() function suspends the calling thread until the specified thread terminates. It takes a thread id as argument. 3.3. Certification of CTL SCAP. We use SCAP [8] to certify CTL. SCAP supports modular certification of assembly code with function call/return abstraction, making CTL routines wellorganized. The specification constructs of SCAP and some selected SCAP rules are shown in Figure 4. A code specification θSCAP is a pair of two predicates p and g. p is the precondition, while g is a predicate over the entry state and the future return state after jr ra. g relates the entry state of a code to the return state of the corresponding procedure. g can also be treated as a general postcondition parameterized by the entry state. Specification constructs. Figure 5 describes the specifi- cation constructs of CTL. We use the word value w to specify the thread id σ. The state ts of a thread may be ready, dead or wait . A thread control block Tcb consists of a pc, a (Thread-id) σ ::= w (RegFileX) R˜ ::= R\ {r0 ❀ ,k0 ❀ ,k1 ❀ ,ra ❀ } (TCB) Tcb ::= (pc,R˜ ,ts,σw) (Th State) ts ::= ready | dead | wait (Th Queue) Q ::= {· } | {σ ❀ Tcb} ∪ Q Figure 5. Specification constructs of CTL partial registers file R˜ and a thread state ts. A partial registers file R˜ is a registers file R excluding r0, k0, k1 and ra. The registers k0 and k1 are preserved for thread scheduling and then unusable in user programs. Because the register r0 always holds the value of zero, it needn’t to be saved. When ra is used to call the routines of CTL, its value is saved in the Tcb.pc field. A thread queue Q is a dictionary mapping σ to Tcb and constructed inductively. Formal core data structures. We formally specify the core data structures that are used in our thread library CTL. As shown in Figure 3, the memory space for CTL is divided into four parts, the global pointer array, the isolated TCB for current thread, the thread queue and the free-block list. Core(σ,Tcb,Q) , ∃gbl .gbl 7→ cth,hd,tl, f l ∗QNode(cth,σ,Tcb,NULL) ∗ TQ(hd,tl,Q) ∗ GoodL(f l) The global pointers (cth,hd,tl, f l) are saved in the memory starting from gbl. cth points to the current thread TCB. hd and tl point to the thread queue. f l points to a memory block list, which is used for dynamic heap allocation. QNode(p,σ,(pc,R˜ ,ts,σw),q) , (p−2 7→ q,34) ∗(p 7→ σ,pc,ts,σw) ∗(p+4 7→ R˜(r1),...,R˜(r25),R˜(r28),R˜ (r29),R˜(r30)) Cth(p,σ,Tcb) , QNode(p,σ,Tcb,NULL) We define the predicate TQ to model the queue for threads. TQ has three arguments, the formal queue Q which is isomorphism to the concrete memory data, a queue-head pointer hd and a queue-tail pointer tl. TQseg models a queue segment in the middle of the queue, with a non-null pointer pointing to the next node. TQseg is defined inductively on the structure of Q. TQseg( {} ,hd,tl,q) , (hd = tl) ∧ (hd = q) TQseg( {σ ❀ Tcb} ,hd,tl,q) , (hd = tl) ∧ QNode(hd,σ,Tcb,q) TQseg( {σ ❀ Tcb} ∪ Q ,hd,tl,q) , ∃q ′ .QNode(hd,σ,Tcb,q ′ ) ∗ TQseg(Q,q ′ ,tl,q) TQ(Q,hd,tl) , TQseg(Q,hd,tl,NULL)
As defined above,a queue segment consists of several Q.(H Core(_,)Top) nodes modeled by QNode.The routines in the CTL may (R,H,pc) traverse,search,add a TCB node or delete one in the queue, (R',H',pc) .VA,Q,o,Tcb.3o whose structure should be preserved. (H Core(o,Tcb,Q)*A) When a thread is created,the scheduler will allocate a (H Core(Gx:(pc',R',ready,_),O)*A) heap block from the free block list.If a thread is joined by where another one.the scheduler will free its TCB.The definition (pc'R,ready,-)=(QU{(R(ra),R.ready,-)})(c) of free block-list GoodL follows the work by Yu et al.[25] a=(QU{c(R(ra),成,ready,-))八{ox(pc,',ready,-)} and Xiang et al.[23],and it can be ported to our framework directly. ctl_yield()mainly performs context switching.Its precondition py requires that the memory space of CTL should be well-formed,specified by the predicate Core. gy is a condition which specifies the actions performed Implementation of yielding.The yielding routine is used by the whole ctl_yield()routine.We assume that the to schedule threads and perform the context switching.The control flow is transferred to thread ox.whose TCB is code body of the ctl_yield()are presented in Figure 6. Tcb,=(pc,R,ready,-).Then gy specifies the postcondi- The first phase of yielding (from YIELD to APPENDTCB) tions which ctl_yield()must satisfies that:(i)at the exit- consists in loading address of the current TCB to ko,saving ing point of ctl_yield(),the old machine context is added the machine context to the current TCB and loading other into Q;(ii)a ready control block Tcb,which contains the global pointers.The code address of the next instruction of current register file Rx and the code pointer pc,has been the current thread has already been saved in ra.ra has to fetched out of Q;(iii)the memory space of Core is still be saved into TCB firstly,because ra will be used to call well-formed while the irrelevant heap A is unchanged.Ob- SAVECTX viously,the specification of ctl_yield()is independent of The second phase of yielding is appending the current CTL scheduling algorithm.Core is related to the implemen- TCB (by calling APPENDTCB)to the thread queue,search- tation,which should be hidden from user programs. ing for a ready TCB through the queue,and fetching it out By the SCAP rules presented in Figure 4,it can be (by calling FETCHTCB).Since there is at least one ready TCB proved that the yielding routine satisfies these specifica- (consider the one just appended)in the queue,the routine tions. FETCHTCB never fails to return.The algorithm of searching ready thread depends on the scheduling policy.Here the Certification of CTL.The certification of ctl_spawn(), naive FIFO method is used and then the code of searching ctl_exit()and ctl_join()follows this method of certi- is a loop over the thread queue. fying the ctl_yield()routine.We take CcrL as the code The role of the last switch phase(from SWITCH)of yield- heap of CTL and YerL as the code heap specification.By ing is to switch the machine context from the old context the CDHP rule,we have: to the new context fetched by FETCHTCB.Concretely,it sets ko with the address of the new TCB.then makes a tail call PCTL FSCAP CCT:平cT to LOADCTX.Inside LOADCTX,the registers file are restored including ra.The last step is tricky,and when program re- turns,the control flow is transferred to the new thread. 4.CMAP:a concurrent program logic CMAP is a program logic for certifying multithreaded Certification of yielding.Part of the formal specifica- assembly code with unbounded dynamic thread creation tions of ctl_yield()are also presented in Figure 6.Note and termination.It assumes that the register files are that the guarantees in the middle of code body are not listed thread local,i.e.,saving and loading registers during con- because they are unnecessary for the readers to understand text switching.CMAP is based on an abstract machine with the specification,although indispensable to the certification built-in thread operations,e.g.,yield,spawn and exit are process.For the same reason,the initial state (R,H,pc)and treated as atomic primitives.This approach simplify the cer- the final state (R',H',pc'),playing their roles as parameters tification of user programs.However,the operations are not of p and g,are omitted as well. directly supported by most existing hardwares. The specification of ctl_yield()(p,gy)is defined be- In Figure 7,we present a simplified CMAP system, low: which is adapted to our thread model.Notice that the whole
As defined above, a queue segment consists of several nodes modeled by QNode. The routines in the CTL may traverse, search, add a TCB node or delete one in the queue, whose structure should be preserved. When a thread is created, the scheduler will allocate a heap block from the free block list. If a thread is joined by another one, the scheduler will free its TCB. The definition of free block-list GoodL follows the work by Yu et al. [25] and Xiang et al. [23], and it can be ported to our framework directly. Implementation of yielding. The yielding routine is used to schedule threads and perform the context switching. The code body of the ctl_yield() are presented in Figure 6. The first phase of yielding (from YIELD to APPENDTCB) consists in loading address of the current TCB to k0, saving the machine context to the current TCB and loading other global pointers. The code address of the next instruction of the current thread has already been saved in ra. ra has to be saved into TCB firstly, because ra will be used to call SAVECTX. The second phase of yielding is appending the current TCB (by calling APPENDTCB) to the thread queue, searching for a ready TCB through the queue, and fetching it out (by calling FETCHTCB). Since there is at least one ready TCB (consider the one just appended) in the queue, the routine FETCHTCB never fails to return. The algorithm of searching ready thread depends on the scheduling policy. Here the naive FIFO method is used and then the code of searching is a loop over the thread queue. The role of the last switch phase (from SWITCH) of yielding is to switch the machine context from the old context to the new context fetched by FETCHTCB. Concretely, it sets k0 with the address of the new TCB, then makes a tail call to LOADCTX. Inside LOADCTX, the registers file are restored including ra. The last step is tricky, and when program returns, the control flow is transferred to the new thread. Certification of yielding. Part of the formal specifications of ctl_yield() are also presented in Figure 6. Note that the guarantees in the middle of code body are not listed because they are unnecessary for the readers to understand the specification, although indispensable to the certification process. For the same reason, the initial state (R,H,pc) and the final state (R ′ ,H′ ,pc′ ), playing their roles as parameters of p and g, are omitted as well. The specification of ctl_yield() (py ,gy ) is defined below: py , ∃Q.(H ° Core( , ,Q) ∗ Top) gy , λ · (R,H,pc) (R ′ ,H′ ,pc′ ) ¸ .∀A,Q,σ,Tcb.∃σx . · (H ° Core(σ,Tcb,Q) ∗ A) (H′ ° Core(σx,(pc′ ,R˜ ′ ,ready, ),Q ′ ) ∗ A) ¸ where (pc′ ,R˜ ′ ,ready, ) = (Q ∪ {σ ❀ (R(ra),R,ready, )})(σx) Q ′ = (Q ∪ {σ ❀ (R(ra),R˜ ,ready, )}) \ {σx ❀ (pc′ ,R˜ ′ ,ready, )} ctl_yield() mainly performs context switching. Its precondition py requires that the memory space of CTL should be well-formed, specified by the predicate Core. gy is a condition which specifies the actions performed by the whole ctl_yield() routine. We assume that the control flow is transferred to thread σx, whose TCB is Tcbx = (pcx ,R˜ x,ready, ). Then gy specifies the postconditions which ctl_yield() must satisfies that: (i) at the exiting point of ctl_yield(), the old machine context is added into Q; (ii) a ready control block Tcbx, which contains the current register file Rx and the code pointer pcx , has been fetched out of Q; (iii) the memory space of Core is still well-formed while the irrelevant heap A is unchanged. Obviously, the specification of ctl_yield() is independent of CTL scheduling algorithm. Core is related to the implementation, which should be hidden from user programs. By the SCAP rules presented in Figure 4, it can be proved that the yielding routine satisfies these specifications. Certification of CTL. The certification of ctl_spawn(), ctl_exit() and ctl_join() follows this method of certifying the ctl_yield() routine. We take CCTL as the code heap of CTL and ΨCTL as the code heap specification. By the CDHP rule, we have: ΨCTL ⊢SCAP CCTL : ΨCTL 4. CMAP: a concurrent program logic CMAP is a program logic for certifying multithreaded assembly code with unbounded dynamic thread creation and termination. It assumes that the register files are thread local, i.e., saving and loading registers during context switching. CMAP is based on an abstract machine with built-in thread operations, e.g., yield, spawn and exit are treated as atomic primitives. This approach simplify the certification of user programs. However, the operations are not directly supported by most existing hardwares. In Figure 7, we present a simplified CMAP system, which is adapted to our thread model. Notice that the whole