A:16 Hongjin Liang et al. 4.3.A Simple Example Below we give a simple example to illustrate the use of RGSim and its parallel compo- sitionality in verifying concurrent program transformations.The high-level program CiC2 is transformed to Cill C2,using a lock 1 to synchronize the accesses of the shared variable x.We aim to prove CilC2 Er CiC2.That is,although x:=x+2 is im- plemented by two steps of incrementing x in C2,the parallel observer Ci will not print unexpected values.Here we view output events as externally observable behaviors. print(x);x :x 2; 次 1ock(1); 1ock(1); print(x);x :=x+1;x :=x+1; unlock(1); (unlock(1)Xx; To facilitate the proof,we introduce an auxiliary shared variable X at the low level to record the value of x at the time when releasing the lock.It specifies the value of x outside every critical section,thus should match the value of the high-level x after every corresponding action.Here(C)means C is executed atomically.Its semantics follows RGSep [Vafeiadis 2008](or see Section 6.2).The auxiliary variable is read-only and would not affect the external behaviors of the program [Abadi and Lamport 1991]. Thus below we can focus on the instrumented target program with the auxiliary code. By the soundness and compositionality of RGSim,we only need to prove simulations over individual threads,providing appropriate relies and guarantees.We first define the invariant a,which only cares about the value of x when the lock is free. a兰{(o,)|σ(X)=(x)A(σ(1)=0=→o(x)=a(X)} We let the pre-and post-conditions be a as well. The high-level threads can be executed in arbitrary environments with arbitrary guarantees:R=GA True.The transformation uses the lock to protect every access of x,thus the low-level relies and guarantees are not arbitrary: Re{(a,o)la(1)=cid=→ o(x)=σ(x)Ao(X)=o(X)Aσ(1)=σ(1)}: g(a,o')o'=o v a(1)=0no'=of1~cid} Vo(1)=cidno'=afx} Vo(1)=cid∧σ'=o{10,X-}} Every low-level thread guarantees that it updates x only when the lock is acquired. Its environment cannot update x or 1 if the current thread holds the lock.Here cid is the identifier of the current thread.When acquired,the lock holds the identifier of the owner thread. Following the definition,we can prove (C1,R,g):axa (C1,R,G)and (C2.R,g)(C2.R,G).By applying the PAR rule and from the soundness of RGSim(Corollary 4.4),we know C1 C2 ET CIC2 holds for any T that respects 0. Perhaps interestingly,if we omit the lock and unlock operations in C1,then CiC2 would have more externally observable behaviors than CiC2.This does not indicate the unsoundness of our PAR rule(which is sound!).The reason is that x might have different values on the two levels after the environments'a-related transitions,so that we cannot have (print (x),R,g)(print(x),R,G)with the current definitions of a,R and C,even though the code of the two sides is syntactically identical. ACM Transactions on Programming Languages and Systems,Vol.V,No.N,Article A,Publication date:January YYYY
A:16 Hongjin Liang et al. 4.3. A Simple Example Below we give a simple example to illustrate the use of RGSim and its parallel compositionality in verifying concurrent program transformations. The high-level program C1 9C2 is transformed to C1 ∥ C2, using a lock l to synchronize the accesses of the shared variable x. We aim to prove C1 ∥C2 ⊑T C19C2. That is, although x:=x+2 is implemented by two steps of incrementing x in C2, the parallel observer C1 will not print unexpected values. Here we view output events as externally observable behaviors. print(x); 9 x := x + 2; ⇓ lock(l); print(x); unlock(l); ∥ lock(l); x := x+1; x := x+1; ⟨unlock(l); X := x; ⟩ To facilitate the proof, we introduce an auxiliary shared variable X at the low level to record the value of x at the time when releasing the lock. It specifies the value of x outside every critical section, thus should match the value of the high-level x after every corresponding action. Here ⟨C⟩ means C is executed atomically. Its semantics follows RGSep [Vafeiadis 2008] (or see Section 6.2). The auxiliary variable is read-only and would not affect the external behaviors of the program [Abadi and Lamport 1991]. Thus below we can focus on the instrumented target program with the auxiliary code. By the soundness and compositionality of RGSim, we only need to prove simulations over individual threads, providing appropriate relies and guarantees. We first define the invariant α, which only cares about the value of x when the lock is free. α , {(σ, Σ) | σ(X) = Σ(x) ∧ (σ(l)= 0 =⇒ σ(x) = σ(X))} . We let the pre- and post-conditions be α as well. The high-level threads can be executed in arbitrary environments with arbitrary guarantees: R = G , True. The transformation uses the lock to protect every access of x, thus the low-level relies and guarantees are not arbitrary: R , {(σ, σ′ ) | σ(l)=cid =⇒ σ(x)=σ ′ (x) ∧ σ(X)=σ ′ (X) ∧ σ(l)=σ ′ (l)} ; G , {(σ, σ′ ) | σ ′ =σ ∨ σ(l)= 0 ∧ σ ′ =σ{l ❀ cid} ∨ σ(l)=cid ∧ σ ′ =σ{x ❀ } ∨ σ(l)=cid ∧ σ ′ =σ{l ❀ 0, X ❀ }} . Every low-level thread guarantees that it updates x only when the lock is acquired. Its environment cannot update x or l if the current thread holds the lock. Here cid is the identifier of the current thread. When acquired, the lock holds the identifier of the owner thread. Following the definition, we can prove (C1, R, G) ≼α;αnα (C1, R, G) and (C2, R, G) ≼α;αnα (C2, R, G). By applying the PAR rule and from the soundness of RGSim (Corollary 4.4), we know C1 ∥ C2 ⊑T C1 9 C2 holds for any T that respects α. Perhaps interestingly, if we omit the lock and unlock operations in C1, then C1 ∥ C2 would have more externally observable behaviors than C19C2. This does not indicate the unsoundness of our PAR rule (which is sound!). The reason is that x might have different values on the two levels after the environments’ α-related transitions, so that we cannot have (print(x), R, G) ≼α;αnα (print(x), R, G) with the current definitions of α, R and G, even though the code of the two sides is syntactically identical. ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Article A, Publication date: January YYYY
Rely-Guarantee-Based Simulation A:17 The use of the auxiliary variable.The auxiliary variable X helps us define the invari- ant a and do the proof.It is difficult to prove the refinement without this auxiliary variable.One may wish to prove (C1,R,G)a':axa (C1,R,G), (4.7) where a',R'and g'are defined as follows by eliminating X from a,R and g: a'{(a,)|a(1)=0=→σ(x)=(x}: R'{(a,o)|σ(1)=cid=→σ(x)=o'(x)Aa(1)=o(1)}: g'(a,o')a'=o v a(1)=0no'=of1cid} Vo(1)=cidAσ=σ{x÷-} Va(1)=cidAo'=of10} But (4.7)does not hold because(R',R*)a(which is used in Definition 4.2(5))permits unexpected transitions.For instance,we allow ((o,')(>))E(R',R*)a for the fol- lowing o,σ',∑and: o=g'{x0,1cid};∑{x0};'{x1. The high-level environment is allowed to change x even if the thread holds the lock at the low level.Then the left thread may print out different values at the two levels, breaking the simulation(4.7). It is possible to define the RGSim relation in another way that allows us to get rid of the auxiliary variable for this example.Instead of defining separate rely/guarantee relations at the two levels and using a to relate them,we can directly define"relational rely/guarantee"relations r,gEP((LState x LState)x(HState x HState)).The new sim- ulation is in the form ofCCand defined by substitutingr andg for (R and(g,G*)a in Definition 4.2.It has all the nice properties of our current RGSim re- lation (including parallel compositionality)and we no longer need auxiliary variables to prove the simple example.We can prove the new simulations CCi and C2 a'a'Ka'irig C2.Here C2 results from removing X from C2,a'is defined as above and r and g are as follows: r{(a,o),(②,)|(1)=cid=→a(x)=o'(x)Ao(1)=a'(1)A(x)='(x)}: g{(a,a),(②,)|a=oA'=∑Vo(1)=0Na=o{1cid}AY=2 Va(1)=cid∧o'=o{x}A'=∑ Va(1)=cidno'=of10A'=Efx~(x)}}. We can see that if the thread holds the lock at the low level,neither the high-level or the low-level environment can change x.This relational r does not permit the unex- pected transitions discussed before.It is more expressive than(R,R*),but is also much heavier.We choose to present the current RGSim relation because in practice it is usually easier to define separate rely/guarantee conditions at the two levels. More discussions.RGSim ensures that the target program preserves safety proper- ties (including the partial correctness)of the source,but allows a terminating source program to be transformed to a target having infinite silent steps.In the above ex- ample,this allows the low-level programs to be blocked forever (e.g.,at the time when the lock is held but never released by some other thread).Proving the preservation of the termination behavior would require liveness proofs in a concurrent setting (e.g., proving the absence of deadlock),which we leave as future work. In the next three sections,we show more serious examples to demonstrate the ap- plicability of RGSim. ACM Transactions on Programming Languages and Systems,Vol.V,No.N,Article A,Publication date:January YYYY
Rely-Guarantee-Based Simulation A:17 The use of the auxiliary variable. The auxiliary variable X helps us define the invariant α and do the proof. It is difficult to prove the refinement without this auxiliary variable. One may wish to prove (C1, R′ , G ′ ) ≼α′ ;α′nα′ (C1, R, G), (4.7) where α ′ , R′ and G ′ are defined as follows by eliminating X from α, R and G: α ′ , {(σ, Σ) | σ(l)= 0 =⇒ σ(x) = Σ(x)} ; R′ , {(σ, σ′ ) | σ(l)=cid =⇒ σ(x)=σ ′ (x) ∧ σ(l)=σ ′ (l)} ; G ′ , {(σ, σ′ ) | σ ′ =σ ∨ σ(l)= 0 ∧ σ ′ =σ{l ❀ cid} ∨ σ(l)=cid ∧ σ ′ =σ{x ❀ } ∨ σ(l)=cid ∧ σ ′ =σ{l ❀ 0}} . But (4.7) does not hold because ⟨R′ , R ∗ ⟩α′ (which is used in Definition 4.2(5)) permits unexpected transitions. For instance, we allow ((σ, σ′ ),(Σ, Σ ′ )) ∈ ⟨R′ , R ∗ ⟩α′ for the following σ, σ ′ , Σ and Σ ′ : σ = σ ′ , {x ❀ 0, l ❀ cid} ; Σ , {x ❀ 0} ; Σ′ , {x ❀ 1} . The high-level environment is allowed to change x even if the thread holds the lock at the low level. Then the left thread may print out different values at the two levels, breaking the simulation (4.7). It is possible to define the RGSim relation in another way that allows us to get rid of the auxiliary variable for this example. Instead of defining separate rely/guarantee relations at the two levels and using α to relate them, we can directly define “relational rely/guarantee” relations r, g ∈ P((LState×LState)×(HState×HState)). The new simulation is in the form of C ≼α;ζnγ;r;g C and defined by substituting r and g for ⟨R, R ∗ ⟩α and ⟨G, G∗ ⟩α in Definition 4.2. It has all the nice properties of our current RGSim relation (including parallel compositionality) and we no longer need auxiliary variables to prove the simple example. We can prove the new simulations C1 ≼α′ ;α′nα′ ;r;g C1 and C ′ 2 ≼α′ ;α′nα′ ;r;g C2. Here C ′ 2 results from removing X from C2, α ′ is defined as above and r and g are as follows: r , {((σ, σ′ ),(Σ, Σ ′ )) | σ(l)=cid =⇒ σ(x)=σ ′ (x) ∧ σ(l)=σ ′ (l) ∧ Σ(x)=Σ′ (x)} ; g , {((σ, σ′ ),(Σ, Σ ′ )) | σ ′ =σ ∧ Σ ′ =Σ ∨ σ(l)= 0 ∧ σ ′ =σ{l ❀ cid} ∧ Σ ′ =Σ ∨ σ(l)=cid ∧ σ ′ =σ{x ❀ } ∧ Σ ′ =Σ ∨ σ(l)=cid ∧ σ ′ =σ{l ❀ 0} ∧ Σ ′ = Σ{x ❀ σ(x)}} . We can see that if the thread holds the lock at the low level, neither the high-level or the low-level environment can change x. This relational r does not permit the unexpected transitions discussed before. It is more expressive than ⟨R′ , R ∗ ⟩α′ , but is also much heavier. We choose to present the current RGSim relation because in practice it is usually easier to define separate rely/guarantee conditions at the two levels. More discussions. RGSim ensures that the target program preserves safety properties (including the partial correctness) of the source, but allows a terminating source program to be transformed to a target having infinite silent steps. In the above example, this allows the low-level programs to be blocked forever (e.g., at the time when the lock is held but never released by some other thread). Proving the preservation of the termination behavior would require liveness proofs in a concurrent setting (e.g., proving the absence of deadlock), which we leave as future work. In the next three sections, we show more serious examples to demonstrate the applicability of RGSim. ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Article A, Publication date: January YYYY
A:18 Hongjin Liang et al. 5.RELATIONAL REASONING ABOUT OPTIMIZATIONS As a general correctness notion of concurrent program transformations,RGSim estab- lishes a relational approach to justify compiler optimizations on concurrent programs. Below we adapt Benton's work [Benton 2004]on sequential optimizations to the con- current setting. 5.1.Optimization Rules Usually optimizations depend on particular contexts,e.g.,the assignment z:=E can be eliminated only in the context that the value of x is never used after the assignment. In a shared-state concurrent setting,we should also consider the parallel context for an optimization.RGSim enables us to specify various sophisticated requirements for the parallel contexts by rely/guarantee conditions.Based on RGSim,we provide a set of inference rules to characterize and justify common optimizations (e.g.,dead code elimination)with information of both the sequential and the parallel contexts.Note in this section the target and the source programs are in the same language. Sequential Unit Laws (C1,R1,G1)≤a:x1(C2,R2,G2) (C,R1,i)≤ax(C2,R2,92) (skip;C1,R1,91)≤a:x1(C2,R2,92) (C1R1,G1)(skip:C2,R2,Ga) Plus the variants with skip after the code C or C2.That is,skips could be arbitrarily introduced and eliminated. Common Branch V01,02.(a1,02)∈S=→B02≠1 (C.R.G):x(C1R'g)=(n (truenB)) (C,R,9)≤aak(C2,R',9)2=(《n(tueM-B) (C,R,g):(if (B)Ci else C2,R',') This rule says that,when the if-condition can be evaluated and both branches can be optimized to the same code C,we can transform the whole if-statement to C without introducing new behaviors. Known Branch (C,R,g):cx(C1,R',G')=(n(truenB)) (C,R.g):cx(if(B)Ci else C2,R',G') (C,R,G):x(C2,R',)=(n(true-B)) (C,R,g):x (if (B)C1 else C2,R',') Since the if-condition B is true (or false)initially,we can consider the then-branch(or the else-branch)only.These rules can be derived from the Common-Branch rule. Dead While =(sn(true-B))CCa Sta(C,(Ri,Ri)a) (skip,R1,Id):x(while(B){C),R2,Id) We can eliminate the loop,if the loop condition is false(no matter how the environ- ments update the states)at the loop entry point. Loop Peeling (while (B)(C},R1,G1)7(while (B){C),R2,G2) (if (B){C:while (B){C}}else skip,R1,G1)x(while (B){C),R2,G2) ACM Transactions on Programming Languages and Systems,Vol.V,No.N,Article A,Publication date:January YYYY
A:18 Hongjin Liang et al. 5. RELATIONAL REASONING ABOUT OPTIMIZATIONS As a general correctness notion of concurrent program transformations, RGSim establishes a relational approach to justify compiler optimizations on concurrent programs. Below we adapt Benton’s work [Benton 2004] on sequential optimizations to the concurrent setting. 5.1. Optimization Rules Usually optimizations depend on particular contexts, e.g., the assignment x := E can be eliminated only in the context that the value of x is never used after the assignment. In a shared-state concurrent setting, we should also consider the parallel context for an optimization. RGSim enables us to specify various sophisticated requirements for the parallel contexts by rely/guarantee conditions. Based on RGSim, we provide a set of inference rules to characterize and justify common optimizations (e.g., dead code elimination) with information of both the sequential and the parallel contexts. Note in this section the target and the source programs are in the same language. Sequential Unit Laws (C1, R1, G1) ≼α;ζnγ (C2, R2, G2) (skip; C1, R1, G1) ≼α;ζnγ (C2, R2, G2) (C1, R1, G1) ≼α;ζnγ (C2, R2, G2) (C1, R1, G1) ≼α;ζnγ (skip; C2, R2, G2) Plus the variants with skip after the code C1 or C2. That is, skips could be arbitrarily introduced and eliminated. Common Branch ∀σ1, σ2. (σ1, σ2) ∈ ζ =⇒ B σ2 ̸=⊥ (C, R, G) ≼α;ζ1nγ (C1, R′ , G ′ ) ζ1 = (ζ ∩ (true∧∧B)) (C, R, G) ≼α;ζ2nγ (C2, R′ , G ′ ) ζ2 = (ζ ∩ (true∧∧¬B)) (C, R, G) ≼α;ζnγ (if (B) C1 else C2, R′ , G ′ ) This rule says that, when the if-condition can be evaluated and both branches can be optimized to the same code C, we can transform the whole if-statement to C without introducing new behaviors. Known Branch (C, R, G) ≼α;ζnγ (C1, R′ , G ′ ) ζ = (ζ ∩ (true∧∧B)) (C, R, G) ≼α;ζnγ (if (B) C1 else C2, R′ , G ′ ) (C, R, G) ≼α;ζnγ (C2, R′ , G ′ ) ζ = (ζ ∩ (true∧∧¬B)) (C, R, G) ≼α;ζnγ (if (B) C1 else C2, R′ , G ′ ) Since the if-condition B is true (or false) initially, we can consider the then-branch (or the else-branch) only. These rules can be derived from the Common-Branch rule. Dead While ζ = (ζ ∩ (true∧∧¬B)) ζ ⊆ α Sta(ζ,⟨R1, R∗ 2⟩α) (skip, R1, Id) ≼α;ζnζ (while (B){C}, R2, Id) We can eliminate the loop, if the loop condition is false (no matter how the environments update the states) at the loop entry point. Loop Peeling (while (B){C}, R1, G1) ≼α;ζnγ (while (B){C}, R2, G2) (if (B) {C; while (B){C}} else skip, R1, G1) ≼α;ζnγ (while (B){C}, R2, G2) ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Article A, Publication date: January YYYY