A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:11 ZIPPER Object Flow Graph Precision Flow Graph Context-Insensitive OFG PFG Graph hich me的 Context-Sensitive (OFG) Reachability Pointer Analysis (PFG) Construction Construction Pointer Analysis on PFG Fig.6.Overview of ZIPPER. 4.1 Overview of ZIPPER The goal of ZIPPER is to efficiently recognize the precision-critical methods in a given program.The central part of ZIPPER is the notion of precision flow graphs(PFGs)that allow us to express all three precision loss patterns in a uniform way,in the sense that each kind of flow can be represented by a path in a PFG.Intuitively,a PFG is much like the right-hand side graphs of Figures 3-5,but replacing the field expressions by the abstract objects and their fields.Via the PFGs,we can convert the problem of identifying precision-critical methods to an abstract graph computation.All methods that are involved in one of the three kinds of flows can be efficiently extracted by solving a simple graph reachability problem on the PFGs. Constructing the PFGs requires information about how objects flow in the program.We leverage the concept of object flow graphs(OFGs)[Tonella and Potrich 2005]as explained in Section 4.2. The OFG for a program allows tracing the flow of objects through local assignments,calls and returns,and field load and store operations in the program.Therefore,it can naturally express the direct flow pattern,in a static analysis that approximates the dynamic flows of objects.However, the original OFG formulation does not represent wrapped and unwrapped flows,thus we cannot directly use it to identify precision-critical methods.For this reason,we build the PFGs on top of the OFG to uniformly express all three precision loss patterns. Figure 6 shows the overall structure of ZIPPER,which itself contains three components:the object flow graph construction,the precision flow graph construction,and the graph reachability computation.First,a fast but imprecise context-insensitive pointer analysis is performed as a pre-analysis for ZIPPER.To simplify the discussion,we assume that the pre-analysis abstracts objects by their allocation-sites [Chase et al.1990],but our technique also works for other object abstractions [Kanvar and Khedker 2016].This pre-analysis provides the information for the OFG construction,in the form of a map pt(v)that captures the points-to set for each variable v.Based on the OFG,a PFG is constructed for each class.Afterwards,ZIPPER computes graph reachability on each PFG to determine which methods are precision-critical.Finally,a selective context-sensitive pointer analysis is performed,guided by ZIPPER's results,so that the pointer analysis applies context sensitivity to only the precision-critical methods reported by ZIPPER. 4.2 Object Flow Graphs The object flow graph(OFG)of a program,as in its original form by Tonella and Potrich [2005],is a directed graph that expresses how objects flow in the program.The nodes in the OFG represent program pointers,which can point to objects,and the edges represent basic object flow among the pointers.More precisely,the OFG contains a node for each variable in the program and for each field of each abstract object.Objects are abstracted in the same way as in the pre-analysis, as described in Section 4.1:we here assume allocation-site abstraction is being used,which is the most common choice,but the technique also works for other choices.An edge a-b in the OFG means that the objects pointed by pointer a may flow to(and also be pointed to by)pointer b. As part of a two-phase analysis,the pre-analysis of ZiPPER needs to be fast;so currently we use a context-insensitive pointer analysis as pre-analysis.It would be interesting future work to further explore the effect of more precise pre-analysis. ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:11 Object Flow Graph (OFG) Construction Precision Flow Graph (PFG) Construction Graph Reachability on PFG OFG PFG ZIPPER Context-Insensitive Pointer Analysis Context-Sensitive Pointer Analysis which methods need contexts Fig. 6. Overview of Zipper. 4.1 Overview of Zipper The goal of Zipper is to efficiently recognize the precision-critical methods in a given program. The central part of Zipper is the notion of precision flow graphs (PFGs) that allow us to express all three precision loss patterns in a uniform way, in the sense that each kind of flow can be represented by a path in a PFG. Intuitively, a PFG is much like the right-hand side graphs of Figures 3–5, but replacing the field expressions by the abstract objects and their fields. Via the PFGs, we can convert the problem of identifying precision-critical methods to an abstract graph computation. All methods that are involved in one of the three kinds of flows can be efficiently extracted by solving a simple graph reachability problem on the PFGs. Constructing the PFGs requires information about how objects flow in the program. We leverage the concept of object flow graphs (OFGs) [Tonella and Potrich 2005] as explained in Section 4.2. The OFG for a program allows tracing the flow of objects through local assignments, calls and returns, and field load and store operations in the program. Therefore, it can naturally express the direct flow pattern, in a static analysis that approximates the dynamic flows of objects. However, the original OFG formulation does not represent wrapped and unwrapped flows, thus we cannot directly use it to identify precision-critical methods. For this reason, we build the PFGs on top of the OFG to uniformly express all three precision loss patterns. Figure 6 shows the overall structure of Zipper, which itself contains three components: the object flow graph construction, the precision flow graph construction, and the graph reachability computation. First, a fast but imprecise context-insensitive pointer analysis is performed as a pre-analysis for Zipper. 1 To simplify the discussion, we assume that the pre-analysis abstracts objects by their allocation-sites [Chase et al. 1990], but our technique also works for other object abstractions [Kanvar and Khedker 2016]. This pre-analysis provides the information for the OFG construction, in the form of a map pt(v) that captures the points-to set for each variable v. Based on the OFG, a PFG is constructed for each class. Afterwards, Zipper computes graph reachability on each PFG to determine which methods are precision-critical. Finally, a selective context-sensitive pointer analysis is performed, guided by Zipper’s results, so that the pointer analysis applies context sensitivity to only the precision-critical methods reported by Zipper. 4.2 Object Flow Graphs The object flow graph (OFG) of a program, as in its original form by Tonella and Potrich [2005], is a directed graph that expresses how objects flow in the program. The nodes in the OFG represent program pointers, which can point to objects, and the edges represent basic object flow among the pointers. More precisely, the OFG contains a node for each variable in the program and for each field of each abstract object. Objects are abstracted in the same way as in the pre-analysis, as described in Section 4.1: we here assume allocation-site abstraction is being used, which is the most common choice, but the technique also works for other choices. An edge a→b in the OFG means that the objects pointed by pointer a may flow to (and also be pointed to by) pointer b. 1As part of a two-phase analysis, the pre-analysis of Zipper needs to be fast; so currently we use a context-insensitive pointer analysis as pre-analysis. It would be interesting future work to further explore the effect of more precise pre-analysis. ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:12 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis (v Variable node o.f Object field node Program Object Flow Graph Local assignment b=a; a ( a b; ① (b) (a p) Assignments in b c.m(a) c.f a; ② ② method calls/returns m(p){return r:} (C→mhi d c; b ③ ⊙ c.f d; ④ Field load b=a.f; where o e pt(a) @.f -b e=d.f;⑤ Field store a.f b; 0.f assuming o e pt(c)and o ept(d) where o e pt(a) Fig.7.Object flow graph construction,with an example. Another way to view the OFG is that it is the subset constraint graph in an Andersen-style points-to analysis [Andersen 1994;Sridharan et al.2013]. Tonella and Potrich [2005]propose to build the OFG with more precision by cloning the variables of a method for each of its receiver objects(conceptually like object sensitivity [Milanova et al. 2002,2005]),so that the flow involved in different receiver objects of the same method can be distinguished.However,this is unnecessary for ZIPPER,since it builds the OFG based on the results of a context-insensitive analysis,and all flow queries are done at the class level instead of the object level,as explained in Section 3.Therefore,we perform no such cloning. Due to the close connection between OFGs and Andersen-style analysis,constructing the OFG is trivial,based on the points-to relation pt(v)provided by the context-insensitive pre-analysis Figure 7 illustrates this construction.The left-hand side of Figure 7 lists(from left to right)the four basic object flows,the related Java statements that induce the flows,and the corresponding graph edges in the OFG. Consider the code fragment and its corresponding OFG on the right-hand side of Figure 7.There are five statements labeled 1-G.and each statement causes an edge(with the same label)to be added to the OFG.With the OFG,the object flow information can be directly obtained simply by checking graph reachability without the need to explicitly track alias information among variables or field accesses.For example,variable e is reachable from b in the OFG,which means that the objects pointed to by b may flow to (and also be pointed to by)e. As a result,direct flows can be expressed naturally by the paths in the OFG,however,that is not the case for wrapped and unwrapped flows.In the next section,we describe how to augment the OFG to express all three kinds of flows. 4.3 Precision Flow Graphs and Graph Reachability We first explain how to construct precision flow graphs(PFGs)and then how to identify precision- critical methods by performing graph reachability on each PFG. Precision Flow Graph Construction.As explained in Section 4.2,one OFG is built for the entire program.Since the PFGs serve to express the three kinds of precision-loss patterns,which are all defined relative to a class,as explained in Section 3,we construct one PFG for each class in the program.As the OFG can already describe direct flow(Section 4.2),the task of building the PFG is to add edges that can express the other two kinds of flows:wrapped and unwrapped flows. Algorithm 1(PFGBUILDER)shows how to build PFGe for a given class c.For simplicity,we represent ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:12 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis Local assignment Assignments in method calls/returns Field load Field store b = a; b = c.m(a); m(p){ return r;} … b = a.f; a.f = b; p a b r a b b b v Variable node o.f Object field node Program Object Flow Graph a = b; c.f = a; d = c; c.f = d; e = d.f; a 1 2 3 4 5 b d e c 1 2 3 4 5 c mthis where o ∈ pt(a) o.f o.f o.f where o ∈ pt(a) assuming o ∈ pt(c) and o ∈ pt(d) Fig. 7. Object flow graph construction, with an example. Another way to view the OFG is that it is the subset constraint graph in an Andersen-style points-to analysis [Andersen 1994; Sridharan et al. 2013]. Tonella and Potrich [2005] propose to build the OFG with more precision by cloning the variables of a method for each of its receiver objects (conceptually like object sensitivity [Milanova et al. 2002, 2005]), so that the flow involved in different receiver objects of the same method can be distinguished. However, this is unnecessary for Zipper, since it builds the OFG based on the results of a context-insensitive analysis, and all flow queries are done at the class level instead of the object level, as explained in Section 3. Therefore, we perform no such cloning. Due to the close connection between OFGs and Andersen-style analysis, constructing the OFG is trivial, based on the points-to relation pt(v) provided by the context-insensitive pre-analysis. Figure 7 illustrates this construction. The left-hand side of Figure 7 lists (from left to right) the four basic object flows, the related Java statements that induce the flows, and the corresponding graph edges in the OFG. Consider the code fragment and its corresponding OFG on the right-hand side of Figure 7. There are five statements labeled 1 – 5 , and each statement causes an edge (with the same label) to be added to the OFG. With the OFG, the object flow information can be directly obtained simply by checking graph reachability without the need to explicitly track alias information among variables or field accesses. For example, variable e is reachable from b in the OFG, which means that the objects pointed to by b may flow to (and also be pointed to by) e. As a result, direct flows can be expressed naturally by the paths in the OFG, however, that is not the case for wrapped and unwrapped flows. In the next section, we describe how to augment the OFG to express all three kinds of flows. 4.3 Precision Flow Graphs and Graph Reachability We first explain how to construct precision flow graphs (PFGs) and then how to identify precisioncritical methods by performing graph reachability on each PFG. Precision Flow Graph Construction. As explained in Section 4.2, one OFG is built for the entire program. Since the PFGs serve to express the three kinds of precision-loss patterns, which are all defined relative to a class, as explained in Section 3, we construct one PFG for each class in the program. As the OFG can already describe direct flow (Section 4.2), the task of building the PFG is to add edges that can express the other two kinds of flows: wrapped and unwrapped flows. Algorithm 1 (PfgBuilder) shows how to build PFG𝑐 for a given class𝑐. For simplicity, we represent ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:13 Algorithm 1:PFGBUILDER OFG (Object Flow Graph) Input c (Input class) (Set of statements in the program) Output:PFGe (Precision Flow Graph for class c) 1PFGc←-{h,VisitedNodes←-{},WUEdges←-{} 2 foreach m∈INe do foreach parameter p of m do LDFs(Np)where Np is the OFG node for p 5 return PFGc 6 Function DFs(N) if NE VisitedNodes then return 9 add N to VisitedNodes 10 if N is a variable node Na then foreach b a.f es do /Handling unwrapped flow Ladd Na一→to WUEdges 13 foreach b.f aeS do /Handling wrapped flow foreach o∈pt(b)do L add Na NIo]to WUEdges 16 foreach N→N'∈OFGU WUEdges do addN→N'to PFGc DEs(N) the PFG and the OFG by their sets of graph edges,and the graph nodes are implicitly those that appear in the edge sets. Three sets are initialized to empty sets in line 1:the PFG edges,the set of visited nodes,and WUEdges,which denotes a set of extra edges for wrapped and unwrapped flows.As all three kinds of flows begin from the parameters of an IN method(see Section 3),the algorithm starts by iterating through those methods(lines 2-3,where INc denotes the set of IN methods of the input class c). Function DEs(line 6)traverses the input OFG and adds the edges for wrapped and unwrapped flows.As a result,the returned PFGc(line 5)includes all the nodes that can be reached from each parameter of IN methods of c,through direct,wrapped,and unwrapped flows,or combinations of these.Specifically,unwrapped and wrapped flows are handled in lines 11-12 and lines 13-15, respectively,by adding the corresponding edges to WUEdges.Finally,the generated PFG includes direct flows(from the OFG)and wrapped/unwrapped flows(from WUEdges)via the statements in lines 16-17.Now let us see the details of handling wrapped and unwrapped flows. Recall that each OFG node represents either a variable or a field of an abstract object.If node N in line 10 is a variable node Na,then for every load operation (b=a.f in line 11)that may load the(unwrapped)objects(which are stored in a field of an object pointed to by a)to variable b, we add an edge from node Na to node No.This allows us to model unwrapped flow,as defined in Definition 3.5 and illustrated in Section 3.3. The most intricate part of the algorithm is lines 13-15,which handle wrapped flows.If node N in line 10 is a variable node Na,then for every store operation(b.f a in line 13)that can store the objects(pointed to by a)in wrapper objects o pointed to by b(line 14),we add an edge from ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:13 Algorithm 1: PfgBuilder Input : OFG (Object Flow Graph) 𝑐 (Input class) S (Set of statements in the program) Output : PFG𝑐 (Precision Flow Graph for class 𝑐) 1 PFG𝑐 ← { }, VisitedNodes ← { }, WUEdges ← { } 2 foreach m ∈ In𝑐 do 3 foreach parameter p of m do 4 Dfs(N𝑝 ) where N𝑝 is the OFG node for 𝑝 5 return PFG𝑐 6 Function Dfs(N) 7 if N ∈ VisitedNodes then 8 return 9 add N to VisitedNodes 10 if N is a variable node Na then 11 foreach b = a.f ∈ S do // Handling unwrapped flow 12 add Na → Nb to WUEdges 13 foreach b.f = a ∈ S do // Handling wrapped flow 14 foreach o ∈ pt(b) do 15 add Na → N [o] to WUEdges 16 foreach N → N ′ ∈ OFG ∪ WUEdges do 17 add N → N ′ to PFG𝑐 18 Dfs(N ′ ) the PFG and the OFG by their sets of graph edges, and the graph nodes are implicitly those that appear in the edge sets. Three sets are initialized to empty sets in line 1: the PFG edges, the set of visited nodes, and WUEdges, which denotes a set of extra edges for wrapped and unwrapped flows. As all three kinds of flows begin from the parameters of an In method (see Section 3), the algorithm starts by iterating through those methods (lines 2–3, where In𝑐 denotes the set of In methods of the input class 𝑐). Function Dfs (line 6) traverses the input OFG and adds the edges for wrapped and unwrapped flows. As a result, the returned PFG𝑐 (line 5) includes all the nodes that can be reached from each parameter of In methods of 𝑐, through direct, wrapped, and unwrapped flows, or combinations of these. Specifically, unwrapped and wrapped flows are handled in lines 11–12 and lines 13–15, respectively, by adding the corresponding edges to WUEdges. Finally, the generated PFG includes direct flows (from the OFG) and wrapped/unwrapped flows (from WUEdges) via the statements in lines 16–17. Now let us see the details of handling wrapped and unwrapped flows. Recall that each OFG node represents either a variable or a field of an abstract object. If node 𝑁 in line 10 is a variable node 𝑁a, then for every load operation (b = a.f in line 11) that may load the (unwrapped) objects (which are stored in a field of an object pointed to by a) to variable b, we add an edge from node 𝑁a to node 𝑁b. This allows us to model unwrapped flow, as defined in Definition 3.5 and illustrated in Section 3.3. The most intricate part of the algorithm is lines 13–15, which handle wrapped flows. If node 𝑁 in line 10 is a variable node 𝑁a, then for every store operation (b.f = a in line 13) that can store the objects (pointed to by a) in wrapper objects o pointed to by b (line 14), we add an edge from ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:14 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis OFG PFG C1.elem item Itr.next itr C2.elem added for wrapped flow Fig.8.A partial PFG for class Collection in Figure 4(wrapped flow).C1,C2,and Itr denote the objects of classes Collection and Iterator allocated in lines 24,30,and 8 in Figure 4,respectively. node Na to Nol.Here we use the notation [o]to denote the variable that the abstract object o was originally assigned to when created:for example,if o is created at a statement v=new ..then [o]is the variable v.These added edges enable tracking wrapped flow as defined in Definition 3.4 and illustrated in Section 3.2.As an example,for the object wrapping this.next obj of line 15 in Figure 4,pt(this)contains an abstract object created at itr new Iterator(e)in line 8,so we add an edge from obj to itr. Note that if(in line 15)instead of adding an edge from Na to Nol we had added an edge from Na to No(mirroring the handling of unwrapped flows),we would miss some flows.Conceptually, according to Definition 3.4,modeling wrapped flow requires tracking the wrapper objects(from where they are created)rather than the variable b in the store operation b.f a(line 13).For example,in the case of Figure 4,consider the store operation this.next obj(line 15)where this(line 15)and itr (line 8)both point to the Iterator object created in line 8.If we added an edge from node Nobj to node Nthis(rather than Nobj to Nitr),the flow tracking from Nthis would not lead to the return statement(line 9)in the OuT method,because the wrapped flow flows out through node Nitr in this case.However,it is safe to add an edge to node Nitr instead(as we do in line 15 in Algorithm 1)since the wrapper object is originally assigned to itr,so that the flow of the wrapper object is taken into account as required by Definition 3.4. Through Algorithm 1,we can see that wrapped and unwrapped flows can be naturally expressed in the PFG by handling the store/load operations(lines 10-15)recursively during the graph traversal. In addition,the newly added edges for wrapped and unwrapped flows build new connections with existing OFG edges that model direct flows.As a result,the generated PFG also naturally expresses combinations of all three kinds of flows. Figure 8 shows a partial PFG example for class Collection from Figure 4.The existing OFG is constructed following the rules in Figure 7.In Figure 8,in the three object field nodes(C1.elem, C2.elem,and Itr.next),the abstract objects respectively denoted by C1,C2,and Itr represent the objects of classes Collection and Iterator.Node obj corresponds to Na in line 15 in Algorithm 1; the edge from node obj to node Itr.next corresponds to the store operation this.next obj in line 15 in Figure 4,and also the store operation b.f a in line 13 in Algorithm 1.According to line 15 in Algorithm 1,an edge that enables tracking the wrapped flow is added in Figure 8 from node obj to node itr,since [Itr]is the variable itr. Graph Reachability on Precision Flow Graphs.We now explain how ZIPPER extracts the precision- critical methods based on the PFGs.Generally,ZIPPER first computes all the nodes that are involved in the three kinds of flows by solving a simple graph reachability problem on the PFG,and then collects the methods that contain the nodes as the precision-critical methods. Given a class c,each flow in the precision-loss patterns corresponds to a path from a parameter node of an IN method of c to a return variable node of an Our method of c in PFGe.Therefore, obtaining the statements that are involved in the flows is equivalent to computing which nodes are reachable from a parameter of an IN method and can also reach a return variable of an OUT ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:14 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis item C1.elem C2.elem e obj Itr.next itr OFG PFG added for wrapped flow Fig. 8. A partial PFG for class Collection in Figure 4 (wrapped flow). C1, C2, and Itr denote the objects of classes Collection and Iterator allocated in lines 24, 30, and 8 in Figure 4, respectively. node 𝑁a to 𝑁[o] . Here we use the notation [o] to denote the variable that the abstract object o was originally assigned to when created: for example, if o is created at a statement v = new . . . then [o] is the variable v. These added edges enable tracking wrapped flow as defined in Definition 3.4 and illustrated in Section 3.2. As an example, for the object wrapping this.next = obj of line 15 in Figure 4, pt(this) contains an abstract object created at itr = new Iterator(e) in line 8, so we add an edge from obj to itr. Note that if (in line 15) instead of adding an edge from 𝑁a to 𝑁[o] we had added an edge from 𝑁a to 𝑁b (mirroring the handling of unwrapped flows), we would miss some flows. Conceptually, according to Definition 3.4, modeling wrapped flow requires tracking the wrapper objects (from where they are created) rather than the variable b in the store operation b.f = a (line 13). For example, in the case of Figure 4, consider the store operation this.next = obj (line 15) where this (line 15) and itr (line 8) both point to the Iterator object created in line 8. If we added an edge from node 𝑁obj to node 𝑁this (rather than 𝑁obj to 𝑁itr), the flow tracking from 𝑁this would not lead to the return statement (line 9) in the Out method, because the wrapped flow flows out through node 𝑁itr in this case. However, it is safe to add an edge to node 𝑁itr instead (as we do in line 15 in Algorithm 1) since the wrapper object is originally assigned to itr, so that the flow of the wrapper object is taken into account as required by Definition 3.4. Through Algorithm 1, we can see that wrapped and unwrapped flows can be naturally expressed in the PFG by handling the store/load operations (lines 10–15) recursively during the graph traversal. In addition, the newly added edges for wrapped and unwrapped flows build new connections with existing OFG edges that model direct flows. As a result, the generated PFG also naturally expresses combinations of all three kinds of flows. Figure 8 shows a partial PFG example for class Collection from Figure 4. The existing OFG is constructed following the rules in Figure 7. In Figure 8, in the three object field nodes (C1.elem, C2.elem, and Itr.next), the abstract objects respectively denoted by C1, C2, and Itr represent the objects of classes Collection and Iterator. Node obj corresponds to 𝑁a in line 15 in Algorithm 1; the edge from node obj to node Itr.next corresponds to the store operation this.next = obj in line 15 in Figure 4, and also the store operation b.f = a in line 13 in Algorithm 1. According to line 15 in Algorithm 1, an edge that enables tracking the wrapped flow is added in Figure 8 from node obj to node itr, since [Itr] is the variable itr. Graph Reachability on Precision Flow Graphs. We now explain how Zipper extracts the precisioncritical methods based on the PFGs. Generally, Zipper first computes all the nodes that are involved in the three kinds of flows by solving a simple graph reachability problem on the PFG, and then collects the methods that contain the nodes as the precision-critical methods. Given a class 𝑐, each flow in the precision-loss patterns corresponds to a path from a parameter node of an In method of 𝑐 to a return variable node of an Out method of 𝑐 in PFG𝑐 . Therefore, obtaining the statements that are involved in the flows is equivalent to computing which nodes are reachable from a parameter of an In method and can also reach a return variable of an Out ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020