cm acm sable Precision-Guided Context Sensitivity for Pointer Analysis YUE LI,Aarhus University,Denmark TIAN TAN,Aarhus University,Denmark ANDERS MOLLER,Aarhus University,Denmark YANNIS SMARAGDAKIS,University of Athens,Greece Context sensitivity is an essential technique for ensuring high precision in Java pointer analyses.It has been observed that applying context sensitivity partially,only on a select subset of the methods,can improve the balance between analysis precision and speed.However,existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset.In this work,we present a more principled approach for identifying precision-critical methods,based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.Accordingly,we provide an efficient algorithm to recognize these flow patterns in a given program and exploit them to yield good tradeoffs between analysis precision and speed. Our experimental results on standard benchmark and real-world programs show that a pointer analysis that applies context sensitivity partially,only on the identified precision-critical methods,preserves effectively all (98.8%)of the precision of a highly-precise conventional context-sensitive pointer analysis(2-object-sensitive with a context-sensitive heap),with a substantial speedup (on average 3.4X,and up to 9.2X). CCS Concepts:.Theory of computation-Program analysis; Additional Key Words and Phrases:static analysis,points-to analysis,Java ACM Reference Format: Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis.2018.Precision-Guided Context Sensitivity for Pointer Analysis.Proc.ACM Program.Lang.2,OOPSLA,Article 141(November 2018),29 pages.https://doi. org/10.1145/3276511 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that estimate the possible values of 14 pointer variables in a program.Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs,and it is used in a wide range of software engineering tools,e.g.,for bug detection [Chandra et al.2009;Naik et al.2006,2009],security analysis [Arzt et al.2014;Grech and Smaragdakis 2017;Livshits and Lam 2005].program verifica- tion [Fink et al.2008;Pradel et al.2012],and program debugging and understanding [Li et al.2016; Sridharan et al.2007]. For decades,numerous analysis techniques have been developed to make pointer analysis more precise and more efficient,especially for object-oriented languages [Hind 2001;Smaragdakis and Balatsouras 2015;Sridharan et al.2013].One of the most successful ideas for producing high precision is context sensitivity [Milanova et al.2002,2005;Sharir and Pnueli 1981;Shivers 1991; Smaragdakis et al.2011],which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method's variables and Authors'email addresses:yueli@cs.au.dk,tiantan@cs.au.dk,amoeller@cs.au dk,smaragd@diuoagr. This work is licensed under a Creative Commons Attribution 4.0 International License. 2018 Copyright held by the owner/author(s). 2475-1421/2018/11-ART141 https:/doi.org10.1145/3276511 Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141 Precision-Guided Context Sensitivity for Pointer Analysis YUE LI, Aarhus University, Denmark TIAN TAN, Aarhus University, Denmark ANDERS MéLLER, Aarhus University, Denmark YANNIS SMARAGDAKIS, University of Athens, Greece Context sensitivity is an essential technique for ensuring high precision in Java pointer analyses. It has been observed that applying context sensitivity partially, only on a select subset of the methods, can improve the balance between analysis precision and speed. However, existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset. In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis. Accordingly, we provide an efficient algorithm to recognize these flow patterns in a given program and exploit them to yield good tradeoffs between analysis precision and speed. Our experimental results on standard benchmark and real-world programs show that a pointer analysis that applies context sensitivity partially, only on the identified precision-critical methods, preserves effectively all (98.8%) of the precision of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap), with a substantial speedup (on average 3.4X, and up to 9.2X). CCS Concepts: • Theory of computation → Program analysis; Additional Key Words and Phrases: static analysis, points-to analysis, Java ACM Reference Format: Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis. 2018. Precision-Guided Context Sensitivity for Pointer Analysis. Proc. ACM Program. Lang. 2, OOPSLA, Article 141 (November 2018), 29 pages. https://doi. org/10.1145/3276511 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that estimate the possible values of pointer variables in a program. Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs, and it is used in a wide range of software engineering tools, e.g., for bug detection [Chandra et al. 2009; Naik et al. 2006, 2009], security analysis [Arzt et al. 2014; Grech and Smaragdakis 2017; Livshits and Lam 2005], program verification [Fink et al. 2008; Pradel et al. 2012], and program debugging and understanding [Li et al. 2016; Sridharan et al. 2007]. For decades, numerous analysis techniques have been developed to make pointer analysis more precise and more efficient, especially for object-oriented languages [Hind 2001; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. One of the most successful ideas for producing high precision is context sensitivity [Milanova et al. 2002, 2005; Sharir and Pnueli 1981; Shivers 1991; Smaragdakis et al. 2011], which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method’s variables and Authors’ email addresses: yueli@cs.au.dk, tiantan@cs.au.dk, amoeller@cs.au.dk, smaragd@di.uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2018 Copyright held by the owner/author(s). 2475-1421/2018/11-ART141 https://doi.org/10.1145/3276511 Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018. This work is licensed under a Creative Commons Attribution 4.0 International License
141:2 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis thereby reduce spurious object flows.However,despite great precision benefits,context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006; Oh et al.2014;Tan et al.2016,2017;Xu and Rountev 2008].One reason is that,with conventional context-sensitivity techniques,every method in a program is treated the same,meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence,too much space and time is consumed [Smaragdakis et al.2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision.It is far from trivial to determine when a context-sensitive analysis will yield precision benefits(or conversely,to determine when omitting context sensitivity for a method would introduce imprecision).This challenge of effectively identifying the precision-critical methods has been the focus of past work [Hassanshahi et al.2017;Jeong et al.2017;Smaragdakis et al.2014;Wei and Ryder 2015].Those techniques are based on heuristics that seem to correlate with imprecision,but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis.For example,introspective analysis [Smaragdakis et al.2014]requires tuning multiple parameters involving sizes of various kinds of points-to sets,and data-driven analysis [Jeong et al. 2017]is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this paper,we provide a more principled approach,named ZIPPER,to efficiently identify precision-critical methods,based on insights about how imprecision is introduced.The key observa- tion is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows,which we call direct,wrapped,and unwrapped flows.Moreover, we show that these three kinds of value flows can be recognized efficiently.Based on information obtained from a fast,context-insentive pointer analysis,ZIPPER constructs a precision flow graph (PFG)that concisely models the relevant value flow.The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time compared to the pointer analysis itself.By applying context sensitivity to the precision-critical methods identified by ZIPPER,a pointer analysis runs significantly faster than conventional tech- niques that apply context sensitivity indiscriminately to all methods,while retaining most of the precision. In summary,we make the following contributions. We describe three fundamental patterns of value flows that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis(Section 2). We present the ZIPPER approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity(Section 3). ZIPPER can guide context-sensitive pointer analysis to run faster while keeping most of its precision.In contrast to other techniques that apply context sensitivity selectively,the ZIPER approach is based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of analysis parameters. We provide an extensive experimental evaluation of our implementation of ZIPPER to evaluate its effectiveness(Section 4).On average,ZIPPER reports that only 38%of the methods are precision-critical,which preserves 98.8%of the precision(measured as average across a range of popular analysis clients)for a 2-object-sensitive pointer analysis with a context-sensitive heap,for a speedup of 3.4X and up to 9.2X.These results demonstrate that the three patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:2 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis thereby reduce spurious object flows. However, despite great precision benefits, context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Oh et al. 2014; Tan et al. 2016, 2017; Xu and Rountev 2008]. One reason is that, with conventional context-sensitivity techniques, every method in a program is treated the same, meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence, too much space and time is consumed [Smaragdakis et al. 2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision. It is far from trivial to determine when a context-sensitive analysis will yield precision benefits (or conversely, to determine when omitting context sensitivity for a method would introduce imprecision). This challenge of effectively identifying the precision-critical methods has been the focus of past work [Hassanshahi et al. 2017; Jeong et al. 2017; Smaragdakis et al. 2014; Wei and Ryder 2015]. Those techniques are based on heuristics that seem to correlate with imprecision, but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis. For example, introspective analysis [Smaragdakis et al. 2014] requires tuning multiple parameters involving sizes of various kinds of points-to sets, and data-driven analysis [Jeong et al. 2017] is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this paper, we provide a more principled approach, named Zipper, to efficiently identify precision-critical methods, based on insights about how imprecision is introduced. The key observation is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows, which we call direct, wrapped, and unwrapped flows. Moreover, we show that these three kinds of value flows can be recognized efficiently. Based on information obtained from a fast, context-insentive pointer analysis, Zipper constructs a precision flow graph (PFG) that concisely models the relevant value flow. The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. By applying context sensitivity to the precision-critical methods identified by Zipper, a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods, while retaining most of the precision. In summary, we make the following contributions. • We describe three fundamental patterns of value flows that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis (Section 2). • We present the Zipper approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity (Section 3). Zipper can guide context-sensitive pointer analysis to run faster while keeping most of its precision. In contrast to other techniques that apply context sensitivity selectively, the Zipper approach is based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of analysis parameters. • We provide an extensive experimental evaluation of our implementation of Zipper to evaluate its effectiveness (Section 4). On average, Zipper reports that only 38% of the methods are precision-critical, which preserves 98.8% of the precision (measured as average across a range of popular analysis clients) for a 2-object-sensitive pointer analysis with a context-sensitive heap, for a speedup of 3.4X and up to 9.2X. These results demonstrate that the three patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:3 2 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS Our approach is based on the key insight that most of the precision loss in context-insensitive pointer analysis for Java can be expressed in terms of three basic patterns of value flows,or as combinations of these.We assume the reader is familiar with state-of-the-art context-sensitive pointer analysis techniques,e.g.,as covered in several surveys [Ryder 2003;Smaragdakis and Balatsouras 2015;Sridharan et al.2013],however,the precision loss patterns are independent of the chosen variant of context sensitivity,such as call-site sensitivity [Sharir and Pnueli 1981;Shivers 1991],object sensitivity [Milanova et al.2005],and type sensitivity [Smaragdakis et al.2011].In this section,we introduce the three precision loss patterns and then describe three corresponding concrete examples(Sections 2.1-2.3).This characterization of precision loss provides the conceptual foundation for ZIPPER to identify precision-critical methods as explained in Section 3. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values(or points-to sets,in the case of pointer object m(object o){ analysis)[Sharir and Pnueli 1981].Figure 1 shows a simple example. 2 return o; If method m is analyzed context-insensitively,then the two objects 3} are mixed together,so the analysis conservatively concludes that 4 x1 new A(): both x2 and y2 may point to both the A object and the B object. 5x2=m(x1): In contrast,a context-sensitive analysis would analyze m twice, 6 y1 new B(); 7y2=m(y1): corresponding to the two different call sites,and thereby conclude Fig.1.Example of precision loss that x2 can only point to an A object and y2 can only point to a B in context-insensitive analysis. object.The price of that extra precision is that the method needs to be analyzed multiple times,so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows,we first introduce some terminology. Definition 2.1(IN and Our methods).Given a class C and a method M that is declared in C or inherited from C's super-classes,if M contains one or more parameters then M is an IN method of C,and if M's return type is non-void then M is an Our method of C.(In the example in Figure 1,m is both an IN and an Our method of the surrounding class.) Definition 2.2(Object wrapping and unwrapping).If an object O is stored in a field of an object W (or in an array entry of W,in case W is an array),then O is wrapped into W.Conversely,if an object O is loaded from a field of an object W(or from an array entry of W in case W is an array),then O is unwrapped from W.(The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place,we can describe the three precision-loss patterns as different kinds of value flows,depicted in Figure 2. Definition 2.3(Direct flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C,and then flows(via a series of assignments,field Direct Wrapped Unwrapped Flow Flow Flow IN Method Objects Value flow via assignments, field load/store operations, OUT Method or method calls/returns Fig.2.Three basic patterns of value flow that cause precision loss in context-insensitive analysis. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:3 2 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS Our approach is based on the key insight that most of the precision loss in context-insensitive pointer analysis for Java can be expressed in terms of three basic patterns of value flows, or as combinations of these. We assume the reader is familiar with state-of-the-art context-sensitive pointer analysis techniques, e.g., as covered in several surveys [Ryder 2003; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013], however, the precision loss patterns are independent of the chosen variant of context sensitivity, such as call-site sensitivity [Sharir and Pnueli 1981; Shivers 1991], object sensitivity [Milanova et al. 2005], and type sensitivity [Smaragdakis et al. 2011]. In this section, we introduce the three precision loss patterns and then describe three corresponding concrete examples (Sections 2.1ś2.3). This characterization of precision loss provides the conceptual foundation for Zipper to identify precision-critical methods as explained in Section 3. A context-insensitive analysis does not distinguish between different calls to a method but merges 1 2 3 4 5 6 7 Object m(Object o){ return o; } x1 = new A(); x2 = m(x1); y1 = new B(); y2 = m(y1); Fig. 1. Example of precision loss in context-insensitive analysis. the incoming abstract values (or points-to sets, in the case of pointer analysis) [Sharir and Pnueli 1981]. Figure 1 shows a simple example. If method m is analyzed context-insensitively, then the two objects are mixed together, so the analysis conservatively concludes that both x2 and y2 may point to both the A object and the B object. In contrast, a context-sensitive analysis would analyze m twice, corresponding to the two different call sites, and thereby conclude that x2 can only point to an A object and y2 can only point to a B object. The price of that extra precision is that the method needs to be analyzed multiple times, so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows, we first introduce some terminology. Definition 2.1 (In and Out methods). Given a class C and a method M that is declared in C or inherited from C’s super-classes, if M contains one or more parameters then M is an In method of C, and if M’s return type is non-void then M is an Out method of C. (In the example in Figure 1, m is both an In and an Out method of the surrounding class.) Definition 2.2 (Object wrapping and unwrapping). If an object O is stored in a field of an object W (or in an array entry ofW , in caseW is an array), then O is wrapped into W . Conversely, if an object O is loaded from a field of an object W (or from an array entry of W in case W is an array), then O is unwrapped from W . (The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place, we can describe the three precision-loss patterns as different kinds of value flows, depicted in Figure 2. Definition 2.3 (Direct flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C, and then flows (via a series of assignments, field Direct Flow Wrapped Flow Unwrapped Flow IN Method OUT Method Value flow via assignments, field load/store operations, or method calls/returns Objects Fig. 2. Three basic patterns of value flow that cause precision loss in context-insensitive analysis. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
141:4 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Direct Flow 1 class Person 15} O ② String name;String id; 16 /Usage Code (19)name1 name2 (24) void setName(String nm){ 17 void main(){ 4 this.name nm: 18 Person p1 new Person(); (3) updateID(): 19 String name1 new String("A"): this.name (4) 6 20 p1.setName(name1): 7 void updateID(){ 21 String id1 p1.getID(): newName (8) 8 String newName this.name: 22 9 this.id newName: 23 Person p2 new Person(); this.id (9) 101 24 String name2 new String("B"): 11 String getID(){ 25 p2.setName(name2): id (12) 12 String id this.id: 26 String id2 p2.getID(): 13 return id: 27 (21)1d1 1d2(26) 14 ①② ①② Fig.3.Example of direct flow.(The line number for each variable/field reference on the right-hand side is shown in parentheses.) load/store operations,method calls,or returns)to the return value of an Our method,M2,of the same class C,then we say the program has direct flow from Mi to M2.(The example in Figure 1 is a simple instance of this pattern.) Definition 2.4(Wrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C and then flows to a store operation that wraps O into an object W,where W subsequently flows to the result of an Our method,M2,of the same class C,then we say the program has wrapped flow from M to M2.More generally,the wrapped flow pattern also covers value flow through multiple object wrapping steps,for example when W is itself wrapped into another object W',which flows to the return value of M2. Definition 2.5(Unwrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C and then flows to a load operation that unwraps an object U from O,where U subsequently flows to the return value of an Our method,M2,of the same class C,then we say the program has unwrapped flow from M to M2.As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. 2.1 Pattern 1:Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis.The Person class provides methods setName and getID to modify a person's name and retrieve his or her ID.Whenever a person's name is modified, the ID is updated accordingly (line 5). After executing this code,id1 in line 21(resp.id2 in line 26)points to object 1in line 19(resp.2 in line 24)only.However,if the three methods of Person are analyzed using a context-insensitive pointer analysis,then id1 and id2 will both imprecisely point to objects 1 and 2.Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects1and 2,respectively pointed to by namel and name2,first flow from their creation sites in lines 19 and 24 to the parameter nm of the IN method setName in line 3,and then to id in line 12 through a series of store and load operations (line 4-line 8-line 9line 12),and finally out of the Our method getID to id1 and id2 in lines 21 and 26.Hence,by Definition 2.3,the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis,objects 1and 2are merged in the same points-to set and further propagated according to this direct flow.In the analysis,the merged objects will Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:4 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis Direct Flow name1 1 2 1 2 class Person { 1 String name; String id; void setName(String nm) { this.name = nm; updateID(); } void updateID() { String newName = this.name; this.id = newName; } String getID() { String id = this.id; return id; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 } // Usage Code void main() { Person p1 = new Person(); String name1 = new String("A"); p1.setName(name1); String id1 = p1.getID(); Person p2 = new Person(); String name2 = new String("B"); p2.setName(name2); String id2 = p2.getID(); } 15 16 17 18 19 20 21 22 23 24 25 26 27 name2 nm this.name newName this.id id id1 id2 2 (24) (3) (4) (8) (9) (12) (21) (26) (19) Fig. 3. Example of direct flow. (The line number for each variable/field reference on the right-hand side is shown in parentheses.) load/store operations, method calls, or returns) to the return value of an Out method, M2, of the same class C, then we say the program has direct flow from M1 to M2. (The example in Figure 1 is a simple instance of this pattern.) Definition 2.4 (Wrapped flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C and then flows to a store operation that wraps O into an object W , where W subsequently flows to the result of an Out method, M2, of the same class C, then we say the program has wrapped flow from M1 to M2. More generally, the wrapped flow pattern also covers value flow through multiple object wrapping steps, for example when W is itself wrapped into another object W ′ , which flows to the return value of M2. Definition 2.5 (Unwrapped flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C and then flows to a load operation that unwraps an object U from O, where U subsequently flows to the return value of an Out method, M2, of the same class C, then we say the program has unwrapped flow from M1 to M2. As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. 2.1 Pattern 1: Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis. The Person class provides methods setName and getID to modify a person’s name and retrieve his or her ID. Whenever a person’s name is modified, the ID is updated accordingly (line 5). After executing this code, id1 in line 21 (resp. id2 in line 26) points to object 1 in line 19 (resp. 2 in line 24) only. However, if the three methods of Person are analyzed using a context-insensitive pointer analysis, then id1 and id2 will both imprecisely point to objects 1 and 2 . Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects 1 and 2 , respectively pointed to by name1 and name2, first flow from their creation sites in lines 19 and 24 to the parameter nm of the In method setName in line 3, and then to id in line 12 through a series of store and load operations (line 4 → line 8 → line 9 → line 12), and finally out of the Out method getID to id1 and id2 in lines 21 and 26. Hence, by Definition 2.3, the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis, objects 1 and 2 are merged in the same points-to set and further propagated according to this direct flow. In the analysis, the merged objects will Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:5 Wrapped Flow 1 class Collection{ 18 Object next(){ ① (② 2 Object elem; 19 return this.next: (25)s1 s2(31) 3 void add(Object el){ 20 this.elem el: 21} (3) 22 //Usage Code Iterator iterator(){ 23 void main(){ this.elem (4) 7 Object e this.elem: 24 Collection c1 new Collection(): 8 Iterator itr 25 String s1 new String("A"): (7) new Iterator(e); 26 c1.add(s1): 9 return itr; 27 Iterator i1 c1.iterator(): (8)itr obi (14) 10 } 28 Object o1 i1.next(); 29 this next(③X②) 11} (15) 12 class Iterator 30 Collection c2 new Collection(): 13 Object next: 1 String s2 new String("B"): 17 (27)(33) 14 Iterator(Object obj) 32 c2.add(s2): ①回0回 1 this.next obj: Iterator 12 c2.iterator(); 16 34 Object o2 12.next(); 01 (28)(34) 17 35 ①@①② Fig.4.Example of wrapped flow. flow out of the Our method,causing id1 and id2 to point to spurious objects.Such imprecision will only get worse when some operations are further applied on idl and id2(not shown in this example),possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow.We consider these to be precision-critical methods,since analyzing just one of them context-insensitively will likely introduce imprecision.With a context-sensitive analysis(for most variants of context sensitivity),in Figure 3,all variables and field references along the direct flow will be analyzed separately.For example,object sensitivity will use the two allocation sites at lines 18 and 23 as contexts.Accordingly,the merged paths along this direct flow are separated by the two contexts,like unzipping a zipper-hence the name of our technique.A similar strategy of separating merged paths also applies to wrapped and unwrapped flows,as shown next. 2.2 Pattern 2:Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis.To keep the example simple,the collection only stores one element,however the code pattern is directly analogous to realistic code,for arbitrarily-sized collections.Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer,next,pointing to the collection element(as set in line 15).The element is passed as an argument to the newly created iterator(line 8),which establishes a connection between the collection and its iterator.Two objects 1(line 25)and (line 31)are stored in two different collections,c1(line 26)and c2 (line 32).The two objects are then accessed by the iterators of the collections(lines 28 and 34). After executing the code,o1 in line 28 (resp.o2 in line 34)points to object1(resp.only.How- ever,if any of the four methods of Collection and Iterator are analyzed context-insensitively,o1 and o2 will both imprecisely point to both objects 1and Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4,similarly to the direct flow example in Figure 3, objects 1 and 2 flow into the IN method add of class Collection,and then further to lines 7, 8,and 14.Unlike a direct flow,the objects 1 and 2 do not directly flow out of the Our method Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:5 s1 s2 o1 o2 itr this.elem el Wrapped Flow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Object next() { return this.next; } } //Usage Code void main() { Collection c1 = new Collection(); String s1 = new String("A"); c1.add(s1); Iterator i1 = c1.iterator(); Object o1 = i1.next(); Collection c2 = new Collection(); String s2 = new String("B"); c2.add(s2); Iterator i2 = c2.iterator(); Object o2 = i2.next(); } class Collection { Object elem; void add(Object el) { this.elem = el; } Iterator iterator(){ Object e = this.elem; Iterator itr = new Iterator(e); return itr; } } class Iterator { Object next; Iterator(Object obj) { this.next = obj; } e obj this.next 1 2 i1 i2 1 2 1 2 1 2 1 2 1 2 (25) (31) (3) (4) (7) (14) (15) (33) (34) (8) (27) (28) Fig. 4. Example of wrapped flow. flow out of the Out method, causing id1 and id2 to point to spurious objects. Such imprecision will only get worse when some operations are further applied on id1 and id2 (not shown in this example), possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow. We consider these to be precision-critical methods, since analyzing just one of them context-insensitively will likely introduce imprecision. With a context-sensitive analysis (for most variants of context sensitivity), in Figure 3, all variables and field references along the direct flow will be analyzed separately. For example, object sensitivity will use the two allocation sites at lines 18 and 23 as contexts. Accordingly, the merged paths along this direct flow are separated by the two contexts, like unzipping a zipperÐhence the name of our technique. A similar strategy of separating merged paths also applies to wrapped and unwrapped flows, as shown next. 2.2 Pattern 2: Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis. To keep the example simple, the collection only stores one element, however the code pattern is directly analogous to realistic code, for arbitrarily-sized collections. Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer, next, pointing to the collection element (as set in line 15). The element is passed as an argument to the newly created iterator (line 8), which establishes a connection between the collection and its iterator. Two objects 1 (line 25) and 2 (line 31) are stored in two different collections, c1 (line 26) and c2 (line 32). The two objects are then accessed by the iterators of the collections (lines 28 and 34). After executing the code, o1 in line 28 (resp. o2 in line 34) points to object 1 (resp. 2 ) only. However, if any of the four methods of Collection and Iterator are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1 and 2 . Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4, similarly to the direct flow example in Figure 3, objects 1 and 2 flow into the In method add of class Collection, and then further to lines 7, 8, and 14. Unlike a direct flow, the objects 1 and 2 do not directly flow out of the Out method Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018