A Principled Approach to Selective Context Sensitivity for Pointer Analysis YUE LI,Nanjing University,China and Aarhus University,Denmark TIAN TAN",Nanjing University,China and Aarhus University,Denmark ANDERS MOLLER,Aarhus University,Denmark YANNIS SMARAGDAKIS,University of Athens,Greece Context sensitivity is an essential technique for ensuring high precision in static analyses.It has been observed that applying context sensitivity partially,only on a select subset of the methods,can improve the balance between analysis precision and speed.However,existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset.In this work,we present a more principled approach for identifying precision-critical methods,based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.Using this theoretical foundation, we present an efficient algorithm,ZIPPER,to recognize these flow patterns in a given program and employ context sensitivity accordingly.We also present a variant,ZIPER,that additionally takes into account which methods are disproportionally costly to analyze with context sensitivity. Our experimental results on standard benchmark and real-world Java programs show that ZIPPER preserves effectively all of the precision(98.8%)of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap,2obj for short),with a substantial speedup(on average, 3.4x and up to 9.4x),and that ZIPPERe preserves 94.7%of the precision of 2obj,with an order-of-magnitude speedup (on average,25.5x and up to 88x).In addition,for 10 programs that cannot be analyzed by 2obj within a 3 hours time limit,on average ZIPPER can guide 2obj to finish analyzing them in less than 11 minutes with high precision compared to context-insensitive and introspective context-sensitive analyses. CCS Concepts:.Theory of computation-Program analysis. Additional Key Words and Phrases:static analysis,points-to analysis,Java ACM Reference Format: Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis.2020.A Principled Approach to Selective Context Sensitivity for Pointer Analysis.ACM Trans.Program.Lang.Syst.1,1,Article 1 (January 2020),40 pages. https:/doi.org/10.1145/3381915 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that compute abstractions of the possible values of pointer variables in a program.Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs,and it is used in a wide range of software engineering tools,e.g.,for bug detection [Chandra et al.2009;Naik et al.2006,2009]. security analysis [Arzt et al.2014;Grech and Smaragdakis 2017;Livshits and Lam 2005],program "Corresponding author Authors'email addresses:yueli@nju.edu.cn,tiantan@nju.edu.cn,amoeller@cs.au.dk,smaragd@di.uoagr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2020 Association for Computing Machinery. 0164-0925/2020/1-ART1$15.00 https:/doi.org/10.1145/3381915 ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1 A Principled Approach to Selective Context Sensitivity for Pointer Analysis YUE LI, Nanjing University, China and Aarhus University, Denmark TIAN TAN∗ , Nanjing University, China and Aarhus University, Denmark ANDERS MØLLER, Aarhus University, Denmark YANNIS SMARAGDAKIS, University of Athens, Greece Context sensitivity is an essential technique for ensuring high precision in static analyses. It has been observed that applying context sensitivity partially, only on a select subset of the methods, can improve the balance between analysis precision and speed. However, existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset. In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis. Using this theoretical foundation, we present an efficient algorithm, Zipper, to recognize these flow patterns in a given program and employ context sensitivity accordingly. We also present a variant, Zipper𝑒 , that additionally takes into account which methods are disproportionally costly to analyze with context sensitivity. Our experimental results on standard benchmark and real-world Java programs show that Zipper preserves effectively all of the precision (98.8%) of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap, 2obj for short), with a substantial speedup (on average, 3.4× and up to 9.4×), and that Zipper𝑒 preserves 94.7% of the precision of 2obj, with an order-of-magnitude speedup (on average, 25.5× and up to 88×). In addition, for 10 programs that cannot be analyzed by 2obj within a 3 hours time limit, on average Zipper𝑒 can guide 2obj to finish analyzing them in less than 11 minutes with high precision compared to context-insensitive and introspective context-sensitive analyses. CCS Concepts: • Theory of computation → Program analysis. Additional Key Words and Phrases: static analysis, points-to analysis, Java ACM Reference Format: Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2020. A Principled Approach to Selective Context Sensitivity for Pointer Analysis. ACM Trans. Program. Lang. Syst. 1, 1, Article 1 (January 2020), 40 pages. https://doi.org/10.1145/3381915 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that compute abstractions of the possible values of pointer variables in a program. Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs, and it is used in a wide range of software engineering tools, e.g., for bug detection [Chandra et al. 2009; Naik et al. 2006, 2009], security analysis [Arzt et al. 2014; Grech and Smaragdakis 2017; Livshits and Lam 2005], program ∗Corresponding author Authors’ email addresses: yueli@nju.edu.cn, tiantan@nju.edu.cn, amoeller@cs.au.dk, smaragd@di.uoa.gr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 0164-0925/2020/1-ART1 $15.00 https://doi.org/10.1145/3381915 ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
12 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis verification [Fink et al.2008;Pradel et al.2012],and program debugging and understanding [Li et al.2016;Sridharan et al.2007]. For decades,numerous analysis techniques have been developed to make pointer analysis more precise and more efficient,especially for object-oriented languages [Hind 2001;Smaragdakis and Balatsouras 2015;Sridharan et al.2013].One of the most successful ideas for producing high precision is context sensitivity [Milanova et al.2002,2005;Sharir and Pnueli 1981;Shivers 1991; Smaragdakis et al.2011],which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method's variables and thereby reduce spurious object flows.However,despite great precision benefits,context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006; Oh et al.2014;Tan et al.2016,2017;Xu and Rountev 2008].One reason is that,with conventional context-sensitivity techniques,every method in a program is treated the same,meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence,too much space and time is consumed [Smaragdakis et al.2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision.It is far from trivial to determine when a context-sensitive analysis will yield precision benefits(or conversely,to determine when omitting context sensitivity for a method would introduce imprecision).This challenge of effectively identifying precision-critical methods has been the focus of past work [Hassanshahi et al. 2017;Jeon et al.2019;Smaragdakis et al.2014;Wei and Ryder 2015].Those techniques are based on heuristics that seem to correlate with imprecision,but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis.For example,introspective analysis [Smaragdakis et al.2014]requires tuning multiple parameters involving sizes of various kinds of points-to sets,and data-driven analysis [Jeong et al. 2017]is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this article,we provide a new more principled approach,named ZIPPER,to efficiently identify precision-critical methods,based on insights about how imprecision is introduced.The key observa- tion is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows,which we call direct,wrapped,and unwrapped flows.Moreover, we show that these three kinds of value flows can be recognized efficiently.Based on information obtained from a fast,context-insentive pointer analysis,ZIPPER constructs a precision flow graph (PFG)that concisely models the relevant value flow.The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. Additionally,we provide a variant of ZIPPER,named ZIPPERe,which also identifies efficiency- critical methods,i.e.,methods that are costly to analyze with context sensitivity.With ZIPPER, only the methods that are precision-critical but not efficiency-critical will be analyzed context- sensitively.By applying context sensitivity only to the methods selected by ZIPPER or ZIPpERe,a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods,while retaining most of the attainable precision. In summary,we make the following key contributions. We describe three general patterns of value flow that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis(Section 3). We present the ZIPPER approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity(Section 4). ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:2 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis verification [Fink et al. 2008; Pradel et al. 2012], and program debugging and understanding [Li et al. 2016; Sridharan et al. 2007]. For decades, numerous analysis techniques have been developed to make pointer analysis more precise and more efficient, especially for object-oriented languages [Hind 2001; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. One of the most successful ideas for producing high precision is context sensitivity [Milanova et al. 2002, 2005; Sharir and Pnueli 1981; Shivers 1991; Smaragdakis et al. 2011], which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method’s variables and thereby reduce spurious object flows. However, despite great precision benefits, context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Oh et al. 2014; Tan et al. 2016, 2017; Xu and Rountev 2008]. One reason is that, with conventional context-sensitivity techniques, every method in a program is treated the same, meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence, too much space and time is consumed [Smaragdakis et al. 2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision. It is far from trivial to determine when a context-sensitive analysis will yield precision benefits (or conversely, to determine when omitting context sensitivity for a method would introduce imprecision). This challenge of effectively identifying precision-critical methods has been the focus of past work [Hassanshahi et al. 2017; Jeon et al. 2019; Smaragdakis et al. 2014; Wei and Ryder 2015]. Those techniques are based on heuristics that seem to correlate with imprecision, but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis. For example, introspective analysis [Smaragdakis et al. 2014] requires tuning multiple parameters involving sizes of various kinds of points-to sets, and data-driven analysis [Jeong et al. 2017] is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this article, we provide a new more principled approach, named Zipper, to efficiently identify precision-critical methods, based on insights about how imprecision is introduced. The key observation is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows, which we call direct, wrapped, and unwrapped flows. Moreover, we show that these three kinds of value flows can be recognized efficiently. Based on information obtained from a fast, context-insentive pointer analysis, Zipper constructs a precision flow graph (PFG) that concisely models the relevant value flow. The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. Additionally, we provide a variant of Zipper, named Zipper𝑒 , which also identifies efficiencycritical methods, i.e., methods that are costly to analyze with context sensitivity. With Zipper𝑒 , only the methods that are precision-critical but not efficiency-critical will be analyzed contextsensitively. By applying context sensitivity only to the methods selected by Zipper or Zipper𝑒 , a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods, while retaining most of the attainable precision. In summary, we make the following key contributions. • We describe three general patterns of value flow that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis (Section 3). • We present the Zipper approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity (Section 4). ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 13 ZIPPER can guide context-sensitive pointer analysis to run faster while keeping most of its precision. We propose the ZIPPER express(ZIPPER)approach,which is developed on the basis of ZIPPER but additionally considers analysis cost when selecting the methods to be analyzed with context sensitivity(Section 5).Generally,ZIPPERe can guide context-sensitive pointer analysis to run extremely fast while being reasonably precise. In contrast to other techniques that apply context sensitivity selectively,the ZIPPER and ZIPPERe approaches are based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of multiple and complex analysis parameters. We provide an extensive experimental evaluation to evaluate the effectiveness of ZIPPER and ZIPPER(Section 6). On average,ZIPPER reports that only 38%of the methods are precision-critical,which preserves 98.8%of the precision(measured as average across a range of popular analysis clients)for a 2-object-sensitive pointer analysis with a context-sensitive heap(2obj),for a speedup of 3.4x and up to 9.2x.These results demonstrate that the three general patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. On average,ZIPPERe reports that only 14%of the methods are precision-critical but not efficiency-critical,which preserves 94.7%of the precision for 2obj for a speedup of 25.5x and up to 88x.In addition,for 10 programs that 2obj is unable to analyze within 3 hours,on average,ZIPPER can guide 2obj to finish analyzing them under 11 minutes with still good precision.Moreover,for some programs,ZIPPERe-guided pointer analysis can even run faster while being more precise than context-insensitive analysis.These results establish ZIPPERe as a new sweet spot in state-of-the-art pointer analyses,for making highly practical trade-offs between efficiency and precision. The present article extends and supersedes the paper by Li et al.[2018a]presented at the ACM Conference on Object-Oriented Programming,Languages,Systems,and Applications 2018(OOPSLA 2018).In comparison,this article contains the following major extensions(and reflects an updated, more complete understanding throughout the rest of the text): We add a brief literature review of the history and current trends in context-sensitive pointer analysis for Java(Section 2). We present the new pointer analysis variant,ZIPPERe(ZIPPER express),which is extremely fast with also good precision(Section 5). We investigate whether the precision-loss patterns of ZIPPER,as its theoretical foundation, are general enough to effectively identify the precision-critical methods and thus preserve the precision for other mainstream context-sensitivity variants,including call-site sensitivity, type sensitivity and hybrid context sensitivity(Section 6.5). We conduct new experiments and extensively evaluate ZIPPERe in terms of efficiency and precision in practice(Section 6.6). 2 CONTEXT SENSITIVITY:A BRIEF REVIEW In this section,we describe how context-sensitivity has evolved for Java pointer analysis from purely improving precision to making good precision and efficiency trade-offs.In addition to giving a brief review to discuss this important and intricate research topic,we also provide some necessary background for our work.We focus on whole-program analysis,which is the main application setting of this research topic [Smaragdakis and Balatsouras 2015;Sridharan et al.2013]. ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:3 Zipper can guide context-sensitive pointer analysis to run faster while keeping most of its precision. • We propose the Zipper express (Zipper𝑒 ) approach, which is developed on the basis of Zipper but additionally considers analysis cost when selecting the methods to be analyzed with context sensitivity (Section 5). Generally, Zipper𝑒 can guide context-sensitive pointer analysis to run extremely fast while being reasonably precise. In contrast to other techniques that apply context sensitivity selectively, the Zipper and Zipper𝑒 approaches are based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of multiple and complex analysis parameters. • We provide an extensive experimental evaluation to evaluate the effectiveness of Zipper and Zipper𝑒 (Section 6). – On average, Zipper reports that only 38% of the methods are precision-critical, which preserves 98.8% of the precision (measured as average across a range of popular analysis clients) for a 2-object-sensitive pointer analysis with a context-sensitive heap (2obj), for a speedup of 3.4× and up to 9.2×. These results demonstrate that the three general patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. – On average, Zipper𝑒 reports that only 14% of the methods are precision-critical but not efficiency-critical, which preserves 94.7% of the precision for 2obj for a speedup of 25.5× and up to 88×. In addition, for 10 programs that 2obj is unable to analyze within 3 hours, on average, Zipper𝑒 can guide 2obj to finish analyzing them under 11 minutes with still good precision. Moreover, for some programs, Zipper𝑒 -guided pointer analysis can even run faster while being more precise than context-insensitive analysis. These results establish Zipper𝑒 as a new sweet spot in state-of-the-art pointer analyses, for making highly practical trade-offs between efficiency and precision. The present article extends and supersedes the paper by Li et al. [2018a] presented at the ACM Conference on Object-Oriented Programming, Languages, Systems, and Applications 2018 (OOPSLA 2018). In comparison, this article contains the following major extensions (and reflects an updated, more complete understanding throughout the rest of the text): • We add a brief literature review of the history and current trends in context-sensitive pointer analysis for Java (Section 2). • We present the new pointer analysis variant, Zipper𝑒 (Zipper express), which is extremely fast with also good precision (Section 5). • We investigate whether the precision-loss patterns of Zipper, as its theoretical foundation, are general enough to effectively identify the precision-critical methods and thus preserve the precision for other mainstream context-sensitivity variants, including call-site sensitivity, type sensitivity and hybrid context sensitivity (Section 6.5). • We conduct new experiments and extensively evaluate Zipper𝑒 in terms of efficiency and precision in practice (Section 6.6). 2 CONTEXT SENSITIVITY: A BRIEF REVIEW In this section, we describe how context-sensitivity has evolved for Java pointer analysis from purely improving precision to making good precision and efficiency trade-offs. In addition to giving a brief review to discuss this important and intricate research topic, we also provide some necessary background for our work. We focus on whole-program analysis, which is the main application setting of this research topic [Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:4 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis There are two major factors that control the precision of pointer analysis:how to abstract the heap,and how to abstract the call stack.Compared with the few approaches on heap abstraction for Java pointer analysis [Kanvar and Khedker 2016;Lhotak and Hendren 2003;Tan et al.2017], there is ample literature on stack modeling.Among the schemes for modeling the stack,in the past years,context sensitivity has been considered as the most effective approach to improve the analysis precision for Java programs [Lhotak and Hendren 2006]and has thus attracted the most attention from researchers [Bravenboer and Smaragdakis 2009;Hassanshahi et al.2017;Jeon et al. 2019,2018;Jeong et al.2017;Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006;Li et al. 2018b;Milanova et al.2002,2005;Smaragdakis and Balatsouras 2015;Smaragdakis et al.2011,2014; Sridharan et al.2013;Tan et al.2016,2017;Thakur and Nandivada 2019;Thiessen and Lhotak 2017]. We summarize the research work on context-sensitive pointer analysis into three categories according to(1)the kinds of context elements,(2)the composition of context elements,and (3)the selection of which parts of a given program to analyze with context sensitivity.Interestingly,the techniques in different categories are orthogonal and can thus be combined for producing more sophisticated pointer analyses. Kinds of Context Elements.Many elements in a program,e.g.,call sites,allocation sites,and types, can be considered as context elements in context sensitivity.The earliest type of context elements for Java pointer analysis inherits from the foundational approaches used both for C and for functional languages [Sharir and Pnueli 1981;Shivers 1991]where context elements are call sites.We refer to this style of context sensitivity as call-site sensitivity or call-string sensitivity [Smaragdakis and Balatsouras 2015].Unlike C programs,which often predominantly manipulate values in the stack. Java programs have their inter-procedural data flow primarily through heap objects.Accordingly. Milanova et al.[2005]proposed object sensitivity that uses allocation sites of receiver objects(the program points where these objects are created)as context elements.Generally,object sensitivity is more precise and efficient than call-site sensitivity,and is considered the most effective context- sensitivity variant for producing good precision for Java.This conclusion has been extensively validated in various pointer analysis frameworks [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006;Naik et al.2006;Sridharan et al.2013;Tan et al.2016,2017]. Despite being precise,object sensitivity is difficult to scale for large and complex Java programs, which motivated the concept of type sensitivity [Smaragdakis et al.2011].In type sensitivity,the context elements are the types of the classes containing allocation sites(as the latter would appear in object sensitivity).This more coarse-grained choice of context elements enables type sensitivity to run much faster by sacrificing some precision compared to object sensitivity.Today,call-site, object,and type sensitivity are still considered the three mainstream context-sensitivity variants for Java pointer analysis [Jeong et al.2017;Smaragdakis and Balatsouras 2015;Tan et al.2017]. Composition of Context Elements.With context sensitivity,each method can be analyzed in multiple contexts,where a context consists of a list of context elements that model the run-time call stack.For example,in call-string sensitivity [Sharir and Pnueli 1981;Shivers 1991],a context for a method m is composed by the context elements that are listed as [c1,c2,...],where c is m's call site,cz is the call site of the method that contains c,etc.To ensure termination of the analysis,a k-limiting approach is typically followed:only the most recent k context elements are selected,and, for efficiency,k is usually set to a very small value in practice.The call-string sensitivity approach has been adopted in virtually all context-sensitive pointer analyses [Smaragdakis and Balatsouras 2015;Sridharan et al.2013].However,Tan et al.[2016]found that this conventional approach leads to many context elements that are not useful for improving precision while occupying the limited slots of context elements determined by k.To address this problem,Tan et al.[2016]presented an approach to identify such redundant context elements.By skipping those elements,a resulting ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:4 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis There are two major factors that control the precision of pointer analysis: how to abstract the heap, and how to abstract the call stack. Compared with the few approaches on heap abstraction for Java pointer analysis [Kanvar and Khedker 2016; Lhoták and Hendren 2003; Tan et al. 2017], there is ample literature on stack modeling. Among the schemes for modeling the stack, in the past years, context sensitivity has been considered as the most effective approach to improve the analysis precision for Java programs [Lhoták and Hendren 2006] and has thus attracted the most attention from researchers [Bravenboer and Smaragdakis 2009; Hassanshahi et al. 2017; Jeon et al. 2019, 2018; Jeong et al. 2017; Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Li et al. 2018b; Milanova et al. 2002, 2005; Smaragdakis and Balatsouras 2015; Smaragdakis et al. 2011, 2014; Sridharan et al. 2013; Tan et al. 2016, 2017; Thakur and Nandivada 2019; Thiessen and Lhoták 2017]. We summarize the research work on context-sensitive pointer analysis into three categories according to (1) the kinds of context elements, (2) the composition of context elements, and (3) the selection of which parts of a given program to analyze with context sensitivity. Interestingly, the techniques in different categories are orthogonal and can thus be combined for producing more sophisticated pointer analyses. Kinds of Context Elements. Many elements in a program, e.g., call sites, allocation sites, and types, can be considered as context elements in context sensitivity. The earliest type of context elements for Java pointer analysis inherits from the foundational approaches used both for C and for functional languages [Sharir and Pnueli 1981; Shivers 1991] where context elements are call sites. We refer to this style of context sensitivity as call-site sensitivity or call-string sensitivity [Smaragdakis and Balatsouras 2015]. Unlike C programs, which often predominantly manipulate values in the stack, Java programs have their inter-procedural data flow primarily through heap objects. Accordingly, Milanova et al. [2005] proposed object sensitivity that uses allocation sites of receiver objects (the program points where these objects are created) as context elements. Generally, object sensitivity is more precise and efficient than call-site sensitivity, and is considered the most effective contextsensitivity variant for producing good precision for Java. This conclusion has been extensively validated in various pointer analysis frameworks [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Naik et al. 2006; Sridharan et al. 2013; Tan et al. 2016, 2017]. Despite being precise, object sensitivity is difficult to scale for large and complex Java programs, which motivated the concept of type sensitivity [Smaragdakis et al. 2011]. In type sensitivity, the context elements are the types of the classes containing allocation sites (as the latter would appear in object sensitivity). This more coarse-grained choice of context elements enables type sensitivity to run much faster by sacrificing some precision compared to object sensitivity. Today, call-site, object, and type sensitivity are still considered the three mainstream context-sensitivity variants for Java pointer analysis [Jeong et al. 2017; Smaragdakis and Balatsouras 2015; Tan et al. 2017]. Composition of Context Elements. With context sensitivity, each method can be analyzed in multiple contexts, where a context consists of a list of context elements that model the run-time call stack. For example, in call-string sensitivity [Sharir and Pnueli 1981; Shivers 1991], a context for a method 𝑚 is composed by the context elements that are listed as [𝑐1, 𝑐2, . . . ], where 𝑐1 is 𝑚’s call site, 𝑐2 is the call site of the method that contains 𝑐1, etc. To ensure termination of the analysis, a 𝑘-limiting approach is typically followed: only the most recent 𝑘 context elements are selected, and, for efficiency, 𝑘 is usually set to a very small value in practice. The call-string sensitivity approach has been adopted in virtually all context-sensitive pointer analyses [Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. However, Tan et al. [2016] found that this conventional approach leads to many context elements that are not useful for improving precision while occupying the limited slots of context elements determined by 𝑘. To address this problem, Tan et al. [2016] presented an approach to identify such redundant context elements. By skipping those elements, a resulting ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:5 1 object m(object o){ 2 return o: 3} 4 x1 new A() 5x2=m(x1): 6 y1 new B(); 7y2=m(y1): Fig.1.Example of precision loss in context-insensitive analysis context-sensitive pointer analysis is guaranteed to be at least as precise as(practically,more precise than)the conventional context sensitivity for the same choice of k.Similarly,Jeon et al.[2018] developed a machine-learning approach to select context elements,leading to a context-sensitive pointer analysis that is both reasonably efficient and precise. Selective Use of Context Sensitivity.The conventional approach to context sensitive analysis is to uniformly apply context sensitivity to every method in the given program.However,in recent years,researchers have become aware that for some methods,context sensitivity does not help improve the analysis precision but only introduces extra analysis cost.As a result,many selective context-sensitive pointer analyses have been proposed and contribute to a promising research direction for making practical trade-offs between precision and efficiency [Hassanshahi et al.2017;Jeon et al.2019;Jeong et al.2017;Li et al.2018b;Smaragdakis et al.2014].In these analyses,context sensitivity is applied only for the methods where it is deemed beneficial to the overall analysis precision.However,finding such precision-critical methods is challenging.The existing approaches to selective context-sensitive pointer analyses rely on heuristics [Hassanshahi et al.2017;Smaragdakis et al.2014]or machine learning [Jeon et al.2019;Jeong et al.2017].The heuristic approaches require manual tuning of multiple complicated parameters.The machine learning approaches are able to reveal some program features that may correlate with the analysis effectiveness;however,the weaknesses of machine learning approaches are also well known:they requires training and manual oversight during the tuning phase,they can behave unpredictably for new inputs,and they offer few insights on why they work. In this article,we present a more principled approach that explains when using context sensitivity for a method is beneficial for precision,or conversely,when omitting context sensitivity introduces imprecision. 3 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS To address the challenge of how to predict which methods are precision-critical in a given program, in this section,we introduce a general model to show that most of the precision loss in context- insensitive pointer analysis for Java can be expressed in terms of three patterns of value flows,or as combinations of these.Precision-loss patterns are independent of the chosen variant of context sensitivity,such as call-site sensitivity [Sharir and Pnueli 1981;Shivers 1991],object sensitivity [Mi- lanova et al.2005],and type sensitivity [Smaragdakis et al.2011].We first introduce the three precision loss patterns and then describe three corresponding concrete examples(Sections 3.1-3.3). This characterization of precision loss provides the conceptual foundation for ZIPPER and ZIPPER to identify the methods that will be analyzed with context sensitivity,as explained in Sections 4 and 5,respectively. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values(or points-to sets,in the case of pointer analysis)[Sharir and Pnueli 1981].Figure 1 shows a simple example.If method m is analyzed context-insensitively,then ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:5 1 2 3 4 5 6 7 Object m(Object o){ return o; } x1 = new A(); x2 = m(x1); y1 = new B(); y2 = m(y1); Fig. 1. Example of precision loss in context-insensitive analysis. context-sensitive pointer analysis is guaranteed to be at least as precise as (practically, more precise than) the conventional context sensitivity for the same choice of 𝑘. Similarly, Jeon et al. [2018] developed a machine-learning approach to select context elements, leading to a context-sensitive pointer analysis that is both reasonably efficient and precise. Selective Use of Context Sensitivity. The conventional approach to context sensitive analysis is to uniformly apply context sensitivity to every method in the given program. However, in recent years, researchers have become aware that for some methods, context sensitivity does not help improve the analysis precision but only introduces extra analysis cost. As a result, many selective context-sensitive pointer analyses have been proposed and contribute to a promising research direction for making practical trade-offs between precision and efficiency [Hassanshahi et al. 2017; Jeon et al. 2019; Jeong et al. 2017; Li et al. 2018b; Smaragdakis et al. 2014]. In these analyses, context sensitivity is applied only for the methods where it is deemed beneficial to the overall analysis precision. However, finding such precision-critical methods is challenging. The existing approaches to selective context-sensitive pointer analyses rely on heuristics [Hassanshahi et al. 2017; Smaragdakis et al. 2014] or machine learning [Jeon et al. 2019; Jeong et al. 2017]. The heuristic approaches require manual tuning of multiple complicated parameters. The machine learning approaches are able to reveal some program features that may correlate with the analysis effectiveness; however, the weaknesses of machine learning approaches are also well known: they requires training and manual oversight during the tuning phase, they can behave unpredictably for new inputs, and they offer few insights on why they work. In this article, we present a more principled approach that explains when using context sensitivity for a method is beneficial for precision, or conversely, when omitting context sensitivity introduces imprecision. 3 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS To address the challenge of how to predict which methods are precision-critical in a given program, in this section, we introduce a general model to show that most of the precision loss in contextinsensitive pointer analysis for Java can be expressed in terms of three patterns of value flows, or as combinations of these. Precision-loss patterns are independent of the chosen variant of context sensitivity, such as call-site sensitivity [Sharir and Pnueli 1981; Shivers 1991], object sensitivity [Milanova et al. 2005], and type sensitivity [Smaragdakis et al. 2011]. We first introduce the three precision loss patterns and then describe three corresponding concrete examples (Sections 3.1–3.3). This characterization of precision loss provides the conceptual foundation for Zipper and Zipper𝑒 to identify the methods that will be analyzed with context sensitivity, as explained in Sections 4 and 5, respectively. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values (or points-to sets, in the case of pointer analysis) [Sharir and Pnueli 1981]. Figure 1 shows a simple example. If method m is analyzed context-insensitively, then ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020