Chapter 1.Introduction 4 We introduce BEAN to improve the precision of points-to analysis with small overhead increases. -BEAN is a new object-sensitivity approach for points-to analysis.Object- sensitivity [53,55,75]is usually considered as the most precise context- sensitivity for points-to analysis for Java [34].BEAN further improves its precision by automatically identifying and eliminating the redundant context elements in distinguishing contexts.Unlike traditional object- sensitivity,which obtains better precision by increasing the limit of con- text length k with usually significantly worse efficiency,BEAN is able to achieve better precision with still the same k-limiting at only small increases in analysis cost. - We thoroughly evaluate the effectiveness of BEAN by comparing it with traditional object-sensitivity [53,75]and the state-of-art hybrid context- sensitivity [34]for points-to analysis.Our evaluation shows that BEAN can always achieve better precision with small increases in analysis cost for real-world Java programs in practice. We introduce MAHJONG to significantly improve the efficiency of points- to analysis while maintaining nearly the same precision for type-dependent clients such as call graph construction. MAHJONG is a new heap abstraction that models the heap by iden- tifying and merging type-consistent objects,which are distinguished by the mainstream allocation-site-based points-to analysis.However, the allocation-site-based heap abstraction often over-partitions the heap without improving the precision much for an important class of type- dependent clients such as call graph construction,devirtualization and
Chapter 1. Introduction 4 • We introduce Bean to improve the precision of points-to analysis with small overhead increases. – Bean is a new object-sensitivity approach for points-to analysis. Objectsensitivity [53, 55, 75] is usually considered as the most precise contextsensitivity for points-to analysis for Java [34]. Bean further improves its precision by automatically identifying and eliminating the redundant context elements in distinguishing contexts. Unlike traditional objectsensitivity, which obtains better precision by increasing the limit of context length k with usually significantly worse efficiency, Bean is able to achieve better precision with still the same k-limiting at only small increases in analysis cost. – We thoroughly evaluate the effectiveness of Bean by comparing it with traditional object-sensitivity [53, 75] and the state-of-art hybrid contextsensitivity [34] for points-to analysis. Our evaluation shows that Bean can always achieve better precision with small increases in analysis cost for real-world Java programs in practice. • We introduce Mahjong to significantly improve the efficiency of pointsto analysis while maintaining nearly the same precision for type-dependent clients such as call graph construction. – Mahjong is a new heap abstraction that models the heap by identifying and merging type-consistent objects, which are distinguished by the mainstream allocation-site-based points-to analysis. However, the allocation-site-based heap abstraction often over-partitions the heap without improving the precision much for an important class of typedependent clients such as call graph construction, devirtualization and
Chapter 1.Introduction 5 may-fail casting.By merging type-consistent objects,MAHJONG im- proves the efficiency of points-to analysis with nearly the same precision for type-dependent clients. -We extensively evaluate the effectiveness of MAHJONG by applying it on three kinds of mainstream context-sensitivity for points-to analysis, i.e.,call-site-sensitivity [71,99,34],object-sensitivity [53,55]and type- sensitivity [75.The evaluation demonstrates that MAHJONG meets its goal of design and is versatile to improve all these three kinds of context- sensitive points-to analysis for the type-dependent clients. We implement both BEAN and MAHJONG as standalone tools which are de- signed to work well with various points-to analysis frameworks for Java.We have released BEAN as an open-source tool at http://www.cse.unsw.edu.au/-corg/bean and we also have released MAHJONG as an open-source tool at http://www.cse.unsw.edu.au/-corg/mahjong. 1.3 Organization The remainder of this thesis is organized as follows. In Chapter 2,we introduce context-sensitive points-to analysis for object-oriented languages.We first give a generic formalism for context-sensitive points-to anal- ysis.Based on this basic formalism,we then explain and formally present three kinds of mainstream context-sensitivity for object-oriented languages,i.e.,call-site- sensitivity [71,99,34],object-sensitivity [53,55]and type-sensitivity [75]
Chapter 1. Introduction 5 may-fail casting. By merging type-consistent objects, Mahjong improves the efficiency of points-to analysis with nearly the same precision for type-dependent clients. – We extensively evaluate the effectiveness of Mahjong by applying it on three kinds of mainstream context-sensitivity for points-to analysis, i.e., call-site-sensitivity [71, 99, 34], object-sensitivity [53, 55] and typesensitivity [75]. The evaluation demonstrates that Mahjong meets its goal of design and is versatile to improve all these three kinds of contextsensitive points-to analysis for the type-dependent clients. • We implement both Bean and Mahjong as standalone tools which are designed to work well with various points-to analysis frameworks for Java. We have released Bean as an open-source tool at http://www.cse.unsw.edu.au/~corg/bean and we also have released Mahjong as an open-source tool at http://www.cse.unsw.edu.au/~corg/mahjong. 1.3 Organization The remainder of this thesis is organized as follows. In Chapter 2, we introduce context-sensitive points-to analysis for object-oriented languages. We first give a generic formalism for context-sensitive points-to analysis. Based on this basic formalism, we then explain and formally present three kinds of mainstream context-sensitivity for object-oriented languages, i.e., call-sitesensitivity [71, 99, 34], object-sensitivity [53, 55] and type-sensitivity [75]
Chapter 1.Introduction 6 In Chapters 3 and 4,we introduce BEAN and MAHJONG respectively.These two chapters share the same structures,i.e.,they both present the techniques following the order of overview,motivation,methodology,formalism,implementation and evaluation.At the end of both chapters,we discuss their related work. In Chapter 5,we give the conclusion and suggest some further work
Chapter 1. Introduction 6 In Chapters 3 and 4, we introduce Bean and Mahjong respectively. These two chapters share the same structures, i.e., they both present the techniques following the order of overview, motivation, methodology, formalism, implementation and evaluation. At the end of both chapters, we discuss their related work. In Chapter 5, we give the conclusion and suggest some further work
Chapter 2 Background:Context-Sensitive Points-to Analysis In this chapter,we provide the background information about context-sensitive points-to analysis that will be necessary to understand the remainder of this thesis. In Section 2.1,we present the preliminary knowledge about context-sensitive points- to analysis.In Section 2.2,we give a standard formulation of context-sensitive points-to analysis and three kinds of mainstream context-sensitivity that are used in the following chapters. 2.1 Preliminary This section gives introductory information about points-to analysis for Java and context-sensitivity. Points-to Analysis Points-to analysis is a fundamental static analysis that com- putes the over-approximation of the heap locations that each program pointer may point to during executions.For object-oriented languages such as Java,a pointer 7
Chapter 2 Background: Context-Sensitive Points-to Analysis In this chapter, we provide the background information about context-sensitive points-to analysis that will be necessary to understand the remainder of this thesis. In Section 2.1, we present the preliminary knowledge about context-sensitive pointsto analysis. In Section 2.2, we give a standard formulation of context-sensitive points-to analysis and three kinds of mainstream context-sensitivity that are used in the following chapters. 2.1 Preliminary This section gives introductory information about points-to analysis for Java and context-sensitivity. Points-to Analysis Points-to analysis is a fundamental static analysis that computes the over-approximation of the heap locations that each program pointer may point to during executions. For object-oriented languages such as Java, a pointer 7
Chapter 2.Background:Context-Sensitive Points-to Analysis 8 can be a local variable or an instance field,and heap locations usually correspond to heap objects.The results of points-to analysis for each pointer are a set of heap objects that are pointed to by the pointer,called points-to set. In Java programs,the heap objects can be created infinitely during execution in the presence of loop and recursion.For decidability and scalability,a points-to analysis must use a heap abstraction which partitions the infinitely-sized heap into a finite number of abstract objects.In points-to analysis for Java,allocation-site abstraction,which uses an allocation site to represent all the heap objects created at the site,is the most commonly used heap abstraction [38,53,55,75,34].We will see later that the heap can be further partitioned context-sensitively in an orthogonal manner (Section 2.2)or abstracted with other granularities to improve the effectiveness of points-to analysis (Chapter 4). Core Statements in Points-to Analysis Table 2.1 lists the five kinds of core statements that points-to analysis for Java needs to deal with,including object allocation (New),variable assignment (Assign),instance field read (Load),instance field write (Store)and virtual method invocation (Call).In this thesis,we only discuss these statements for brevity as in [38,53,55,75,76.Other kinds of statements (e.g.,static field accesses and static method invocations)are handled Statement Kind x new T() New x=y Assign x =y.f Load x.f =y Store x=y.g(arg1,·.,argn) Call Table 2.1:Five kinds of core statements in points-to analysis
Chapter 2. Background: Context-Sensitive Points-to Analysis 8 can be a local variable or an instance field, and heap locations usually correspond to heap objects. The results of points-to analysis for each pointer are a set of heap objects that are pointed to by the pointer, called points-to set. In Java programs, the heap objects can be created infinitely during execution in the presence of loop and recursion. For decidability and scalability, a points-to analysis must use a heap abstraction which partitions the infinitely-sized heap into a finite number of abstract objects. In points-to analysis for Java, allocation-site abstraction, which uses an allocation site to represent all the heap objects created at the site, is the most commonly used heap abstraction [38, 53, 55, 75, 34]. We will see later that the heap can be further partitioned context-sensitively in an orthogonal manner (Section 2.2) or abstracted with other granularities to improve the effectiveness of points-to analysis (Chapter 4). Core Statements in Points-to Analysis Table 2.1 lists the five kinds of core statements that points-to analysis for Java needs to deal with, including object allocation (New), variable assignment (Assign), instance field read (Load), instance field write (Store) and virtual method invocation (Call). In this thesis, we only discuss these statements for brevity as in [38, 53, 55, 75, 76]. Other kinds of statements (e.g., static field accesses and static method invocations) are handled Statement Kind x = new T() New x = y Assign x = y.f Load x.f = y Store x = y.g(arg1, ..., argn) Call Table 2.1: Five kinds of core statements in points-to analysis