Self-Inferencing Refection Resolution for Java Yue Li,Tian Tan,Yulei Sui,and Jingling Xue School of Computer Science and Engineering.UNSW Australia {yueli,tiantan,ysui,jingling}Ocse.unsw.edu.au Abstract.Reflection has always been an obstacle both for sound and for effective under-approximate pointer analysis for Java applications.In pointer analysis tools,reflection is either ignored or handled partially, resulting in missed,important behaviors.In this paper,we present our findings on reflection usage in Java benchmarks and applications.Guided by these findings,we introduce a static reflection analysis,called ELF, by exploiting a self-inferencing property inherent in many reflective calls. Given a reflective call,the basic idea behind ELF is to automatically in- fer its targets (methods or fields)based on the dynamic types of the arguments of its target calls and the downcasts (if any)on their re- turned values,if its targets cannot be already obtained from the class, Method or Field objects on which the reflective call is made.We evaluate ELF against Doop's state-of-the-art reflection analysis performed in the same context-sensitive Andersen's pointer analysis using all 11 DaCapo benchmarks and two applications.ELF can make a disciplined tradeoff among soundness,precision and scalability while also discovering usually more reflective targets.ELF is useful for any pointer analysis,particularly under-approximate techniques deployed for such clients as bug detection, program understanding and speculative compiler optimization. 1 Introduction Pointer analysis is an important enabling technology since it can improve the precision and performance of many program analyses.However,reflection poses a major obstacle to pointer analysis.Despite the large literature on whole-program [1,6,7,11,15,21]and demand-driven [10,13,14,17]pointer analysis for Java, almost all the analyses reported are unsound in the presence of refection since it is either ignored or handled partially.As a result,under-approximate or unsound techniques represent an attractive alternative in cases where sound analysis is not required 18(e.g.,for supporting bug detection,program understanding and speculative compiler optimization).Even so,ignoring reflection often leads to missed,important behaviors [18.This explains why modern pointer analysis tools for Java [4,19-21]provide some forms of reflection handling. As reflection is increasingly used in Java programs,the cost of imprecise re- flection handling has increased dramatically.To improve the effectiveness of a pointer analysis tool for Java,automatic techniques for handling reflection by balancing soundness,precision and scalability are needed.Despite its impor- tance,this problem has received little attention.Some solutions include(1)dy-
Self-Inferencing Reflection Resolution for Java Yue Li, Tian Tan, Yulei Sui, and Jingling Xue School of Computer Science and Engineering, UNSW Australia {yueli,tiantan,ysui,jingling}@cse.unsw.edu.au Abstract. C onsistent * Complete * Well D ocumen et d t ysaE * o Reuse * * Eva ul det a * EC O O P * Artifact * AE C Reflection has always been an obstacle both for sound and for effective under-approximate pointer analysis for Java applications. In pointer analysis tools, reflection is either ignored or handled partially, resulting in missed, important behaviors. In this paper, we present our findings on reflection usage in Java benchmarks and applications. Guided by these findings, we introduce a static reflection analysis, called Elf, by exploiting a self-inferencing property inherent in many reflective calls. Given a reflective call, the basic idea behind Elf is to automatically infer its targets (methods or fields) based on the dynamic types of the arguments of its target calls and the downcasts (if any) on their returned values, if its targets cannot be already obtained from the Class, Method or Field objects on which the reflective call is made. We evaluate Elf against Doop’s state-of-the-art reflection analysis performed in the same context-sensitive Andersen’s pointer analysis using all 11 DaCapo benchmarks and two applications. Elf can make a disciplined tradeoff among soundness, precision and scalability while also discovering usually more reflective targets. Elf is useful for any pointer analysis, particularly under-approximate techniques deployed for such clients as bug detection, program understanding and speculative compiler optimization. 1 Introduction Pointer analysis is an important enabling technology since it can improve the precision and performance of many program analyses. However, reflection poses a major obstacle to pointer analysis. Despite the large literature on whole-program [1, 6, 7, 11, 15, 21] and demand-driven [10, 13, 14, 17] pointer analysis for Java, almost all the analyses reported are unsound in the presence of reflection since it is either ignored or handled partially. As a result, under-approximate or unsound techniques represent an attractive alternative in cases where sound analysis is not required [18] (e.g., for supporting bug detection, program understanding and speculative compiler optimization). Even so, ignoring reflection often leads to missed, important behaviors [18]. This explains why modern pointer analysis tools for Java [4, 19–21] provide some forms of reflection handling. As reflection is increasingly used in Java programs, the cost of imprecise re- flection handling has increased dramatically. To improve the effectiveness of a pointer analysis tool for Java, automatic techniques for handling reflection by balancing soundness, precision and scalability are needed. Despite its importance, this problem has received little attention. Some solutions include (1) dy-
1 Aa new A(); 2 String cName,mName,fName =... Class clz Class.forName(cName); 4 Object obi clz.newInstance(); 5 Bb=(B)obj; 6 Method mtd clz.getDeclaredMethod(mName,{A.class}); 1 Obiect 1 mtd.invoke(b,{a}); 8 Field fld clz.getField(fName); 9 Xr =(X)fld.get(a); 10 fld.set(NULL,a); Fig.1.An example of reflection usage in Java. namic analysis [2 for recording reflective(call)targets discovered during input- dependent program runs and passing these annotations to a subsequent pointer analysis,(2)online analysis 5 for discovering reflective targets at run time and performing a pointer analysis to support JIT optimizations,and(3)static anal- ysis [4,8,20 for resolving reflective targets together with a pointer analysis. In this paper,we present a new static reflection analysis,called ELF,which is integrated into DooP,a state-of-the-art Datalog-based pointer analysis tool [4 for analyzing Java programs.ELF draws its inspirations from the two earlier re- flection analyses [4,8 and benefits greatly from the open-source reflection anal- ysis implemented in Doop [4].Livshits et al.8 suggested resolving reflective calls by tracking the flow of class/method/field names in the program.In the code from Figure 1.this involves tracking the flow of cName into clz in line 3,mName into mtd in line 6,and fName into fld in line 8,if cName,mName and fName are string constants.If cName is,say,read from a configuration file,they suggested narrowing the types of reflectively-created objects,e.g.,obj in line 4, optimistically by using the downcast (B)available in line 5.Later,Doop [4] handles reflection analogously,but context-sensitively,to obtain the full benefit from the mutual increase in precision of both component analyses. However,ELF goes beyond [4,8]by taking advantage of a self-inferencing property inherent in reflective code to strike a disciplined tradeoff among sound- ness,precision and scalability.Our key observation(made from a reflection-usage study described in Section 2)is that many reflective calls are self-inferenceable. Consider r =(X)fld.get(a)in Figure 1.Its target fields accessed can often be approximated based on the dynamic types (i.e.,A)of argument a and the downcast that post-dominates its return values,if fld represents a statically unknown field named fName.In this case.the refective call is resolved to all possible field reads r =a.f.Here,f is a field of type T(where T is X or a supertype or subtype of X),declared in a class C(where C is A or a supertype of A).To the best of our knowledge,ELF is the first static reflection analysis that exploits such self-inferencing property to resolve reflective calls. Due to the intricacies and complexities of the Java reflection APl,we will postpone a detailed comparison between ELF and the two state-of-the-art reflec- tion analyses [4,8 later in Section 3 after we have introduced ELF in full. In summary,this paper makes the following main contributions:
1 A a = new A(); 2 String cName, mName, fName = ...; 3 Class clz = Class.forName(cName); 4 Object obj = clz.newInstance(); 5 B b = (B)obj; 6 Method mtd = clz.getDeclaredMethod(mName,{A.class}); 7 Object l = mtd.invoke(b, {a}); 8 Field fld = clz.getField(fName); 9 X r = (X)fld.get(a); 10 fld.set(NULL, a); Fig. 1. An example of reflection usage in Java. namic analysis [2] for recording reflective (call) targets discovered during inputdependent program runs and passing these annotations to a subsequent pointer analysis, (2) online analysis [5] for discovering reflective targets at run time and performing a pointer analysis to support JIT optimizations, and (3) static analysis [4, 8, 20] for resolving reflective targets together with a pointer analysis. In this paper, we present a new static reflection analysis, called Elf, which is integrated into Doop, a state-of-the-art Datalog-based pointer analysis tool [4] for analyzing Java programs. Elf draws its inspirations from the two earlier re- flection analyses [4, 8] and benefits greatly from the open-source reflection analysis implemented in Doop [4]. Livshits et al. [8] suggested resolving reflective calls by tracking the flow of class/method/field names in the program. In the code from Figure 1, this involves tracking the flow of cName into clz in line 3, mName into mtd in line 6, and fName into fld in line 8, if cName, mName and fName are string constants. If cName is, say, read from a configuration file, they suggested narrowing the types of reflectively-created objects, e.g., obj in line 4, optimistically by using the downcast (B) available in line 5. Later, Doop [4] handles reflection analogously, but context-sensitively, to obtain the full benefit from the mutual increase in precision of both component analyses. However, Elf goes beyond [4, 8] by taking advantage of a self-inferencing property inherent in reflective code to strike a disciplined tradeoff among soundness, precision and scalability. Our key observation (made from a reflection-usage study described in Section 2) is that many reflective calls are self-inferenceable. Consider r = (X)fld.get(a) in Figure 1. Its target fields accessed can often be approximated based on the dynamic types (i.e., A) of argument a and the downcast that post-dominates its return values, if fld represents a statically unknown field named fName. In this case, the reflective call is resolved to all possible field reads r = a.f. Here, f is a field of type T (where T is X or a supertype or subtype of X), declared in a class C (where C is A or a supertype of A). To the best of our knowledge, Elf is the first static reflection analysis that exploits such self-inferencing property to resolve reflective calls. Due to the intricacies and complexities of the Java reflection API, we will postpone a detailed comparison between Elf and the two state-of-the-art reflection analyses [4, 8] later in Section 3 after we have introduced Elf in full. In summary, this paper makes the following main contributions:
We report findings on a reflection-usage study using 14 representative Java benchmarks and applications (Section 2).We expect these findings to be useful in guiding the design and implementation of reflection analysis. We introduce a static reflection analysis,ELF,to improve the effectiveness of pointer analysis tools for Java (Section 3).ELF adopts a new self-inferencing mechanism for reflection resolution and handles a significant part of the Java reflection aPI that was previously ignored or handled partially. We formulate ELF in Datalog consisting of 207 rules,covering the majority of reflection methods frequently used in Java programs(Section 4). We have evaluated ELF against a state-of-the-art reflection analysis in DooP (version r160113)under the same context-sensitive Andersen's pointer anal- ysis framework,using all 11 DaCapo benchmarks and two Java applications, Eclipse4 and Javac (Section 5).Our results show that ELF can make a dis- ciplined tradeoff among soundness,precision and scalability while resolving usually more reflective call targets than DooP. 2 Understanding Reflection Usage Section 2.1 provides a brief introduction to the Java reflection API.Section 2.2 reports our findings on reflection usage in Java benchmarks and applications. 2.1 Background The Java reflection API provides metaobjects to allow programs to examine themselves and make changes to their structure and behavior at run time.In Figure 1,the metaobjects clz,mtd and fld are instances of the metaobject classes Class,Method and Field,respectively.Constructor can be seen as Method except that the method name"<init>"is implicit.Class provides ac- cessor methods such as getDeclaredMethod()in line 6 and getField in line 8 to allow the other metaobjects (e.g.,of Method and Field)related to a Class object to be introspected.With dynamic invocation,a Method object can be commanded to invoke the method that it represents (line 7)and a Field object can be commanded to access the field that it represents (lines 9 and 10). As far as pointer analysis is concerned,we can divide the pointer-affecting methods in the Java reflection API into three categories:(1)entry methods, e.g.,forName()in line 3,for creating Class objects,(2)member-introspecting methods,e.g.,getDeclaredMethod()in line 6 and getField()in line 8,for retrieving Method (Constructor)and Field objects from a Class object,and(3) side-effect methods,e.g.,newInstance(),invoke(),get (and set()in lines 4,7,9 and 10,that affect the pointer information in the program reflectively. Class provides a number of accessor methods for introspecting methods, constructors and fields in a target class.Unlike [4,8,ELF is the first to handle all such accessor methods in reflection analysis.Let us recall the four on return- ing Method objects.getDeclaredMethod(String,Class []returns a Method object that represents a declared method of the target Class object with the name (formal parameter types)specified by the first (second)parameter (line
– We report findings on a reflection-usage study using 14 representative Java benchmarks and applications (Section 2). We expect these findings to be useful in guiding the design and implementation of reflection analysis. – We introduce a static reflection analysis, Elf, to improve the effectiveness of pointer analysis tools for Java (Section 3). Elf adopts a new self-inferencing mechanism for reflection resolution and handles a significant part of the Java reflection API that was previously ignored or handled partially. – We formulate Elf in Datalog consisting of 207 rules, covering the majority of reflection methods frequently used in Java programs (Section 4). – We have evaluated Elf against a state-of-the-art reflection analysis in Doop (version r160113) under the same context-sensitive Andersen’s pointer analysis framework, using all 11 DaCapo benchmarks and two Java applications, Eclipse4 and Javac (Section 5). Our results show that Elf can make a disciplined tradeoff among soundness, precision and scalability while resolving usually more reflective call targets than Doop. 2 Understanding Reflection Usage Section 2.1 provides a brief introduction to the Java reflection API. Section 2.2 reports our findings on reflection usage in Java benchmarks and applications. 2.1 Background The Java reflection API provides metaobjects to allow programs to examine themselves and make changes to their structure and behavior at run time. In Figure 1, the metaobjects clz, mtd and fld are instances of the metaobject classes Class, Method and Field, respectively. Constructor can be seen as Method except that the method name “<init>” is implicit. Class provides accessor methods such as getDeclaredMethod() in line 6 and getField in line 8 to allow the other metaobjects (e.g., of Method and Field) related to a Class object to be introspected. With dynamic invocation, a Method object can be commanded to invoke the method that it represents (line 7) and a Field object can be commanded to access the field that it represents (lines 9 and 10). As far as pointer analysis is concerned, we can divide the pointer-affecting methods in the Java reflection API into three categories: (1) entry methods, e.g., forName() in line 3, for creating Class objects, (2) member-introspecting methods, e.g., getDeclaredMethod() in line 6 and getField() in line 8, for retrieving Method (Constructor) and Field objects from a Class object, and (3) side-effect methods, e.g., newInstance(), invoke(), get() and set() in lines 4, 7, 9 and 10, that affect the pointer information in the program reflectively. Class provides a number of accessor methods for introspecting methods, constructors and fields in a target class. Unlike [4, 8], Elf is the first to handle all such accessor methods in reflection analysis. Let us recall the four on returning Method objects. getDeclaredMethod(String, Class[]) returns a Method object that represents a declared method of the target Class object with the name (formal parameter types) specified by the first (second) parameter (line
6 in Figure 1).getMethod(String,Class[])is similar except that the re- turned Method object is public(either declared or inherited).If the target Class does not have a matching method,then its superclasses are searched first recur- sively (bottom-up)before its interfaces(implemented).getDeclaredMethods() returns an array of Method objects representing all the methods declared in the target Class object.getMethods()is similar except that all the public methods (either declared or inherited)in the target Class object are returned.Given a Method object mtd,its target method can be called as shown in line 7 in Figure 1. 2.2 Empirical Study The Java reflection API is rich and complex in details.We conduct an empirical study to understand reflection usage in practice in order to guide the design and implementation of a sophisticated reflection analysis. We select 14 representative Java programs,including nine DaCapo bench- marks (2006-10-MR2),three latest versions of popular desktop applications, javac-1.7.0,jEdit-5.1.0 and Eclipse-4.2.2 (denoted Eclipse4),and two latest versions of popular server applications,Jetty-9.0.5 and Tomcat-7.0.42. Note that DaCapo consists of 11 benchmarks,including an older version of Eclipse(version 3.1.2).We exclude bloat since its application code is reflection- free.We consider lucene instead of luindex and lusearch separately since these two benchmarks are derived from lucene with the same reflection usage. We consider a total of 191 methods in the Java reflection API (version 1.5). including the ones in java.lang.reflect and java.lang.Class,loadClass() in java.lang.ClassLoader,and getclass()in java.lang.Object.We have also considered A.class,which represents the Class object of a class A. We use SooT[19]to pinpoint the calls to reflection methods in the bytecode of a program.To understand reflection usage,we consider only the reflective calls found in the application classes and their dependent libraries but exclude the standard Java libraries.To increase the code coverage for the five applications considered,we include the jar files whose names contain the names of these applications (e.g.,*jetty*.jar for Jetty)and make them available under the process-dir option supported by SOOT.For Eclipse4,we use org.eclipse.core runtime.adaptor.EclipseStarter to enable SooT to locate all the other jar files used.We manually inspect the reflection usage in a program in a demand- driven manner,starting from its side-effect methods,assisted by Open Call Hier- archy in Eclipse,by following their backward slices.For a total of 609 side-effect callsites examined,510 callsites for calling entry methods and 304 callsites for calling member-introspecting methods are tracked and analyzed. Below we describe our five findings on reflection usage in our empirical study. Side-Effect Methods Table 1 lists a total of nine side-effect methods that can possibly modify or use (as their side effects)the pointer information in a program. Figure 2 depicts their percentage frequency distribution in the 14 programs studied.We can see that invoke()and Class:newInstance()are the two most frequently used (32.7%and 35.3%,respectively,on average),which are
6 in Figure 1). getMethod(String, Class[]) is similar except that the returned Method object is public (either declared or inherited). If the target Class does not have a matching method, then its superclasses are searched first recursively (bottom-up) before its interfaces (implemented). getDeclaredMethods() returns an array of Method objects representing all the methods declared in the target Class object. getMethods() is similar except that all the public methods (either declared or inherited) in the target Class object are returned. Given a Method object mtd, its target method can be called as shown in line 7 in Figure 1. 2.2 Empirical Study The Java reflection API is rich and complex in details. We conduct an empirical study to understand reflection usage in practice in order to guide the design and implementation of a sophisticated reflection analysis. We select 14 representative Java programs, including nine DaCapo benchmarks (2006-10-MR2), three latest versions of popular desktop applications, javac-1.7.0, jEdit-5.1.0 and Eclipse-4.2.2 (denoted Eclipse4), and two latest versions of popular server applications, Jetty-9.0.5 and Tomcat-7.0.42. Note that DaCapo consists of 11 benchmarks, including an older version of Eclipse (version 3.1.2). We exclude bloat since its application code is reflectionfree. We consider lucene instead of luindex and lusearch separately since these two benchmarks are derived from lucene with the same reflection usage. We consider a total of 191 methods in the Java reflection API (version 1.5), including the ones in java.lang.reflect and java.lang.Class, loadClass() in java.lang.ClassLoader, and getClass() in java.lang.Object. We have also considered A.class, which represents the Class object of a class A. We use Soot [19] to pinpoint the calls to reflection methods in the bytecode of a program. To understand reflection usage, we consider only the reflective calls found in the application classes and their dependent libraries but exclude the standard Java libraries. To increase the code coverage for the five applications considered, we include the jar files whose names contain the names of these applications (e.g., *jetty*.jar for Jetty) and make them available under the process-dir option supported by Soot. For Eclipse4, we use org.eclipse.core. runtime.adaptor.EclipseStarter to enable Soot to locate all the other jar files used. We manually inspect the reflection usage in a program in a demanddriven manner, starting from its side-effect methods, assisted by Open Call Hierarchy in Eclipse, by following their backward slices. For a total of 609 side-effect callsites examined, 510 callsites for calling entry methods and 304 callsites for calling member-introspecting methods are tracked and analyzed. Below we describe our five findings on reflection usage in our empirical study. Side-Effect Methods Table 1 lists a total of nine side-effect methods that can possibly modify or use (as their side effects) the pointer information in a program. Figure 2 depicts their percentage frequency distribution in the 14 programs studied. We can see that invoke() and Class::newInstance() are the two most frequently used (32.7% and 35.3%, respectively, on average), which are
Table 1.Nine side-effect methods and their side effects.assuming that the target class of clz and ctor is A and the target method (field)of mtd (fld)is m (f). Simplified Method Calling Scenario Side Effect Class::newInstance o clz.newInstance() o new A() Constructor:newInstance o=ctor.newInstance(farg1,.}) o=newA(arg1,…) Method:invoke a mtd.invoke(o,{arg1,...]) a=0.m(arg1,…) Field::get a=fd.get(o) a=o.f Field::set fld.set(o,a) o.f=a Array:newInstance o Array.newInstance(clz.size) o new Asize Array::get a Array.get(o,i) a=o Array::set Array.set(o,i,a)】 ofil =a Proxy:newProxyInstance o Prory.newProxyInstance(... o=new Proxy*(...) oxy..ge ne Other Fig.2.Side-effect methods Fig.3.Entry methods handled by prior pointer analysis tools 4,20,21].However,Array-related side- effect methods,which are also used in many programs,are previously ignored but handled by ELF.Note that newProxyInstance()is used in jEdit only. Entry Reflection Methods Figure 3 shows the percentage frequency distri- bution of different types of entry methods used.The six as shown are the only ones found in the first 12 programs.In the last two (Jetty and Tomcat),"Oth- ers"stands for defineClass()in ClassLoader and getParameterTypes()in Method only."Unknown"is included since we failed to find the entry meth- ods for some side-effect calls such as invoke()even by using Eclipse's Open Call Hierarchy tool.Finally,getComponentType()is usually used in the form of getclass().getComponentType()for creating a Class object argument for Array.newInstance().On average,Class.forName()and.class are the top two most frequently used entry methods (48.1%and 18.0%,respectively). String Constants and String Manipulation As shown in Figure 4,string constants are commonly used when calling the two entry methods (34.7%on average)and the four member-introspecting methods(63.1%on average).In the presence of string manipulations,many class/method/field names are unknown exactly.This is mainly because their static resolution requires precisely handling of many different operations e.g.,subString()and append().Thus,ELF does
Table 1. Nine side-effect methods and their side effects, assuming that the target class of clz and ctor is A and the target method (field) of mtd (fld) is m (f ). Simplified Method Calling Scenario Side Effect Class::newInstance o = clz.newInstance() o = new A() Constructor::newInstance o = ctor.newInstance({arg1, ...}) o = new A(arg1, ...) Method::invoke a = mtd.invoke(o, {arg1, ...}) a = o.m(arg1, ...) Field::get a = fld.get(o) a = o.f Field::set fld.set(o, a) o.f = a Array::newInstance o = Array.newInstance(clz, size) o = new A[size] Array::get a = Array.get(o, i) a = o[i] Array::set Array.set(o, i, a) o[i] = a Proxy::newProxyInstance o = Proxy.newProxyInstance(...) o = new Proxy$*(...) antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Array::set Array::get Field::set Field::get Method::invoke Array::newInstance Proxy::newProxyInstance Constructor::newInstance Class::newInstance Fig. 2. Side-effect methods. antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unknown Others Proxy::getProxyClass Class::getComponentType ClassLoader::loadClass .class Object::getClass Class::forName Fig. 3. Entry methods. handled by prior pointer analysis tools [4, 20, 21]. However, Array-related sideeffect methods, which are also used in many programs, are previously ignored but handled by Elf. Note that newProxyInstance() is used in jEdit only. Entry Reflection Methods Figure 3 shows the percentage frequency distribution of different types of entry methods used. The six as shown are the only ones found in the first 12 programs. In the last two (Jetty and Tomcat), “Others” stands for defineClass() in ClassLoader and getParameterTypes() in Method only. “Unknown” is included since we failed to find the entry methods for some side-effect calls such as invoke() even by using Eclipse’s Open Call Hierarchy tool. Finally, getComponentType() is usually used in the form of getClass().getComponentType() for creating a Class object argument for Array.newInstance(). On average, Class.forName() and .class are the top two most frequently used entry methods (48.1% and 18.0%, respectively). String Constants and String Manipulation As shown in Figure 4, string constants are commonly used when calling the two entry methods (34.7% on average) and the four member-introspecting methods (63.1% on average). In the presence of string manipulations, many class/method/field names are unknown exactly. This is mainly because their static resolution requires precisely handling of many different operations e.g., subString() and append(). Thus, Elf does