□Unknown☐String Manipulation Unresolved☐String Manipulatio Resolved Const-String 言复是8量复复屋爱是盖道宜警 (a)Calls to entry methods (b)Calls to member-introspecting methods Fig.4.Classification of the String arguments of two entry methods,forName() and loadclass(),and four member-introspecting methods,getMethod() getDeclaredMethod(),getField()and getDeclaredField(). not handle string manipulations presently.As suggested in Section 5.3.2,how- ever,incomplete information about class/method/field names can be exploited in our self-inferencing framework,just like the cast and type information. We also found that many string arguments are Unknown (55.3%for calling entry methods and 25.1%for calling member-introspecting methods,on average). These are the strings that may be read from,say,configuration files or command lines.Finally,string constants are found to be more frequently used for calling the four member-introspecting methods than the two entry methods:146 calls to getDeclaredMethod()and getMethod(),27 calls to getDeclaredField() and getField()in contrast with 98 calls to forName()and loadclass().This suggests that the analyses [4,20 that ignore string constants flowing into some of these member-introspecting methods may be imprecise(Table 2). Self-Inferenceable Reflective Calls In real applications,many reflective calls are self-inferenceable,as illustrated in Figures 8-10.Therefore,we should try to find their targets by aggressively tracking the flow of constant class/method/field names in the program.However,there are also many input-dependent strings. For many input-dependent reflective calls,such as factoryField.get(null)in Figure 8,field.set(null,value)in Figure 9 and method.invoke(target, parameters)in Figure 10,we can approximate their targets reasonably accu- rately based on the dynamic types of the arguments of their target calls and the downcasts(if any)on their returned values.ELF will exploit such self-inferencing property inherent in reflective code during its reflection analysis. Retrieving an Array of Method/Field/Constructor Objects Class con- tains a number of accessor methods for returning an array of such metaobjects for the target Class object.In the two Eclipse programs,there are four invoke callsites called on an array of Method objects returned from getMethods and 15 fld.get()and fld.set()callsites called on an array of Field objects returned by getDeclaredFields().Ignoring such methods as in prior work [4,8,21]may lead to many missed methods in the call graph of a program
antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unknown String Manipulation Unresolved String Manipulation Resolved Const-String (a) Calls to entry methods (b) Calls to member-introspecting methods Fig. 4. Classification of the String arguments of two entry methods, forName() and loadClass(), and four member-introspecting methods, getMethod(), getDeclaredMethod(), getField() and getDeclaredField(). not handle string manipulations presently. As suggested in Section 5.3.2, however, incomplete information about class/method/field names can be exploited in our self-inferencing framework, just like the cast and type information. We also found that many string arguments are Unknown (55.3% for calling entry methods and 25.1% for calling member-introspecting methods, on average). These are the strings that may be read from, say, configuration files or command lines. Finally, string constants are found to be more frequently used for calling the four member-introspecting methods than the two entry methods: 146 calls to getDeclaredMethod() and getMethod(), 27 calls to getDeclaredField() and getField() in contrast with 98 calls to forName() and loadClass(). This suggests that the analyses [4, 20] that ignore string constants flowing into some of these member-introspecting methods may be imprecise (Table 2). Self-Inferenceable Reflective Calls In real applications, many reflective calls are self-inferenceable, as illustrated in Figures 8 – 10. Therefore, we should try to find their targets by aggressively tracking the flow of constant class/method/field names in the program. However, there are also many input-dependent strings. For many input-dependent reflective calls, such as factoryField.get(null) in Figure 8, field.set(null, value) in Figure 9 and method.invoke(target, parameters) in Figure 10, we can approximate their targets reasonably accurately based on the dynamic types of the arguments of their target calls and the downcasts (if any) on their returned values. Elf will exploit such self-inferencing property inherent in reflective code during its reflection analysis. Retrieving an Array of Method/Field/Constructor Objects Class contains a number of accessor methods for returning an array of such metaobjects for the target Class object. In the two Eclipse programs, there are four invoke callsites called on an array of Method objects returned from getMethods and 15 fld.get() and fld.set() callsites called on an array of Field objects returned by getDeclaredFields(). Ignoring such methods as in prior work [4, 8, 21] may lead to many missed methods in the call graph of a program
r =(A)(o,{..))7 r=(A)get(o)4 ).set(o,a)s s Method M=C.getMethod(mName,{...)Field=CgetField(IName)3 Class.forName(cName)1: <Entry Methods> 年。tgw。g。w。,,。e。ee。gg。g <Member.Introspecting Methods? Target Class Type Propagation Side-Effect Methods> IName Propagation =(A)C ype ->Target Method/Field Descriptor Inference Fig.5.Self-inferencing reflection analysis in ELF. 3 Methodology We start with a set of assumptions made.We then describe our self-inferencing approach adopted by ELF.Finally,we compare ELF with the two prior reflection analyses [4,8 by summarizing their similarities and differences. 3.1 Assumptions We adopt all the assumptions from [8]:(1)Closed World:only the classes reach- able from the class path at analysis time can be used by the program at run time,(2)Well-behaved Class Loaders:the name of the class returned by a call to forName(cName)equals cName,and(3)Correct Casts:the downcasts operating on the result of a call to newInstance()are correct.Due to (1),we will not con- sider the side-effect method Proxy:newProxyInstance in Table 1 and the entry method loadClass in Figure 3 as both may use custom class loaders.Finally, we broaden Correct Casts by also including fld.get()and mtd.invoke(). 3.2 Self-Inferencing Reflection Resolution Figure 5 depicts a typical reflection scenario and illustrates how ELF works. In this scenario,a Class object C is first created for the target class named cName.Then a Method (Field)object M(F)representing the target method (field)named mName(fName)in the target class of C is created.Finally,at some reflective callsites,e.g.,invoke(),get()and set(),the target method (field) is invoked (accessed)on the target object o,with the arguments,{...or a.In the case of newInstance(),the default constructor "init()"called is implicit. ELF works as part of a pointer analysis,with each being both the producer and consumer of the other.It exploits a self-inferencing property inherent in reflective code,by employing the following two component analyses(Figure 5): Target Propagation (Marked by Solid Arrows)ELF resolves the targets (methods or fields)of reflective calls,such as invoke(),get()and set(),by propagating the names of the target classes and methods/fields (e.g.,those
<Entry'Methods> <Member1Introspecting'Methods> <Side1Effect'Methods> Target'Class'Type'Propagation Target'Method/Field'Name'Propagation Target'Class'Type'Inference Target'Method/Field'Descriptor'Inference r = (A) M .invoke(o, {...}) r = (A) F .get(o) F .set(o, a) Method M = C .getMethod(mName, {...}) Class C = Class.forName(cName) Field F = C .getField(fName) r = (A) C .newInstance() 1 2 3 4 5 6 7 Fig. 5. Self-inferencing reflection analysis in Elf. 3 Methodology We start with a set of assumptions made. We then describe our self-inferencing approach adopted by Elf. Finally, we compare Elf with the two prior reflection analyses [4, 8] by summarizing their similarities and differences. 3.1 Assumptions We adopt all the assumptions from [8]: (1) Closed World: only the classes reachable from the class path at analysis time can be used by the program at run time, (2) Well-behaved Class Loaders: the name of the class returned by a call to forName(cName) equals cName, and (3) Correct Casts: the downcasts operating on the result of a call to newInstance() are correct. Due to (1), we will not consider the side-effect method Proxy::newProxyInstance in Table 1 and the entry method loadClass in Figure 3 as both may use custom class loaders. Finally, we broaden Correct Casts by also including fld.get() and mtd.invoke(). 3.2 Self-Inferencing Reflection Resolution Figure 5 depicts a typical reflection scenario and illustrates how Elf works. In this scenario, a Class object C is first created for the target class named cName. Then a Method (Field) object M (F) representing the target method (field) named mName (fName) in the target class of C is created. Finally, at some reflective callsites, e.g., invoke(), get() and set(), the target method (field) is invoked (accessed) on the target object o, with the arguments, {...} or a. In the case of newInstance(), the default constructor “init()” called is implicit. Elf works as part of a pointer analysis, with each being both the producer and consumer of the other. It exploits a self-inferencing property inherent in reflective code, by employing the following two component analyses (Figure 5): Target Propagation (Marked by Solid Arrows) Elf resolves the targets (methods or fields) of reflective calls, such as invoke(), get() and set(), by propagating the names of the target classes and methods/fields (e.g., those
pointed by cName,mName and fName if statically known)along the solid lines into the points symbolized by circles.Note that the second argument of getMethod()is an array of type Class[].It may not be beneficial to analyze it to disambiguate overloaded methods.because (1)its size may be statically unknown,(2)its components are collapsed by the pointer analysis, and(3)its components may be Class objects with unknown class names. Target Inference (Marked by Dashed Arrows)By using Target Propaga- tion alone,a target method/field name (blue circle)or its target class type (red circle)at a reflective callsite may be missing.i.e..unknown,due to the presence of input-dependent strings(Figure 4).If the target class type(red circle)is missing,ELF will infer it from the dynamic type of the target ob- ject o (obtained by pointer analysis)at invoke(),get()or set()(when o !=null)or the downcast (if any),such as (A),that post-dominantly op- erates on the result of a call to newInstance().If the target method/field name (blue circle)is missing,ELF will infer it from (1)the dynamic types of the arguments of the target call,e.g.,{...of invoke()and a of set(), and/or(2)the downcast on the result of the call,such as (A)at invoke() and get().Just like getMethod,the second argument of invoke()is also an array,which is also similarly hard to analyze statically.To improve precision, we disambiguate overloaded target methods with a simple intraprocedural analysis only when the array argument can be analyzed exactly element-wise. To balance soundness,precision and scalability in a disciplined manner,ELF adopts the following inference principle:a target method or field is resolved at a reflective callsite if both its target class type (red circle)and its target method/- field name (blue circle)can be resolved (i.e.,statically discovered)during either Target Propagation or Target Inference.As a result,the number of spurious tar- gets introduced when analyzing a reflective call,invoke(),get()or set(),is minimized due to the existence of two simultaneous constraints (the red and blue circles).How to relax ELF in the presence of just one such a constraint will be investigated in future work.Note that the cast operations on newInstance() will still have to be handled heuristically as only one of the two constraints exists. As ELF is unsound,so is the underlying pointer analysis.Therefore,a reflective callsite is said to be resolved if at least one of its targets is resolved. Let us illustrate Target Inference by considering r=(A)F.get(o)in Fig- ure 5.If a target field name is known but its target class type (i.e.,red circle) is missing,we infer it by looking at the types of all pointed-to objects o'by o. If B is the type of o',then a potential target class of o is B or any of its super- types.If the target class type of F is B but a potential target field name (i.e., blue circle)is missing,we can deduce it from the downcast (A)to resolve the call to r =o.f,where f is a member field in B whose type is A or a supertype or subtype of A.A supertype is possible because a field of this supertype may initially point to an object of type,say,A and then downcast to A. In Figure 5,if getMethods()(getFields())is called at Label 6(Label 3) instead,then an array of Method(Field)objects will be returned so that Target
pointed by cName, mName and fName if statically known) along the solid lines into the points symbolized by circles. Note that the second argument of getMethod() is an array of type Class[]. It may not be beneficial to analyze it to disambiguate overloaded methods, because (1) its size may be statically unknown, (2) its components are collapsed by the pointer analysis, and (3) its components may be Class objects with unknown class names. Target Inference (Marked by Dashed Arrows) By using Target Propagation alone, a target method/field name (blue circle) or its target class type (red circle) at a reflective callsite may be missing, i.e., unknown, due to the presence of input-dependent strings (Figure 4). If the target class type (red circle) is missing, Elf will infer it from the dynamic type of the target object o (obtained by pointer analysis) at invoke(), get() or set() (when o != null) or the downcast (if any), such as (A), that post-dominantly operates on the result of a call to newInstance(). If the target method/field name (blue circle) is missing, Elf will infer it from (1) the dynamic types of the arguments of the target call, e.g., {...} of invoke() and a of set(), and/or (2) the downcast on the result of the call, such as (A) at invoke() and get(). Just like getMethod, the second argument of invoke() is also an array, which is also similarly hard to analyze statically. To improve precision, we disambiguate overloaded target methods with a simple intraprocedural analysis only when the array argument can be analyzed exactly element-wise. To balance soundness, precision and scalability in a disciplined manner, Elf adopts the following inference principle: a target method or field is resolved at a reflective callsite if both its target class type (red circle) and its target method/- field name (blue circle) can be resolved (i.e., statically discovered) during either Target Propagation or Target Inference. As a result, the number of spurious targets introduced when analyzing a reflective call, invoke(), get() or set(), is minimized due to the existence of two simultaneous constraints (the red and blue circles). How to relax Elf in the presence of just one such a constraint will be investigated in future work. Note that the cast operations on newInstance() will still have to be handled heuristically as only one of the two constraints exists. As Elf is unsound, so is the underlying pointer analysis. Therefore, a reflective callsite is said to be resolved if at least one of its targets is resolved. Let us illustrate Target Inference by considering r = (A) F.get(o) in Figure 5. If a target field name is known but its target class type (i.e., red circle) is missing, we infer it by looking at the types of all pointed-to objects o 0 by o. If B is the type of o 0 , then a potential target class of o is B or any of its supertypes. If the target class type of F is B but a potential target field name (i.e., blue circle) is missing, we can deduce it from the downcast (A) to resolve the call to r = o.f, where f is a member field in B whose type is A or a supertype or subtype of A. A supertype is possible because a field of this supertype may initially point to an object of type, say, A and then downcast to A. In Figure 5, if getMethods() (getFields()) is called at Label 6 (Label 3) instead, then an array of Method (Field) objects will be returned so that Target
Table 2.Comparing ELF with the two closely-related reflection analyses [4,8]. Member-Introspecting ⑧ DooP 4 ELF Side-Effect Methods Methods ◆一◆一 getMethod invoke getDeclaredMethod getMethods n/a n a v n avv getDeclaredMethods n/a n/a v n/a vv getField get getDeclaredField set getFields n/a n/av n/a vv getDeclaredFields n/a n/a v n/a vv Vn/a√/n/aVn/aWn/a Propagation from them is implicitly performed.All the other methods available in Class for introspecting methods/fields/constructors are handled similarly. 3.3 ELF vs.Livshits et al.'s Analysis and Doop Table 2 compares ELF with Livshits et al.'s and Doop's analyses [4,8 in terms of how four representative side-effect reflective calls are resolved. Target Propagation ELF resolves a target method/field at a reflective callsite by requiring both its target class type (red circle)and its target name(blue circle)to be known.However,this is not the case in the other two analyses. In the case of Livshits et al.'s analysis,the target class type is always ignored. Therefore,the target methods/fields with a given name in all the classes in the program are conservatively included.Doop suffers the opposite problem by ignoring the target method/field names.As a result,all methods/fields in the target class are included.Finally,of the three analyses,ELF is the only one that can handle all the member-introspecting methods listed. Target Inference Of the three analyses,ELF is the only one to adopt a self inferencing principle to find the target classes and methods/fields at a reflec- tive callsite.Livshits et al.'s analysis narrows the type of reflectively-created objects at newInstance()in Figure 5,but Doop does not do this.However, Doop is more sophisticated than Livshits et al.'s analysis in distinguishing virtual,static and special calls and considering the modifiers of fields for loads and stores.These are all handled by the ELF reflection analysis. 4 Reflection Resolution We specify the reflection resolution in ELF as a set of Datalog rules,i.e.,mono- tonic logical inferences(with no negation in a recursion cycle),following the style of [6].The main advantage is that the specification is close to the actual imple- mentation.Datalog has been the basis of several pointer analysis tools [4,6,8, 21].Our rules are declarative:the order of evaluation of rules or examination of their clauses do not affect the final results.Given a program to be analyzed,these rules are repeatedly applied to infer more facts until a fixed point is reached
Table 2. Comparing Elf with the two closely-related reflection analyses [4, 8]. Side-Effect Methods Member-Introspecting [8] Doop [4] Elf Methods invoke getMethod √ √ √ √ √ getDeclaredMethod √ √ √ √ √ √ getMethods n/a n/a √ n/a √ √ getDeclaredMethods n/a n/a √ n/a √ √ getField √ √ √ √ √ get getDeclaredField √ √ √ √ √ √ set getFields n/a n/a √ n/a √ √ getDeclaredFields n/a n/a √ n/a √ √ newInstance √ n/a √ √ n/a √ n/a √ n/a Propagation from them is implicitly performed. All the other methods available in Class for introspecting methods/fields/constructors are handled similarly. 3.3 Elf vs. Livshits et al.’s Analysis and Doop Table 2 compares Elf with Livshits et al.’s and Doop’s analyses [4, 8] in terms of how four representative side-effect reflective calls are resolved. Target Propagation Elf resolves a target method/field at a reflective callsite by requiring both its target class type (red circle) and its target name (blue circle) to be known. However, this is not the case in the other two analyses. In the case of Livshits et al.’s analysis, the target class type is always ignored. Therefore, the target methods/fields with a given name in all the classes in the program are conservatively included. Doop suffers the opposite problem by ignoring the target method/field names. As a result, all methods/fields in the target class are included. Finally, of the three analyses, Elf is the only one that can handle all the member-introspecting methods listed. Target Inference Of the three analyses, Elf is the only one to adopt a selfinferencing principle to find the target classes and methods/fields at a reflective callsite. Livshits et al.’s analysis narrows the type of reflectively-created objects at newInstance() in Figure 5, but Doop does not do this. However, Doop is more sophisticated than Livshits et al.’s analysis in distinguishing virtual, static and special calls and considering the modifiers of fields for loads and stores. These are all handled by the Elf reflection analysis. 4 Reflection Resolution We specify the reflection resolution in Elf as a set of Datalog rules, i.e., monotonic logical inferences (with no negation in a recursion cycle), following the style of [6]. The main advantage is that the specification is close to the actual implementation. Datalog has been the basis of several pointer analysis tools [4, 6, 8, 21]. Our rules are declarative: the order of evaluation of rules or examination of their clauses do not affect the final results. Given a program to be analyzed, these rules are repeatedly applied to infer more facts until a fixed point is reached