Understanding and Analyzing Java Reflection 7:11 ■Unknown String Manipulation Unresolved String Manipulation Resolved String Constant 40% 20% (a)Calls to class-retrieving methods (b)Calls to member-retrieving methods Fig.5.Classification of the String arguments of two class-retrieving methods,forName()and loadclass(),and four member-retrieving methods,getMethod(),getDeclaredMethod(),getField()and getDeclaredField(). object named by its first parameter;getDeclaredField(String)and getField(String)each return a Field object named by its single parameter. As shown in Figure 5,string constants are commonly used when calling the two class-retrieving methods(34.7%on average)and the four member-retrieving methods(63.1%on average).In the presence of string manipulations,many class/method/field names are unknown exactly.This is mainly because their static resolution requires precise handling of many different operations e.g., subString()and append().In fact,many cases are rather complex and thus cannot be handled well by simply modeling the java.lang.String-related API.Thus,SoLAR does not currently handle string manipulations.However,the incomplete information about class/method/field names (i.e.,partial string information)can be exploited beneficially [22,51]. We also found that many string arguments are Unknown(55.3%for calling class-retrieving methods and 25.1%for calling member-retrieving methods,on average).These are the strings that may be read from,say,configuration files,command lines,or even Internet URLs.Finally,string constants are found to be more frequently used for calling the four member-retrieving methods than the two class-retrieving methods:146 calls to getDeclaredMethod()and getMethod(),27 calls to getDeclaredField()and getField()in contrast with 98 calls to forName()and loadclass(). This suggests that the analyses that ignore string constants flowing into some member-retrieving methods may fail to exploit such valuable information and thus become imprecise. Remark 1.Resolving reflective targets by string constants does not always work.On average,only 49%reflective call sites (where string arguments are used to specify reflective targets)use string constants.In addition,fully resolving non-constant string arguments by string manipulation, although mentioned elsewhere [5,38],may be hard to achieve,in practice. Q2.Retrieving an Array of Member Objects.As introduced in Section 2.2.2,half of member- retrieving methods(e.g.,getMethods())return an array of member metaobjects.Although not as frequently used as the ones returning single member metaobject(e.g,getMethod()),they play an important role in introducing new program behaviours in some applications.For example,in the two Eclipse programs studied,there are four invoke()call sites called on an array of Method ACM Trans.Softw.Eng.Methodol.,Vol.28,No.2,Article 7.Publication date:February 2019
Understanding and Analyzing Java Reflection 7:11 0% 20% 40% 60% 80% 100% antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unknown String Manipulation Unresolved String Manipulation Resolved String Constant (a) Calls to class-retrieving methods (b) Calls to member-retrieving methods Fig. 5. Classification of the String arguments of two class-retrieving methods, forName() and loadClass(), and four member-retrieving methods, getMethod(), getDeclaredMethod(), getField() and getDeclaredField(). object named by its first parameter; getDeclaredField(String) and getField(String) each return a Field object named by its single parameter. As shown in Figure 5, string constants are commonly used when calling the two class-retrieving methods (34.7% on average) and the four member-retrieving methods (63.1% on average). In the presence of string manipulations, many class/method/field names are unknown exactly. This is mainly because their static resolution requires precise handling of many different operations e.g., subString() and append(). In fact, many cases are rather complex and thus cannot be handled well by simply modeling the java.lang.String-related API. Thus, Solar does not currently handle string manipulations. However, the incomplete information about class/method/field names (i.e., partial string information) can be exploited beneficially [22, 51]. We also found that many string arguments are Unknown (55.3% for calling class-retrieving methods and 25.1% for calling member-retrieving methods, on average). These are the strings that may be read from, say, configuration files, command lines, or even Internet URLs. Finally, string constants are found to be more frequently used for calling the four member-retrieving methods than the two class-retrieving methods: 146 calls to getDeclaredMethod() and getMethod(), 27 calls to getDeclaredField() and getField() in contrast with 98 calls to forName() and loadClass(). This suggests that the analyses that ignore string constants flowing into some member-retrieving methods may fail to exploit such valuable information and thus become imprecise. Remark 1. Resolving reflective targets by string constants does not always work. On average, only 49% reflective call sites (where string arguments are used to specify reflective targets) use string constants. In addition, fully resolving non-constant string arguments by string manipulation, although mentioned elsewhere [5, 38], may be hard to achieve, in practice. Q2. Retrieving an Array of Member Objects. As introduced in Section 2.2.2, half of memberretrieving methods (e.g., getMethods()) return an array of member metaobjects. Although not as frequently used as the ones returning single member metaobject (e.g., getMethod()), they play an important role in introducing new program behaviours in some applications. For example, in the two Eclipse programs studied, there are four invoke() call sites called on an array of Method ACM Trans. Softw. Eng. Methodol., Vol. 28, No. 2, Article 7. Publication date: February 2019
7:12 Yue Li,Tian Tan,and Jingling Xue ■Instance Methods■Static Methods ■Instance Fields Static Fields 100% 80% 08 (a)Method::invoke()call sites (b)Field::get(/set()call sites Fig.6.The percentage frequency distribution of reflective-action call sites on instance and static members. objects returned from getMethods()and 15 fld.get()and fld.set()call sites called on an array of Field objects returned by getDeclaredFields().Through these calls,dozens of methods are invoked and hundreds of fields are modified reflectively.Ignoring such methods as in prior work [38]and tools(BDDBDDB,WALA,Soor)may lead to significantly missed program behaviours by the analysis. Remark 2.In member-retrieving methods,get(Declared)Methods/Fields/Constructors(),which return an array of member metaobjects,are usually ignored by most of existing reflection analysis tools.However,they play an important role in certain applications for both method invocations and field manipulations. O3.Static or Instance Members.In the literature on reflection analysis [32,38,51,68],reflective targets are mostly assumed to be instance members.Accordingly,calls to the reflective-action methods such as invoke(),get()and set(),are usually considered as virtual calls,instance field accesses,and instance field modifications,respectively (see Table 1 for details).However,in real programs,as shown in Figure 6,on average,37%of the invoke()call sites are found to invoke static methods and 50%of the get()/set()call sites are found to access/modify static fields.Thus in practice,reflection analysis should distinguish both cases and also be aware of whether a reflective target is a static or instance member,since the approaches for resolving both cases are usually different. Remark 3.Static methods/fields are invoked/accessed nearly as frequently as instance methods/- fields in Java reflection,even though the latter has received more attention in the literature.In practice,reflection analysis should distinguish the two cases and adopt appropriate approaches for handling them. Q4.Resolving newInstance()by Casts.In Figure 3,when cName is not a string constant,the (dynamic)type of obj created by newInstance()in line 4 is unknown.For this case,Livshits et al.[38]propose to infer the type of obj by leveraging the cast operation that post-dominates intra-procedurally the newInstance()call site.If the cast type is A,the type of obj must be A or ACM Trans.Softw.Eng.Methodol.,Vol.28,No.2,Article 7.Publication date:February 2019
7:12 Yue Li, Tian Tan, and Jingling Xue 0% 20% 40% 60% 80% 100% antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Instance Methods Static Methods (a) Method::invoke() call sites average antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Instance Fields Static Fields (b) Field::get()/set() call sites Fig. 6. The percentage frequency distribution of reflective-action call sites on instance and static members. objects returned from getMethods() and 15 fld.get() and fld.set() call sites called on an array of Field objects returned by getDeclaredFields(). Through these calls, dozens of methods are invoked and hundreds of fields are modified reflectively. Ignoring such methods as in prior work [38] and tools (Bddbddb, Wala, Soot) may lead to significantly missed program behaviours by the analysis. Remark 2. In member-retrieving methods, get(Declared)Methods/Fields/Constructors(), which return an array of member metaobjects, are usually ignored by most of existing reflection analysis tools. However, they play an important role in certain applications for both method invocations and field manipulations. Q3. Static or Instance Members. In the literature on reflection analysis [32, 38, 51, 68], reflective targets are mostly assumed to be instance members. Accordingly, calls to the reflective-action methods such as invoke(), get() and set(), are usually considered as virtual calls, instance field accesses, and instance field modifications, respectively (see Table 1 for details). However, in real programs, as shown in Figure 6, on average, 37% of the invoke() call sites are found to invoke static methods and 50% of the get()/set() call sites are found to access/modify static fields. Thus in practice, reflection analysis should distinguish both cases and also be aware of whether a reflective target is a static or instance member, since the approaches for resolving both cases are usually different. Remark 3. Static methods/fields are invoked/accessed nearly as frequently as instance methods/- fields in Java reflection, even though the latter has received more attention in the literature. In practice, reflection analysis should distinguish the two cases and adopt appropriate approaches for handling them. Q4. Resolving newInstance() by Casts. In Figure 3, when cName is not a string constant, the (dynamic) type of obj created by newInstance() in line 4 is unknown. For this case, Livshits et al. [38] propose to infer the type of obj by leveraging the cast operation that post-dominates intra-procedurally the newInstance() call site. If the cast type is A, the type of obj must be A or ACM Trans. Softw. Eng. Methodol., Vol. 28, No. 2, Article 7. Publication date: February 2019
Understanding and Analyzing Java Reflection 7:13 one of its subtypes assuming that the cast operation does not throw any exceptions.This approach has been implemented in many analysis tools such as WALA,BDDBDDB and ELF. However,as shown in Figure 7,exploiting casts this way does not always work.On average,28% of newInstance()call sites (obtained by manually inspecting all the related reflective code)have no such intra-procedural post-dominating casts.As newInstance()is the most widely used reflective- action method(see Q6),its unresolved call sites may significantly affect the soundness of the analysis,as discussed in Section 7.5.1.Hence,we need a better solution to handle newInstance(). ■Unresolved ■Resolved 100% 80% 409 20% Fig.7.newInstance()resolution by leveraging intra-procedural post-dominating casts. Remark 4.Resolving newInstance()calls by leveraging their intra-procedural post-dominating cast operations fails to work for 28%of the newInstance()call sites found.As newInstance() affects critically the soundness of reflection analysis(Remark 6),a more effective approach for its resolution is required. O5.Class-Retrieving Methods.Figure 8 shows the percentage frequency distribution of eight class-retrieving methods."Unknown"is included since we failed to find the class-retrieving methods for some reflective-action calls (e.g.,invoke())even by using Eclipse's Open Call Hierarchy tool.For the first 12 programs,the six class-retrieving methods as shown (excluding "Unknown" and "Others")are the only ones leading to reflective-action calls.For the last two,Jetty and Tomcat,"Others"stands for defineclass()in ClassLoader and getParameterTypes()in Method. Finally,getComponentType()is usually used in the form of getclass().getComponentType() for creating a Class object argument for Array.newInstance(). On average,Class.forName(),.class,getclass()and loadclass()are the top four most frequently used(48.1%,18.0%,17.0%and 9.7%,respectively).A class loading strategy can be con- figured in forName()and loadclass().In practice,forName()is often used by the system class loader and loadclass()is usually overwritten in customer class loaders,especially in framework applications such as Tomcat and Jetty. Remark 5.Reflection analysis should handle Class.forName(),getclass(),.class,and loadclass(),which are the four major class-retrieving methods for creating Class objects. ACM Trans.Softw.Eng.Methodol.,Vol.28,No.2,Article 7.Publication date:February 2019
Understanding and Analyzing Java Reflection 7:13 one of its subtypes assuming that the cast operation does not throw any exceptions. This approach has been implemented in many analysis tools such as Wala, Bddbddb and Elf. However, as shown in Figure 7, exploiting casts this way does not always work. On average, 28% of newInstance() call sites (obtained by manually inspecting all the related reflective code) have no such intra-procedural post-dominating casts. As newInstance() is the most widely used reflectiveaction method (see Q6), its unresolved call sites may significantly affect the soundness of the analysis, as discussed in Section 7.5.1. Hence, we need a better solution to handle newInstance(). 0% 20% 40% 60% 80% 100% antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unresolved Resolved Fig. 7. newInstance() resolution by leveraging intra-procedural post-dominating casts. Remark 4. Resolving newInstance() calls by leveraging their intra-procedural post-dominating cast operations fails to work for 28% of the newInstance() call sites found. As newInstance() affects critically the soundness of reflection analysis (Remark 6), a more effective approach for its resolution is required. Q5. Class-Retrieving Methods. Figure 8 shows the percentage frequency distribution of eight class-retrieving methods. “Unknown” is included since we failed to find the class-retrieving methods for some reflective-action calls (e.g., invoke()) even by using Eclipse’s Open Call Hierarchy tool. For the first 12 programs, the six class-retrieving methods as shown (excluding “Unknown” and “Others”) are the only ones leading to reflective-action calls. For the last two, Jetty and Tomcat, “Others” stands for defineClass() in ClassLoader and getParameterTypes() in Method. Finally, getComponentType() is usually used in the form of getClass().getComponentType() for creating a Class object argument for Array.newInstance(). On average, Class.forName(), .class, getClass() and loadClass() are the top four most frequently used (48.1%, 18.0%, 17.0% and 9.7%, respectively). A class loading strategy can be configured in forName() and loadClass(). In practice, forName() is often used by the system class loader and loadClass() is usually overwritten in customer class loaders, especially in framework applications such as Tomcat and Jetty. Remark 5. Reflection analysis should handle Class.forName(), getClass(), .class, and loadClass(), which are the four major class-retrieving methods for creating Class objects. ACM Trans. Softw. Eng. Methodol., Vol. 28, No. 2, Article 7. Publication date: February 2019
7:14 Yue Li,Tian Tan,and Jingling Xue ■Unknown ■Others Class::getComponentType Proxy::getProxyClass ClassLoader::loadClass ■Class::forName ■Object::getClass ■.class antir chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average 0% 20% 40% 60% 80% 100% Fig.8.Class-retrieving methods. In addition,getComponentType()should also be modeled if Array-related reflective-action methods are analyzed,as they are usually used together. Q6.Reflective-Action Methods.Figure 9 depicts the percentage frequency distribution of all the nine reflective-action methods in all the programs studied.We can see that newInstance()and invoke()are the ones that are most frequently used(46.3%and 32.7%,respectively,on average). Both of them are handled by existing static analysis tools such as Doop,Soor,WALA and BDDBDDB. However,Field-and Array-related reflective-action methods,which are also used in many programs,are ignored by most of these tools.Their handling is often necessary.For example, Eclipse (org.eclipse.osgi.util.NLS)uses Field.set()to initialize a large number of(non- primitive)fields of all given classes.Some JDK code (e.g.,java.util.AbstractCollection)uses Array.newInstance()to reflectively create a new non-primitive array whose type depends on the given argument. As far as we know,Field-and Array-related reflective-action methods are handled only by ELF [32],SOLAR [33]and DOop [51]. Remark 6.Reflection analysis should at least handle newInstance()and invoke()as they are the most frequently used reflective-action methods(79%on average),which will significantly affect a program's behavior,in general;otherwise,much of the codebase may be invisible for analysis. Effective reflection analysis should also consider Field-and Array-related reflective-action methods,as they are also commonly used. ACM Trans.Softw.Eng.Methodol,Vol.28,No.2,Article 7.Publication date:February 2019
7:14 Yue Li, Tian Tan, and Jingling Xue 0% 20% 40% 60% 80% 100% average tomcat jetty jedit javac eclipse4 xalan pmd lucene jython hsqldb fop eclipse chart antlr Unknown Others Class::getComponentType Proxy::getProxyClass ClassLoader::loadClass Class::forName Object::getClass .class Fig. 8. Class-retrieving methods. In addition, getComponentType() should also be modeled if Array-related reflective-action methods are analyzed, as they are usually used together. Q6. Reflective-Action Methods. Figure 9 depicts the percentage frequency distribution of all the nine reflective-action methods in all the programs studied. We can see that newInstance() and invoke() are the ones that are most frequently used (46.3% and 32.7%, respectively, on average). Both of them are handled by existing static analysis tools such as Doop, Soot, Wala and Bddbddb. However, Field- and Array-related reflective-action methods, which are also used in many programs, are ignored by most of these tools. Their handling is often necessary. For example, Eclipse (org.eclipse.osgi.util.NLS) uses Field.set() to initialize a large number of (nonprimitive) fields of all given classes. Some JDK code (e.g., java.util.AbstractCollection) uses Array.newInstance() to reflectively create a new non-primitive array whose type depends on the given argument. As far as we know, Field- and Array-related reflective-action methods are handled only by Elf [32], Solar [33] and Doop [51]. Remark 6. Reflection analysis should at least handle newInstance() and invoke() as they are the most frequently used reflective-action methods (79% on average), which will significantly affect a program’s behavior, in general; otherwise, much of the codebase may be invisible for analysis. Effective reflection analysis should also consider Field- and Array-related reflective-action methods, as they are also commonly used. ACM Trans. Softw. Eng. Methodol., Vol. 28, No. 2, Article 7. Publication date: February 2019
Understanding and Analyzing Java Reflection 7:15 ■Array:rset ■Array:get ■Field:set ■Field::get ■Method:invoke ■Array:newlnstance Proxy:newProxylnstance Constructor::newlnstance Class::newlnstance antir chart eclipse fop hsgldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average 0% 20% 40% 60% 80% 100% Fig.9.Reflective-action methods. Q7.Self-Inferencing Property.As illustrated by the program given in Figure 3,the names of its reflective targets are specified by the string arguments(e.g.,cName,mName and fName)at the class- retrieving and member-retrieving reflective calls.Therefore,string analysis has been a popular approach for static reflection analysis in the last decade.However,if the value of a string is unknown statically (e.g.,read from external files or command lines),then the related reflective calls,includ- ing those to newInstance(),may have to be ignored,rendering the corresponding codebase or operations invisible to the analysis(note that conservatively estimating those unresolved reflective calls to have any effect would cause to invoke any methods,making the analysis too imprecise to be scalable).To improve precision,in this case,the last resort is to exploit the existence of some intra-procedurally post-dominating cast operations on a call to newInstance()in order to deduce the types of objects reflectively created (Q4). However,in our study,we find that there are many other rich hints about the behaviors of reflective calls at their usage sites.Such hints can be and should be exploited to make reflection analysis more effective,even when some string values are partially or fully unknown.In the following,we first look at three real example programs to examine what these hints are and expose a so-called self-inferencing property inherent in these hints.Finally,we explain why self-inferencing property is naturally inherent in most Java reflection code and discuss its potential in making reflection analysis more effective. Example 2.1(Reflective Method Invocation(Figure 10)).The method name (the first argument of getMethod()in line 174)is statically unknown as part of it is read from command line cmd. However,the target method(represented by method)can be deduced from the second argument (parameters)of the corresponding reflective-action call invoke()in line 175.Here,parameters is an array of objects,with only one element(line 155).By querying the pointer analysis and also leveraging the type information in the program,we know that the type of the object pointed to by this is FrameworkCommandInterpreter,which has no subtypes.As a result,we can infer that ACM Trans.Softw.Eng.Methodol.,Vol.28.No.2.Article 7.Publication date:February 2019
Understanding and Analyzing Java Reflection 7:15 0% 20% 40% 60% 80% 100% average tomcat jetty jedit javac eclipse4 xalan pmd lucene jython hsqldb fop eclipse chart antlr Array::set Array::get Field::set Field::get Method::invoke Array::newInstance Proxy::newProxyInstance Constructor::newInstance Class::newInstance Fig. 9. Reflective-action methods. Q7. Self-Inferencing Property. As illustrated by the program given in Figure 3, the names of its reflective targets are specified by the string arguments (e.g., cName, mName and fName) at the classretrieving and member-retrieving reflective calls. Therefore, string analysis has been a popular approach for static reflection analysis in the last decade. However, if the value of a string is unknown statically (e.g., read from external files or command lines), then the related reflective calls, including those to newInstance(), may have to be ignored, rendering the corresponding codebase or operations invisible to the analysis (note that conservatively estimating those unresolved reflective calls to have any effect would cause to invoke any methods, making the analysis too imprecise to be scalable). To improve precision, in this case, the last resort is to exploit the existence of some intra-procedurally post-dominating cast operations on a call to newInstance() in order to deduce the types of objects reflectively created (Q4). However, in our study, we find that there are many other rich hints about the behaviors of reflective calls at their usage sites. Such hints can be and should be exploited to make reflection analysis more effective, even when some string values are partially or fully unknown. In the following, we first look at three real example programs to examine what these hints are and expose a so-called self-inferencing property inherent in these hints. Finally, we explain why self-inferencing property is naturally inherent in most Java reflection code and discuss its potential in making reflection analysis more effective. Example 2.1 (Reflective Method Invocation (Figure 10)). The method name (the first argument of getMethod() in line 174) is statically unknown as part of it is read from command line cmd. However, the target method (represented by method) can be deduced from the second argument (parameters) of the corresponding reflective-action call invoke() in line 175. Here, parameters is an array of objects, with only one element (line 155). By querying the pointer analysis and also leveraging the type information in the program, we know that the type of the object pointed to by this is FrameworkCommandInterpreter, which has no subtypes. As a result, we can infer that ACM Trans. Softw. Eng. Methodol., Vol. 28, No. 2, Article 7. Publication date: February 2019