2017 IEEE 28th International Symposium on Software Reliability Engineering Reflection Analysis for Java:Uncovering More Reflective Targets Precisely Jie Liu,Yue Li',Tian Tan',and Jingling Xuel.2 ISchool of Computer Science and Engineering,UNSW,Australia 2State Key Laboratory of Software Engineering,Computer School,Wuhan University,China Abstract-Reflection,which is widely used in practice and and records reflective targets accessed at reflective calls during abused by many security exploits,poses a significant obstacle program execution,can be both precise and efficient.As a to program analysis.Reflective calls can be analyzed statically result,bug detectors on finding,for example,data races [18], or dynamically.Static analysis is more sound but also more imprecise (by introducing many false reflective targets and thus deadlocks [19]and property violations [201),and security anal- affecting its scalability).Dynamic analysis can be precise but ysers on finding,for example,privacy leaks [21]and malicious often miss many true reflective targets due to low code coverage. functionalities [22],often resort to dynamic reflection analysis. We introduce MIRROR,the first automatic reflection analysis However,analyzing reflection dynamically often misses many for Java that increases significantly the code coverage of dynamic analysis while keeping false reflective targets low.In its static true reflective targets (due to low code coverage).This is analysis,a novel reflection-oriented slicing technique is applied to especially the case when GUI applications are analyzed.For identify a small number of small path-based slices for a reflective example,we observe that TAMIFLEX [17],the state-of-the-art call so that different reflective targets are likely exercised along dynamic reflection analysis,fails to find any new reflective these different paths.This preserves the soundness of pure static target after a long sequence of GUI operations has been reflection analysis as much as possible,improves its scalability, performed (on,for example,findbugs-1.2.1). and reduces substantially its false positive rate.In its dynamic analysis,these slices are executed with automatically generated In this paper,we introduce MIRROR,the first automatic test cases to report the reflective targets accessed.This signifi- reflection analysis for Java that combines program slicing cantly improves the code coverage of pure dynamic analysis.We (static analysis)and test case generation (dynamic analysis)to evaluate MIRROR against a state-of-the-art dynamic reflection uncover more reflective targets precisely.MIRROR is designed analysis tool,TAMIFLEX,by using 10 large real-world Java applications.MIRROR detects 12.5%-933.3%more reflective to assist dynamic reflection analysis (e.g.,TAMIFLEX)to targets efficiently (in 362.8 seconds on average)without producing resolve more reflective targets with low false positives.Thus, any false positives.These new targets enable 5-174949 call- MIRROR can discover effectively reflective targets that would graph edges to be reachable in the application code. otherwise be missed by TAMIFLEX in real-world applications and improve the code coverage of dynamic reflection analysis. I.INTRODUCTION In MIRROR,its static analysis applies a novel reflection- As one of the most widely adopted programming lan- oriented slicing technique to focus on the parts of the program guages [1].Java has been a popular attack target.Java suffers relevant to a reflective call.Unlike traditional slicing [23],[24]. still from serious security issues,with 87%of attack vectors which hardly scales to large object-oriented programs [25]. for web exploits in 2013 [2]and 91%in 2014 [3].A large [26],MIRROR identifies a small subgraph of the program's variety of exploited vulnerabilities are related to reflection,a call graph that likely affects the execution of a reflective call dynamic feature widely used in Java applications to enable and then computes a small number of small path-based slices their runtime behaviors to be examined or modified at run- in the subgraph so that potentially true yet different reflective time,which is abused by 45%of all exploits in the wild [4]. targets are likely exercised at the reflective call along these In practice,program analysis tools are invaluable for ensur- different paths.This preserves the soundness of pure static ing software quality and reliability.However,"you can't check reflection analysis as much as possible,improves its scalability, code you don't see"[5].Without analyzing reflection,bug and reduces substantially its false positive rate. detectors and security analysers may miss important program In MIRROR,its dynamic analysis executes each path-based behaviors.because these tools do not have a complete view slice with automatically generated test cases to exercise the of the code (as many reflectively induced call-graph edges are path and record the reflective targets accessed.This increases missing).Therefore,reflection poses a major obstacle to bug the code coverage of pure dynamic reflection analysis. detection and security analysis [6]-[9]. We have evaluated MIRROR against TAMIFLEX [17]by Reflective calls can be analyzed either statically or dy- using a set of 10 large real-world Java programs.MIRROR namically.Static analysis [6],[7].[10]-[15].which discovers detects 12.5%-933.3%more reflective targets efficiently reflective targets accessed at reflective calls via type inference, (in 362.8 seconds on average)with no false targets.These is often imprecise by reporting many false targets (and con- new reflective targets result in 5-174949 call-graph edges sequently,impairing scalability for some large applications). reachable in the application code of these programs. In contrast,dynamic analysis [16].[17].which instruments With MIRROR,more reflective targets can be found pre- IEEE 2332-6549/17$31.0002017IEEE 12 ④computer D0I10.1109/M1SSRE.2017.36 society
Reflection Analysis for Java: Uncovering More Reflective Targets Precisely Jie Liu1, Yue Li1, Tian Tan1, and Jingling Xue1,2 1School of Computer Science and Engineering, UNSW, Australia 2State Key Laboratory of Software Engineering, Computer School, Wuhan University, China Abstract—Reflection, which is widely used in practice and abused by many security exploits, poses a significant obstacle to program analysis. Reflective calls can be analyzed statically or dynamically. Static analysis is more sound but also more imprecise (by introducing many false reflective targets and thus affecting its scalability). Dynamic analysis can be precise but often miss many true reflective targets due to low code coverage. We introduce MIRROR, the first automatic reflection analysis for Java that increases significantly the code coverage of dynamic analysis while keeping false reflective targets low. In its static analysis, a novel reflection-oriented slicing technique is applied to identify a small number of small path-based slices for a reflective call so that different reflective targets are likely exercised along these different paths. This preserves the soundness of pure static reflection analysis as much as possible, improves its scalability, and reduces substantially its false positive rate. In its dynamic analysis, these slices are executed with automatically generated test cases to report the reflective targets accessed. This signifi- cantly improves the code coverage of pure dynamic analysis. We evaluate MIRROR against a state-of-the-art dynamic reflection analysis tool, TAMIFLEX, by using 10 large real-world Java applications. MIRROR detects 12.5% – 933.3% more reflective targets efficiently (in 362.8 seconds on average) without producing any false positives. These new targets enable 5 – 174949 callgraph edges to be reachable in the application code. I. INTRODUCTION As one of the most widely adopted programming languages [1], Java has been a popular attack target. Java suffers still from serious security issues, with 87% of attack vectors for web exploits in 2013 [2] and 91% in 2014 [3]. A large variety of exploited vulnerabilities are related to reflection, a dynamic feature widely used in Java applications to enable their runtime behaviors to be examined or modified at runtime, which is abused by 45% of all exploits in the wild [4]. In practice, program analysis tools are invaluable for ensuring software quality and reliability. However, “you can’t check code you don’t see” [5]. Without analyzing reflection, bug detectors and security analysers may miss important program behaviors, because these tools do not have a complete view of the code (as many reflectively induced call-graph edges are missing). Therefore, reflection poses a major obstacle to bug detection and security analysis [6]–[9]. Reflective calls can be analyzed either statically or dynamically. Static analysis [6], [7], [10]–[15], which discovers reflective targets accessed at reflective calls via type inference, is often imprecise by reporting many false targets (and consequently, impairing scalability for some large applications). In contrast, dynamic analysis [16], [17], which instruments and records reflective targets accessed at reflective calls during program execution, can be both precise and efficient. As a result, bug detectors on finding, for example, data races [18], deadlocks [19] and property violations [20]), and security analysers on finding, for example, privacy leaks [21] and malicious functionalities [22], often resort to dynamic reflection analysis. However, analyzing reflection dynamically often misses many true reflective targets (due to low code coverage). This is especially the case when GUI applications are analyzed. For example, we observe that TAMIFLEX [17], the state-of-the-art dynamic reflection analysis, fails to find any new reflective target after a long sequence of GUI operations has been performed (on, for example, findbugs-1.2.1). In this paper, we introduce MIRROR, the first automatic reflection analysis for Java that combines program slicing (static analysis) and test case generation (dynamic analysis) to uncover more reflective targets precisely. MIRROR is designed to assist dynamic reflection analysis (e.g., TAMIFLEX) to resolve more reflective targets with low false positives. Thus, MIRROR can discover effectively reflective targets that would otherwise be missed by TAMIFLEX in real-world applications and improve the code coverage of dynamic reflection analysis. In MIRROR, its static analysis applies a novel reflectionoriented slicing technique to focus on the parts of the program relevant to a reflective call. Unlike traditional slicing [23], [24], which hardly scales to large object-oriented programs [25], [26], MIRROR identifies a small subgraph of the program’s call graph that likely affects the execution of a reflective call and then computes a small number of small path-based slices in the subgraph so that potentially true yet different reflective targets are likely exercised at the reflective call along these different paths. This preserves the soundness of pure static reflection analysis as much as possible, improves its scalability, and reduces substantially its false positive rate. In MIRROR, its dynamic analysis executes each path-based slice with automatically generated test cases to exercise the path and record the reflective targets accessed. This increases the code coverage of pure dynamic reflection analysis. We have evaluated MIRROR against TAMIFLEX [17] by using a set of 10 large real-world Java programs. MIRROR detects 12.5% – 933.3% more reflective targets efficiently (in 362.8 seconds on average) with no false targets. These new reflective targets result in 5 – 174949 call-graph edges reachable in the application code of these programs. With MIRROR, more reflective targets can be found pre- 2017 IEEE 28th International Symposium on Software Reliability Engineering 2332-6549/17 $31.00 © 2017 IEEE DOI 10.1109/ISSRE.2017.36 12
cisely and quickly,making many previously missing call-graph 1 public class Server edges visible to a variety of analysis tools.This enables bug 2 Class clzObj; 3 boolean infoChanged; detectors and security analyzers,for example,to identify more 4 public static void main(String args[]){ bugs and vulnerabilities effectively. 5 String className "content.HTTPRequest"; In summary,this paper makes the following contributions: 6 if (args.length 0){ className args [0] We introduce MIRROR,the first automatic Java reflection 8 analysis framework that combines static and dynamic 9 if (args.length >5) analysis to resolve reflective calls in large codebases. 0 System.out.println("No config info"); We describe a reflection-oriented slicing technique that Server srv new Server(); 12 srv.changeInfo(className); improves the scalability of traditional slicing for object- 中 arv.readConfig(); oriented programs (w.r.t.a reflective call).This technique 14 8rW.1 nitServer《: is also potentially useful for supporting bug detection, 15 16 public void changeInfo(String cName){ program understanding,and verification. 17 this.clzObj Class.forName(cName); We describe a dynamic analysis technique for resolving 18 this.infoChanged false; reflective calls by combining automatic test case genera- 19 d 20 public void readConfig(){ tion and program execution on path-based slices 21 loadCommand(); We evaluate MIRROR (implemented in 10 KLOC of Java) 2 this.infoChanged true; by using 10 large real-world Java programs.MIRROR is 23 FileMonitor.getFileMonitor().addReloadable(this); 24 critical in enabling their reflective calls to be analyzed 25 public void initServer(){ efficiently and precisely with good soundness 26 created(); 27 II.A MOTIVATING EXAMPLE 28 public void created(){ 29 loadCommand(): Our motivating example is a Java program given in 30 Figure 1,which is abstracted from real-world applications 31 public void loadCommand(){ freecs-1.3 and pmd-4.2.5 (used in our evaluation in 32 String cmdStr null; 33 if (this.infoChanged ==false){ Section IV).We focus on resolving the reflective target meth- Method getM null; ods that may be called at getM.invoke(o,null)in line 35 Class clz this.clzObj; 42.While MIRROR works on the Jimple IR (Intermediate 36 if (pmd.cpd.Renderer.class.isAssignableFrom(clz)) 37 getM clz.getMethod("toString"); Representation)in SooT [27],we illustrate it by using the 38 else if (content.HTTPRequest.class.isAssignableFrom(clz)) high-level statements in Java in order to ease understanding. 39 getM clz.getMethod("getUrl"); In line 17,a class metaobject pointed to by this.clzobj 40 else throw new ErrorClassTypeException(...); 41 is created by calling class.forName (cName)to rep- Object o clz.nevInstance(); 42 cmdStr =(String)getM.invoke(o,null); resent the class named by cName.Here,cName is either 43 "content.HTTPRequest"(line 5)or a non-constant string read from the command line (line 7).In line 41,an object 45] o is created reflectively as an instance of this.clzobj. Fig.1:A Java program. Then a method object,pointed to by getM,is created by calling getMethod()in line 37 or 39 to repre- sent a method from this.clzobj with its name being “toString”or“getUr1”.In line42,this method is two potential targets in line 42.Finally,as o may also be called reflectively on the receiver object o with the ac- an instance of any unknown class (line 7),getUrl()or tual argument null.There are five potential reflective tar- tostring()in any class is also a possible target.Thus, gets:getUrl()in class content.HTTPRequest and a dilemma emerges.Including all these targets improves the the four tostring ()methods in the four subclasses, soundness of the analysis but introduces many false targets, SimpleRenderer.XMLRenderer.CSVRenderer and making the analysis imprecise or unscalable.Conversely.ig- VSRenderer,of interface pmd.cpd.Renderer. noring all these targets may cause some true targets to be missed.In MIRROR,we avoid this dilemma by reporting only A.Existing Approaches the reflective targets observed dynamically on the path-based Static analysis [6],[7],[11],[28][30]attempts to re- slices that are statically computed for a given reflective call. solve reflective targets of getM.invoke (o,null)in Dynamic analysis [16],[17]instruments and records the line 42 by conducting type inference.By keeping track reflective targets invoked by getM.invoke (o,null)in of string constants,we know that getM represents a line 42 during program execution.Unless all test inputs are method named tostring ((line 37)or geturl() exhausted,which is impractical for many applications,espe- (line 39).In addition,as o may be an instance of cially GUI applications,some reflective targets will be missed. class content.HTTPRequest (line 5),tostring (and For example,pmd-4.2.5,from which our example is partly getUrl()from class content.HTTPRequest are the abstracted,does not come with any test case for exercising 13
cisely and quickly, making many previously missing call-graph edges visible to a variety of analysis tools. This enables bug detectors and security analyzers, for example, to identify more bugs and vulnerabilities effectively. In summary, this paper makes the following contributions: • We introduce MIRROR, the first automatic Java reflection analysis framework that combines static and dynamic analysis to resolve reflective calls in large codebases. • We describe a reflection-oriented slicing technique that improves the scalability of traditional slicing for objectoriented programs (w.r.t. a reflective call). This technique is also potentially useful for supporting bug detection, program understanding, and verification. • We describe a dynamic analysis technique for resolving reflective calls by combining automatic test case generation and program execution on path-based slices. • We evaluate MIRROR (implemented in 10 KLOC of Java) by using 10 large real-world Java programs. MIRROR is critical in enabling their reflective calls to be analyzed efficiently and precisely with good soundness. II. A MOTIVATING EXAMPLE Our motivating example is a Java program given in Figure 1, which is abstracted from real-world applications freecs-1.3 and pmd-4.2.5 (used in our evaluation in Section IV). We focus on resolving the reflective target methods that may be called at getM.invoke(o, null) in line 42. While MIRROR works on the Jimple IR (Intermediate Representation) in SOOT [27], we illustrate it by using the high-level statements in Java in order to ease understanding. In line 17, a class metaobject pointed to by this.clzObj is created by calling Class.forName(cName) to represent the class named by cName. Here, cName is either “content.HTTPRequest” (line 5) or a non-constant string read from the command line (line 7). In line 41, an object o is created reflectively as an instance of this.clzObj. Then a method object, pointed to by getM, is created by calling getMethod() in line 37 or 39 to represent a method from this.clzObj with its name being “toString” or “getUrl”. In line 42, this method is called reflectively on the receiver object o with the actual argument null. There are five potential reflective targets: getUrl() in class content.HTTPRequest and the four toString() methods in the four subclasses, SimpleRenderer, XMLRenderer, CSVRenderer and VSRenderer, of interface pmd.cpd.Renderer. A. Existing Approaches Static analysis [6], [7], [11], [28]–[30] attempts to resolve reflective targets of getM.invoke(o, null) in line 42 by conducting type inference. By keeping track of string constants, we know that getM represents a method named toString() (line 37) or getUrl() (line 39). In addition, as o may be an instance of class content.HTTPRequest (line 5), toString() and getUrl() from class content.HTTPRequest are the 1 public class Server { 2 Class clzObj; 3 boolean infoChanged; 4 public static void main(String args[]) { 5 String className = "content.HTTPRequest"; 6 if (args.length > 0) { 7 className = args[0]; 8 } 9 if (args.length > 5) 10 System.out.println("No config info"); 11 Server srv = new Server(); 12 srv.changeInfo(className); 13 srv.readConfig(); 14 srv.initServer(); 15 } 16 public void changeInfo(String cName) { 17 this.clzObj = Class.forName(cName); 18 this.infoChanged = false; 19 } 20 public void readConfig() { 21 loadCommand(); 22 this.infoChanged = true; 23 FileMonitor.getFileMonitor().addReloadable(this); 24 } 25 public void initServer() { 26 created(); 27 } 28 public void created() { 29 loadCommand(); 30 } 31 public void loadCommand() { 32 String cmdStr = null; 33 if (this.infoChanged == false) { 34 Method getM = null; 35 Class clz = this.clzObj; 36 if (pmd.cpd.Renderer.class.isAssignableFrom(clz)) 37 getM = clz.getMethod("toString"); 38 else if (content.HTTPRequest.class.isAssignableFrom(clz)) 39 getM = clz.getMethod("getUrl"); 40 else throw new ErrorClassTypeException(...); 41 Object o = clz.newInstance(); 42 cmdStr = (String) getM.invoke(o, null); 43 } 44 } 45 } Fig. 1: A Java program. two potential targets in line 42. Finally, as o may also be an instance of any unknown class (line 7), getUrl() or toString() in any class is also a possible target. Thus, a dilemma emerges. Including all these targets improves the soundness of the analysis but introduces many false targets, making the analysis imprecise or unscalable. Conversely, ignoring all these targets may cause some true targets to be missed. In MIRROR, we avoid this dilemma by reporting only the reflective targets observed dynamically on the path-based slices that are statically computed for a given reflective call. Dynamic analysis [16], [17] instruments and records the reflective targets invoked by getM.invoke(o, null) in line 42 during program execution. Unless all test inputs are exhausted, which is impractical for many applications, especially GUI applications, some reflective targets will be missed. For example, pmd-4.2.5, from which our example is partly abstracted, does not come with any test case for exercising 13
MIRROR Reflection-Oriented Slicing(Static Analysis) Reflection Resolution (Dynamic Analysis) Java Program Reflective Call Graph Reduct based Slicing matic Test Ca匹 nted Execution Targets Fig.2:The MIRROR framework for resolving reflective calls by combining static and dynamic analysis. className (line 7).Therefore,a dynamic analysis tool may such that these data dependences form the du-chains for R fail to find the four tostring (target methods provided in In reflection-oriented slicing,we obtain first a subgraph of the the four aforementioned subclasses of pmd.cpd.Renderer. program's call graph that contains DR(by considering data In general,dynamic analysis cannot resolve reflective calls that dependences)and then a small number of path-based slices for are not encountered during program execution.In MIRROR, R on this subgraph (by considering also control dependences). however,we can still handle such reflective calls by combining a)Stage 1.Call Graph Reduction:Instead of operating automatic test case generation and program execution. on the entire call graph of the program as in traditional slicing [23],[24],MIRROR restricts itself to a small subgraph. B.The MIRROR Approach For the reflective call in line 42,we have: Figure 2 gives an overview.Given a Java program,MIRROR D42={4÷7,7→12,5÷12,12→17,17→35,35→37 analyzes its reflective calls individually.Below we describe 35→39,35→41,37→42,39→42,41→42} (1) our approach by focusing on getM.invoke (o,null)in line 42 of Figure 1.We will highlight the functionalities of where each statement is identified by its line number.In its four stages,with the first two forming the static analysis Figure 1,all these statements,which affect the execution phase ("Reflection-Oriented Slicing")and the last two forming of line 42 in a data-dependent manner,are underlined.The the dynamic analysis phase ("Reflection Resolution").We definition of getM null in line 34 is disregarded since it will also explain some challenges faced,our solutions,the cannot trigger any target at the reflective call in line 42. motivations behind,and the tradeoffs made. Given D.a subgraph,denoted GR,is built so that,for 1)Reflection-Oriented Slicing(Static Analysis):As dis- every du-chain in DR,GR contains a sequence of method cussed earlier,pure static reflection analysis may introduce calls along which the du-chain holds.In our example,its call many false reflective targets,making it unscalable for some graph is given in Figure 3(a)and the subgraph G42 is given programs.Given a reflective call to resolve,one straightfor- in Figure 3(b). ward remedy is to restrict the analysis to its backward slice el comprising the statements on which the reflective call data-or main() changeInfo() ain0☐eL-changeInfo) 色2 control-depends.Unfortunately,such traditional slicing [23]. e4↓ [24]does not scale to large object-oriented programs [28]. initserver() readconfig() readconfig() e5. e3 e3, [31]-[33].as it operates on the entire call graph of the created()loadcommand() program.For the reflective call in line 42,its backward slice Loadcommand() consists of all the methods in Figure 1,comprising all the (a)Entire call graph (b)Subgraph G42 statements except the three in lines 9,10 and 23. Ideally.we would like to find the smallest backward slice for Fig.3:Call graph reduction for Figure 1. a reflective call so that its different paths trigger its different reflective targets.Achieving such soundness and precision There can be many such subgraphs to choose from.For efficiently is too challenging to be practical.In this paper,we the subgraph in Figure 3(b),replacing its call-graph edges e2 introduce reflection-oriented slicing that strikes a good balance and e3 by e4,e5 and e6 in Figure 3(a)yields another larger among efficiency,soundness and precision by leveraging the subgraph.We do not aim to find the smallest GR.Instead,we following key observation about reflection usage. will build GR by performing BFS in the program's call graph in order to keep GR as small as possible,thereby improving Observation 1.Given a reflective call R in the program, the scalability of the subsequent path-based slicing stage.This the variables that appear in all the conditionals affecting the tradeoff may still make call-graph reduction unscalable for execution of R are usually defined in the same set of methods some reflective calls,affecting the soundness of MIRROR (due that contain the du-chains (i.e.,def-use chains)for R. to the inherent complexity of slicing,in general). Let D be the set of du-pairs (i.e.,def-use pairs)of the b)Stage 2.Path-based Slicing:For a reflective call R, form ss',where a variable defined at s is used at s', its different targets may be defined along different paths.We 14
Java Program Reflection-Oriented Slicing (Static Analysis) Call Graph Reduction Path-based Slicing Automatic Test Case Generation Path-besed Slices Reflection Resolution (Dynamic Analysis) Instrumented Execution Test Cases MIRROR Sub Call Graph Reflective Targets Fig. 2: The MIRROR framework for resolving reflective calls by combining static and dynamic analysis. className (line 7). Therefore, a dynamic analysis tool may fail to find the four toString() target methods provided in the four aforementioned subclasses of pmd.cpd.Renderer. In general, dynamic analysis cannot resolve reflective calls that are not encountered during program execution. In MIRROR, however, we can still handle such reflective calls by combining automatic test case generation and program execution. B. The MIRROR Approach Figure 2 gives an overview. Given a Java program, MIRROR analyzes its reflective calls individually. Below we describe our approach by focusing on getM.invoke(o, null) in line 42 of Figure 1. We will highlight the functionalities of its four stages, with the first two forming the static analysis phase (“Reflection-Oriented Slicing”) and the last two forming the dynamic analysis phase (“Reflection Resolution”). We will also explain some challenges faced, our solutions, the motivations behind, and the tradeoffs made. 1) Reflection-Oriented Slicing (Static Analysis): As discussed earlier, pure static reflection analysis may introduce many false reflective targets, making it unscalable for some programs. Given a reflective call to resolve, one straightforward remedy is to restrict the analysis to its backward slice comprising the statements on which the reflective call data- or control-depends. Unfortunately, such traditional slicing [23], [24] does not scale to large object-oriented programs [28], [31]–[33], as it operates on the entire call graph of the program. For the reflective call in line 42, its backward slice consists of all the methods in Figure 1, comprising all the statements except the three in lines 9, 10 and 23. Ideally, we would like to find the smallest backward slice for a reflective call so that its different paths trigger its different reflective targets. Achieving such soundness and precision efficiently is too challenging to be practical. In this paper, we introduce reflection-oriented slicing that strikes a good balance among efficiency, soundness and precision by leveraging the following key observation about reflection usage. Observation 1. Given a reflective call R in the program, the variables that appear in all the conditionals affecting the execution of R are usually defined in the same set of methods that contain the du-chains (i.e., def-use chains) for R. Let DR be the set of du-pairs (i.e., def-use pairs) of the form s ⇒ s , where a variable defined at s is used at s , such that these data dependences form the du-chains for R. In reflection-oriented slicing, we obtain first a subgraph of the program’s call graph that contains DR (by considering data dependences) and then a small number of path-based slices for R on this subgraph (by considering also control dependences). a) Stage 1. Call Graph Reduction: Instead of operating on the entire call graph of the program as in traditional slicing [23], [24], MIRROR restricts itself to a small subgraph. For the reflective call in line 42, we have: D42 = {4 ⇒ 7, 7 ⇒ 12, 5 ⇒ 12, 12 ⇒ 17, 17 ⇒ 35, 35 ⇒ 37, 35 ⇒ 39, 35 ⇒ 41, 37 ⇒ 42, 39 ⇒ 42, 41 ⇒ 42} (1) where each statement is identified by its line number. In Figure 1, all these statements, which affect the execution of line 42 in a data-dependent manner, are underlined. The definition of getM = null in line 34 is disregarded since it cannot trigger any target at the reflective call in line 42. Given DR, a subgraph, denoted GR, is built so that, for every du-chain in DR, GR contains a sequence of method calls along which the du-chain holds. In our example, its call graph is given in Figure 3(a) and the subgraph G42 is given in Figure 3(b). loadCommand() changeInfo() initServer() readConfig() created() main() e1 e4 e5 e6 e3 e2 loadCommand() changeInfo() readConfig() main() e1 e3 e2 (a) Entire call graph (b) Subgraph G42 Fig. 3: Call graph reduction for Figure 1. There can be many such subgraphs to choose from. For the subgraph in Figure 3(b), replacing its call-graph edges e2 and e3 by e4, e5 and e6 in Figure 3(a) yields another larger subgraph. We do not aim to find the smallest GR. Instead, we will build GR by performing BFS in the program’s call graph in order to keep GR as small as possible, thereby improving the scalability of the subsequent path-based slicing stage. This tradeoff may still make call-graph reduction unscalable for some reflective calls, affecting the soundness of MIRROR (due to the inherent complexity of slicing, in general). b) Stage 2. Path-based Slicing: For a reflective call R, its different targets may be defined along different paths. We 14
are therefore motivated to partition D into different du-chain Finally,for each representative path selected for a du-chain groups so that one group contains exactly one definition for group X partitioned from DR,we obtain a path-based slice every variable used at R.Clearly,all the paths that contain the by applying traditional slicing [23].[24]to R.However, same du-chains for R must trigger the same set of reflective we only consider the data and control dependences for the targets at R.Therefore,only one representative path needs to statements on the path and restrict the slice to GR. be selected for each du-chain group.As the partition obtained statically this way is not guaranteed to be the coarsest at run time,different du-chain groups may still trigger the same set A4,6-8,9,11,12,13,17,18,21,22,32,33,34,35-37,41,42 of reflective targets at R. B4,6-8,9,11,12,1317,18,21,22,32,33,34,35,36,38,39,41,42 ⊙4,5,6,9,11,12,13,17,18,21,22,32,33,34,35,36,38,39,41,42 7一12一173537 D4,5,6,9,11,12,13,17,18,21,22,32,33,34,35-37,41,42 净42 41 Fig.6:Path-based slices for the paths in Figure 5. 7→1217→3539入 净42 441 Figure 6 gives the slices found for the four representative 39 →12→1735< 净42 paths in Figure 5.For each path,the statements on the path that 41 are irrelevant to line 42 with respect to this path are crossed 12→1735< 37 out.In addition,line 22 is the only (new)statement added. 净42 41 2)Reflection Resolution (Dynamic Analysis):For each path-based slice obtained for R,we generate its test cases and Fig.4:Partitioning of D42 into du-chain groups. discover its reflective targets at R by instrumented execution. a)Stage 3.Automatic Test Case Generation:Each Figure 4 gives a partition of D42 in Equation (1)into four path-based slice has only one execution path.As in sym- du-chain groups,A,B.C and D.Note that 512 and bolic execution [34]-1381.a path condition is collected that 7=12 cannot appear in the same group since className is comprises all the constraints affecting the execution of the defined in lines 5 and 7 but used in line 12.Similarly,37=42 path.However,unlike the prior work,MIRROR models a and 3942 cannot appear in the same group since getM is variety of object-oriented features such as instanceof, defined in lines 37 and 39 but used in line 42. isAssignableFrom and type casts in order to generate Given a du-chain group Xm of DR,we will find a represen- test cases more comprehensively.In general,we will generate tative path that contains all the du-pairs in X such that for a set of different test inputs that satisfy the path condition so every s1→s2∈Xr,ifs→s2∈DR\XR,thens1 will not that different reflective targeted can be captured. appear on the path.This ensures that s is the only definition for s2 on this path.We will achieve this by performing BFS Args[]E"pmd.cpd SimpleRendererer",("pmd.cpd.XMLRendererer", on the ICFG (inter-procedural Control Flow Graph)of the "pmd.cpd.CSVRendererer","pmd.cpd.VSRendererer" program restricted to GR.If such a path does not exist,X B Args[]={"content.HTTPRequcst" is ignored.This can happen when X contains two du-pairs No input needed that appear in two mutually exclusive branches and thus cannot D hold simultaneously during program execution. Fig.7:Test inputs generated for the slices in Figure 6. A4,6-9,11,12,17,18,13,21,32-37,41,42 B4,6-9,11,12,17,18,13,21,32-36,38,39,41,42 Figure 7 give the test inputs generated.For slice A,four test C4-6,9,11,12,17,1813,21,32-36,38,39,41,42 cases for Args [0]at line 4 are generated due to pmd.cpd. D4-6,9,11,12,17,18,13.21,32-37,41,42 Renderer.class.isAssignableFrom(clz)in its path condition.Here,SimpleRenderer,XMLRenderer,C Fig.5:Representative paths for the du-chain groups in Fig- SVRenderer and VSRenderer are the four subclasses of ure 4. interface pmd.cpd.Renderer.For slice B,one test case is generated for Args[0]at line 4 due to content.HTTPRe Figure 5 gives the representative paths found for the four quest.class.isAssignableFrom(clz).For slice C. du-chain groups in Figure 4.In each case,there are two paths no input is needed as all variables are well initialized.For slice due to line 9.The shorter one that contains no statements in D,its path is infeasible due to two inconsistent constraints the if branch in lines 9-10 is selected.Note that for the className="content.HTTPRequest"and pmd.cpd du-chain groups C and D in Figure 4.with each containing Renderer.class.isAssignableFrom(clz). the definition of className in line 5,the other definition Let us now examine the implications of Observation I of className in line 7 is skipped.Thus,the else branch of on the precision of MIRROR.If a conditional,say,x = the if statement in lines 9-10 will be followed. 0 that affects the execution of R involves a definition of 15
are therefore motivated to partition DR into different du-chain groups so that one group contains exactly one definition for every variable used at R. Clearly, all the paths that contain the same du-chains for R must trigger the same set of reflective targets at R. Therefore, only one representative path needs to be selected for each du-chain group. As the partition obtained statically this way is not guaranteed to be the coarsest at run time, different du-chain groups may still trigger the same set of reflective targets at R. A 4 7 12 17 35 37 41 42 B 4 7 12 17 35 39 41 42 C 5 12 35 39 41 17 42 5 12 35 37 41 D 5 17 42 Fig. 4: Partitioning of D42 into du-chain groups. Figure 4 gives a partition of D42 in Equation (1) into four du-chain groups, A, B, C and D. Note that 5 ⇒ 12 and 7 ⇒ 12 cannot appear in the same group since className is defined in lines 5 and 7 but used in line 12. Similarly, 37 ⇒ 42 and 39 ⇒ 42 cannot appear in the same group since getM is defined in lines 37 and 39 but used in line 42. Given a du-chain group XR of DR, we will find a representative path that contains all the du-pairs in XR such that for every s1 ⇒ s2 ∈ XR, if s 1 ⇒ s2 ∈ DR \XR, then s 1 will not appear on the path. This ensures that s1 is the only definition for s2 on this path. We will achieve this by performing BFS on the ICFG (inter-procedural Control Flow Graph) of the program restricted to GR. If such a path does not exist, XR is ignored. This can happen when XR contains two du-pairs that appear in two mutually exclusive branches and thus cannot hold simultaneously during program execution. 4, 6-9, 11, 12, 17, 18, 13, 21, 32-37, 41, 42 4, 6-9, 11, 12, 17, 18, 13, 21, 32-36, 38, 39, 41, 42 4-6, 9, 11, 12, 17, 18, 13, 21, 32-36, 38, 39, 41, 42 4-6, 9, 11, 12, 17, 18, 13, 21, 32-37, 41, 42 A B C D Fig. 5: Representative paths for the du-chain groups in Figure 4. Figure 5 gives the representative paths found for the four du-chain groups in Figure 4. In each case, there are two paths due to line 9. The shorter one that contains no statements in the if branch in lines 9 – 10 is selected. Note that for the du-chain groups C and D in Figure 4, with each containing the definition of className in line 5, the other definition of className in line 7 is skipped. Thus, the else branch of the if statement in lines 9 – 10 will be followed. Finally, for each representative path selected for a du-chain group XR partitioned from DR, we obtain a path-based slice by applying traditional slicing [23], [24] to R. However, we only consider the data and control dependences for the statements on the path and restrict the slice to GR. B A C D 4, 6-8, 9, 11, 12, 13, 17, 18, 21, 22, 32, 33, 34, 35-37, 41, 42 4, 6-8, 9, 11, 12, 13, 17, 18, 21, 22, 32, 33, 34, 35, 36, 38, 39, 41, 42 4, 5, 6, 9, 11, 12, 13, 17, 18, 21, 22, 32, 33, 34, 35, 36, 38, 39, 41, 42 4, 5, 6, 9, 11, 12, 13, 17, 18, 21, 22, 32, 33, 34, 35-37, 41, 42 Fig. 6: Path-based slices for the paths in Figure 5. Figure 6 gives the slices found for the four representative paths in Figure 5. For each path, the statements on the path that are irrelevant to line 42 with respect to this path are crossed out. In addition, line 22 is the only (new) statement added. 2) Reflection Resolution (Dynamic Analysis): For each path-based slice obtained for R, we generate its test cases and discover its reflective targets at R by instrumented execution. a) Stage 3. Automatic Test Case Generation: Each path-based slice has only one execution path. As in symbolic execution [34]–[38], a path condition is collected that comprises all the constraints affecting the execution of the path. However, unlike the prior work, MIRROR models a variety of object-oriented features such as instanceof, isAssignableFrom and type casts in order to generate test cases more comprehensively. In general, we will generate a set of different test inputs that satisfy the path condition so that different reflective targeted can be captured. B A D Args[] ∈ {{“pmd.cpd.SimpleRendererer”}, {“pmd.cpd.XMLRendererer”}, Args[] = {“content.HTTPRequest”} D Infeasible path C No input needed {“pmd.cpd.CSVRendererer”}, {“pmd.cpd.VSRendererer”}} Fig. 7: Test inputs generated for the slices in Figure 6. Figure 7 give the test inputs generated. For slice A, four test cases for Args[0] at line 4 are generated due to pmd.cpd. Renderer.class.isAssignableFrom(clz) in its path condition. Here, SimpleRenderer, XMLRenderer, C SVRenderer and VSRenderer are the four subclasses of interface pmd.cpd.Renderer. For slice B, one test case is generated for Args[0] at line 4 due to content.HTTPRe quest.class.isAssignableFrom(clz). For slice C, no input is needed as all variables are well initialized. For slice D, its path is infeasible due to two inconsistent constraints, className="content.HTTPRequest" and pmd.cpd .Renderer.class.isAssignableFrom(clz). Let us now examine the implications of Observation 1 on the precision of MIRROR. If a conditional, say, x == 0 that affects the execution of R involves a definition of 15
x outside GR,then MIRROR will generate a value of x so du-pair s s2 is central to our algorithm.To this end,we that x =0 holds.If this conditional never evaluates to true make use of the following two functions: during program execution,then MIRROR may report some ·Connect(s1,s2)returns a sub-call graph so that s1→s2. false reflective targets along this path.This has never happened Common(s1,s2)returns a statement reaching s1 and s2. to a set of 10 large Java programs evaluated,indicating that Observation 1 usually holds in real-world applications There are three types of interprocedural du-pairs.We discuss how to build the two functions,as illustrated in Figure 8. b)Stage 4.Instrumented Execution:For each path- based slice obtained for R,we generate an executable program by adding some missing variable declarations.For example, a(){ b(){ every slice in Figure 6 misses the declaration for getM.We s1:b(y)i s2:x=a()i then execute each slice with its test cases generated earlier to resolve the reflective targets at R along its associated path. el. Let us consider the reflective call in line 42.For slice A. the four tostring()target methods in the four subclasses, b(z){ a(){ SimpleRenderer,XMLRenderer,CSVRenderer and s2:x=Z; sl:returny VSRenderer,of interface pmd.cpd.Renderer are found. } For slice B,content.HTTPRequest.getUrl()is dis- covered.For the slice C.this same target is found. (a)Parameter passing (b)Value returning III.THE MIRROR ALGORITHMS c() a(){ We describe the algorithms for realizing the four compo- S3: a(); el S4: b); sl:p.f=.... nents in MIRROR (Figure 2).MIRROR operates on the ICFG of the program expressed in a three-address IR,by making use of the program's call graph G and def-use chains available. e2 Given a program,MIRROR analyzes the reflective calls b(){ reachable from its main (individually.For a given reflective s2:..q.f; call R.D represents the set of du-chains for R.Our call- graph reduction algorithm will reduce G to a substantially smaller subgraph GR that contains a sequence of method calls (c)Field access to establish every du-chain in DR.Our path-based slicing algorithm will build a small number of small path-based slices Fig.8:Building Connect(s1,s2)and Common(s1,s2)for three on GR with its paths leading potentially to all the possible types of interprocedural du-pairs ss2. targets accessed at R.Our automatic test case generator generates the test cases for exercising a (feasible)path-based (1)Parameter Passing (Figure 8(a)).s=s2 denotes a slice.Finally,our instrumentator instruments and executes a parameter-passing dependence,where s1 is a call state- given slice to report the reflective targets accessed at R. ment in method a()that passes an argument used at s2 in method b().As a result,Connect(s1,s2)=fa()b()} A.Call Graph Reduction and Common(s1,s2)=s1. Given a reflective call R,we will reduce the program's call (2)Return value (Figure 8(b))s1=s2 denotes a value- graph G to a subgraph GR,which is substantially smaller returning dependence,where s2 is a call statement in but still allows every du-chain in D to hold.Thus,GR method 60)that receives a value returned from s in suffices to trigger all the possible reflective targets at R while method a().Thus,Connecr(s1,s2)=fa()b()}and keeping the number of false targets reported to a minimum Common(s1,s2)=s2. due to Observation 1.This also enables MIRROR to perform (3)Field Access (Figure 8(c))s1=s2 denotes a field- its subsequent reflection-oriented slicing on a small sub- related dependence,where s is a store p.f =..in call graph,thereby improving the scalability of traditional method a()and s2 is a load...=q.f in method b(). slicing (especially for large object-oriented programs),which Here,p.f and g.f are aliases as p and q may point to is performed on the entire call graph of the program instead. a common object.This also includes the special cases We will build GR so that for every du-chain in DR.GR when s1 appears directly in c()(as if the call to a()at contains a sequence of method calls for the du-chain to hold. s3 is inlined)and/or when s2 appears directly in c()(as It suffices to consider only the interprocedural du-pairs (i.e.. if the call to 6()at s4 is inlined). the du-pairs spanning two different methods)in DR,since the We compute Connect(s1,s2)by performing BFS on the resulting G will naturally include all the intraprocedural du- program's call graph G backwards,starting from the use pairs in DR.To construct GR.we grow it incrementally by s2.We will stop at the first method m such that processing all the interprocedural du-pairs in turn.Therefore, 1)m is a direct or indirect caller of b()or contains s2 (as how to find a sub-call graph to establish one interprocedural is the case when s4 is replaced by s2),and 16
x outside GR, then MIRROR will generate a value of x so that x == 0 holds. If this conditional never evaluates to true during program execution, then MIRROR may report some false reflective targets along this path. This has never happened to a set of 10 large Java programs evaluated, indicating that Observation 1 usually holds in real-world applications. b) Stage 4. Instrumented Execution: For each pathbased slice obtained for R, we generate an executable program by adding some missing variable declarations. For example, every slice in Figure 6 misses the declaration for getM. We then execute each slice with its test cases generated earlier to resolve the reflective targets at R along its associated path. Let us consider the reflective call in line 42. For slice A, the four toString() target methods in the four subclasses, SimpleRenderer, XMLRenderer, CSVRenderer and VSRenderer, of interface pmd.cpd.Renderer are found. For slice B, content.HTTPRequest.getUrl() is discovered. For the slice C, this same target is found. III. THE MIRROR ALGORITHMS We describe the algorithms for realizing the four components in MIRROR (Figure 2). MIRROR operates on the ICFG of the program expressed in a three-address IR, by making use of the program’s call graph G and def-use chains available. Given a program, MIRROR analyzes the reflective calls reachable from its main() individually. For a given reflective call R, DR represents the set of du-chains for R. Our callgraph reduction algorithm will reduce G to a substantially smaller subgraph GR that contains a sequence of method calls to establish every du-chain in DR. Our path-based slicing algorithm will build a small number of small path-based slices on GR with its paths leading potentially to all the possible targets accessed at R. Our automatic test case generator generates the test cases for exercising a (feasible) path-based slice. Finally, our instrumentator instruments and executes a given slice to report the reflective targets accessed at R. A. Call Graph Reduction Given a reflective call R, we will reduce the program’s call graph G to a subgraph GR, which is substantially smaller but still allows every du-chain in DR to hold. Thus, GR suffices to trigger all the possible reflective targets at R while keeping the number of false targets reported to a minimum due to Observation 1. This also enables MIRROR to perform its subsequent reflection-oriented slicing on a small subcall graph, thereby improving the scalability of traditional slicing (especially for large object-oriented programs), which is performed on the entire call graph of the program instead. We will build GR so that for every du-chain in DR, GR contains a sequence of method calls for the du-chain to hold. It suffices to consider only the interprocedural du-pairs (i.e., the du-pairs spanning two different methods) in DR, since the resulting GR will naturally include all the intraprocedural dupairs in DR. To construct GR, we grow it incrementally by processing all the interprocedural du-pairs in turn. Therefore, how to find a sub-call graph to establish one interprocedural du-pair s1 ⇒ s2 is central to our algorithm. To this end, we make use of the following two functions: • Connect(s1, s2) returns a sub-call graph so that s1 ⇒ s2. • Common(s1, s2) returns a statement reaching s1 and s2. There are three types of interprocedural du-pairs. We discuss how to build the two functions, as illustrated in Figure 8. a(){ s1: } b(y); e1 b(z){ s2: } x = z; b(){ s2: } x = a(); a(){ s1: } e1 return y; (a) Parameter passing (b) Value returning c(){ s3: s4: a(); b(); } b(){ s2: } … = e2 q.f; a(){ s1: } p.f=…; e1 (c) Field access Fig. 8: Building Connect(s1, s2) and Common(s1, s2) for three types of interprocedural du-pairs s1 ⇒ s2. (1) Parameter Passing (Figure 8(a)). s1 ⇒ s2 denotes a parameter-passing dependence, where s1 is a call statement in method a() that passes an argument used at s2 in method b(). As a result, Connect(s1, s2) = {a() → b()} and Common(s1, s2) = s1. (2) Return value (Figure 8(b)) s1 ⇒ s2 denotes a valuereturning dependence, where s2 is a call statement in method b() that receives a value returned from s1 in method a(). Thus, Connect(s1, s2) = {a() → b()} and Common(s1, s2) = s2. (3) Field Access (Figure 8(c)) s1 ⇒ s2 denotes a fieldrelated dependence, where s1 is a store p.f = ··· in method a() and s2 is a load ··· = q.f in method b(). Here, p.f and q.f are aliases as p and q may point to a common object. This also includes the special cases when s1 appears directly in c() (as if the call to a() at s3 is inlined) and/or when s2 appears directly in c() (as if the call to b() at s4 is inlined). We compute Connect(s1, s2) by performing BFS on the program’s call graph G backwards, starting from the use s2. We will stop at the first method m such that 1) m is a direct or indirect caller of b() or contains s2 (as is the case when s4 is replaced by s2), and 16