Program Tailoring:Slicing by Sequential Criteria rtifac Yue Li*,Tian Tan*,Yifei Zhang,and Jingling Xue School of Computer Science and Engineering,UNSW Australia -Abstract Protocol and typestate analyses often report some sequences of statements ending at a program point P that needs to be scrutinized,since P may be erroneous or imprecisely analyzed.Program slicing focuses only on the behavior at P by computing a slice of the program affecting the values at P.In this paper,we propose to restrict our attention to the subset of that behavior at P affected by one or several statement sequences,called a sequential criterion (SC).By leveraging the ordering information in a SC,e.g.,the temporal order in a few valid/invalid API method invocation sequences,we introduce a new technique,program tailoring,to compute a tailored program that comprises the statements in all possible execution paths passing through at least one sequence in SC in the given order.With a prototyping implementation,TAILOR,we show why tailoring is practically useful by conducting two case studies on seven large real-world Java applications.For program debugging and understanding,TAILOR can complement program slicing by removing SCirrelevant statements.For program analysis,TAILOR can enable a pointer analysis,which is unscalable to a program,to perform a more focused and therefore potentially scalable analysis to its SC-relevant parts containing hard language features such as reflection. 1998 ACM Subject Classification F.3.2 Semantics of Programming Languages-Program Ana- lysis,D.2.5 Testing and Debugging-Code inspections and Debugging aids Keywords and phrases Program Slicing,Program Analysis,API Protocol Analysis Digital Object Identifier XXX/LIPIcs.ECOOP.2016. 1 Introduction Program slicing,supported by industry-strength tools,such as WALA [52]and CodeSurfer [18], has found many diverse applications,such as program debugging,comprehension,analysis testing,verification and optimization 8,20,43,49.Given a slicing criterion consisting of a program point P and several variables used at P [53],program slicing computes a slice of the program that may affect their values at P in terms of data and control dependences.In the past three decades,several variations on this theme of program slicing have been proposed, including static vs.dynamic,backward vs.forward,and closure vs.executable 43,49. In practice,API protocol analysis [7,37]and typestate analysis [9,16,34]often report some statement sequences ending at a program point P that needs to be scrutinized,since P may be erroneous or imprecisely analyzed.Each sequence can represent a valid or invalid API usage call sequence.Such protocol specifications or violations can also be provided manually or mined automatically [1,3,17,36,42,60.As such analyses are either conservative or unsound,the temporal order specified in a sequence may or may not be feasible.However, program slicing focuses on P by ignoring this order,which is often essential for analyzing P. As illustrated in Figure 1,we introduce a new technique,program tailoring to reap additional benefits missed by program slicing at a point P(e.g.,file.write())by leveraging the temporal order specified in a new criterion,called a sequential criterion (SC)for P.A These authors contributed equally to this work Yue Li,Tian Tan,Vifei Zhang.and Jingling Xue licensed under Creative Commons License CC-BY 30th European Conference on Object Oriented Programming (ECOOP 2016). Leibniz International Proceedings in Informatics LIPICS Schloss Dagstuhl-Leibniz-Zentrum fuir Informatik,Dagstuhl Publishing,Germany
C onsistent * Complete * Well D ocumen et d t ysaE * o Reuse * * Eva ul det a * EC O O P * Artifact * AE C Program Tailoring: Slicing by Sequential Criteria Yue Li∗ , Tian Tan∗ , Yifei Zhang, and Jingling Xue School of Computer Science and Engineering, UNSW Australia Abstract Protocol and typestate analyses often report some sequences of statements ending at a program point P that needs to be scrutinized, since P may be erroneous or imprecisely analyzed. Program slicing focuses only on the behavior at P by computing a slice of the program affecting the values at P. In this paper, we propose to restrict our attention to the subset of that behavior at P affected by one or several statement sequences, called a sequential criterion (SC). By leveraging the ordering information in a SC, e.g., the temporal order in a few valid/invalid API method invocation sequences, we introduce a new technique, program tailoring, to compute a tailored program that comprises the statements in all possible execution paths passing through at least one sequence in SC in the given order. With a prototyping implementation, Tailor, we show why tailoring is practically useful by conducting two case studies on seven large real-world Java applications. For program debugging and understanding, Tailor can complement program slicing by removing SC-irrelevant statements. For program analysis, Tailor can enable a pointer analysis, which is unscalable to a program, to perform a more focused and therefore potentially scalable analysis to its SC-relevant parts containing hard language features such as reflection. 1998 ACM Subject Classification F.3.2 Semantics of Programming Languages - Program Analysis, D.2.5 Testing and Debugging - Code inspections and Debugging aids Keywords and phrases Program Slicing, Program Analysis, API Protocol Analysis Digital Object Identifier XXX/LIPIcs.ECOOP.2016. 1 Introduction Program slicing, supported by industry-strength tools, such as WALA [52] and CodeSurfer [18], has found many diverse applications, such as program debugging, comprehension, analysis, testing, verification and optimization [8, 20, 43, 49]. Given a slicing criterion consisting of a program point P and several variables used at P [53], program slicing computes a slice of the program that may affect their values at P in terms of data and control dependences. In the past three decades, several variations on this theme of program slicing have been proposed, including static vs. dynamic, backward vs. forward, and closure vs. executable [43, 49]. In practice, API protocol analysis [7, 37] and typestate analysis [9, 16, 34] often report some statement sequences ending at a program point P that needs to be scrutinized, since P may be erroneous or imprecisely analyzed. Each sequence can represent a valid or invalid API usage call sequence. Such protocol specifications or violations can also be provided manually or mined automatically [1, 3, 17, 36, 42, 60]. As such analyses are either conservative or unsound, the temporal order specified in a sequence may or may not be feasible. However, program slicing focuses on P by ignoring this order, which is often essential for analyzing P. As illustrated in Figure 1, we introduce a new technique, program tailoring to reap additional benefits missed by program slicing at a point P (e.g., file.write()) by leveraging the temporal order specified in a new criterion, called a sequential criterion (SC) for P. A ∗ These authors contributed equally to this work © Yue Li, Tian Tan, Yifei Zhang, and Jingling Xue; licensed under Creative Commons License CC-BY 30th European Conference on Object Oriented Programming (ECOOP 2016). Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
XX:2 Program Tailoring:Slicing by Sequential Criteria Sliced Program [write] SC-Irrelevant Statements Traditiona Slicing-Related Criterion lechnigues Analysis Program Debugging Outputs Tools Program Understanding Program aoen close Criterion(SC) Tailoring E.g.,API Protocol write (Error) Analysis 【open+close-+write】 Open→write Tailored Program Program Analysis Figure 1 Program tailoring with some of its potential applications highlighted SCp consists of one or several statement sequences ending at P,with each representing,e.g., a valid API call sequence like file.open()-file.write()or an invalid API call sequence like file.open()-file.close()->file.write().In what follows,we will drop P from SCp when the context is clear or when we are not interested in it.Given a SCp,tailoring aims to obtain a tailored program,denoted T(SCp),that comprises the statements in all the possible execution paths passing through at least one statement sequence in SCp in the given order.By construction,all data and control dependences needed for understanding the behavior at P affected by SCp are included.Any statement that is not in T(SCp)is irrelevant to SCp,i.e.,SCp-irrelevant.For illustration purposes,we write S(P)to represent the(backward)slice affecting P obtained by program slicing.Note that slicing all the points in SCp independently still fails to capture their ordering constraint(and is unscalable,too). Like slicing,tailoring enables software developers or client applications to inspect only the interesting parts of a program.Unlike slicing,which focuses on understanding the program behavior at P,tailoring restricts our attention to the subset of that behavior affected by SCp only.Due to incompatible criteria used,tailoring can be used either as a complementary technique to slicing or in cases where slicing is ineffective,as discussed below. 1.1 Goals and Motivations Given a SCp,we have developed a prototyping implementation of program tailoring,denoted TAILOR,for Java programs,with the following three goals in mind: Precision TAILOR is designed to sharpen the precision of many client applications,as highlighted in Figure 1,by exploiting the temporal order in a SCp.One significant class of client applications includes many slicing-related techniques,such as thin slicing [47], program chopping 22 and value slicing 26.For a given program,T(SCp)is shown as the blue circle and S(P)as the white circle.The statements in Z(SCp)=S(P)-S(P)n T(SCp)are SCp-irrelevant and can thus be pruned away to facilitate program debugging and understanding by a human.If Z(SCp)=,then TAILOR is ineffective at P but no harm is done.If I(SCp),then TAILOR can make slicing more precise,by leveraging the otherwise wasted SCp information that is widely available.In Section 6.1,we show that TAILOR can improve the precision of thin slicing [47],a state-of-the-art practical but unsound slicing technique,for Java programs.In Section 6.2,we show that TAILOR can enable a sophisticated pointer analysis for Java,S-2OBJ [24],which is unscalable for a program,to perform a focused analysis on its SC-relevant parts containing hard language features such as refection,where existing slicing techniques are ineffective. Scalability TAILOR is designed to work efficiently for large object-oriented programs,for which traditional slicing [21]is unscalable(even with industry-strength implementations [18,521), with the key bottleneck coming from handling of the heap [47].Like any slicing tool, TAILOR is not always scalable.However,TAILOR is designed to scale significantly better
XX:2 Program Tailoring: Slicing by Sequential Criteria Analysis E.g., API Protocol Analysis Outputs open close write (Error) Tailoring Sequential Program Criterion (SC) [open close write] Traditional Criterion [write] Techniques Slicing-Related Program Debugging Program Understanding … ... Program Analysis SC-Irrelevant Statements open write [open write] Sliced Program Tailored Program Tools Figure 1 Program tailoring with some of its potential applications highlighted. SC P consists of one or several statement sequences ending at P, with each representing, e.g., a valid API call sequence like file.open() → file.write() or an invalid API call sequence like file.open() → file.close() → file.write(). In what follows, we will drop P from SC P when the context is clear or when we are not interested in it. Given a SC P , tailoring aims to obtain a tailored program, denoted T (SC P ), that comprises the statements in all the possible execution paths passing through at least one statement sequence in SC P in the given order. By construction, all data and control dependences needed for understanding the behavior at P affected by SC P are included. Any statement that is not in T (SC P ) is irrelevant to SC P , i.e., SCP -irrelevant. For illustration purposes, we write S(P) to represent the (backward) slice affecting P obtained by program slicing. Note that slicing all the points in SC P independently still fails to capture their ordering constraint (and is unscalable, too). Like slicing, tailoring enables software developers or client applications to inspect only the interesting parts of a program. Unlike slicing, which focuses on understanding the program behavior at P, tailoring restricts our attention to the subset of that behavior affected by SC P only. Due to incompatible criteria used, tailoring can be used either as a complementary technique to slicing or in cases where slicing is ineffective, as discussed below. 1.1 Goals and Motivations Given a SC P , we have developed a prototyping implementation of program tailoring, denoted Tailor, for Java programs, with the following three goals in mind: Precision Tailor is designed to sharpen the precision of many client applications, as highlighted in Figure 1, by exploiting the temporal order in a SC P . One significant class of client applications includes many slicing-related techniques, such as thin slicing [47], program chopping [22] and value slicing [26]. For a given program, T (SC P ) is shown as the blue circle and S(P) as the white circle. The statements in I(SC P ) = S(P) − S(P) ∩ T (SC P ) are SC P -irrelevant and can thus be pruned away to facilitate program debugging and understanding by a human. If I(SC P ) = ∅, then Tailor is ineffective at P but no harm is done. If I(SC P ) 6= ∅, then Tailor can make slicing more precise, by leveraging the otherwise wasted SC P information that is widely available. In Section 6.1, we show that Tailor can improve the precision of thin slicing [47], a state-of-the-art practical but unsound slicing technique, for Java programs. In Section 6.2, we show that Tailor can enable a sophisticated pointer analysis for Java, S-2Obj [24], which is unscalable for a program, to perform a focused analysis on its SC-relevant parts containing hard language features such as reflection, where existing slicing techniques are ineffective. Scalability Tailor is designed to work efficiently for large object-oriented programs, for which traditional slicing [21] is unscalable (even with industry-strength implementations [18,52]), with the key bottleneck coming from handling of the heap [47]. Like any slicing tool, Tailor is not always scalable. However, Tailor is designed to scale significantly better
Y.Li,T.Tan,Y.Zhang,and J.Xue XX:3 than traditional slicing,as TAILOR avoids handling of the heap by reasoning about essentially the (un)reachability of a statement towards the statement sequences in a SCp. Soundiness TAILOR is designed to be a practical tool to facilitate programming debugging and understanding as well as program analysis (among others)for large object-oriented programs.In this setting,any sound static analysis would be either unscalable or imprecise due to the presence of many hard language features,such as native code,dynamic class loading,reflection and multi-threading [30].Therefore,TAILOR is designed to be sound whenever a sound graph representation is available for capturing all the control flows in a (single-or multi-threaded)program.In other words,TAILOR is always sound with respect to part of the program behavior modeled.According to [30],"a soundy analysis aims to be as sound as possible without excessively compromising precision and/or scalability." Therefore,TAILOR represents one such soundy analysis. 1.2 Challenges and Solutions We examine some challenges faced in achieving our three goals and describe our solutions: Precision/Scalability Tradeoffs How do we compute T(SC)efficiently and precisely for large object-oriented programs?Due to exponential blowup of program paths,both DFS and BFS are out of question.We formulate the problem of computing T(SC)by solving an IFDS(Interprocedural Finite Distributive Subset)data-flow analysis problem [38], efficiently on its interprocedural control-flow graph(ICFG)representation of the program. To avoid unrealizable paths with mismatched calls and returns,our analysis must be (fully)context-sensitive in order to achieve useful precision for Java programs.Otherwise, many unrealizable paths,which go through the constructor of java.lang.Object,cannot be filtered out,causing T(SC)to be severely over-approximated.However,distinguishing calling contexts fully with call strings,object-sensitivity [33],and method cloning 54]are known to be unscalable for large programs [12,44].We achieve (full)context sensitivity by solving a CFL-reachability problem over a balanced-parentheses language by matching call and return edges also in the IFDS framework as described in 38. How do we ensure that TAILOR works effectively for a SC of any given length,which is defined to be the number of statements in its longest statement sequence?In general, the longer a SC is,the more SC-irrelevant statements will be removed.We propose to lengthen any given SCby leveraging the concept of object-sensitivity [33,44]developed by the pointer analysis community for Java.As a result,some infeasible paths that would otherwise be introduced are avoided.We will also try to avoid making SC extensions that make our analysis run longer but contribute nothing to precision improvement. Soundiness How do we make TAILOR as soundly as possible?We decompose the problem of tailoring a program for a given SC into two sub-problems:(1)building an ICFG,GICFG, for the program and(2)computing T(SC)from GICFG.TAILOR is sound if GICFG is sound(representation of all control flows in the program).In fact,TAILOR is designed to be practically useful for analyzing the program behavior modeled by GICFG even if GICFG is unsound.This paper solves(2)while resorting to the state-of-the-art for(1). 1.3 Contributions We introduce program tailoring,a new technique for trimming a program based on SCs, which are widely available from other analyses but never exploited by program slicing. We describe how to extend a given SC for object-oriented programs in order to improve the precision of program tailoring (by making tailored programs smaller). EC00P2016
Y. Li, T. Tan, Y. Zhang, and J. Xue XX:3 than traditional slicing, as Tailor avoids handling of the heap by reasoning about essentially the (un)reachability of a statement towards the statement sequences in a SC P . Soundiness Tailor is designed to be a practical tool to facilitate programming debugging and understanding as well as program analysis (among others) for large object-oriented programs. In this setting, any sound static analysis would be either unscalable or imprecise due to the presence of many hard language features, such as native code, dynamic class loading, reflection and multi-threading [30]. Therefore, Tailor is designed to be sound whenever a sound graph representation is available for capturing all the control flows in a (single- or multi-threaded) program. In other words, Tailor is always sound with respect to part of the program behavior modeled. According to [30], “a soundy analysis aims to be as sound as possible without excessively compromising precision and/or scalability.” Therefore, Tailor represents one such soundy analysis. 1.2 Challenges and Solutions We examine some challenges faced in achieving our three goals and describe our solutions: Precision/Scalability Tradeoffs How do we compute T (SC) efficiently and precisely for large object-oriented programs? Due to exponential blowup of program paths, both DFS and BFS are out of question. We formulate the problem of computing T (SC) by solving an IFDS (Interprocedural Finite Distributive Subset) data-flow analysis problem [38], efficiently on its interprocedural control-flow graph (ICFG) representation of the program. To avoid unrealizable paths with mismatched calls and returns, our analysis must be (fully) context-sensitive in order to achieve useful precision for Java programs. Otherwise, many unrealizable paths, which go through the constructor of java.lang.Object, cannot be filtered out, causing T (SC) to be severely over-approximated. However, distinguishing calling contexts fully with call strings, object-sensitivity [33], and method cloning [54] are known to be unscalable for large programs [12, 44]. We achieve (full) context sensitivity by solving a CFL-reachability problem over a balanced-parentheses language by matching call and return edges also in the IFDS framework as described in [38]. How do we ensure that Tailor works effectively for a SC of any given length, which is defined to be the number of statements in its longest statement sequence? In general, the longer a SC is, the more SC-irrelevant statements will be removed. We propose to lengthen any given SC by leveraging the concept of object-sensitivity [33, 44] developed by the pointer analysis community for Java. As a result, some infeasible paths that would otherwise be introduced are avoided. We will also try to avoid making SC extensions that make our analysis run longer but contribute nothing to precision improvement. Soundiness How do we make Tailor as soundly as possible? We decompose the problem of tailoring a program for a given SC into two sub-problems: (1) building an ICFG, GICFG, for the program and (2) computing T (SC) from GICFG. Tailor is sound if GICFG is sound (representation of all control flows in the program). In fact, Tailor is designed to be practically useful for analyzing the program behavior modeled by GICFG even if GICFG is unsound. This paper solves (2) while resorting to the state-of-the-art for (1). 1.3 Contributions We introduce program tailoring, a new technique for trimming a program based on SCs, which are widely available from other analyses but never exploited by program slicing. We describe how to extend a given SC for object-oriented programs in order to improve the precision of program tailoring (by making tailored programs smaller). E COO P 2 0 1 6
XX:4 Program Tailoring:Slicing by Sequential Criteria We formulate the problem of computing a tailored program as one of solving a data-flow problem efficiently in the IFDS framework.TAILOR,which is implemented in Soor [51], is released as an open-source tool at http://www.cse.unsw.edu.au/~corg/tailor We describe two case studies to demonstrate why TAILOR is practically useful on a set of seven large real-world Java programs,by(1)assisting program slicing with program debugging and understanding tasks,and(2)enabling a focused pointer analysis on the parts of a program containing hard language features such as reflection. 2 A Motivating Example We use an example to describe how TAILOR can assist slicing tools to simplify program debugging and understanding tasks through exploiting the temporal ordering information in a given SCthat is otherwise ignored by program slicing.In Section 6.2,we provide additional motivations on why TAILOR can be useful in simplifying program analysis tasks. Large object-oriented programs are very difficult to debug and understand,due to the pervasive use of heap-allocated data,nested data structures,and large libraries with complex dependences and configurations.Tracing the flow of values via multiple levels of pointer indirection through the heap across many classes in both the application and libraries is unworkable.A practical tool is needed to pinpoint relevant statements for the task at hand. Our Java example is given in Figure 2.The Driver class is used to create and initialize a Driver object according to some user input(lines 33-36)or by default (lines 37-41).Then the corresponding initialization information stored in info is dumped to a log at line 42. 1 class Driver{ void main(String[]args){ Writer fw new FileWriter(..): 32 Driver d:String info; Driver (String s){...) if (args[o].equals(...)){ Driver0{】 3 d new Driver(args[o]): void config(String[]args){ d.config(args); Writer bw=new BufferedWriter(fw): info getConfiglnfo(args): > for(int i=1:i<args.length;i++) 3> else{ 8 bw.write(args间+"n: 3 d=new Driver(; bw.close(): File f getSystemFile(...): 10 } 4 info=getSystemlnfo(f): void log(String info){ fw.write(info); 2 d.log(info); 13 433} 4} :15 class FileWriter{ 23 class BufferedWriter{ 16 boolean isopen tre Writer out: 17 void close isOpen false; BufferedWriter(Writer w)fout w.} :18 void write(...){ 26 void close({ 19 if(isOpen) 27 out.close(): :20 throw new IOException(); 28 21 29} 221 30 Figure 2 An example showing how TAILOR removes SCline 12-irrelevant statements. This example has a typical error found in Java programs caused by ignoring the fact that closing a wrapper file handler will also close its internal file handler.The internal file handler,fw is passed as an argument at line 6 and assigned to out at line 25.Later,closing its wrapper file handler,bw,at line 9 will also close out (i.e.,fw)at line 27.Then any further access to a closed fw(e.g.,at line 12)will trigger a runtime exception at line 20. Now we use a static typestate analysis tool CLARA [9]to analyze this program.Some
XX:4 Program Tailoring: Slicing by Sequential Criteria We formulate the problem of computing a tailored program as one of solving a data-flow problem efficiently in the IFDS framework. Tailor, which is implemented in Soot [51], is released as an open-source tool at http://www.cse.unsw.edu.au/~corg/tailor. We describe two case studies to demonstrate why Tailor is practically useful on a set of seven large real-world Java programs, by (1) assisting program slicing with program debugging and understanding tasks, and (2) enabling a focused pointer analysis on the parts of a program containing hard language features such as reflection. 2 A Motivating Example We use an example to describe how Tailor can assist slicing tools to simplify program debugging and understanding tasks through exploiting the temporal ordering information in a given SC that is otherwise ignored by program slicing. In Section 6.2, we provide additional motivations on why Tailor can be useful in simplifying program analysis tasks. Large object-oriented programs are very difficult to debug and understand, due to the pervasive use of heap-allocated data, nested data structures, and large libraries with complex dependences and configurations. Tracing the flow of values via multiple levels of pointer indirection through the heap across many classes in both the application and libraries is unworkable. A practical tool is needed to pinpoint relevant statements for the task at hand. Our Java example is given in Figure 2. The Driver class is used to create and initialize a Driver object according to some user input (lines 33 – 36) or by default (lines 37 – 41). Then the corresponding initialization information stored in info is dumped to a log at line 42. class Driver { Writer fw = new FileWriter(...); Driver (String s) {…} Driver () {…} void config(String[] args) { Writer bw = new BufferedWriter(fw); for(int i = 1; i < args.length; i++) bw.write(args[i] + "\n"); bw.close(); } void log(String info) { fw.write(info); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 class FileWriter { boolean isOpen = true; void close() { isOpen = false; } if (!isOpen) throw new IOException(); } } void write(...) { 15 16 17 18 19 20 21 22 class BufferedWriter { Writer out; void close() { } } out.close(); BufferedWriter(Writer w) {out = w;} 23 24 25 26 27 28 29 30 void main(String[] args) { Driver d; String info; if (args[0].equals(…)) { d = new Driver(args[0]); } else { } d.config(args); info = getConfigInfo(args); d = new Driver(); File f = getSystemFile(...); info = getSystemInfo(f); d.log(info); } 40 41 42 43 31 32 33 34 35 36 37 38 39 Figure 2 An example showing how Tailor removes SCline 12-irrelevant statements. This example has a typical error found in Java programs caused by ignoring the fact that closing a wrapper file handler will also close its internal file handler. The internal file handler, fw is passed as an argument at line 6 and assigned to out at line 25. Later, closing its wrapper file handler, bw, at line 9 will also close out (i.e., fw) at line 27. Then any further access to a closed fw (e.g., at line 12) will trigger a runtime exception at line 20. Now we use a static typestate analysis tool Clara [9] to analyze this program. Some
Y.Li,T.Tan,Y.Zhang,and J.Xue XX:5 Potential Point of Failure typestate specifications regarding file operations used will in- Statement:fw.write(info)at line 12 clude "accessing a closed file leads to an error state".For our Related Program Points: Statement out.close()at line 27 example,CLARA produces an error report shown on the left Statement:new FileWriter(..)at line 2 marking line 12 as a"potential point of failure",together with a sequence of two method calls leading to line 12.We have one SCline 12:line 2-line 27-line 12.As static analyses like CLARA are either conservative or unsound,there may or may not be an error at line 12.Now our debugging task begins. The error at line 12 happens only when a Driver object is created in the if branch of main().Therefore,a tool that instructs the developer to examine this if branch only would enable its cause to be identified quickly.In contrast,marking some lines in its matching else branch as also being relevant can increase human analysis effort significantly(especially if the developer has to trace the flow of values across many nested heap structures). Traditional Slicing For large object-oriented programs,traditional(sound)slicing [21,53] is unscalable or yields slices that are too large for human consumption 47.Given a virtual call fw.write(info)at line 12,the set of variables of interest consists of(1)the receiver reference and the arguments of the call [52],and(2)some relevant variables selected manually or recognized automatically [2.13.In our example,these variables form the set {fw,info,fw.isOpen}.Then,a(backward)slice that affects their values comprises all the statements except lines 7-8 and 19-20.This slice,which includes everything in main(), contains too many statements that are not all directly relevant to the task at line 12. Thin Slicing Thin slicing [47]is introduced to facilitate program debugging and understand- ing for object-oriented programs by trading soundness for scalability and(direct)relevance. All control dependences and all base pointer data dependences are excluded.Given a point of interest,thin slicing includes only so-called producer statements that affect directly the values at the point.Statements that serve to explain why producer statements affect the point are ignored.For example,given x=p.f and q.f=y,where p and q may be aliased, q.f=y is a producer statement for r=p.f,because there may be a direct value flow from y to z.All statements that help explain or establish why p and q are aliases are ignored. If we adopt the same slicing criterion at line 12 as above,thin slicing will include only seven statements at lines 2,12,16.17,36,39 and 40.Compared with the traditional slice obtained,this smaller slice still retains line 17,a statement for explaining an immediate cause of the error at line 12.However,two SCline 12-irrelevant statements at lines 39-40(in the else branch of main())are also present,which can cost human analysis effort unnecessarily. Program Tailoring Given SCline 12:line 2-line 27-line 12,TAILOR produces a tailored program consisting of all the statements in all execution paths passing through lines 2,27 and 12 in that order.As line 27 is only reachable from line 9,which resides in the config method invoked at line 35,the tailored program includes all the lines in the example except lines 37-41 that appear in the else branch of main()and lines 19-20. Let us revisit the two slices obtained above,the traditional slice,Ptrad and the thin slice,Pthin.Let our tailored program be identified as Ptal.Despite Pthin<Ptall<Ptradl, tailoring brings several benefits,obtained from exploiting the temporal order of invocation sequences in SCline 12.First,Ptal is the only one that includes nothing from the else branch of main(),revealing more clearly to a human debugger that the potential error at line 12 is triggered by a Driver object created in the if branch of main()("according to some user input")rather than in its matching else branch ("according to some default configuration").Such contextual information enables the developer to locate the cause for the error at line 12 more quickly.In the case of Ptrad and Pthin,the developer may end up wasting a lot of analysis effort on navigating through a lot of SCline 12-irrelevant code, EC00P2016
Y. Li, T. Tan, Y. Zhang, and J. Xue XX:5 Potential Point of Failure : Statement: fw.write(info) at line 12 Related Program Points : Statement: out.close() at line 27 Statement: new FileWriter(...) at line 2 typestate specifications regarding file operations used will include “accessing a closed file leads to an error state”. For our example, Clara produces an error report shown on the left, marking line 12 as a “potential point of failure”, together with a sequence of two method calls leading to line 12. We have one SCline 12 : line 2 → line 27 → line 12. As static analyses like Clara are either conservative or unsound, there may or may not be an error at line 12. Now our debugging task begins. The error at line 12 happens only when a Driver object is created in the if branch of main(). Therefore, a tool that instructs the developer to examine this if branch only would enable its cause to be identified quickly. In contrast, marking some lines in its matching else branch as also being relevant can increase human analysis effort significantly (especially if the developer has to trace the flow of values across many nested heap structures). Traditional Slicing For large object-oriented programs, traditional (sound) slicing [21, 53] is unscalable or yields slices that are too large for human consumption [47]. Given a virtual call fw.write(info) at line 12, the set of variables of interest consists of (1) the receiver reference and the arguments of the call [52], and (2) some relevant variables selected manually or recognized automatically [2, 13]. In our example, these variables form the set {fw, info, fw.isOpen}. Then, a (backward) slice that affects their values comprises all the statements except lines 7 – 8 and 19 – 20. This slice, which includes everything in main(), contains too many statements that are not all directly relevant to the task at line 12. Thin Slicing Thin slicing [47] is introduced to facilitate program debugging and understanding for object-oriented programs by trading soundness for scalability and (direct) relevance. All control dependences and all base pointer data dependences are excluded. Given a point of interest, thin slicing includes only so-called producer statements that affect directly the values at the point. Statements that serve to explain why producer statements affect the point are ignored. For example, given x = p.f and q.f = y, where p and q may be aliased, q.f = y is a producer statement for x = p.f, because there may be a direct value flow from y to x. All statements that help explain or establish why p and q are aliases are ignored. If we adopt the same slicing criterion at line 12 as above, thin slicing will include only seven statements at lines 2, 12, 16, 17, 36, 39 and 40. Compared with the traditional slice obtained, this smaller slice still retains line 17, a statement for explaining an immediate cause of the error at line 12. However, two SCline 12-irrelevant statements at lines 39 – 40 (in the else branch of main()) are also present, which can cost human analysis effort unnecessarily. Program Tailoring Given SCline 12 : line 2→line 27→line 12, Tailor produces a tailored program consisting of all the statements in all execution paths passing through lines 2, 27 and 12 in that order. As line 27 is only reachable from line 9, which resides in the config method invoked at line 35, the tailored program includes all the lines in the example except lines 37 – 41 that appear in the else branch of main() and lines 19 – 20. Let us revisit the two slices obtained above, the traditional slice, Ptrad and the thin slice, Pthin. Let our tailored program be identified as Ptal. Despite |Pthin| < |Ptal| < |Ptrad|, tailoring brings several benefits, obtained from exploiting the temporal order of invocation sequences in SCline 12. First, Ptal is the only one that includes nothing from the else branch of main(), revealing more clearly to a human debugger that the potential error at line 12 is triggered by a Driver object created in the if branch of main() (“according to some user input”) rather than in its matching else branch (“according to some default configuration”). Such contextual information enables the developer to locate the cause for the error at line 12 more quickly. In the case of Ptrad and Pthin, the developer may end up wasting a lot of analysis effort on navigating through a lot of SCline 12-irrelevant code, E COO P 2 0 1 6