XX:6 Program Tailoring:Slicing by Sequential Criteria highlighted by getSystemInfo()and getSystemFile(),in the else branch.Second,any statement that is not included in Ptal is SCline 12-irrelevant for understanding the SC-specific behavior at line 12.Thus,TAILOR can make,e.g.,thin slicing more effective,by removing the SCline 12-irrelevant statements Pthin-Pthin nPtal,i.e.,lines 39 and 40,from Pthin.Then Pthin is down to only five statements at lines 2,12,16,17 and 36.Starting from line 17 (an immediate cause for the error at line 12),the developer can trace the flow of isOpen backwards to find the original cause.Finally,Ptal includes all data and control dependences reaching line 12 affected by SCline 12,enabling it to be analyzed further by other analyses, e.g.,a pointer analysis,as will be discussed in Section 6.However,Ptrad and Pthin will not be applicable since Ptrad is unobtainable scalably for large programs and Pthin is unsound. Note that Ptal still contains lines 7-8 that do not affect SCline 12.Removing all such irrelevant statements for large programs may be neither necessary (due to the first two points made earlier)nor practical,as a sound slicing tool would be unscalable.Thus,we have designed TAILOR based on the precision/scalability tradeoffs described in Section 1. Figure 3 recaps our insight behind tailoring.Given a SC:A-B-C.we focus on the behavior at C affected ntra-proco by a sequential execution of A,B and C.If one point in SC is reached from only one branch (e.g.,the one containing A) Figure 3 Leveraging the order- of a multi-way branching statement,then the statements in ing information SC:A-BC the other branches are SC-irrelevant and can be trimmed to trim irrelevant statements away.away (x).This is particularly suitable for object-oriented languages,since a virtual call site behaves as a multi-way branch switching to its target methods.For example,B can be regarded as residing in a target method invoked at the marked virtual call on a receiver object created only at the allocation site at A.Thus,the other target methods unreachable to C are trimmed away (x). 3 Methodology Figure 4 gives an overview of TAILOR,with all the components implemented in this paper highlighted in blue.Given a program,we rely on the state-of-the-art (shown as "ICFG Construction")to build an inter-procedural control flow graph(ICFG),denoted GICFG, to represent all the possible control flows in the program.A SC is simply a set of one or more statement sequences ending at the same statement,with all statements identified by their line numbers,i.e.,program points only.The length of a SC is the number of statements in its longest sequence.SCs can be deduced from the results returned by many analysis tools such as API protocol analysis [7,37]and typestate analysis [9,16,34].For example,a typestate analysis may report a potential error at line C,f.write(),together with two invocation sequences of related methods,A:f.open()B1:f1.close()and A:f.open()B2:f2.close(),leading to line C.Therefore,we may choose to tailor the program at C to investigate its behavior affected by SC={A→B1→C,A→B2→C}. Given a SC,a tailored program,T(SC),consists of the statements on all possible execution paths in GIcFc passing through at least one statement sequence in SC.By exploiting the temporal ordering information in SC,it is possible to scale tailoring significantly better than slicing for large object-oriented programs and makes it a practically useful technique. TAILOR is sound as it computes T(SC)over-approximately with respect to GICFG.In contrast,traditional(sound)slicing [21]is unscalable when GIcFG is large and thin slicing [47]
XX:6 Program Tailoring: Slicing by Sequential Criteria highlighted by getSystemInfo() and getSystemFile(), in the else branch. Second, any statement that is not included in Ptal is SCline 12-irrelevant for understanding the SC-specific behavior at line 12. Thus, Tailor can make, e.g., thin slicing more effective, by removing the SCline 12-irrelevant statements Pthin − Pthin ∩Ptal, i.e., lines 39 and 40, from Pthin. Then Pthin is down to only five statements at lines 2, 12, 16, 17 and 36. Starting from line 17 (an immediate cause for the error at line 12), the developer can trace the flow of isOpen backwards to find the original cause. Finally, Ptal includes all data and control dependences reaching line 12 affected by SCline 12, enabling it to be analyzed further by other analyses, e.g., a pointer analysis, as will be discussed in Section 6. However, Ptrad and Pthin will not be applicable since Ptrad is unobtainable scalably for large programs and Pthin is unsound. A … B C X X X Virtual Call Branch Intra-procedural Inter-procedural control flow control flow Figure 3 Leveraging the ordering information SC : A → B → C to trim irrelevant statements away. Note that Ptal still contains lines 7 – 8 that do not affect SCline 12. Removing all such irrelevant statements for large programs may be neither necessary (due to the first two points made earlier) nor practical, as a sound slicing tool would be unscalable. Thus, we have designed Tailor based on the precision/scalability tradeoffs described in Section 1. Figure 3 recaps our insight behind tailoring. Given a SC : A → B → C, we focus on the behavior at C affected by a sequential execution of A, B and C. If one point in SC is reached from only one branch (e.g., the one containing A) of a multi-way branching statement, then the statements in the other branches are SC-irrelevant and can be trimmed away ( ). This is particularly suitable for object-oriented languages, since a virtual call site behaves as a multi-way branch switching to its target methods. For example, B can be regarded as residing in a target method invoked at the marked virtual call on a receiver object created only at the allocation site at A. Thus, the other target methods unreachable to C are trimmed away ( ). 3 Methodology Figure 4 gives an overview of Tailor, with all the components implemented in this paper highlighted in blue. Given a program, we rely on the state-of-the-art (shown as “ICFG Construction”) to build an inter-procedural control flow graph (ICFG), denoted GICFG, to represent all the possible control flows in the program. A SC is simply a set of one or more statement sequences ending at the same statement, with all statements identified by their line numbers, i.e., program points only. The length of a SC is the number of statements in its longest sequence. SCs can be deduced from the results returned by many analysis tools such as API protocol analysis [7, 37] and typestate analysis [9, 16, 34]. For example, a typestate analysis may report a potential error at line C, f.write(), together with two invocation sequences of related methods, A : f.open() → B1 : f1.close() and A : f.open() → B2 : f2.close(), leading to line C. Therefore, we may choose to tailor the program at C to investigate its behavior affected by SC = {A → B1 → C, A → B2 → C}. Given a SC, a tailored program, T (SC), consists of the statements on all possible execution paths in GICFG passing through at least one statement sequence in SC. By exploiting the temporal ordering information in SC, it is possible to scale tailoring significantly better than slicing for large object-oriented programs and makes it a practically useful technique. Tailor is sound as it computes T (SC) over-approximately with respect to GICFG. In contrast, traditional (sound) slicing [21] is unscalable when GICFG is large and thin slicing [47]
Y.Li,T.Tan,Y.Zhang,and J.Xue XX:7 TAILOR Program rogram Analysis Tools SCEXT) SC SC-Based Data-Flow Analysis (SCDFA) Figure 4 Overview of TAILOR(with all the components implemented in this paper in blue). is unsound for GICFG (as it is designed for program debugging and understanding only). TAILOR computes T(SC)in two stages,SC Extension and SC-based Data-Flow Analysis. In the first stage(Section 3.2),we exploit"branch correlations"in object-oriented programs to lengthen a given SC in order to avoid some infeasible paths that would otherwise be introduced in the second stage.In the second stage(Section 3.1),we compute T(SC)by solving a data-flow problem in order to avoid unrealizable paths efficiently.This design allows TAILOR to achieve good precision and scale well to large object-oriented programs. 3.1 SC-Based Data-Flow Analysis We compute T(SC)by solving flow-and context-sensitively an IFDS(Interprocedural Finite Distributive Subset)data-flow problem [38],efficiently on GIcrG,via graph reachability. This formulation of our SC-based data-flow analysis,denoted SCDFA,is significant for three reasons.(1)With flow-sensitivity,SCDFA can filter out imprecisely ordered statement sequences in a SC,as many static analyses from which SCs are deduced are flow-insensitive in order to be scalable.Conversely,precisely ordered statement sequences in SC also enable SCDFA to filter out irrelevant control flows in tailoring a program.This mutually beneficial process improves the precision of both parts,therefore avoiding the unnecessary complexity faced for solving both as one problem.(2)With context-sensitivity,we avoid introducing unrealizable paths with mismatched calls and returns,which is critical for achieving precision in computing T(C).(3)With an IFDS formulation,SCDFA can scale well to reasonably large object-oriented programs.In particular,(full)context-sensitivity can be realized more efficiently by solving a CFL-reachability problem over a simplified balanced-parentheses language.As an IFDS problem,SCDFA has a time complexity of O(ED3),where E is the number of edges in GIcFG and D is the size of the set of SCrelated data-flow facts used(i.e., the set of suffixes of sequences in SC,as will be clear below and defined in Section 4). SCDFA starts with a bottom-up pass(BU)and finishes with a top-down pass(TD),with both performed(fully)context-sensitively.By using the example program from Figure 2, we first explain the functionalities of two passes and then examine briefy how context- sensitivity is realized efficiently in the IFDS framework.Suppose we are given a SC as line 2:fw new FileWriter()-line 27:out.close()-line 12:fw.write().For convenience,we write it as SC:n-cw.In its GIcFG,the allocation site for Filewriter at line 2 is replicated in the two constructors of Driver.We identify the one in the single-arg constructor as line 2(denoted by n)and the one in the non-arg constructor as line 2'(denoted byn.As a result,we finally have a two-sequence SCo={n→c→w,n'→c→w}. BU and TD Passes Both passes operate on GICFG,as shown in Figure 5,for our example. As in any data-flow analysis,all control flow edges in GICFG are treated non-deterministically executable.Let ENTRY be the entry node of GICFG and EXIT the node marking the point of interest w at line 12 in SC.Conceptually,if a sequence SSC,which is ncw or n'cw, lies on a path,then BU must see a suffix of S at every node n on the path backwards from EXIT and TD must see the corresponding prefix of S at n forwards from ENTRY. EC00P2016
Y. Li, T. Tan, Y. Zhang, and J. Xue XX:7 Extended SC Bottom-Up Pass Top-Down Pass TAILOR Tailored Program E.g., API Protocol Analysis Analysis Tools SC Extension Program ICFG Construction GICFG SC-Based Data-Flow Analysis (SCDFA) SC (SCEXT) Figure 4 Overview of Tailor (with all the components implemented in this paper in blue). is unsound for GICFG (as it is designed for program debugging and understanding only). Tailor computes T (SC) in two stages, SC Extension and SC-based Data-Flow Analysis. In the first stage (Section 3.2), we exploit “branch correlations” in object-oriented programs to lengthen a given SC in order to avoid some infeasible paths that would otherwise be introduced in the second stage. In the second stage (Section 3.1), we compute T (SC) by solving a data-flow problem in order to avoid unrealizable paths efficiently. This design allows Tailor to achieve good precision and scale well to large object-oriented programs. 3.1 SC-Based Data-Flow Analysis We compute T (SC) by solving flow- and context-sensitively an IFDS (Interprocedural Finite Distributive Subset) data-flow problem [38], efficiently on GICFG, via graph reachability. This formulation of our SC-based data-flow analysis, denoted SCDFA, is significant for three reasons. (1) With flow-sensitivity, SCDFA can filter out imprecisely ordered statement sequences in a SC, as many static analyses from which SCs are deduced are flow-insensitive in order to be scalable. Conversely, precisely ordered statement sequences in SC also enable SCDFA to filter out irrelevant control flows in tailoring a program. This mutually beneficial process improves the precision of both parts, therefore avoiding the unnecessary complexity faced for solving both as one problem. (2) With context-sensitivity, we avoid introducing unrealizable paths with mismatched calls and returns, which is critical for achieving precision in computing T (SC). (3) With an IFDS formulation, SCDFA can scale well to reasonably large object-oriented programs. In particular, (full) context-sensitivity can be realized more efficiently by solving a CFL-reachability problem over a simplified balanced-parentheses language. As an IFDS problem, SCDFA has a time complexity of O(ED3 ), where E is the number of edges in GICFG and D is the size of the set of SC-related data-flow facts used (i.e., the set of suffixes of sequences in SC, as will be clear below and defined in Section 4). SCDFA starts with a bottom-up pass (BU) and finishes with a top-down pass (TD), with both performed (fully) context-sensitively. By using the example program from Figure 2, we first explain the functionalities of two passes and then examine briefly how contextsensitivity is realized efficiently in the IFDS framework. Suppose we are given a SC as line 2 : fw = new FileWriter() → line 27 : out.close() → line 12 : fw.write(). For convenience, we write it as SC : n→c→w. In its GICFG, the allocation site for FileWriter at line 2 is replicated in the two constructors of Driver. We identify the one in the single-arg constructor as line 2 (denoted by n) and the one in the non-arg constructor as line 2 0 (denoted by n 0 ). As a result, we finally have a two-sequence SC w = {n→c→w, n0→c→w}. BU and TD Passes Both passes operate on GICFG, as shown in Figure 5, for our example. As in any data-flow analysis, all control flow edges in GICFG are treated non-deterministically executable. Let ENTRY be the entry node of GICFG and EXIT the node marking the point of interest w at line 12 in SC w. Conceptually, if a sequence S ∈ SC w, which is ncw or n 0 cw, lies on a path, then BU must see a suffix of S at every node n on the path backwards from EXIT and TD must see the corresponding prefix of S at n forwards from ENTRY. E COO P 2 0 1 6
XX:8 Program Tailoring:Slicing by Sequential Criteria TD BU BU Entry Node ......ncw.cw,w){ncw} 33) ncw .cwcw w.......... …{w}…{ncw】 ncw )--(ncw.cw.w) 34 new Driver new Driver 38 dw《ncw) n 2 new() new Filewiter 2 n {Gw…Gww} 35 dconfig0☐ getSystemFale 3w)(ncw) {cw…{cww} c 27 outclose() …w}《nCw {w}…{w}… 3 getConfiglngetSystemlnlo40 {w}w …《W}…{ncw} ao902- …《w}…{nCww) Non-SC Statame fw.wrRe(12 w -Inter-prooedural control fiow End Node☐ Figure 5 A simplified GIcFG of the program given in Figure 2 for illustrating the bottom-up (BU)and top-down(TD)passes of SCDFA with SC={n→c→w,n'→c→w} BU computes a global property,PANTI,for every node n in GICFc,backwards against the control flow,starting at EXIT.PANTIout(n)represents the set of suffixes of some sequences in SCw that are partially anticipable at the entry of n,i.e.,appear on some outgoing path of n ending at EXIT.In our example,PANTIout (ENTRY)={ncw,cw,w}.As ncw is partially anticipable(but n'cw is not)at ENTRY,GICFG contains some paths passing through ncw. TD computes a global property,PAVAlL,for every node n in GICFG,forwards along the control flow,starting from ENTRY.PAVAlLi(n)specifies the set of suffixes of some sequences in PANTIot(ENTRY)to represent implicitly (for efficiency reasons)the fact that their corresponding prefixes are partially available at the entry of n,i.e.,appear on some incoming paths of n starting from ENTRY.For example,PAVAlLi(12)={ncw,w},indicating that the prefixes e and nc of ncw are partially available at the entry of node 12. For this example,a node n is included in the tailored program if PAVAlLin(n)nPANTIout(n) since a suffix s E PAVAlLin(n)nPANTIout (n)is partially anticipable at the entry of n and some sequence in SC with s removed is partially available at the entry of n.In our example,the tailored program consists of all the lines except lines 38-40. Context Sensitivity Figure 6 illustrates as an example how the TD pass shown in Figure 5 is performed for FileWriter()context-sensitively by solving a CFL-reachability-based balanced-parentheses problem,efficiently call-o-start edge.( in the IFDS framework.For technical new Filewriter(2 details,we refer to 38.There are four 士 data-flow facts,ncw,cw,w,and 0 (for NTER FleWrer.cnnib the empty set).There are two call sites to FileWriter()at lines 2 and 2',which are identified as nodes n and n',respectively, to-return cda with their call-to-start and exit-to-return ●1 35 d.config0 edges labeled as(n:)n,(n and )n appro- ● priately.ENTER and EXIT are the start Figure 6 Context-sensitivity via CFL-reachability. and exit nodes of FileWriter(),whose CFG is irrelevant and thus elided.In SCDFA,the call-to-return edge always serves as a kill edge(to stop the data-flow facts from bypassing a callee).For each node n,PAVAlLin(n)is the set of facts,i.e.,suffixes of ncw shown in black dots.According to Figure 5,PAVAlLin(2)=PAVAlLin(2')={ncw).If FileWriter()
XX:8 Program Tailoring: Slicing by Sequential Criteria TD BU BU TD 36 getConfigInfo() new Driver() 38 33 if (...) getSystemFile() 39 getSystemInfo() 40 End Node 34 new Driver(...) Entry Node 35 d.config() fw.write() 12 2 new FileWriter() new FileWriter() 2' 27 out.close() d.log() 42 { } w { } w { } w { } w { } w { } w { } w { } w { } w { } w { } cw { } cw { } ncw { } ncw { } ncw { } ncw, cw, w { } ncw, cw, w { } ncw, cw, w Non-SC Statement Inter-procedural control flow SC Statement Intra-procedural control flow { } ncw { } ncw { } ncw { } ncw { } ncw { } ncw, w { } cw, w { } cw, w n n' c w Figure 5 A simplified GICFG of the program given in Figure 2 for illustrating the bottom-up (BU) and top-down (TD) passes of SCDFA with SC w = {n→c→w, n0→c→w}. BU computes a global property, PANTI, for every node n in GICFG, backwards against the control flow, starting at EXIT. PANTIout(n) represents the set of suffixes of some sequences in SC w that are partially anticipable at the entry of n, i.e., appear on some outgoing path of n ending at EXIT. In our example, PANTIout(ENTRY) = {ncw, cw, w}. As ncw is partially anticipable (but n 0 cw is not) at ENTRY, GICFG contains some paths passing through ncw. TD computes a global property, PAVAIL, for every node n in GICFG, forwards along the control flow, starting from ENTRY. PAVAILin(n) specifies the set of suffixes of some sequences in PANTIout(ENTRY) to represent implicitly (for efficiency reasons) the fact that their corresponding prefixes are partially available at the entry of n, i.e., appear on some incoming paths of n starting from ENTRY. For example, PAVAILin(12) = {ncw, w}, indicating that the prefixes and nc of ncw are partially available at the entry of node 12. For this example, a node n is included in the tailored program if PAVAILin(n)∩PANTIout(n) 6= ∅, since a suffix s ∈ PAVAILin(n) ∩ PANTIout(n) is partially anticipable at the entry of n and some sequence in SC w with s removed is partially available at the entry of n. In our example, the tailored program consists of all the lines except lines 38 – 40. Context Sensitivity Figure 6 illustrates as an example how the TD pass shown in Figure 5 is performed for FileWriter() context-sensitively by solving a CFL-reachability-based ( n ( n' ) n' ) n 35 d.config() getSystemFile() 39 2 new FileWriter() new FileWriter() 2' FileWriter.<init> ENTER 0 ncw cw w FileWriter.<init> EXIT 0 ncw cw w 0 ncw cw w 0 ncw cw w 0 ncw cw w 0 ncw cw w call-to-return edge call-to-start edge exit-to-return edge … … n n' Figure 6 Context-sensitivity via CFL-reachability. balanced-parentheses problem, efficiently in the IFDS framework. For technical details, we refer to [38]. There are four data-flow facts, ncw, cw, w, and 0 (for the empty set). There are two call sites to FileWriter() at lines 2 and 2 0 , which are identified as nodes n and n 0 , respectively, with their call-to-start and exit-to-return edges labeled as (n, )n, (n0 and )n0 appropriately. ENTER and EXIT are the start and exit nodes of FileWriter(), whose CFG is irrelevant and thus elided. In SCDFA, the call-to-return edge always serves as a kill edge (to stop the data-flow facts from bypassing a callee). For each node n, PAVAILin(n) is the set of facts, i.e., suffixes of ncw shown in black dots. According to Figure 5, PAVAILin(2) = PAVAILin(20 ) = {ncw}. If FileWriter()
Y.Li,T.Tan,Y.Zhang,and J.Xue XX:9 is entered from(n and exited from )n,then PAVAILin(ENTER)=PAVAILin(EXIT)={cw}. Hence,PAVAILiM(35)={cw}.However,if FileWriter()is entered from(n and exited from )n,then PAVAILiM(ENTER)=PAVAILi(EXIT)=(ncw}.Hence,PAVAILin(39)={ncw} 3.2 SC Extension TAILOR is designed to work effectively with any SC.While an API specification mining tool [17]may generate SCs longer than 10,an assertion verifier may produce only single-point SCs.In general,the longer a SCis,the more SC-irrelevant statements will be eliminated. To improve the precision of SCDFA,we perform first a SC extension pass,denoted SCEXT, as shown in Figure 4.Given a sequence SE SC,with F being its first point,we look for a set of n extension points E,..,En,that collectively dominate F,such that any program execution that passes through F must pass through one Ei.We do so effectively by leveraging the concept of object-sensitivity [33,44]developed by the pointer analysis community for Java. For F,its extension points are chosen as the object allocation sites used for representing the object-sensitive calling contexts for the method containing F.As a result,some infeasible paths are avoided by exploiting branch correlations in object-oriented programs(Figure 3). To make SC extensions,we use the points-to information obtained by,e.g.,the pointer analysis performed earlier for building GICFG.Consider an ICFG fragment discussed earlier in Figure 3.Suppose we are given SC:B-C such that B resides in a target method m for the virtual call site shown and m is only invokable on the objects created at A,which represents an object allocation site.Then we can prepend A to SC to obtain ESC:A-B-C. According to SCDFA,T(SC)will include both branches of "Branch"shown in Figure 3 since both reach SC but T(ESC)will include only the branch containing A as the other(infeasible) branch does not reach ESC.However,lengthening a SC increases the number of SCrelated data-flow facts used (Figure 5).So a precision/scalability tradeoff needs to be made. 】class PMD( void main(Stringll args){ 1ass CMLOpon Renderer createRenderer ( 3 Ext 3 CMLOptions opts=new CMLOptions(args): f(…) Renderer renderer opts.createRenderer(; eturn new EmacsRendererO; renderer..end0:于 17 else if(...) 18 return new HTMLRenderer(); 6 class AbstractRenderer...{ else if (... void end0【t 20 Ext 2 return new SummaryHTMLRenderer(): render(writer,.:十 122 9 class SummaryHTMLRenderer...{ 10 void render (Writer writer)f 23 dass HTMLRenderer 11 Ext 1 renderer=new HTMLRer voi render (Writer writer....){ 12 renderer..renderBody(writer,.-于十 5 riter.write():) Figure 7 A code snippet from PMD for illustrating SCEXT in a program understanding task. We use a real program understanding task to show why SCEXT is useful for real code.PMD is a static analyzer for analyzing Java programs and can print a range of source code flaws in different formats such as Emacs,CSV,and HTML.In this task,we want to understand how PMD renders its outputs in the commonly used HTML format.Figure 7 shows the simplified code.The only knowledge we have initially is that the HTMLRenderer class is responsible for writing to the file at line 25,but how it is done is unknown.At this stage,we have a single-point,SCline25:line 25,representing just the write statement at line 25. If we apply SCDFA to SCline 25,T(SCline 25)will include all the lines in the code given. Below we show how to extend this to ESCline 25 line 20line 11-line 25 object- sensitively [33,44].If we apply SCDFA to ESCline 25,T(ESCline25)will now be smaller, consisting of all the lines except lines 16,18 and 22 in method createRenderer().In other words,all the branches except the one enclosing line 20 in method createRenderer()are EC00P2016
Y. Li, T. Tan, Y. Zhang, and J. Xue XX:9 is entered from (n and exited from )n, then PAVAILin(ENTER) = PAVAILin(EXIT) = {cw}. Hence, PAVAILin(35) = {cw}. However, if FileWriter() is entered from (n0 and exited from )n0 , then PAVAILin(ENTER) = PAVAILin(EXIT) = {ncw}. Hence, PAVAILin(39) = {ncw}. 3.2 SC Extension Tailor is designed to work effectively with any SC. While an API specification mining tool [17] may generate SCs longer than 10, an assertion verifier may produce only single-point SCs. In general, the longer a SC is, the more SC-irrelevant statements will be eliminated. To improve the precision of SCDFA, we perform first a SC extension pass, denoted SCEXT, as shown in Figure 4. Given a sequence S ∈ SC, with F being its first point, we look for a set of n extension points E1, · · · , En, that collectively dominate F, such that any program execution that passes through F must pass through one Ei . We do so effectively by leveraging the concept of object-sensitivity [33,44] developed by the pointer analysis community for Java. For F, its extension points are chosen as the object allocation sites used for representing the object-sensitive calling contexts for the method containing F. As a result, some infeasible paths are avoided by exploiting branch correlations in object-oriented programs (Figure 3). To make SC extensions, we use the points-to information obtained by, e.g., the pointer analysis performed earlier for building GICFG. Consider an ICFG fragment discussed earlier in Figure 3. Suppose we are given SC : B → C such that B resides in a target method m for the virtual call site shown and m is only invokable on the objects created at A, which represents an object allocation site. Then we can prepend A to SC to obtain ESC : A → B → C. According to SCDFA, T (SC) will include both branches of “Branch” shown in Figure 3 since both reach SC but T (ESC) will include only the branch containing A as the other (infeasible) branch does not reach ESC. However, lengthening a SC increases the number of SC-related data-flow facts used (Figure 5). So a precision/scalability tradeoff needs to be made. class PMD { CMLOptions opts = new CMLOptions(args); void main (String[] args) { Renderer renderer = opts.createRenderer(); } } 1 2 3 4 5 9 23 24 25 13 14 … … renderer.end(); class CMLOptions { if ( … ) { Renderer createRenderer () { return new EmacsRenderer(); } else if ( … ) return new HTMLRenderer(); } else if ( … ) return new SummaryHTMLRenderer(); } else if ( … ) } class SummaryHTMLRenderer ... { renderer = new HTMLRenderer(...); void render (Writer writer, … ) { renderer.renderBody(writer, ...); class HTMLRenderer ... { writer.write(...); void renderBody (Writer writer, … ) { } } } } 15 16 17 18 19 20 21 22 } } class AbstractRenderer ... { render(writer, ... ); void end () { } } 6 7 8 10 11 12 Ext 1 Ext 2 Ext 3 Figure 7 A code snippet from PMD for illustrating SCEXT in a program understanding task. We use a real program understanding task to show why SCEXT is useful for real code. PMD is a static analyzer for analyzing Java programs and can print a range of source code flaws in different formats such as Emacs, CSV, and HTML. In this task, we want to understand how PMD renders its outputs in the commonly used HTML format. Figure 7 shows the simplified code. The only knowledge we have initially is that the HTMLRenderer class is responsible for writing to the file at line 25, but how it is done is unknown. At this stage, we have a single-point, SCline 25: line 25, representing just the write statement at line 25. If we apply SCDFA to SCline 25, T (SCline 25) will include all the lines in the code given. Below we show how to extend this to ESCline 25 : line 20 → line 11 → line 25 objectsensitively [33, 44]. If we apply SCDFA to ESCline 25, T (ESCline 25) will now be smaller, consisting of all the lines except lines 16, 18 and 22 in method createRenderer(). In other words, all the branches except the one enclosing line 20 in method createRenderer() are E COO P 2 0 1 6