1:16 Y.Zhou et al. 因月 用 】 是n trntNrE 川 走 +3 围 用 超 1 ACM Transactions on Software Engineering and Methodology,Vol.27,No. 1,Article 1.Pub.date:April 2018
1:16 Y. Zhou et al. Table 2. Continued Project characteristics Key modeling components (challenges) covered Performance evaluation context Study Year Topic #Source projects (releases) #Target projects (releases) Languages in source and target projects Privatize data Homogenize features Filter instances Balance classes Transform distributions Select features Target training data Application scenario Main performance indicators Test data avail? Against SSM? Select for comp.? Turhan et al. [105] 2009 Fine-grained prediction 10 25 C/C++/Java Yes Ranking E2(70) Not No Cruz et al. [10] 2009 Distribution transformation 1 2 Java Yes Yes Classification Z* Not No Khoshgoftaar et al.[48] 2009 Multi-dataset classifier ensemble 7 7 C/C++/Java Yes Yes Classification NECM Partial Yes Watanabe et al. [110] 2008 Applicability of cross languages 2 2 C++/Java Yes Classification Recall, Precision Not No Nagappan et al. [74] 2006 Utility of complexity metrics 5 5 C++/C# Yes Yes Ranking S Not No Nagappan et al. [73] 2006 Utility of process/product metrics 1 1 C/C++ Yes Yes Ranking S Not No Thongmak et al. [103] 2003 Utility of design metrics 1 1 N/A Yes Classification Accuracy Not No Briand et al. [9] 2002 Applicability of cross projects 2 2 Java Yes Both Corr., Comp., CECM Not Yes No TL: transfer Learning, CIL: class imbalance learning, SSM: simple size model. RCEC: represents the raw cost-effectiveness curve in a module-based Alberg diagram, in which the x-axis is the cumulative modules inspected and the y-axis is the cumulative defects found. CECM: Cost-effective curve in a module-based Alberg diagram (the x-axis is the cumulative number of the modules selected from the module ranking and the y-axis is the cumulative number of defects found in the selected modules). ACM Transactions on Software Engineering and Methodology, Vol. 27, No. 1, Article 1. Pub. date: April 2018
How Far We Have Progressed in the Journey?An Examination of CPDP 1:17 CPDP models were more cost-effective than simple module size models.However,the statistical significance and effect sizes were not examined.Currently,for most of the existing CPDP models, it is unclear whether they have a prediction performance superior to simple module size models. These observations reveal that much effort has been devoted to developing supervised CPDP models.However,little effort has been devoted to examining whether they are superior to simple module size models.It is important for researchers and practitioners to know the answer to this problem.If the magnitude of the difference were trivial,then simple module size models would be preferred in practice due to the low building and application cost.To answer this problem,we next compare the prediction performance of the existing supervised CPDP models with simple module size models. 3 EXPERIMENTAL DESIGN In this section,we first introduce the simple module size models under study.Then,we present the research questions relating simple module size models to the supervised CPDP models.After that,we describe the data analysis method used to investigate the research questions.Finally,we report the datasets used. 3.1 Simple Module Size Models In this study,we leverage simple module size metrics such as SLOC in the target release to build simple module size models.As stated by Monden et al.[67],to adopt defect prediction models in industry,one needs to consider not only their prediction performance but also the significant cost required for metrics collection and modeling themselves.A recent investigation from Google developers shows that a prerequisite for deploying a defect prediction model in a large company such as Google is that it must be able to scale to large source repositories [56].Therefore,our study only considers those simple models that have a low building cost,a low application cost, and a good scalability.More specifically,we take into account the following two simple module size models:ManualDown and ManualUp.For the simplicity of presentation,let m be a module in the target release,SizeMetric be a module size metric,and R(m)be the predicted risk value of the module m.Formally,the ManualDown model is R(m)=SizeMetric(m),while the ManualUp model is R(m)=1/SizeMetric(m).For a given target release,ManualDown considers a larger module as more defect-prone,as many studies report that a larger module tends to have more defects [65]. However,ManualUp considers a smaller module as more defect-prone,as recent studies argue that a smaller module is proportionally more defect-prone and hence should be inspected/tested first [49-51,65] Under ManualDown or ManualUp,it is possible that two modules have the same predicted risk values,i.e.,they have a tied rank.In our study,if there is a tied rank according to the predicted risk values,the module with a lower defect count will be ranked higher.In this way,we will obtain sim- ple module size models that have the "worst"predictive performance(theoretically).If the experi- mental results show that those "worst"simple module size models are competitive with the existing CPDP models,then we can safely conclude that,for practitioners,it would be better to apply simple module size models to predict defects in a target release.Note that,Canfora et al.used Trivialine and TrivialDec as the baseline models to investigate the performance of their proposed CPDP models under the ranking scenario [12].Conceptually,Trivialine is the same as ManualUp,while Trivialpec is the same as ManualDown.However,there are two important differences in the imple- mentation.First,TrivialIne and Trivialpec are applied to the z-score normalized module size data, while ManualUp and ManualDown are applied to the raw/unhandled module size data.Second,in Canfora et al.'s study,they did not report how to process the tied rank in Trivialine and TrivialDec. ACM Transactions on Software Engineering and Methodology,Vol.27.No.1,Article 1.Pub.date:April 2018
How Far We Have Progressed in the Journey? An Examination of CPDP 1:17 CPDP models were more cost-effective than simple module size models. However, the statistical significance and effect sizes were not examined. Currently, for most of the existing CPDP models, it is unclear whether they have a prediction performance superior to simple module size models. These observations reveal that much effort has been devoted to developing supervised CPDP models. However, little effort has been devoted to examining whether they are superior to simple module size models. It is important for researchers and practitioners to know the answer to this problem. If the magnitude of the difference were trivial, then simple module size models would be preferred in practice due to the low building and application cost. To answer this problem, we next compare the prediction performance of the existing supervised CPDP models with simple module size models. 3 EXPERIMENTAL DESIGN In this section, we first introduce the simple module size models under study. Then, we present the research questions relating simple module size models to the supervised CPDP models. After that, we describe the data analysis method used to investigate the research questions. Finally, we report the datasets used. 3.1 Simple Module Size Models In this study, we leverage simple module size metrics such as SLOC in the target release to build simple module size models. As stated by Monden et al. [67], to adopt defect prediction models in industry, one needs to consider not only their prediction performance but also the significant cost required for metrics collection and modeling themselves. A recent investigation from Google developers shows that a prerequisite for deploying a defect prediction model in a large company such as Google is that it must be able to scale to large source repositories [56]. Therefore, our study only considers those simple models that have a low building cost, a low application cost, and a good scalability. More specifically, we take into account the following two simple module size models: ManualDown and ManualUp. For the simplicity of presentation, let m be a module in the target release, SizeMetric be a module size metric, and R(m) be the predicted risk value of the modulem. Formally, the ManualDown model is R(m) = SizeMetric(m), while the ManualUp model is R(m) = 1/SizeMetric(m). For a given target release, ManualDown considers a larger module as more defect-prone, as many studies report that a larger module tends to have more defects [65]. However, ManualUp considers a smaller module as more defect-prone, as recent studies argue that a smaller module is proportionally more defect-prone and hence should be inspected/tested first [49–51, 65]. Under ManualDown or ManualUp, it is possible that two modules have the same predicted risk values, i.e., they have a tied rank. In our study, if there is a tied rank according to the predicted risk values, the module with a lower defect count will be ranked higher. In this way, we will obtain simple module size models that have the “worst” predictive performance (theoretically). If the experimental results show that those “worst” simple module size models are competitive with the existing CPDP models, then we can safely conclude that, for practitioners, it would be better to apply simple module size models to predict defects in a target release. Note that, Canfora et al. used TrivialInc and TrivialDec as the baseline models to investigate the performance of their proposed CPDP models under the ranking scenario [12]. Conceptually, TrivialInc is the same as ManualUp, while TrivialDec is the same as ManualDown. However, there are two important differences in the implementation. First, TrivialInc and TrivialDec are applied to the z-score normalized module size data, while ManualUp and ManualDown are applied to the raw/unhandled module size data. Second, in Canfora et al.’s study, they did not report how to process the tied rank in TrivialInc and TrivialDec . ACM Transactions on Software Engineering and Methodology, Vol. 27, No. 1, Article 1. Pub. date: April 2018