Anda di halaman 1dari 87

The Economics of Fault Prediction

Submitted in partial fulllment of the requirements for the degree of


Master of Technology
by
Deepak Banthia
(1010102)
under the guidance of
Dr. Atul Gupta
Computer Science & Engineering
INDIAN INSTITUTE OF INFORMATION TECHNOLOGY,
DESIGN AND MANUFACTURING JABALPUR, INDIA
2012
Approval Sheet
This thesis entitled The Economics of Fault Prediction submitted by
Deepak Banthia (1010102) is approved for partial fulllment of the re-
quirements for the degree of Master of Technology in Computer Science and
Engineering.
Examining Committee
................................................
................................................
................................................
Guide
................................................
................................................
................................................
Chairman
................................................
Date .......................... ................................................
Place ......................... ................................................
Certificate
This is to certify that the work contained in the thesis entitled, The Economics
of Fault Prediction, submitted by Deepak Banthia (Roll No. 1010102) in
partial fulllment of the requirements for the degree of Master of Technology in
Computer Science and Engineering, has been carried out under my supervision and
that this work has not been submitted elsewhere.
(Atul Gupta) ............ , 2012
Associate Professor
Computer Science & Engineering Discipline
Indian Institute of Information Technology, Design and Manufacturing Jabalpur
Jabalpur, India.
Acknowledgments
This thesis would not have been possible without the sincere help and contri-
butions of several people. I would like to use this opportunity for expressing
my sincere gratitude to them.
Firstly I would like to thank god with whose blessing I could turn my idea
into reality. I express my deep sense of gratitude towards my mentor and
thesis supervisor Dr Atul Gupta for his valuable guidance, moral support and
constant encouragement throughout the thesis. His approach towards software
engineering will always be a valuable learning experience for me. No words can
express my feelings towards him for taking such a keen interest in my academics
and personal welfare. His dedication, professionalism and hard work has been
and shall be a source of inspiration throughout my life.
The contributions of a mother to the success of her child can be neither mea-
sured nor directly repaid. To such a mother, who is but a manifestation of
the divine virtues of the Earth, this report is one petite oering. Thank you
parents for all the liberty, prosperity, condence and discipline showered on
me.This thesis would not have been completed without the motivation and
blessing of my parents. My ance (Nisha) , brought a light inside me and al-
ways lled me with enthusiasm and vigour to do my jobs with complete eort
and dedication. Thanks to her for accompanying me in all the way and for
uninching help and support for all my endeavours. I would like to thank, my
uncle Mr. Hem Kumar Banthia and Mr. Khagendra Kumar Banthia for their
encouragement throughout my studies.Along with them, I also receive energy
and motivation from my sisters for my career. I would also like to give my
sincere thanks to Mr. Amaltas Khan, Mr. Arpit Gupta, Mr. Ravindra Singh,
Mr. Santosh Singh Rathore and Mr. Saurabh Tiwari for their support and
being there always, no matter what.
I thank the CSE fraternity at IIITDM Jabalpur and my special thanks to my
batch mates.
Jabalpur Deepak Banthia
..........., 2012
IV
Abstract
Fault-prediction techniques aim to predict fault prone software modules in
order to streamline the eorts to be applied in the later phases of software
development. Normally the eectiveness of a fault-prediction technique is
demonstrated by educating it over a part of some known fault data and mea-
suring its performance against the other part of the fault data. There have
been many eorts comparing the performance of various fault-prediction tech-
niques on dierent project datasets. However, invariably most of these studies
have also recorded high misclassication rate (normally, 15 to 35%), besides
not so high accuracy gures (normally, 70 to 85%). This raises serious concerns
about the viability of these techniques. In this thesis, we rst present a brief
summary of the results of some of the earlier studies undertaken in fault pre-
diction and argue about their usefulness. As a follow up, we then investigate
two important and related research questions regarding the viability of fault
prediction. First, for a given project, are the fault prediction results useful? In
case of an armative answer, then we look for how to choose a fault-prediction
technique for an overall improved performance in terms of cost-eectiveness.
Here, we propose an adaptive cost evaluation framework that incorporates cost
drivers for various fault removal phases, and performs a cost-benet analysis
for the misclassication of faults. We then used this framework to investigate
the usefulness of various fault prediction techniques in two dierent settings.
The rst part of the investigation consisted of performance evaluation of ve
major fault-prediction techniques on nineteen public datasets. Here, we found
fault prediction useful for the projects with percentage of faulty modules less
than a certain threshold, and there was no single technique that could provide
the best results in all cases i.e for all nineteen project datasets. In the other
part of the investigation study, and as a practical use of the proposed frame-
work, we have demonstrate that the fault information of the previous versions
of the software can be eectively used to predict fault proneness in the cur-
rent version of the software. Here, we found the fault prediction useful when
the dierence between inter-version fault rate was below a certain threshold.
Also, the usability of fault prediction found to be reduced with the increase of
inter-version fault rate.
VI
Contents
Approval I
Certicate II
Acknowledgments III
Abstract V
List of Figures IX
List of Tables X
List of Symbols XII
Abbreviations XIII
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 6
2.1 Fault Prediction Models . . . . . . . . . . . . . . . . . . . . . 6
2.2 Public Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Numerical measures . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Graphical evaluation measures . . . . . . . . . . . . . . 12
2.4 Fault Prediction Studies . . . . . . . . . . . . . . . . . . . . . 13
2.5 Estimating Cost of Fault Prediction . . . . . . . . . . . . . . . 16
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Fault Prediction Results: How useful They Are? 20
3.1 Issues in Fault Prediction . . . . . . . . . . . . . . . . . . . . 20
3.2 A Proposed Model for Evaluating Fault Prediction Eciency . 21
3.2.1 General arguments . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Evaluation model . . . . . . . . . . . . . . . . . . . . . 23
3.3 Revisiting Fault Prediction Results . . . . . . . . . . . . . . . 24
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 A Cost Evaluation Framework 29
4.1 The Evaluation Framework . . . . . . . . . . . . . . . . . . . . 30
4.2 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Experiment execution . . . . . . . . . . . . . . . . . . 34
4.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.4 Experiment ndings . . . . . . . . . . . . . . . . . . . 43
4.2.5 Threats to validity . . . . . . . . . . . . . . . . . . . . 45
4.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 An Application of Cost Evaluation Framework for Multiple
Releases 50
5.1 The Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Experiment execution . . . . . . . . . . . . . . . . . . 54
5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.4 Threats to validity . . . . . . . . . . . . . . . . . . . . 59
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Conclusions and Future Work 61
References 63
Publications 70
Index 71
VIII
List of Figures
1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 Cost statistics for faulty modules . . . . . . . . . . . . . . . . . 22
3.2 Cost statistics for non-faulty modules . . . . . . . . . . . . . . . 22
4.1 Decision chart representation to evaluate the estimated Ecost . 36
4.2 Value of NEcost for category 1 when
u
= 0.25 and
s
= 0.5 . 38
4.3 Value of NEcost for category 2 when
u
= 0.25 and
s
= 0.5 . 41
4.4 Value of NEcost for category 3 when
u
= 0.25 and
s
= 0.5 . 43
4.5 Cost characteristics of used fault-prediction techniques when
u
= 0.5 and
s
= 0.65 . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Cost characteristics of used fault-prediction techniques when
u
= 0.25 and
s
= 0.5 . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Cost characteristics of used fault-prediction techniques when
u
= 0.15 and
s
= 0.25 . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Decision chart representation to evaluate the estimated Ecost . 52
5.2 Value of Ecost for Jedit versions when
u
= 0.25 and
s
= 0.5 59
List of Tables
2.1 Datasets used in the study . . . . . . . . . . . . . . . . . . . . 9
2.2 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Fault Prediction Studies . . . . . . . . . . . . . . . . . . . . . 13
3.1 NASA datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Experiment results for dataset CM1 . . . . . . . . . . . . . . . 26
3.3 Experiment results for dataset kc1 . . . . . . . . . . . . . . . . 26
3.4 Experiment results for dataset kc2 . . . . . . . . . . . . . . . . 27
3.5 Experiment results for dataset pc1 . . . . . . . . . . . . . . . 27
4.1 Removal costs of test techniques (in sta-hours per defect) [52] 30
4.2 Fault identication eciencies of dierent test phases [26] . . . 31
4.3 Used projects from NASA [1] and PROMISE data repository [1] 34
4.4 Categorization of projects based on the fraction of faulty modules 34
4.5 Result of experiment for PC1 (1109) . . . . . . . . . . . . . . 37
4.6 Result of experiment for AR1 (121) . . . . . . . . . . . . . . . 37
4.7 Result of experiment for NW1 (403) . . . . . . . . . . . . . . . 37
4.8 Result of experiment for KC3 (458) . . . . . . . . . . . . . . . 38
4.9 Result of experiment for CM1 (498) . . . . . . . . . . . . . . . 38
4.10 Result of experiment for PC3 (1563) . . . . . . . . . . . . . . 39
4.11 Result of experiment for ARC (234) . . . . . . . . . . . . . . . 39
4.12 Result of experiment for PC4 (1458) . . . . . . . . . . . . . . 39
4.13 Result of experiment for KC1 (2109) . . . . . . . . . . . . . . 40
4.14 Result of experiment for AR4 (107) . . . . . . . . . . . . . . . 40
4.15 Result of experiment for JM1 (10885) . . . . . . . . . . . . . . 40
4.16 Result of experiment for KC2 (522) . . . . . . . . . . . . . . . 41
4.17 Result of experiment for Camel 1.6 (858) . . . . . . . . . . . . 41
4.18 Result of experiment for Ant 1.6 (351) . . . . . . . . . . . . . 42
4.19 Result of experiment for Ant 1.7 (493) . . . . . . . . . . . . . 42
4.20 Result of experiment for MC2 (161) . . . . . . . . . . . . . . . 42
4.21 Result of experiment for J-edit 3.2 (272) . . . . . . . . . . . . 42
4.22 Result of experiment for Lucene 2.0 (195) . . . . . . . . . . . 43
4.23 Result of experiment for J-edit 4.0 (274) . . . . . . . . . . . . 43
5.1 Used projects from PROMISE data repository [2] . . . . . . . 53
5.2 Prediction results for Ant 1.6 . . . . . . . . . . . . . . . . . . 55
5.3 Prediction results for Ant 1.7 when fault prediction model trained
using Ant 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Results of experiment to calculate the Ecost for Ant 1.7 using
information of Ant 1.6 . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Prediction results for Jedit4.0 (3 cross-validation) . . . . . . . 57
5.6 Results of experiment to calculate the Ecost for Jedit4.1 using
information of Jedit4.0 . . . . . . . . . . . . . . . . . . . . . . 57
5.7 Prediction results for Jedit4.0 and Jedit4.1 (3 cross-validation) 57
5.8 Results of experiment to calculate the Ecost for Jedit4.2 using
information of Jedit4.0 and 4.1 . . . . . . . . . . . . . . . . . 58
5.9 Prediction results for Jedit4.0, Jedit4.1 and Jedit4.2. (3 cross-
validation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.10 Results of experiment to calculate the Ecost for Jedit4.3 using
information of Jedit4.0, 4.1 and 4.2 . . . . . . . . . . . . . . . 58
A1 Details of used metrics . . . . . . . . . . . . . . . . . . . . . . 71
A2 Metrics used in datasets . . . . . . . . . . . . . . . . . . . . . 72
XI
List of Symbols
C
f
Normalized fault removal cost in eld
C
i
Initial setup cost of used fault prediction approach
C
s
Normalized fault removal cost in system testing
C
u
Normalized fault removal cost in unit testing
M
p
Percentage of modules unit tested

s
Fault identication eciency of system testing

u
Fault identication eciency of unit testing
Abbreviations
Acc Accuracy
AUC Area Under the Curve
Ecost Estimated Fault Removal Cost of the software when we
use fault prediction
EFN Estimated number of False Positives
EFP Estimated number of False Positives
ETP Estimated number of True Positives
FN False Negative
FNR False Negative Rate
FP False Positive
FPR False Positive Rate
NEcost Normalized Estimated fault removal cost of the software
when we use fault prediction
NPV Negative Predictive Value
PD Probability of Detection
PF Probability of False Alarm
PPV Positive Predictive Value
PR Precision
Tcost Estimated fault removal cost of the software without the
use fault prediction
TN True Negative
TP True Positive
Chapter 1
Introduction
Software fault prediction has become an important area of research in the arena
of Software Development Life Cycle. It has the potential to aid in ensuring
the desired software quality as well as to achieve an economic development
process. The potential of fault prediction is backed by its ability to identify
the fault prone software modules before the actual testing process begins. This
helps in obtaining desired software quality in optimum time, with optimized
cost and eort.
Most of the major development organizations spend a lot of time and eorts
on the research in the eld of quality assurance activities. But the practical
usage of fault prediction is equivocal. It indicates that there is a need of
further research in this eld that would emphasize on how it is applicable in
the quality assurance process.
1.1 Motivation
Software quality assurance process focuses on the identication and removal of
faults quickly from the artifacts that are generated and subsequently used in
the development of software. Fault prediction can help in this by identifying
the fault-prone modules in the early stages of development life cycle, which,
then can lead to a more streamlined eort to be applied. The fault-proneness
information not only points to the need for increased quality monitoring during
the development but also provides an important advice to undertake suitable
verication and validation activities that eventually lead to improve the eec-
1.1 Motivation 2
tiveness and eciency of the fault nding process.
Fault prediction is a process to predict the fault prone software modules
without executing them. Conventionally, fault prediction is done by apply-
ing machine-learning techniques over project datasets. The eectiveness of
a fault-prediction technique is demonstrated by educating it over a part of
some known fault data and measuring its performance against the other part
of the fault data. Recently, several software project data-repositories became
publicly available such as NASA Metrics Data Program [1] and PROMISE
Data Repository [2]. Availability of these public datasets has encouraged
undertaking more investigations and their replications. A wide range of fault-
prediction techniques has been applied to demonstrate their eectiveness on
these datasets [19][8][28][49][38].
However, there are certain crucial issues, which are required to be resolved
before the results of such prediction can be incorporated in practice. An
important concern is related with the lack of suitable performance evaluation
measures that would assess the economics of fault prediction if adopted in soft-
ware development process [6]. Another concern is about the typical prediction
accuracy of a fault-prediction technique, which is found to be considerably
low, ranging 70-85 percent [32][19][20], compared to the high accuracy results
obtained in other elds like image recognition, spam lters, etc. Yet another
concern can be attributed to the unequal distribution of fault data that may
lead to biased learning. We know from our experience that fault distributions
typically emulate Pareto principle, and hence, the accuracy gures obtained
from fault prediction can be grossly misleading, as a fault-prediction technique
can produce high accuracy results by mostly classifying non-faulty modules as
non-faulty.
The key functionality of fault prediction is to identify the highest possible
number of faults with the least possible resources. However, the concerns
mentioned above, in fact, pose serious threats for the fault prediction results to
be used to streamline quality assurance activities undertaken during software
development. We need to investigate further, what these results mean and
whether they can be used economically in the software development process.
The Economics of Fault Prediction
1.2 Objectives 3
1.2 Objectives
The main objective of this thesis work is to propose a cost evaluation frame-
work that helps to put the results of a fault-prediction technique in proper
perspective. If the results of the fault prediction are to be used in the devel-
opment process, the framework can provide an estimate of the saving in the
eorts applied in subsequent phases of the software development. Specically,
we aim to answer that for a given project dataset, whether fault prediction
would help. And if yes, then how to choose a fault-prediction technique that
would yield the optimal results.
With this dissertation, we will investigate:
Q1: For a given project, whether fault prediction would economically help in
software development?
Q2: If yes, then how to select a fault-prediction technique for overall optimum
performance?
1.3 Thesis Organization
The overall structure of this thesis can be illustrated as shown in Figure 1.1.
The content can broadly be divided into three major sections, namely Back-
ground Research, Research Contribution and Research Prospects.
Figure 1.1: Thesis structure
.
Chapter 2 summarizes the concepts which are relevant to the study. In particu-
lar, Fault prediction models, details of public datasets used in our experimental
The Economics of Fault Prediction
1.3 Thesis Organization 4
study, model evaluation techniques and literature review of previous related
studies are given in this chapter.
In chapter 3, we present an insight towards the economy of fault prediction. In
particular, we rst revisited the results of some of the previous fault prediction
studies on the basis of the economics of fault. Then, we rene the criteria based
on fault misclassication, and again measure the performance of the above-
said fault-prediction techniques on the basis of cost eectiveness. We have used
four NASA MDP datasets to perform our study. Here, our results suggested
that simple techniques like IBK perform better over most of the datasets.
In chapter 4, we have proposed a cost evaluation framework that can help
to answer both of the questions, using limited fault data. Essentially, the
framework can provide an estimate of the saving in the eorts applied by
using the results of the fault prediction in subsequent phases of the software
development. To construct the cost evaluation framework, we accounted for
typical fault removal cost of dierent testing phases [52], along with their fault
identication eciency [26]. The rst question can be answered by comparing
the fault removal cost in both the cases, i.e. with or without use of fault
prediction.
Here, we investigated the usefulness of fault-prediction techniques based on the
proposed framework by using limited fault data. The investigation consisted
of performance evaluation of ve major fault-prediction techniques on nineteen
public datasets. Here, we have used ve well-known fault-prediction techniques
namely Random Forest, J48 (C4.5 decision tree), Neural Network, K-means
Clustering and IBK (K-nearest neighbors). These datasets provide a wide
range of percentage faulty modules (varying from 7 to 49 percentages). We
categorized these datasets based on the fault information into three categories.
We have used WEKA machine learning tool to perform all listed experiments.
The results of this study suggested that, the fault prediction can be useful for
the projects with percentage of faulty module less than a certain threshold (in
our case, it varied from 21% to 42% over the specied range of testing phases
eciency). Also, there was no single technique that could provide the best
results in all cases.
In chapter 5, we show the application of the proposed cost framework over
the multiple subsequent releases of the software. We evaluated fault removal
The Economics of Fault Prediction
1.4 Summary 5
cost of the current version of software using the fault information available
from its previous versions. Then, this estimated fault removal cost helps to
decide, whether fault prediction is useful or not for the current version. To
answer both the research questions, we have investigated the usefulness of
fault-prediction techniques based on the framework on successive versions of
the two dierent softwares namely Ant and Jedit. Here, we found the fault
prediction useful when the dierence between inter-version fault rate was be-
low a certain threshold (in our case, it was 2%). Also, the usability of fault
prediction found to be reduced with the increase of inter-version fault rate.
Here, the dierence between inter-version fault rate depicts, the dierence
between percentage faulty modules present in successive versions.
Finally we concluded the contribution of our research in Chapter 6. The future
prospects of our research are also discussed in the same chapter.
1.4 Summary
Fault-prediction techniques are used to identify faults in the software code
without execution it. So it has the potential to help validation and veri-
cation process by accurately identifying the faults. It may also help in an
economic software development process. But most of the organizations still do
not consider fault-prediction techniques while its potential has been validated
in couple of researches. It indicates that there is a need of further research in
this eld that would emphasize on how it can improve the quality assurance
process. In this chapter we highlight the issues in fault prediction arena and
summarized our work, which tried to put fault prediction results in the correct
perspective i.e. cost eectiveness.
The Economics of Fault Prediction
Chapter 2
Related Work
In this chapter, we summarized the concepts which are relevant to the study.
In particular, Fault prediction models, detail of public datasets used in the
research study, model evaluation techniques and literature review of previous
related studies are given in this section.
2.1 Fault Prediction Models
Fault prediction allows the tester to manipulate their resources more eectively
and eciently, which would potentially result in higher quality products and
lower costs. Fault prediction is typically employed by applying various ma-
chine learning algorithms on known properties learned from the project fault
datasets. The typical way of predicting faults in software modules include use
of software metrics and fault data (collected from previous releases or similar
projects) to construct a fault-prediction model. Then this model is used to
predict their fault proneness. For Example, a module under the scanner of a
fault prediction technique is detected as faulty, if it has the similar features
(metrics) value, compared to a faulty module that has been used to train the
fault prediction technique.
Many techniques have been proposed to estimate the fault-proneness of a soft-
ware module. Some of the proposed techniques are clustering, Decision Tree,
Neural Networks, Dempster-Shafer Belief Networks, Random Forest and Quad
Tree based K-Means [19][20][8][27][28][9][49].
Dierent Approaches for Fault Prediction Models-
2.1 Fault Prediction Models 7
A project manager needs to make sure a project met its timetable and budget
plan without loss of quality. In order to help project managers to make a
decision, fault prediction models play an important role to allocate software
quality assurance resources. Existing research in software fault-prone models
focus on predicting faults from these two perspectives:
The number of faults or fault density: This technique predicts the
number of faults (or fault density) in a module or a component. These
models typically use data from historical versions (or pre-release parts)
and predict the faults in the new version (or the new developed parts).
For example, the fault data from historical releases can be used to predict
faults in updated releases [46][33][50][23].
Classication: Classication predicts which modules (components) con-
tain faults and which modules dont. The goal of this kind of prediction
distinguishes fault free subsystems from faulty subsystems. This allows
project managers to focus resources to x faulty subsystems.
To construct fault prediction models we used methods. There are two methods
to classify fault-prone modules from fault free modules: supervised learning
and unsupervised learning. Both of them are used in dierent situations.
When a new system without any previous release is built, for the new devel-
oped subsystems (modules, components, or classes), in order to predict fault-
prone subsystems, unsupervised learning needs to be adopted. After some
subsystems are tested and put into function, these pre-release subsystems can
be used as training data to build software fault prediction models to predict
new subsystems. This is the time when supervised learning can be used. The
dierence between supervised and unsupervised learning is the status of train-
ing datas class, if it is unknown, then the learning is unsupervised, otherwise,
the learning is supervised learning.
Supervised Learning Learning is called supervised because; the method
operates under supervision provided with the actual outcome for each of the
training examples. Supervised learning requires known fault measurement
data (i.e. the number of faults, fault density, or fault-prone or not) for training
data. Usually, Fault measurement data from previous versions [46], pre-release
The Economics of Fault Prediction
2.2 Public Datasets 8
[44], or similar project [29] can act as training data to predict new projects
(subsystems).
Most research reported in fault prediction is supervised learning including
experiments in this dissertation. The learning result of supervised is easier to
judge than unsupervised learning. This probably helps to explain why there
are abundant reports on supervised learning in the literature and there are
few reports on unsupervised learning. Like most research conducted in fault
prediction, a data with all known classes is divided into training data and
testing data: the classes for training data are provided to a machine algorithm,
the testing data acts as the validation set and is used to judge the training
models. The success rate on test data gives an objective measure of how
well the machine learning algorithm performs. When this process is repeated
multiple times with randomized divided training and testing sets, it is the
standard data mining practice, called cross-validation. Like other research in
data mining, randomization, cross-validation, and bootstrapping are often the
standard statistical procedures for fault prediction in software engineering.
Unsupervised Learning Sometimes we may not have fault data or we may
have very little modules having previous fault data. For example, if a new
project is developing or previous fault data is not collected, supervised learning
approaches do not work because we do not have labeled training data.
Therefore, unsupervised learning approaches such as clustering methods may
be applied. However, research for this approach is seldom reported. As far
as the author is aware, Zhong et al. [55][56] are the rst group who investi-
gate this in fault prediction. They use Neural-Gas and K-means clustering to
class software modules into several groups, with the help of human experts to
identify fault-prone or not fault-prone to each group. Their results indicate
promising potentials for this unsupervised learning method.
2.2 Public Datasets
There are several software project data repositories become publically available
such as NASA Metrics Data Program [1] and PROMISE Data Repository
[2]. NASA MDP is a software project metrics repository provided by NASA
and is available to users through their website. NASA MDP data stores and
The Economics of Fault Prediction
2.2 Public Datasets 9
organizes the software metrics data and associated fault data at the module
level. Currently, there are thirteen projects datasets available. All NASA MDP
datasets are also available in PROMISE public repository. There are ninety
four defect datasets available in PROMISE. Therefore, these datasets can be
used to validate the performance of the various fault-prediction techniques. In
the experiments of this thesis work, we used twenty three public datasets from
NASA and Promise Data Repositories.
Table 2.1: Datasets used in the study
Project Faulty
(%)
Number of
Modules
Language Source
Jedit 4.3 2.23 492 Java PROMISE
pc1 6.94 1109 C NASA MDP
ar1 7.44 121 C PROMISE
nw1 7.69 403 C NASA MDP
kc3 9.34 458 Java NASA MDP
cm1 9.84 498 C NASA MDP
pc3 10.24 1563 C NASA MDP
Arc 11.54 234 C++ PROMISE
pc4 12.21 1458 C NASA MDP
kc1 15.46 2109 C++ NASA MDP
Jedit 4.2 13.07 367 Java PROMISE
ar4 18.69 107 C PROMISE
jm1 19.35 10885 C NASA MDP
kc2 20.5 522 C++ NASA MDP
camel1.6 21.91 858 Java PROMISE
ant1.6 26.21 351 Java PROMISE
Jedit4.0 24.5 306 Java PROMISE
Jedit4.1 25.32 312 Java PROMISE
ant1.7 27.79 493 Java PROMISE
mc2 32.3 161 C++ NASA MDP
jedit 3.2 33.09 272 Java PROMISE
lucene2.0 46.67 195 Java PROMISE
jedit 4.0 m 48.9 274 Java PROMISE
The details of these datasets are tabulated in Table 2.1. These datasets corre-
The Economics of Fault Prediction
2.3 Evaluation Measures 10
Table 2.2: Confusion matrix
Defect Present No Yes
Defect Predicted
No TN=True Negative FN=False Negative
Yes FP=False Positive TP=True Positive
spond to dierent programming language and have dierent software metrics
varying in size from eight to forty. The description of used datasets along with
their metrics is given in Appendix.(Appendix Table A1 and Appendix Table
A2)
2.3 Evaluation Measures
In this section, we summarize the various evaluation measures used by var-
ious researchers to evaluate the performance of a fault-prediction technique.
These measures can be broadly classied in two major categories- Numerical
measures and Graphical measures.
2.3.1 Numerical measures
All numerical measures can be derived from the Confusion matrix. A Con-
fusion matrix contains information about actual and predicted classications
done by a fault-prediction technique. Table 2.2 shows the confusion matrix
for a two class classication.
Accuracy:
The prediction accuracy of a fault-prediction technique is measured as
Accuracy =
TN +TP
TN +TP +FN +FP
(2.1)
False positive rate (FPR):
It is measured as the ratio of modules incorrectly predicted as faulty module
to the entire non-faulty modules. False alarm and type-1 error are similar as
FPR.
The Economics of Fault Prediction
2.3 Evaluation Measures 11
FPR =
FP
TN +FP
(2.2)
False negative rate (FNR):
It is measured as the ratio of modules incorrectly predicted as non-faulty
module to the entire faulty modules. Type-2 error is similar as FNR.
FNR =
FN
TP +FN
(2.3)
Precision:
It is measured as the ratio of modules correctly predicted as faulty to the entire
modules predicted as faulty.
Precision =
TP
TP +FP
(2.4)
Recall:
It is measured as the ratio of modules correctly predicted as faulty to the entire
faulty modules. Probability of detection (PD) is similar to recall.
Recall =
TP
TP +FN
(2.5)
F-measure:
It is measured as the harmonic mean of precision and recall. [36]
F measure = 2
Precision Recall
Precision +Recall
(2.6)
G-mean:
It is the Geometric mean. G-mean indices are dened in expressions (7) and
(8). G-mean1 is the square root of the probability of detection (PD) and
precision. G-mean2 is the square root of the product of PD and specicity.
[35]
G mean1 =

PD Precision (2.7)
G mean2 =

PD Specificity (2.8)
J-coecient (J-coe):
It tells about the performance of the prediction techniques more eectively.
[51]
The Economics of Fault Prediction
2.3 Evaluation Measures 12
J coeff = PD PF (2.9)
When J-coe is 0, the probability of detecting a faulty module is equal to
the false alarm rate. When J-coe is greater than 0, PD is greater than PF.
Whereas J-coe=1 represents the perfect classication, while J-coe=-1 is the
worst case because all modules are predicted inaccurately.
2.3.2 Graphical evaluation measures
Graphical measures depict the relationship between two or more numerical
measures. As the numerical measures all the graphical measures can be also
derived from the confusion matrix.
ROC curve [54]:
An ROC curve provides visualization of the tradeo between the ability to
correctly predict fault-prone modules (PD) and the number of incorrectly pre-
dicted fault-free modules (PF). The area under the ROC curve (denoted AUC)
is a numeric performance evaluation measure to compare the performance of
fault-prediction techniques. In ROC curves, the best performance indicates
high PD and low PF.
PR curve [14]:
An ROC curve provides visualization of the tradeo between Precision and
Recall. In a PR curve, x-axis represents Recall and y-axis is Precision. Recall
is another term for PD. In PR curve, the best performance indicates high PD
and high Precision.
Cost curve [15]:
A Cost curve provides the visualization on the cost of misclassication. It
describes the performance of a fault-prediction technique on the basis of cost
of misclassication. Its y-axis represents normalized expected misclassication
cost. It indicates the dierence between the maximum and the minimum cost
of misclassifying faulty modules. The x-axis represents the probability cost
function.
The Economics of Fault Prediction
2.4 Fault Prediction Studies 13
2.4 Fault Prediction Studies
In this section, we present the brief summary of some of the fault prediction
studies which are relevant to our study. In particular we summarized some of
the studies on fault-prediction techniques, some of the useful review journals
and research papers relevant to cost eectiveness of fault prediction. The
summarized studies are shown in Table 2.3.
These studies show that a lot of researches have been done in the eld of fault
prediction. But it requires more specic studies showing the eect of fault
prediction on software quality and its economics. In this thesis, we address
one of the major and complex problem in software fault prediction studies
i.e. how to compare the performance of dierent fault-prediction techniques
eectively? And as a solution we proposed a cost evaluation framework which
compare the performance on the basis of resultant fault removal cost.
Table 2.3: Fault Prediction Studies
S.
no
Study Fault-Prediction
Techniques used
Evaluation
metrics
Datasets
Used
Conclusion
1 Victor R. Basili,
Lionel C. Briand,
and Walcelio L.
Melo, (1996) [5]
logistic regression
(univari-
ate and multivari-
ate regression)
regression
coecient,
p-value
Private
Datasets (8
datasets)
1. They found that C and K met-
rics was useful to predict class fault-
proneness during the early phases of
the development life-cycle. 2. They
concluded that, on their dataset, C
and K metrics was better predictors
than traditional code metrics.
2 S. S. Gokhale and
M. R. Lyu (1997)
[17]
Regression tree,
density modeling
techniques
Accuracy,
Type I
and Type II
error
Private
Dataset
(med-
ical Imaging
System)
1. They found that Regression
tree based technique has high pre-
diction accuracy then density tech-
nique. 2. it has lower misclassi-
cation rate as compared to density
based technique.
3 T. Khoshgof-
taar and N. Seliya
(2002) [32]
CART-LS,
CART-LAD and
S-PLUS
average ab-
solute error
(aae)
and average
relative er-
ror (are)
Private
Dataset
(from a large
telecommu-
nication
system)
1. They concluded that perfor-
mance of CART-LAD was found
better than the other two tech-
niques. 2.S-PLUS trees had poor
predictive accuracy.
4 Lan Guo , Bojan
Cukic and
Harshinder Singh
(2003) [19]
Dempster-Shafer
(D-S) belief net-
work, logistic re-
gres-
sion and discrim-
inant analysis
Specicity,
Sensitivity,
Overall Pre-
diction Ac-
curacy,
Probabil-
ity of False
Alarm,
Eort
KC2 1. Accuracy of D-S belief networks
was found higher than logistic re-
gression and discriminant analysis.
The Economics of Fault Prediction
2.4 Fault Prediction Studies 14
5 Lan Guo, Yan
Ma, Bojan Cu-
kic, and Harshin-
der Singh (2004)
[20]
Logistic Re-
gression, Discrim-
inant
analysis, Decision
Tree, Rule Set,
Boosting, Logis-
tic, Ker-
nel Density, Nave
Bayes, j48, IBK,
IB1, Voted Per-
ceptron, Hyper
Pipes, ROCKY
Accuracy,
Probability
of Detection
CM1, JM1,
KC1, KC2
and PC1
1. Random Forest generally
achieves higher overall prediction
accuracy and defect detection rate
than other. 2. Compared dierent
machine learning models
6 T. Menzies, J.
DiStefano, A. Or-
rego, R. Chap-
man (2004) [42]
Naive Bayes and
J48
Accuracy,
Preci-
sion, Proba-
bility of De-
tection and
Probabil-
ity of False
alarm
CM1, JM1,
KC1, KC2
and PC1
1. They concluded that perfor-
mance of Naive Bayes is better than
J48 algorithm. 2. They stated that
accuracy is not a useful parameter
for evaluation. 3. They suggested
use of fault prediction in addition
to inspection for better quality as-
surance activity.
7 A. Koru
and Hongfang Liu
(2005) [34]
J48 and Kstar F-measure,
Precision
and Recall
CM1, JM1,
KC1, KC2
and PC1
1. They suggested, it is better
to perform defect prediction on the
data that belong to large modules.
2. They showed that when de-
fect prediction was performed us-
ing class level metrics, gives better
performance as compare to method
level metrics.
8 Venkata
U.B. Challagulla,
Farokh B. Bas-
tani, I-Ling Yen
(2005) [13]
Linear
Regression, Pace
Regression, Sup-
port Vector Re-
gression, Neural
Network for con-
tinuous goal eld,
Support Vec-
tor Logistic Re-
gression, Neural
Network for dis-
crete goal eld,
Logistic Regres-
sion, Nave Bayes,
Instance
Based Learning,
J48 Tree, and 1-
Rule
Mean Abso-
lute Error
CM1, JM1,
KC1 and
PC1
1. Evaluate performance of dif-
ferent prediction models. 2.
Shows that combination of 1R and
Instance-based Learning gives bet-
ter prediction accuracy. 3. Also
showed that Size and Complexity
metrics are not sucient for e-
cient fault prediction.
9 Tibor
Gyimothy, Rudolf
Ferenc, and Ist-
van Siket (2005)
[21]
logistic regression
(univari-
ate and multivari-
ate regression ) ,
decision tree and
neural network
Precision,
Cor-
rectness and
Completeness
Mozilla 1.0
to Mozilla
1.6
1. Presented a toolset to calcu-
late the OO metrics from C++
software. 2. Shown how fault-
proneness changed over seven ver-
sions of Mozilla
10 U.B. Challagulla,
B. Bastani, I. Yen
(2006) [12]
Memory
Based Reasoning
(MBR) technique
Accuracy,
Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, JM1,
KC1 and
PC1
1. They conclude that if accu-
racy is the only criteria, then simple
MBR with Euclidean distance per-
form better than other used tech-
niques. 2. They proposed a frame-
work that can be used to derive the
optimal conguration which gives
best performance for the given de-
fect dataset.
The Economics of Fault Prediction
2.4 Fault Prediction Studies 15
11 Yan Ma, Lan Guo
and Bojan Cukic
(2006) [39]
Logistic
Regression, Dis-
criminant Analy-
sis, Decision Tree,
Rule Set, Boost-
ing, Kernel Den-
sity, Naive Bayes,
J48, IBK, IB1,
Voted Percep-
tron, VF1, Hyper
Pipes, ROCKY,
Random Forest,
Modied Random
Forest
Probability
of De-
tection, Ac-
curacy, Pre-
cision,
G-mean1,
G-mean2,F-
measure
CM1, JM1,
KC1, KC2
and PC1
1. Proposed a novel methodology
based on variants of the random for-
est algorithm which is more robust
than random forest. 2. Compared
dierent machine learning models.
12 T. Menzies, J.
Greenwald and A.
Frank (2007) [43]
Nave bayes, J48
and log ltering
techniques
Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, KC3,
KC4, MW1,
PC1, PC2,
PC3 and
PC4
1. They showed that data mining of
static code attributes to learn de-
fect predictor techniques is useful.
2. They concluded that used pre-
dictors was found useful for priori-
tizing a resource-bound exploration
of code that has to be inspected.
13 S Kanmani, Rhy-
mend Uthariaraj,
Sankara-
narayanan, P.
Thambidurai
(2007) [27]
Back Propagation
Neural Net-
work, Probabilis-
tic Neural Net-
work, discrimi-
nant analysis and
logistic regression
Type I, type
II and over-
all misclas-
sication
rate
PC1, PC2,
PC3, PC4,
PC5 and
PC6
1. Probabilistic Neural Net-
works outperforms Back Propaga-
tion Neural Networks in predicting
the fault proneness of Object Ori-
ented Software.
14 Zhan Li, Marek
Reformat (2007)
[37]
Support
Vector Machine,
C4.5, Multilayer
Per-
ceptron and Nave
Bayes classier
Sensitivity,
Specicity
and
Accuracy
JM1 and
KC1
1. Performance of proposed
methodology i.e. SimBoost was
found better compared to conven-
tional techniques. 2. Authors pro-
posed fuzzy labels for classication
purposes.
15 Naeem Seliya,
Taghi M. Khosh-
goftaar (2007)
[47]
Expectation
Maximization,
C4.5
Type I, type
II and over-
all error rate
KC1, KC2,
KC3 and
JM1
1. EM-based semi-supervised clas-
sication improves the performance
of software quality models.
16 Yue Jiang, Bojan
Cukic and Yan
Ma (2008) [25]
Nave Bayes, Lo-
gistic, IB1, J48,
Bagging
All available
eval-
uation tech-
niques. In
addition in-
troduce
Cost curve
CM1, JM1,
KC1, KC2,
KC4, MC2,
PC1 and
PC5
1. Selection of the best pre-
diction model cannot be made
without considering software cost
characteristics.
17 Olivier Van-
decruys a, David
Martens
, Bart Baesens ,
Christophe Mues
b,
Manu De Backer
and Raf Haesen
(2008) [50]
Ant Miner+,
C4.5,
logistic regression
and support vec-
tor machine
accuracy,
specicity
and
sensitivity
KC1, PC1
and PC4
1. Authors argued that the intu-
itiveness and comprehensibility of
Ant Miner+ model found superior
then compared models.
18 B. Turhan and A.
Bener (2009) [48]
Nave Bayes Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, KC3,
KC4, MW1,
PC1, PC2,
PC3 and
PC4
1. They showed that independence
assumption of Nave Bayes was not
harmful for the defect prediction
in datasets with PCA preprocess-
ing. 2. They showed that assigning
weights to static code attribute can
signicantly increase the prediction
performance.
The Economics of Fault Prediction
2.5 Estimating Cost of Fault Prediction 16
19 Huihua Lu, Bojan
Cukic, Mark Culp
(2011) [38]
Random forest,
FTF
Probability
of Detection
and the
Area Under
Re-
ceiver Oper-
ating Char-
acter-
istic Curve
(AUC),
JM1, KC1,
PC1, PC3
and PC4
1. Semi-supervised technique out-
performs corresponding supervised
technique.
20 P.S. Bishnu and
V. Bhattacherjee
(2011) [9]
K-Means, Catal
et al. Two stage
approach
(CT) Single stage
approach (CS),
Nave Bayes and
Linear discrimi-
nant analysis
False Posi-
tive Rate,
False Nega-
tive Rate
and Error
AR3, AR4,
AR5, SYD1
and SYD2
1. Overall error rate of QDK al-
gorithm was found comparable to
other compared techniques
2.5 Estimating Cost of Fault Prediction
Software fault prediction attracts signicant attention as it can oer guidance
to software verication and validation activities. Over the past few years, many
organizations have provided their datasets containing software metrics and re-
spective fault information, publicly. Availability of these datasets encourages
researches to validate the performance of various machine learning techniques
in predicting fault proneness of software modules. Many research studies have
also been performed to evaluate the performance of these fault-prediction tech-
niques. But it seems that they ignored the impact of fault misclassication
on the economics of software development. Certifying considerable number of
faulty modules to be non-faulty raises serious concerns as it may result in the
increment of development cost due to the increase in fault removal cost of the
same, in the later phases. Hence, a more viable evaluation measure will be to
favor techniques which tend to reduce the fault removal cost.
Many studies have used dierent criteria to evaluate the performance of various
fault-prediction techniques under investigation. Some of the used criteria are:
Accuracy, Precision, Recall, and Mean absolute error, but these criteria could
not consider the cost parameter of software development. Then, few of them
presented cost measures to evaluate the cost eectiveness of fault prediction
studies.In this section, we summarized the studies, which measures the cost
eectiveness of fault prediction and relate them with our work.
Jiang et al. [25] have used various metrics to measure the performance of
The Economics of Fault Prediction
2.5 Estimating Cost of Fault Prediction 17
fault-prediction techniques. Then, they introduced cost curve, a measure to
estimate the cost eectiveness of a classication technique, to evaluate the per-
formance of a fault-prediction technique. They drew out the conclusion that
cost characteristics must be considered to select the best prediction technique.
Jiang et al. [24] addressed a more general problem, in which they observed that
the cost implications of false positives and false negatives are dierent. They
analyzed the benets of fault-prediction techniques which incorporate misclas-
sication cost in the development of the prediction model. They performed
11 experiments with dierent cost for false positives and false negatives on
13 datasets. They concluded that a cost-sensitive modeling does not improve
the overall performance of fault-prediction techniques. Nevertheless, explicit
information about misclassication cost makes it easier for software managers
to select the most appropriate technique.
Mende et al. [41] pointed out that, traditional prediction techniques typically
ignore the eort needed to x the faults, i.e., they do not distinguish between
a predicted fault in a small module and a predicted fault in a large module.
Then, they introduced a performance measure (popt), that takes the size of
the modules into account to measure the performance of a fault-prediction
technique. They performed their study on thirteen NASA datasets. They
concluded that their drawn result indicates the need for further research to
improve existing prediction models, not only by more sophisticated classica-
tion algorithms, but also by searching for better performance measures.
Mende et al. [40] proposed two strategies namely AD (eort-aware binary
prediction) and DD (eort-aware prediction based on defect density) to in-
clude the notion of eort awareness into fault-prediction techniques. The rst
strategy, AD, is applicable to any probabilistic classier, while DD is appli-
cable only for regression algorithms. They evaluate these strategies on fteen
publicly available datasets. They concluded that both strategies improve the
cost eectiveness of faultprediction techniques signicantly, in the statistical
and a practical sense.
Arisholm et al. [3] presented a study performed in an industrial setting where
they tried to build fault prediction models to eciently predict faults in a
JAVA system having multiple versions. They also proposed a cost performance
measure (CE), a variation of lift charts where the x-axis contains the ratio of
The Economics of Fault Prediction
2.6 Summary 18
lines of code instead of modules. They concluded that the popular confusion
matrix criterion is not clearly related to the cost-eectiveness.
Catal et al. [11] presented a literature review on fault-prediction studies from
1990 to 2009. They reviewed the results of previous studies as well as dis-
cussed the current trends. Bell et al. [6] presented a challenge paper and
discussed some important issues regarding the impact of fault-prediction stud-
ies on testing and other eorts. They concluded that till then no study existed
in literature which could investigate the impact of fault prediction in software
development process. They also highlighted that coming up with a method
that would assess the eectiveness of fault-prediction studies if adopted in
software project would be helpful for the software community.
Jiang et al. [25] used cost curve to show the cost eectiveness of fault-
prediction studies, but they assume the same misclassication costs for each
module, which might be unreasonable in practice. Mende et al. [41] intro-
duced a new performance namely popt, that account module size to evaluate
the performance of a fault-prediction technique, but in our framework the fault
removal cost of an particular phase is same for all modules. Jiang et al. [24]
experimented the cost impact on fault misclassications over eleven dierent
values (taken arbitrarily) for cost of false positives and false negatives. These
values were considered as same for all phases of software development which
is not a practical assumption. In this thesis, we proposed a new cost evalu-
ation framework, which overcome this limitation by using organization-wide
cost information and compute the estimated fault removal costs based on their
place of identication. Wagner et al. [52] summarized the fault removal cost
for dierent testing stages. Jones et al [26] summarized the fault identication
eciency of dierent testing phases. We have used these parameters to com-
pute estimated fault removal cost for a specic fault-prediction technique and
that eventually helped us to decide its applicability in a more precise way.
2.6 Summary
In this chapter, we presented a brief summarization of concepts related to our
study. In particular, we have shown the conventional way of performing fault
prediction, the measures used to evaluate the performance of fault-prediction
The Economics of Fault Prediction
2.6 Summary 19
technique and the brief summary of available public dataset repositories. Here,
we also summarized the studies, which are related to my thesis work and frame
a background for the same.
The Economics of Fault Prediction
Chapter 3
Fault Prediction Results: How
useful They Are?
In this chapter, we give an insight towards the cost economy of fault prediction.
In particular, we revisited the results of some of the earlier fault prediction
studies to account for fault misclassication. Here, we rst investigate how
dierent authors measure the performance of their presented fault-prediction
techniques. Then, we rene the performance evaluation criteria based on fault
misclassication, and revisited the outcomes of the above-said fault-prediction
techniques.
In our study, we used fteen research papers based on public datasets along
with their outcomes and measurement criteria (see Table 2.3). The remainder
of this chapter is organized as follows. Section 3.1 discusses the issues in fault
prediction. Section 3.2 presents a new model for evaluating fault prediction
performance of a technique base on cost economics. Section 3.3 presents re-
vision of fault prediction results based on presented evaluation model, and
Section 3.4 summarize our ndings.
3.1 Issues in Fault Prediction
Economic software development process requires identication and removal of
faults in the early stages of software development process. Fault-prediction
techniques are used to predict fault-prone modules in the software. Predicting
faults correctly may help in reducing the eorts applied in the later stages of
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 21
testing.
But building an accurate prediction model is a challenging task because the
dataset being used may have noisy content and may contain outliers [7]. It
is hard to nd suitable measure that can provide reliable estimation for the
various characteristics of the software system [6]. This makes the study of
fault prediction much more involved, as we are dealing with many alternative
and imprecise measures to compute the same software characteristic.
It has been found that the number of faulty modules represents only a small
fraction of the total number of modules in the software. This observation
in particular, is critically important to put the results obtained by the fault-
prediction technique in a correct perspective. Having fewer faulty modules
in the dataset, a high value of prediction accuracy may result due to the
classication of majority of non-faulty modules as non-faulty. However, our
main concern is identication of faulty modules rather than non-faulty ones.
Simply considering accuracy might be misleading, sometimes.
Many eorts have been made to evaluate the performance of fault-prediction
techniques. However, it seems that they tend to ignore the impact of the
fault misclassication on the economics of software development. For instance,
if there is high number of false positives, then it will require extra eorts
unnecessarily to scan those modules which are non-faulty. On the other hand,
if there is more number of false negatives, then leaving out too many faulty
modules under the scanner, the technique doesnt seem to help either. This
call for choosing a technique that would predict lesser number of false negatives
even if it tends to be less accurate and/or higher value of false positives.
Therefore, we revisited the results of previous fault prediction studies on the
basis of fault misclassication.
3.2 A Proposed Model for Evaluating Fault
Prediction Eciency
Here, we present a performance evaluation model, which evaluates the perfor-
mance of fault-prediction techniques in the context of economics.
The Economics of Fault Prediction
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 22
Figure 3.1 and Figure 3.2 shows the cost statistics for both faulty and non-fault
modules respectively. If a faulty module predicted as faulty it requires unit
level testing eorts but, if it is predicted as non-faulty it requires extra eorts
paid in later development stages to remove the same fault (see Figure 3.1).
However, if a non-faulty module incorrectly predicted as faulty module it re-
quires extra eort paid at the time of unit testing (see Figure 3.2). We used
both of the above said observation to compare the performance of the fault-
prediction techniques in our presented evaluation model.
True Positive
Not Discovered
False Negative
Discovered
Predicted as
non-Faulty
Predicted as
Faulty
Faulty Module
Fault Prediction
Technique
Require Unit
Testing Cost
System Testing
On Field Testing
Require System
Testing Cost
Require Field
Testing Cost
Figure 3.1: Cost statistics for faulty modules
.
False Positive
False Negative
True Negative
Predicted as
non-Faulty
Predicted as
Faulty
non-Faulty
Module
Fault Prediction
Technique
Require Unit
Testing Cost
No Testing
Required
Figure 3.2: Cost statistics for non-faulty modules
.
The Economics of Fault Prediction
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 23
3.2.1 General arguments
Based on above investigations and observations, we found a need to use those
prediction techniques which tries to minimize false negatives, even at the cost
of increasing false positive and compromising some accuracy. Accordingly, we
present a model to evaluate the performance of fault-prediction techniques.
The presented model tends to prioritize the performance of a fault-prediction
technique based on three criteria, namely, false negative rate, false positive
rate and prediction accuracy.
The general arguments to measure the performance of a fault-prediction tech-
nique are
1. False negatives are critically important for the overall reduction in the
testing and maintenance cost of the system and hence to be minimized.
2. False positives are to be reduced but can be compromised if they help to
reduce false negatives.
3. Similarly, prediction accuracy can also be compromised if it helps to reduce
false negatives.
3.2.2 Evaluation model
We quantify our arguments towards nding best technique. Here we discuss,
how we conclude a technique as the best one in the perspective of economic
software development. The dened model is given below-
1. Choose a technique as the best technique, having least FNR value but the
dierence between the FPR should be with in thresholds.
2. If two or more techniques have nearly same FNR value, then choose a
technique as the best technique, having least FPR value.
3. If two or more techniques have nearly equal FNR and FPR values, then
choose a technique as the best technique, having maximum accuracy.
We dene above three step evaluation model to compare the performance of
fault-prediction techniques so that selected technique requires minimum eort
for fault removal.
The Economics of Fault Prediction
3.3 Revisiting Fault Prediction Results 24
Wagner et al. [52] presented the quality economics of defect-detection tech-
niques and the impact of the uncovered faults on software cost as well as its
quality. This study support our presented model. Use of this evaluation model
helps to determine the impact of fault prediction on the software cost due to
undetected faults.
3.3 Revisiting Fault Prediction Results
There have been various studies done in the eld of software fault prediction.
In our analysis, we used the studies performed on public datasets. Table 2.3
summarizes detailed study of dierent authors with their used evaluation mea-
sures and their drawn conclusions. We analyzed that authors used various
evaluation measures to compare the performance of dierent fault-prediction
techniques, which made the comparison even more complicated. Moreover,
the performance of a technique varies with the used dataset. Therefore, we
revisited the results of earlier fault-prediction studies (Table 2.3) over four
NASA MDP [1] datasets(Table 3.1), incorporating above mentioned perfor-
mance measures i.e. false negative and false positive. All reported experi-
ments utilized technique implementations from WEKA data-mining tool [53].
All performance measurements are generated by threefold cross-validation of
classication.
Table 3.1: NASA datasets
Project # modules % with defects Language
CM1 496 9.80% C
KC1 2,109 15.50% C++
KC2 520 20.40% C++
PC1 1,109 6.90% C
As we know that high FNR shows that many faults remain undetected under
the scanner of fault-prediction technique, so it has a high impact on the soft-
ware quality as well as the testing and maintenance cost. At the same time
high value of FPR requires more eort for unit testing.
Overall, this suggests that for the development of economic and high quality
The Economics of Fault Prediction
3.3 Revisiting Fault Prediction Results 25
software, we should choose a technique that predicted less number of false
negatives even if it tends to be less accurate and/or predicted high number of
false positives. For our analysis, we combined the results of various authors
(mentioned in Table 2.3) with results of our presented model (FNR and FPR)
in Appendix Tables 3.2 to 3.5. Then, we interpret the performance of these
techniques in accordance to our model.
We have evaluated the performance of these techniques over four NASA datasets
viz. CM1, KC1, KC2 and PC1. We used WEKA [2] data mining tool to run
all the experiments. The interpretation is as follow:
For dataset CM1 (Table 3.2), techniques IBK, IB1 and Nave Bayes have similar
false negative rate (FNR) values but Nave Bayes have higher false positive rate
(FPR) than other two. Since Step 2 of our model compares the FPR values,
we found that IBK and IB1 had similar FPR values so both were equally good
when compared to other techniques.
For dataset KC1 (Table 3.3), techniques IBK, IB1 and Classication via Clus-
tering have similar FNR values but IBK has least FPR value, hence it outper-
forms all other techniques and can be consider the best for this dataset.
For dataset KC2 (Table 3.4), techniques Bayesian Logistic Regression and
Voted Perceptron have least FNR values but their FPR values are very high,
so they are not eective because almost all modules are predicted as faulty.
Hence, we consider Decision Stump technique as the best technique.
For dataset PC1 (Table 3.5), techniques IBK and IB1 have similar FNR values
but IB1 has slightly more false positives, so IBK is considered to be the best
one for the dataset PC1.
Now generalizing the best technique for these four datasets, our result shows
that IBK technique is found as the best technique among all the available
techniques.
The Economics of Fault Prediction
3.3 Revisiting Fault Prediction Results 26
Table 3.2: Experiment results for dataset CM1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 87.55 3 433 16 46 0.94 0.04 0.16 0.06 0.09
Simple Logistic 89.76 1 446 3 48 0.98 0.01 0.25 0.02 0.04
SMO 89.76 0 447 2 49 1 0 0 0 0
Voted Perceptron 89.96 0 448 1 49 1 0 0 0 0
IBK 87.95 15 423 26 34 0.69 0.06 0.37 0.31 0.33
IB1 87.95 15 423 26 34 0.69 0.06 0.37 0.31 0.33
Bagging 89.96 0 448 1 49 1 0 0 0 0
Classication via
Regression
89.56 3 443 6 46 0.94 0.01 0.33 0.06 0.1
Dagging 89.96 0 448 1 49 1 0 0 0 0
Stacking 90.16 0 449 0 49 1 0 0 0 0
Hyper pipes 89.56 0 446 3 49 1 0.01 0 0 0
Decision Table 90.16 0 449 0 49 1 0 0 0 0
PART 89.96 1 447 2 48 0.98 0 0.33 0.02 0.04
Jrip (RIPPER) 89.56 1 445 4 48 0.98 0.01 0.2 0.02 0.04
J 48 89.96 4 439 10 45 0.92 0.02 0.29 0.08 0.13
Random Forest 89.76 6 441 8 43 0.88 0.02 0.43 0.12 0.19
Decision Stump 90.16 0 449 0 49 1 0 0 0 0
BF tree 89.96 1 447 2 48 0.98 0 0.33 0.02 0.04
Nave Bayes 83.53 15 401 48 34 0.69 0.11 0.24 0.31 0.27
Bayesian Logistic
Regression
90.16 0 449 0 49 1 0 0 0 0
Logistic 88.15 8 431 18 41 0.84 0.04 0.31 0.16 0.21
Classication via
Clustering
84.14 13 406 43 36 0.73 0.1 0.23 0.27 0.25
Grading 90.16 0 449 0 49 1 0 0 0 0
Zero r 90.16 0 449 0 49 1 0 0 0 0
Table 3.3: Experiment results for dataset kc1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 85.78 69 1740 43 257 0.79 0.02 0.62 0.21 0.32
Simple Logistic 85.63 66 1740 43 260 0.8 0.02 0.61 0.2 0.3
SMO 84.64 9 1776 7 317 0.97 0 0.56 0.03 0.05
Voted Perceptron 81.79 117 1608 175 209 0.64 0.1 0.4 0.36 0.38
IBK 84.45 134 1647 136 192 0.59 0.08 0.5 0.41 0.45
IB1 83.36 134 1624 159 192 0.59 0.09 0.46 0.41 0.43
Bagging 85.92 78 1734 49 248 0.76 0.03 0.61 0.24 0.34
Classication via
Regression
85.4 63 1738 45 263 0.81 0.03 0.58 0.19 0.29
Dagging 84.83 12 1777 6 314 0.96 0 0.67 0.04 0.07
Stacking 84.54 0 1783 0 326 1 0 0 0 0
Hyper pipes 85.07 13 1781 2 313 0.96 0 0.87 0.04 0.08
Decision Table 84.73 43 1744 39 283 0.87 0.02 0.52 0.13 0.21
PART 85.02 50 1743 40 276 0.85 0.02 0.56 0.15 0.24
Jrip (RIPPER) 84.68 84 1702 81 242 0.74 0.05 0.51 0.26 0.34
J 48 85.21 96 1701 82 230 0.71 0.05 0.54 0.29 0.38
Random Forest 85.25 92 1706 77 234 0.72 0.04 0.54 0.28 0.37
Decision Stump 84.54 0 1783 0 326 1 0 0 0 0
BF tree 85.25 40 1758 25 286 0.88 0.01 0.62 0.12 0.2
Nave Bayes 82.46 120 1619 164 206 0.63 0.09 0.42 0.37 0.39
Bayesian Logistic
Regression
84.73 13 1774 9 313 0.96 0.01 0.59 0.04 0.07
Logistic 85.3 70 1729 54 256 0.79 0.03 0.56 0.21 0.31
Classication via
Clustering
81.79 129 1596 187 197 0.6 0.1 0.41 0.4 0.4
Grading 84.54 0 1783 0 326 1 0 0 0 0
Zero r 84.54 0 1783 0 326 1 0 0 0 0
The Economics of Fault Prediction
3.3 Revisiting Fault Prediction Results 27
Table 3.4: Experiment results for dataset kc2
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 83.14 39 395 20 68 0.64 0.05 0.66 0.36 0.47
Simple Logistic 82.95 40 393 22 67 0.63 0.05 0.65 0.37 0.47
SMO 83.52 26 410 5 81 0.76 0.01 0.84 0.24 0.38
Voted Perceptron 24.52 106 22 393 1 0.01 0.95 0.21 0.99 0.35
IBK 79.12 50 363 52 57 0.53 0.13 0.49 0.47 0.48
IB1 76.25 51 347 68 56 0.52 0.16 0.43 0.48 0.45
Bagging 83.72 50 387 28 57 0.53 0.07 0.64 0.47 0.54
Classication via
Regression
82.57 45 386 29 62 0.58 0.07 0.61 0.42 0.5
Dagging 81.8 17 410 5 90 0.84 0.01 0.77 0.16 0.26
Stacking 79.5 0 415 0 107 1 0 0 0 0
Hyper pipes 81.99 19 409 6 88 0.82 0.01 0.76 0.18 0.29
Decision Table 82.57 45 386 29 62 0.58 0.07 0.61 0.42 0.5
PART 80.84 32 390 25 75 0.7 0.06 0.56 0.3 0.39
Jrip (RIPPER) 83.52 58 378 37 49 0.46 0.09 0.61 0.54 0.57
J 48 81.42 46 379 36 61 0.57 0.09 0.56 0.43 0.49
Random Forest 81.8 48 379 36 59 0.55 0.09 0.57 0.45 0.5
Decision Stump 78.93 80 332 83 27 0.25 0.2 0.49 0.75 0.59
BF tree 82.57 50 381 34 57 0.53 0.08 0.6 0.47 0.52
Nave Bayes 83.52 45 391 24 62 0.58 0.06 0.65 0.42 0.51
Bayesian Logistic
Regression
20.88 107 2 413 0 0 1 0.21 1 0.34
Logistic 82.38 47 383 32 60 0.56 0.08 0.59 0.44 0.51
Classication via
Clustering
81.03 70 353 62 37 0.35 0.15 0.53 0.65 0.59
Grading 79.5 0 415 0 107 1 0 0 0 0
Zero r 79.5 0 415 0 107 1 0 0 0 0
Table 3.5: Experiment results for dataset pc1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 93.6 18 1020 12 59 0.77 0.01 0.6 0.23 0.34
Simple Logistic 92.79 5 1024 8 72 0.94 0.01 0.38 0.06 0.11
SMO 93.15 1 1032 0 76 0.99 0 1 0.01 0.03
Voted Perceptron 91.61 0 1016 16 77 1 0.02 0 0 0
IBK 92.43 34 991 41 43 0.56 0.04 0.45 0.44 0.45
IB1 92.25 34 989 43 43 0.56 0.04 0.44 0.44 0.44
Bagging 92.88 6 1024 8 71 0.92 0.01 0.43 0.08 0.13
Classication via
Regression
92.79 3 1026 6 74 0.96 0.01 0.33 0.04 0.07
Dagging 93.06 1 1031 1 76 0.99 0 0.5 0.01 0.03
Stacking 93.06 0 1032 0 77 1 0 0 0 0
Hyper pipes 92.52 2 1024 8 75 0.97 0.01 0.2 0.03 0.05
Decision Table 92.7 5 1023 9 72 0.94 0.01 0.36 0.06 0.11
PART 92.43 1 1024 8 76 0.99 0.01 0.11 0.01 0.02
Jrip (RIPPER) 92.88 7 1023 9 70 0.91 0.01 0.44 0.09 0.15
J 48 92.7 11 1017 15 66 0.86 0.01 0.42 0.14 0.21
Random Forest 92.9666 20 1011 21 57 0.74 0.02 0.49 0.26 0.34
Decision Stump 92.88 2 1028 4 75 0.97 0 0.33 0.03 0.05
BF tree 92.7 4 1024 8 73 0.95 0.01 0.33 0.05 0.09
Nave Bayes 89.36 24 967 65 53 0.69 0.06 0.27 0.31 0.29
Bayesian Logistic
Regression
93.06 0 1032 0 77 1 0 0 0 0
Logistic 92.06 8 1013 19 69 0.9 0.02 0.3 0.1 0.15
Classication via
Clustering
89.81 19 977 55 57 0.75 0.05 0.26 0.25 0.25
Grading 93.06 0 1032 0 77 1 0 0 0 0
Zero r 93.06 0 1032 0 77 1 0 0 0 0
The Economics of Fault Prediction
3.4 Summary 28
3.4 Summary
Software fault prediction attracts signicant attention as it can oer guidance
to software verication and validation activities. Over the past few years,
many organizations have provided their datasets describing module metrics
and their fault content publicly. The availability of these datasets encour-
age researchers to perform their fault prediction studies using several machine
learning techniques. In this chapter, we studied the outcome of some of the
earlier studies undertaken in this area. We found that they have used various
criteria to evaluate the performance of a given technique. In most of the cases,
these studies have used prediction accuracy to show how good a technique is.
However, they seem to be ignoring the impact of fault misclassication rate
in judging the overall performance of the various fault-prediction techniques.
Certifying considerable number of faulty modules to be non-faulty raises seri-
ous concerns, where faulty modules themselves are small in number compared
to non-faulty modules. A more viable evaluation criterion will be to favor
techniques which tend to reduce false negatives even if compromise on false
positives and/or prediction accuracy.
We have performed re-analysis on the results of earlier studies and rene their
outcomes based on our presented model. Our contribution in this chapter is
to rene the way of selection of best technique. Here, we also identify the
need of an evaluation measure which provides the specic information about,
how cost economic fault-prediction techniques are and what their fundamental
limitations are?
The Economics of Fault Prediction
Chapter 4
A Cost Evaluation Framework
In the previous chapter, we investigated the impact of fault misclassication
on software economics and quality. In this chapter, we quantify the fault
removal cost in dierent stages of software development when we are using
fault prediction and answered both research questions.
Specically, we propose a cost evaluation framework that can help to put
the results of fault prediction in proper usability context. Essentially, the
framework can provide an estimate of the saving in the eorts applied by
using the results of the fault prediction in subsequent phases of the software
development. To construct the framework, we accounted for realistic fault
removal cost of dierent testing phases [52], along with their fault identication
eciency [26]. We have used this framework to investigate two important
and related research questions that for a given project dataset, whether fault
prediction would help? And if yes, then how to choose a fault-prediction
technique that would yield the optimal results. The rst question can be
answered by comparing the fault removal cost in both the cases, i.e. with or
without use of fault prediction.
The remainder of this chapter is organized as follows. In Section 4.1, we present
our proposed cost evaluation framework. Section 4.2 presents an experimental
study to investigate the usefulness of fault-prediction techniques using our
proposed framework. We discuss the implications of using our framework in
Section 4.3 and summarization is given in Section 4.4.
4.1 The Evaluation Framework 30
4.1 The Evaluation Framework
In the previous chapter, we highlighted the need of a cost evaluation measure,
which compare the performance of a fault-prediction technique on the basis of
their economics. Jones [30] states that 30-40 percent of the development cost is
spent for quality assurance and fault removal. Since fault-prediction techniques
are used to predict fault prone modules in early development life cycle, hence
it can help in reducing the cost incurred on testing and maintenance.
Here, we construct a cost evaluation framework, which accounts for realistic
cost required to remove a fault and computes the estimated fault removal cost
for a specic fault-prediction technique. The constraints, which we accounted
for our framework include:
(1) Fault removal cost varies with testing phases.
(2) It is not possible to identify 100 % faults in specic testing phase.
(3) It is practically not feasible to perform unit test on all modules.
We have used normalized fault removal cost suggested by Wagner et al. [52]
to formulate our cost evaluation framework, but these costs may vary from
one organization to another and also depends on the various characteristics
of the project. The normalized costs are summarized in Table 4.1. The fault
identication eciencies for dierent testing phases are taken from the study
of Jones [26]. The eciencies of testing phases are summarized in Table 4.2.
Wilde et al. [45] stated that more than fty percent of modules are very small
in size, hence unit testing on these modules is unfruitful. We have included
this value (0.5) as the threshold for unit testing in our framework.
Table 4.1: Removal costs of test techniques (in sta-hours per defect) [52]
Type Lowest Mean Median Highest
Unit 1.5 3.46 2.5 6
System 2.82 8.37 6.2 20
Field 3.9 27.24 27 66.6
The Economics of Fault Prediction
4.1 The Evaluation Framework 31
Table 4.2: Fault identication eciencies of dierent test phases [26]
Type Lowest Median Highest
Unit 0.1 0.25 0.5
System 0.25 0.5 0.65
Figure 3.1 and Figure 3.2 shows the cost statistics for both faulty and non-
fault modules respectively. Software modules which are predicted as faulty
(true positives and false positives) by the fault-prediction technique requires
some verication and testing cost at module level i.e. require cost equal to
the unit testing cost (C
u
, specically for our study). As it is stated that 100
% identication of faults in specic testing phase is not possible, so some of
the correctly predicted faulty modules (true positive) remain undetected in
the unit testing. Faulty modules, which are predicted as non-faulty (false
negatives) and the correctly predicted faulty modules which are remain unde-
tected in unit testing, are probably detected in later stages that require the
fault removal cost equal to either system testing or eld testing (c
u
and C
s
respectively in our case). The used testing techniques in our framework can
also, along with the respective fault removal cost and eciency can vary from
organization to organization. Equation 4.1 shows the proposed cost evaluation
framework to estimate the overall fault removal cost. Equation 4.2 shows the
minimum fault removal cost without the use of fault prediction. Normalized
fault removal cost and its interpretation is shown in equation 4.3.
Ecost = C
i
+C
u
(FP +TP) +
s
C
s
(FN + (1
u
) TP)
+(1
s
) C
f
(FN + (1
u
) TP) (4.1)
Tcost = M
p
C
u
(TM)+
s
C
s
(1
u
)FM+(1
s
)C
f
(1
u
)FM(4.2)
NEcost =
Ecost
Tcost

< 1, Fault Prediction is useful


=> 1, Use Unit Testing
(4.3)
Where, Ecost - Estimated fault removal cost of the software when we use fault
prediction.
The Economics of Fault Prediction
4.1 The Evaluation Framework 32
Tcost- Estimated fault removal cost of the software without the use fault
prediction.
NEcost- Normalized Estimated fault removal cost of the software when we use
fault prediction.
C
i
- Initial setup cost of used fault-prediction technique.
C
u
- Normalized fault removal cost in unit testing.
C
s
- Normalized fault removal cost in system testing.
C
f
- Normalized fault removal cost in eld testing.
M
p
- Percentage of modules unit tested.
FP - Number of false positives.
FN - Number of false negatives.
TP - Number of true positives.
TM - Total modules.
FM - Total number of faulty modules.

u
- Fault identication eciency of unit testing.

s
- Fault identication eciency of system testing.
Our cost evaluation framework consider more practical scenario where the un-
detected faults are traced in all the later testing phases and the corresponding
fault removal cost is evaluated based on the organization specic statistics.
It makes the proposed framework more viable performance measure then the
other measures.
In our experiment, we used values of C
u
, C
s
and C
f
as summarized in Ta-
ble 4.1.
u
and
s
show the fault identication eciency of unit testing and
system testing, respectively. We have used the values of
u
and
s
from the
survey report Software Quality in 2010 of Caper Jones [26]. M
p
shows the
fraction of modules unit tested. Its value is taken from the study of Wilde
[45]. We have generalized the framework so that it can be applied to any sort
of organization/software with their specic values of C
u
, C
s
, C
f
, M
p
,
u
and

s
. Our aim is to provide the bench mark to approximate the overall fault
removal cost. This is clear from our framework that if a technique is having
high false negatives and/or high false positive, then it results in higher fault
removal cost. When this approximated cost exceeds the unit testing cost, we
suggest testing all the modules at unit level instead of using fault prediction.
(equation 4.3)
The Economics of Fault Prediction
4.2 Experimental Study 33
4.2 Experimental Study
In this section, we presented an experimental study to investigate the useful-
ness of fault-prediction techniques using our cost evaluation framework. In our
study, we used ve popular fault-prediction techniques [19][20][27][25][22] on
19 projects from NASA MDP [1] and PROMISE [2] repositories to investigate
our study. As these nineteen projects cover signicant range of percentage
faulty modules in the project (varying from 7 to 49 percentages), it is su-
cient for our investigation. We used WEKA machine learning tool to perform
all listed experiments.
4.2.1 Experimental setup
We have used NASA MDP [1] and PROMISE [2] datasets, listed in Table
4.3, to evaluate the impact of fault-prediction technique over the fault removal
cost using our proposed framework (Ecost). The metrics in these datasets
describe projects, which vary in size as well as in complexity. These datasets
have dierent software metrics varying in size from eight to forty. We further
classify these datasets on the basis of percentage of faulty modules present as
shown in Table 4.4.
To illustrate eectiveness of our framework, we have used ve well-known
fault-prediction techniques. Our goal is to demonstrate the cost evaluation
framework and suggest when to use fault prediction, rather than identifying
the best fault-prediction technique. For this reason, the choice of fault-
prediction technique is orthogonal with respect to the intended contribution.
The fault-prediction techniques which we selected for our study are Random
Forest, J48 (C4.5 decision tree), Neural Network, K-means Clustering and IBK
(K-nearest neighbours). These algorithms represent a broad range of machine
learning techniques. All reported experiments utilized technique implemen-
tations from a well-known software package WEKA [53]. All performance
measurements are generated by threefold cross-validation of classication.
The Economics of Fault Prediction
4.2 Experimental Study 34
Table 4.3: Used projects from NASA [1] and PROMISE data repository [1]
Project Faulty (%) Number of Modules
pc1 6.94 1109
ar1 7.44 121
nw1 7.69 403
kc3 9.34 458
cm1 9.84 498
pc3 10.24 1563
Arc 11.54 234
pc4 12.21 1458
kc1 15.46 2109
ar4 18.69 107
jm1 19.35 10885
kc2 20.5 522
camel1.6 21.91 858
ant1.6 26.21 351
ant1.7 27.79 493
mc2 32.3 161
jedit 3.2 33.09 272
lucene2.0 46.67 195
jedit 4.0 m 48.9 274
Table 4.4: Categorization of projects based on the fraction of faulty modules
Category Faults (%) Number of projects
Category 1 Below 10 5
Category 2 20-Oct 6
Category 3 Above 20 8
4.2.2 Experiment execution
To present the usefulness of our cost evaluation framework, we perform an
empirical study by using the projects listed in Table 4.3, collected from NASA
The Economics of Fault Prediction
4.2 Experimental Study 35
MDP and PROMISE data repositories. We used WEKA software package
to perform all the listed experiments. We assigned the values of initial setup
cost, C
i
= 0 (because we didnt incur any cost in obtaining the datasets),
normalized unit testing cost, C
u
= 2.5, normalized system testing cost, C
s
=
6.2, normalized on eld testing cost, C
f
= 27 (median values from Table 4.1),

u
= 0.25 and
s
= 0.5 (median values from Table 4.2). The value of M
p
is
taken as 0.5 [45]. To obtain the results from our experiment, we rst performed
threefold cross-validation of each of the 19 projects corresponding to each of
the fault-prediction techniques. In these experiments, we computed the values
of false positive (FP), false negative (FN) and true positive (TP) corresponding
to each dataset. We then used these values (C
u
, C
s
, C
f
, M
p
,
u
and
s
) to
calculate the value of Ecost, estimated testing cost (Tcost) and NEcost. This
led us to compare the normalized estimated fault removal cost (NEcost) values
for selecting the best technique under each of the three categories. The method
is presented in Figure 4.1. The results are briey discussed in the following
section.
4.2.3 Results
The results obtained through the experiments are tabulated in Table 4.5 to 4.23
and shown in Figures 4.2 to 4.4. Each table corresponds to a particular dataset
and shows the values of Accuracy, FP, FN, TP and Ecost for each of the fault-
prediction techniques, namely Random Forest, j48, Neural Network, K-means
and IBK. Whereas, each gure corresponds to a particular category and shows
the values of NEcost for each of the used fault-prediction techniques. Based
on the results we answered Q1 i.e. whether fault-prediction techniques are to
be used and in case of armation, we selected the technique with minimum
value of Ecost to answer Q2.
To answer Q1, for coming up with a decision, we compared the value of NEcost,
if it is lesser than 1, then we recommend the use of fault-prediction technique
because their use reduces the overall software development cost. But if the
value of NEcost is greater than 1, then we suggest not to go for fault prediction
because it causes extra testing eorts and results in higher development cost.
The Economics of Fault Prediction
4.2 Experimental Study 36
Start
Select a fault prediction approachs
Calculate confusion matrix
Train& Test (3 fold) project
dataset
Select the value of parameters
Calculate Ecost for current verion
Ecost > unit
test cost
Use fault prediction
NO
YES
Fault prediction is
not useful
End
Figure 4.1: Decision chart representation to evaluate the estimated Ecost
Category 1
In this category, we included the projects/datasets, which have less than 10
percent of faulty modules. We have ve such datasets. We have used the same
ve fault-prediction techniques mentioned in the section 4.2.1 to predict faults
in these projects and based on the output confusion matrix, we calculated
the normalized estimated fault removal cost (NEcost). The results of datasets
belonging to this category are tabulated in Tables 4.5 to 4.9 and compared in
The Economics of Fault Prediction
4.2 Experimental Study 37
Figure 4.2.To answer Q1, we analyzed the Figure 4.2. Here, we found that,
in category 1, fault prediction is useful for the projects/datasets because even
if a fault-prediction technique has a high value of false negatives and/or false
positive, then also the value of NEcost comes out to be lesser than 1. For
example, for NASA dataset PC1, the minimum value of NEcost is 0.54, when
we use Random Forest as our fault-prediction technique.
Table 4.5: Result of experiment for PC1 (1109)
Technique Accuracy FP FN TP NEcost
RF 92.97 0.02 0.74 0.26 0.55
j48 92.7 0.01 0.86 0.14 0.55
NN 93.6 0.01 0.77 0.23 0.55
K-means 89.81 0.05 0.75 0.25 0.58
IBK 92.43 0.04 0.56 0.44 0.56
Table 4.6: Result of experiment for AR1 (121)
Technique Accuracy FP FN TP NEcost
RF 90.91 0.02 1 0 0.59
j48 86.78 0.07 0.89 0.11 0.64
NN 85.12 0.09 0.89 0.11 0.66
K-means 92.56 0 1 0 0.57
IBK 90.91 0.04 0.78 0.22 0.59
Table 4.7: Result of experiment for NW1 (403)
Technique Accuracy FP FN TP NEcost
RF 90.82 0.02 0.9 0.1 0.60
j48 90.82 0.03 0.77 0.23 0.60
NN 90.57 0.04 0.71 0.29 0.61
K-means 81.64 0.16 0.48 0.52 0.71
IBK 87.59 0.08 0.68 0.32 0.64
The Economics of Fault Prediction
4.2 Experimental Study 38
Table 4.8: Result of experiment for KC3 (458)
Technique Accuracy FP FN TP NEcost
RF 90.61 0.02 0.84 0.16 0.65
j48 89.3 0.04 0.77 0.23 0.67
NN 88.65 0.05 0.77 0.23 0.67
K-means 65.28 0.37 0.16 0.84 0.93
IBK 87.55 0.06 0.72 0.28 0.69
Table 4.9: Result of experiment for CM1 (498)
Technique Accuracy FP FN TP NEcost
RF 89.76 0.02 0.88 0.12 0.67
j48 88.96 0.02 0.92 0.08 0.67
NN 87.55 0.04 0.94 0.06 0.69
K-means 84.14 0.1 0.73 0.27 0.73
IBK 87.95 0.06 0.69 0.31 0.69
Figure 4.2: Value of NEcost for category 1 when
u
= 0.25 and
s
= 0.5
Category 2
In this category, we included the projects/datasets, which have 10 to 20 percent
of faulty modules. We have six such datasets. We used same fault-prediction
techniques to calculate the value of NEcost. The results of datasets belonging
to this category are tabulated in Table 4.10 to 4.15 and presented in Figure 4.3.
The Economics of Fault Prediction
4.2 Experimental Study 39
On analyzing the results, we found that as the percentage of faulty modules
increases, the fault-prediction technique tends to have a higher value of NEcost.
As the answer of Q1, we found that for all datasets in this category, the
value of NEcost is lower than the value of unit testing cost i.e. 1, for all the
projects. Still, the value of NEcost has increased in this category as compared
to category 1; this high value is due to high number of false negatives. The
maximum acceptable value of false negatives depends on the percentage of
faulty modules present. In this category, we found that the performance of
fault-prediction techniques is dataset specic.
Table 4.10: Result of experiment for PC3 (1563)
Technique Accuracy FP FN TP NEcost
RF 89.44 0.02 0.86 0.14 0.68
j48 88.42 0.03 0.83 0.17 0.69
NN 86.82 0.06 0.74 0.26 0.71
K-means 65.07 0.29 0.91 0.09 0.92
IBK 86.76 0.06 0.74 0.26 0.71
Table 4.11: Result of experiment for ARC (234)
Technique Accuracy FP FN TP NEcost
RF 87.75 0.04 0.81 0.19 0.74
j48 87.61 0.04 0.78 0.22 0.73
NN 86.32 0.06 0.74 0.26 0.74
K-means 58.55 0.33 0.78 0.22 0.95
IBK 80.34 0.13 0.7 0.3 0.80
Table 4.12: Result of experiment for PC4 (1458)
Technique Accuracy FP FN TP NEcost
RF 90.12 0.02 0.66 0.34 0.72
j48 89.51 0.05 0.53 0.47 0.73
NN 88 0.07 0.47 0.53 0.75
K-means 54.25 0.4 0.86 0.14 1.04
IBK 86.69 0.07 0.57 0.43 0.76
The Economics of Fault Prediction
4.2 Experimental Study 40
Table 4.13: Result of experiment for KC1 (2109)
Technique Accuracy FP FN TP NEcost
RF 85.25 0.04 0.72 0.28 0.81
j48 85.21 0.05 0.71 0.29 0.82
NN 85.78 0.02 0.79 0.21 0.81
K-means 81.79 0.1 0.6 0.4 0.85
IBK 84.45 0.08 0.59 0.41 0.83
Table 4.14: Result of experiment for AR4 (107)
Technique Accuracy FP FN TP NEcost
RF 85.05 0.03 0.65 0.35 0.86
j48 82.24 0.07 0.65 0.35 0.88
NN 81.31 0.09 0.6 0.4 0.88
K-means 83.18 0.03 0.75 0.25 0.86
IBK 77.57 0.15 0.55 0.45 0.91
Table 4.15: Result of experiment for JM1 (10885)
Technique Accuracy FP FN TP NEcost
RF 80.8 0.05 0.76 0.24 0.97
j48 79.81 0.06 0.8 0.2 0.97
NN 81.19 0.02 0.88 0.12 0.96
K-means 71.93 0.13 0.9 0.1 1.03
IBK 76.54 0.14 0.63 0.37 1.00
Category 3
In this category, we included the projects/datasets, which have more than 20
percent of faulty modules. We have eight such datasets. The results of datasets
belonging to this category are tabulated in Tables 4.16 to 4.23 and presented
in Figure 4.4. To answer Q1, when we analyzed the results, we found that for
datasets KC2, camel1.6, ant1.6 and ant1.7 the value of NEcost is less than 1.
For remaining datasets, the value of NEcost is greater than 1. Our framework
recommends use of fault prediction for the former datasets and does not for
The Economics of Fault Prediction
4.2 Experimental Study 41
Figure 4.3: Value of NEcost for category 2 when
u
= 0.25 and
s
= 0.5
latter ones. In this category, we found that the performance of Neural Network
and J48 was better than other used fault-prediction techniques.
Table 4.16: Result of experiment for KC2 (522)
Technique Accuracy FP FN TP NEcost
RF 81.8 0.09 0.55 0.45 0.90
j48 81.42 0.09 0.57 0.43 0.90
NN 83.14 0.05 0.64 0.36 0.89
K-means 81.03 0.15 0.35 0.65 0.91
IBK 79.12 0.13 0.53 0.47 0.92
Table 4.17: Result of experiment for Camel 1.6 (858)
Technique Accuracy FP FN TP NEcost
RF 76.92 0.08 0.77 0.23 0.93
j48 72.84 0.16 0.66 0.34 0.96
NN 76.92 0.1 0.69 0.31 0.94
K-means 53.15 0.42 0.64 0.36 1.09
IBK 70.75 0.19 0.65 0.35 0.98
The Economics of Fault Prediction
4.2 Experimental Study 42
Table 4.18: Result of experiment for Ant 1.6 (351)
Technique Accuracy FP FN TP NEcost
RF 80.63 0.08 0.51 0.49 0.95
j48 79.49 0.12 0.45 0.55 0.96
NN 81.2 0.09 0.47 0.53 0.95
K-means 65.53 0.31 0.42 0.58 1.04
IBK 76.07 0.17 0.42 0.58 0.98
Table 4.19: Result of experiment for Ant 1.7 (493)
Technique Accuracy FP FN TP NEcost
RF 78.9 0.12 0.45 0.55 0.97
j48 82.96 0.09 0.37 0.63 0.95
NN 83.16 0.08 0.41 0.59 0.95
K-means 59.64 0.1 0.82 0.18 1.00
IBK 79.11 0.13 0.42 0.58 0.97
Table 4.20: Result of experiment for MC2 (161)
Technique Accuracy FP FN TP NEcost
RF 67.08 0.16 0.69 0.31 1.04
j48 62.11 0.28 0.58 0.42 1.07
NN 70.81 0.17 0.54 0.46 1.03
K-means 70.2 0.08 0.75 0.25 1.02
IBK 70.2 0.21 0.48 0.52 1.03
Table 4.21: Result of experiment for J-edit 3.2 (272)
Technique Accuracy FP FN TP NEcost
RF 73.53 0.17 0.46 0.54 1.02
j48 75.74 0.18 0.38 0.62 1.01
NN 73.53 0.21 0.38 0.62 1.02
K-means 68.38 0.37 0.21 0.79 1.06
IBK 73.9 0.2 0.39 0.61 1.02
The Economics of Fault Prediction
4.2 Experimental Study 43
Table 4.22: Result of experiment for Lucene 2.0 (195)
Technique Accuracy FP FN TP NEcost
RF 63.59 0.23 0.52 0.48 1.09
j48 63.59 0.4 0.32 0.68 1.10
NN 66.15 0.33 0.35 0.65 1.09
K-means 60 0.45 0.34 0.66 1.11
IBK 64.1 0.3 0.43 0.57 1.09
Table 4.23: Result of experiment for J-edit 4.0 (274)
Technique Accuracy FP FN TP NEcost
RF 71.17 0.32 0.25 0.75 1.08
j48 70.8 0.35 0.23 0.77 1.08
NN 71.9 0.35 0.21 0.79 1.08
K-means 58.03 0.24 0.6 0.4 1.10
IBK 64.6 0.31 0.4 0.6 1.09
Figure 4.4: Value of NEcost for category 3 when
u
= 0.25 and
s
= 0.5
4.2.4 Experiment ndings
In this section, we described a brief summary of overall ndings of our experi-
ment. The experiment was performed with nineteen project datasets using ve
The Economics of Fault Prediction
4.2 Experimental Study 44
fault-prediction techniques. The performance of the techniques was evaluated
using a parameterized cost evaluation framework with specic values of the pa-
rameters. We have conducted all the nineteen experiments for three dierent
values of
u
and
s
taken from Table 4.2. The results are shown in Figure 4.5
to Figure 4.7. Figure 4.5 illustrate the cost characteristics of fault-prediction
techniques, when we consider maximum fault identication eciency of unit
and system testing i.e.
u
= 0.5 and
s
= 0.65, respectively. Here, our re-
sults suggest that fault prediction could be used when the software has less
than 21% of faulty module. Similarly, Figure 4.6 and Figure 4.7 illustrate the
cost characteristics when we consider median and lowest fault identication
eciencies. Here, the threshold is 31 and 42%, respectively. Therefore, our
nding suggest that, fault prediction can be useful for the projects with per-
centage of faulty module lesser than certain threshold (in our case, it vary
from 21 to 42 %).
To answer to select a technique for a given project, we found that, when we
have lesser faulty modules, Random Forest performed better over most of the
datasets. For instance, in category one, the value of NEcost for Random Forest
is lesser than the other techniques. But if the percentage of faulty modules is
high then Neural Network and J48 performed better than the other techniques.
Neural Network also performed better for large size projects.
Figure 4.5: Cost characteristics of used fault-prediction techniques when
u
=
0.5 and
s
= 0.65
The Economics of Fault Prediction
4.2 Experimental Study 45
Figure 4.6: Cost characteristics of used fault-prediction techniques when
u
=
0.25 and
s
= 0.5
4.2.5 Threats to validity
Several issues aect the results of the experiment and limit our interpretations
and generalizations of the results. Here, we highlight the factors aecting the
validity of the proposed cost evaluation framework. The validity considerations
can be grouped in the following categories:
Construct validity: The construct validity questions for the necessary condi-
tions for the experiment to be successful. It includes questions like - are we
actually measuring what we intend to measure. We are interested to measure
the eectiveness of a fault-prediction technique based on economics. The ef-
fectiveness of a technique is measured as Ecost, which is the estimated fault
removal cost. The framework is developed considering the costs to be incurred
to rectify faults in the later phases of software development, if not predicted
before testing. Though the proposed cost framework is founded based on well
understood cost drivers, one can argue that the same is to be validated empir-
ically to establish its use in practice. Hence, our results should be accordingly
interpreted. The fault removal cost of an particular phase is assumed to be
same for all modules i.e. nding a fault in a 100 LOC module is the same as
nding a fault in a 1000 LOC module, which may have to be calibrated for a
more correct estimate of the cost.
The Economics of Fault Prediction
4.2 Experimental Study 46
Figure 4.7: Cost characteristics of used fault-prediction techniques when
u
=
0.15 and
s
= 0.25
Internal validity: This questions for the confounding factors present in the
experiment and tries to be sure that the data really follows from the experi-
mental concepts and not due to unknown or un-controlled factors. The cost
parameters (values of C
u
, C
s
and C
f
) are taken from Wagner [52]. The fault
identication eciencies (values of
u
and delta
s
) are taken from Jones [26]
and the value of M
p
is taken from the study of wilde [45]. However, all these
values may vary with organizational benchmarks. We have used WEKA [53]
tool to construct confusion matrix.
Conclusion validity: These considerations check for appropriate data collec-
tion and analysis. As we gather confusion matrix for each project and fault-
prediction technique from WEKA tool. Here, we used standard statistical
data analysis which includes graphical method. Our results suggest that used
ve fault-predicting techniques give dierent results on dierent projects as
well as on faulty behavior of modules and this need to be understood before
selecting a technique for a new project.
External validity: This investigates the potential threats when we try to gen-
eralize the causal relationship obtained beyond that studied/observed. This
is the most crucial aspect of an experimental study, and it requires great care
and restraint to address the related threats. We have used ve fault-prediction
The Economics of Fault Prediction
4.2 Experimental Study 47
techniques on nineteen public datasets in our study, so the results should be
interpreted within the context of these techniques and datasets. In the pro-
posed cost evaluation framework, the values of C
u
, C
s
, C
f
, M
p
,
u
and
s
are
specic to organization as well as software domain. Therefore, the parameters
and testing phases used in the framework need to modify with organizations
for eective evaluation.
4.2.6 Discussion
Fault-prediction techniques can be used to minimize the fault removal cost as
well as to assure the quality of the software. Many authors presented fault-
prediction techniques, but the selection of the best technique is critical due to
the limited applicability of comparison criteria. They used various evaluation
measures to compare the capability of fault-prediction techniques but they
seem to ignore its impact over the economics of software development. In this
chapter, we proposed a cost evaluation framework, which uses organization
specic strategy to show the cost implication of fault prediction. It inuences
the decision regarding usefulness of fault prediction.
In the proposed framework, we have used the value of cost parameters from the
study of Wagner [52]. The values of
u
and s from the study of Jones [26]. The
value of M
p
(fraction of module unit tested) from the study of Wilde [45]. We
used these values due to the unavailability of the organizational benchmark.
These values may not be realistic but our main contribution is to provide
a cost evaluation measure that would access the cost eectiveness of fault-
prediction techniques when it is used in development process. The Changes in
the framework parameters can only make the change in the resultant threshold
values.
Some of the existing evaluation metrics are correlated with our framework.
When false negative rate increases, it increases the testing cost of the project
because undetected fault will require extra eort at a later stage. The evalua-
tion measure Recall provides the actual information about the faulty modules
that are detected. If value of Recall increases, it will reduce the overall fault
removal cost because of the fewer remaining undetected faults. Recall and false
negative rate directly aect the quality of the software because they show un-
The Economics of Fault Prediction
4.2 Experimental Study 48
detected faulty modules. An increase in false positive rate increases the overall
module level testing cost.
Jiang et al. [25] used cost curve to show the cost eectiveness of fault-
prediction studies, but they assume the same misclassication costs for each
module, which might be unreasonable in practice. However, our framework
uses more practical technique which accounts the company specic cost pa-
rameters to calculate the fault removal cost when we use fault prediction, in
a better way. Mende et al. [41] introduced a new performance namely popt,
that account module size to evaluate the performance of a fault-prediction
technique, but in our framework the fault removal cost of an particular phase
is same for all modules. Hall et al. [22] showed that simple methods like IBK
performs better over most of the datasets but our result are based on the value
of our framework (Ecost). Jiang et al. [24] performed 11 experiments with
dierent cost for false positives and false negatives on 13 datasets to analyze
the benets of fault techniques which incorporate misclassication costs in the
software development process. But they consider every false negative to be of
same cost, whereas it depends on the testing phase in which the corresponding
fault gets detected. We overcome this limitation by computing the estimated
fault removal costs based on their place of identication.
One can note that for the results of our investigations to be applicable in
practice, the proposed cost framework requires to be empirically validated
for its use in practice. We argue that the proposed cost framework is based
on well founded theories of software fault removal activities, and accordingly
it includes cost drivers from this theory. The empirical validation of such
framework, though highly desirable, however would span over dierent phases
of software development from coding to testing and then maintenance upto
some considerable amount of time. Additionally, it will require identied fault
to be mapped to its potential place of getting revealed. This validation may
get further evolved as the software keep on changing over time.
Our study suggested that the selection of a fault-prediction technique cannot
be made without considering its impact on the economics of software. How-
ever, for business critical applications, where ignoring faults can be crucial, we
suggest not using fault prediction if it has high false negatives. Otherwise, it
may result in a poor quality outcome.
The Economics of Fault Prediction
4.3 Summary 49
4.3 Summary
Software quality assurance focuses on streamlining the eorts applied to de-
velop quality software. To ensure the desired quality, we used various testing
techniques, reviews and models etc. Fault-prediction techniques have the po-
tential to predict fault prone modules in early development stages, so as to
help streamlining the eorts made for software quality assurance. However,
due to the lack of proper software measures and metrics, the results of any
fault-prediction technique have to be properly interpreted for their successful
applications. In this chapter, we have proposed a cost evaluation framework to
check if the fault prediction results can be useful. We applied this framework
on the results of ve fault-prediction techniques on nineteen public datasets.
Here, our nding suggested that, the fault prediction can be useful for the
projects with percentage of faulty module less than certain threshold (in our
case, it varied from 21 to 42% with the variation in testing phases eciency)
and there was no single technique, that could provide the best results in all
cases. The technique presented in this chapter can easily be adapted to dif-
ferent settings by incorporating organizational benchmarks in the proposed
evaluation framework.
The Economics of Fault Prediction
Chapter 5
An Application of Cost
Evaluation Framework for
Multiple Releases
In the previous chapter, we proposed a cost evaluation framework to access
the eectiveness of fault-prediction technique based on the fault removal cost.
In this chapter, we show the application of the proposed framework over the
multiple subsequent releases of software. We evaluated fault removal cost
(Ecost) of the current version of software using the fault information available
from its previous versions. This estimated cost helps to decide, whether fault
prediction is useful or not for the current version.
The remainder of this chapter is organized as follows. In Section 5.1, we present
the procedure to calculate the fault removal cost in the current version of the
software. Section 5.2 presents an experimental setup, and summarization is
given in Section 5.3.
5.1 The Procedure
A module currently under development is more likely to be fault-prone if it
has the same or similar properties, compared to a faulty module that has
already been developed or released earlier in the same environment [31][10].
Therefore, historical information about the software in the same domain can
help to predict the fault-prone modules in the current version of the software
5.1 The Procedure 51
[40].
We used above observation to access the eectiveness of fault prediction on the
unseen dataset(current version) using the available information from previous
version of the software. To evaluate the estimated value of Ecost, we present a
well-structured method to be followed as shown in decision chart (Figure 5.1).
Figure 5.1 shows our approach to estimate the value of Ecost for the current
version of the software. It would help in deciding whether fault prediction
could be useful or not for the current version, if yes then which technique is
most eective. The modied form of evaluation framework is given in equation
5.1.
Ecost = C
i
+C
u
(EFP + ETP) +
s
C
s
(EFN + (1
u
) ETP)
+(1
s
) C
f
(EFN + (1
u
) ETP) (5.1)
Where, Ecost - Estimated fault removal cost of the software when we use fault
prediction.
C
i
- Initial setup cost of used fault-prediction technique.
C
u
- Normalized fault removal cost in unit testing.
C
s
- Normalized fault removal cost in system testing.
C
f
- Normalized fault removal cost in eld testing.
M
p
- Percentage of modules unit tested.
EFP - Estimated number of false positives
EFN - Estimated number of false negatives
ETP - Estimated number of true positives

u
- Fault identication eciency of unit testing.

s
- Fault identication eciency of system testing.
In our experiment, again we used values of C
u
, C
s
and C
f
as summarized in
Table 4.1. The values of
u
and
s
are taken from Table 4.2. The value of
M
p
is 0.5, taken from the study of Wilde [45]. But organizations can use their
specic values of C
u
, C
s
, C
f
, M
p
,
u
and
s
.
The Economics of Fault Prediction
5.1 The Procedure 52
Start
Select a fault prediction approachs
Test curent version
Use previous version to train
prediction model
Calculate PPV & NPV
Calculate confusion matrix for
prevoius versions
Train& Test (3 fold) previous
versions
Calculate E-FP, E-FN, E-TP,
E-TN using PPV & NPV
Calculate Ecost for current verion
Ecost > unit
test cost
Use fault prediction
NO
YES
Fault prediction is
not useful
End
Figure 5.1: Decision chart representation to evaluate the estimated Ecost
The Economics of Fault Prediction
5.2 Experimental Study 53
5.2 Experimental Study
In this chapter, we presented a application of proposed cost evaluation frame-
work to measure the cost eectiveness of a fault-prediction technique when it
used on unseen dataset (current version). To illustrate the method, we rstly
used ve fault-prediction techniques on ant 1.6 and ant 1.7 to evaluate Ecost.
The same procedure is, than applied on multiple version of Jedit software
system namely Jedit 4.0, Jedit 4.1, Jedit 4.2 and Jedit 4.3 to further inves-
tigate the usefulness of fault-prediction techniques using the cost evaluation
framework. All experiments are performed on WEKA data mining tool.
5.2.1 Experimental setup
We have used PROMISE [2] datasets, listed in Table 5.1, to access the ef-
fectiveness of fault-prediction technique over the fault removal cost using our
framework (Ecost). The metrics in these datasets describe projects, which
vary in size as well as in complexity.
Table 5.1: Used projects from PROMISE data repository [2]
Project Faulty (%) Number of Modules
ant1.6 26.21 351
ant1.7 27.79 493
Jedit4.0 24.5 306
Jedit4.1 25.32 312
Jedit 4.2 13.07 367
Jedit 4.3 2.23 492
To illustrate eectiveness of cost framework on unseen data, we have again
used the same ve fault-prediction techniques used in chapter 4. Again to
remind reader, our goal is to demonstrate the application of cost evaluation
framework and suggest when to use fault prediction, rather than identifying
the best fault-prediction technique. For this reason, the choice of fault-
prediction technique is orthogonal with respect to the intended contribution.
All performance measurements are generated by threefold cross-validation of
The Economics of Fault Prediction
5.2 Experimental Study 54
classication.
5.2.2 Experiment execution
To present the usefulness of framework on unseen data, we perform an empir-
ical study by using the projects listed in Table 5.1, collected from PROMISE
data repository. All reported experiments utilized technique implementations
from a well-known software package WEKA (in Tables 5.2 to 5.10). We as-
signed the values of initial setup cost, C
i
= 0 (because we didnt incur any
cost in obtaining the datasets), normalized unit testing cost, C
u
= 2.5, nor-
malized system testing cost, C
s
= 6.2, normalized on eld testing cost, C
f
=
27 (median values from Table 4.1),
u
= 0.25 and
s
= 0.5 (median values from
Table 4.2). The value of M
p
is taken as 0.5 [45].
To obtain the results from our experiment, we performed the experiments as
shown in Figure 5.1. In these experiments, we computed the values of posi-
tive predictive value (PPV), negative predictive value (NPV), estimated false
positive (EFP), estimated false negative (EFN) and estimated true positive
(ETP) corresponding to each version of the software dataset. We then used
these values with the stated values of testing costs (C
u
, C
s
, C
f
, M
p
,
u
and

s
) to calculate the value of cost. This led us to compare the estimated fault
removal cost (Ecost) values for selecting the best technique for the current
version of the software. The results are briey discussed in the next section.
5.2.3 Results
The results obtained through the experiments are tabulated from Table 5.2 to
5.10. Each table corresponds to a particular version of a software dataset and
shows the values of PPV, NPV, EFP, EFN, ETP, ETN and cost for each of
the fault-prediction techniques, namely Random Forest, j48, Neural Network,
K-means and IBK. Based on the results we decided whether fault-prediction
techniques are to be used and in case of armation, we selected the technique
with minimum value of cost.
For coming up with a decision, we compared the value of Ecost (NEcost) with
the unit testing cost of all the software modules. If the value of NEcost is less
The Economics of Fault Prediction
5.2 Experimental Study 55
than 1, then we recommend the use of fault-prediction technique because their
use reduces the overall fault removal cost of the project. But if the value of
NEcost is greater than 1, then we suggest not to go for fault prediction because
it causes extra fault removal eorts and result in higher cost to be paid.
To illustrate the method shown in Figure 5.1, we rstly used ant 1.6 and ant
1.7 datasets. At rst, we performed threefold cross-validation on the previous
version of the software i.e. ant 1.6, corresponding to each of the fault-prediction
technique. Then, we calculated the confusion matrix, positive predicative
value (PPV) and negative predicative value (NPV) for ant 1.6. The results
are shown in Table 5.2. After that, we trained the fault prediction model with
the previous version (ant 1.6) and tested it onto the current version (ant 1.7).
Therefore, the modules of ant 1.7 would be falling in one of the two categories
either faulty or non-faulty. This prediction result is tabulated in Table 5.3.
Next, we used the values of PPV and NPV to evaluate the estimated values
of false positives (EFP), false negatives (EFN), true positives (ETP) and true
negatives (ETN). We, then, evaluated the value of Ecost by substituting these
values in our cost evaluation framework, for various prediction techniques.
The values of C
i
, C
u
, C
s
, C
f
, M
p
,
u
and
s
are taken, as discussed in section
4.3.2. The calculated results are then tabulated in Table 5.4. To select the
fault-prediction technique, we compared the value of NEcost with 1. As the
answer of Q1, for the specic case of ant 1.7, we found that the value of NEcost
is lesser than the unit testing cost. So, it is suggested to use fault prediction
in this case.
Table 5.2: Prediction results for Ant 1.6
Technique FP FN TP PPV NPV
Random Forest 21 47 45 0.682 0.835
j48 31 41 51 0.622 0.848
Neural Network 23 43 49 0.681 0.846
K-means 81 39 53 0.396 0.819
IBK 45 39 53 0.541 0.846
The Economics of Fault Prediction
5.2 Experimental Study 56
Table 5.3: Prediction results for Ant 1.7 when fault prediction model trained
using Ant 1.6
Technique Number of mod-
ules predicted as
non-faulty
Number of mod-
ules predicted as
faulty
Random Forest 415 78
j48 435 58
Neural Network 370 123
K-means 447 46
IBK 390 103
Table 5.4: Results of experiment to calculate the Ecost for Ant 1.7 using
information of Ant 1.6
Technique EFP EFN ETP ETN NEcost
RF 25 68 53 347 0.85
j48 22 66 36 369 0.73
NN 39 57 84 313 0.99
K-means 28 81 18 366 0.73
IBK 47 60 56 330 0.84
The same procedure is applied on multiple version of Jedit software system
namely Jedit 4.0, Jedit 4.1, Jedit 4.2 and Jedit 4.3 for further investigation.
The corresponding results for versions of Jedit are tabulated in Table 5.5 to
Table 5.10 and presented in Figure 5.2. After analyzing the results it has been
found that fault-prediction techniques could not be used for the system Jedit
over its multiple releases because the value of NEcost is found higher than 1,
when
u
= 0.25 and
s
= 0.5. The reason behind high value of NEcost is,
the dierence between inter-version fault rate is high in the versions of Jedit
software. Here, the dierence between inter-version fault rate depicts, the
dierence between percentage faulty modules present in successive versions.
The results of this case study show how we can use the fault information of
previous versions of the software to decide the applicability of fault prediction
for the current version. The technique presented in this section can easily be
adapted to dierent settings by incorporating organizational benchmarks in
The Economics of Fault Prediction
5.2 Experimental Study 57
the evaluation framework.
Table 5.5: Prediction results for Jedit4.0 (3 cross-validation)
Technique FP FN TP PPV NPV
Random Forest 17 44 31 0.646 0.829
j48 32 35 40 0.556 0.85
Neural Network 29 36 39 0.574 0.849
K-means 92 26 49 0.348 0.842
IBK 32 33 42 0.568 0.858
Table 5.6: Results of experiment to calculate the Ecost for Jedit4.1 using
information of Jedit4.0
Technique EFP EFN ETP ETN NEcost
RF 22 43 39 208 0.99
j48 31 36 38 207 0.91
NN 23 39 31 219 0.85
K-means 71 32 38 171 0.93
IBK 29 35 39 209 0.89
Table 5.7: Prediction results for Jedit4.0 and Jedit4.1 (3 cross-validation)
Technique FP FN TP PPV NPV
Random Forest 38 76 78 0.672 0.849
j48 48 71 83 0.634 0.854
Neural Network 67 77 77 0.535 0.838
K-means 198 59 95 0.324 0.818
IBK 62 72 82 0.569 0.848
The Economics of Fault Prediction
5.2 Experimental Study 58
Table 5.8: Results of experiment to calculate the Ecost for Jedit4.2 using
information of Jedit4.0 and 4.1
Technique EFP EFN ETP ETN NEcost
RF 22 45 46 254 1.41
j48 31 41 54 241 1.48
NN 31 49 36 251 1.35
K-means 80 45 39 203 1.44
IBK 38 42 50 237 1.47
Table 5.9: Prediction results for Jedit4.0, Jedit4.1 and Jedit4.2. (3 cross-
validation)
Technique FP FN TP PPV NPV
Random Forest 62 107 95 0.605 0.871
j48 56 110 92 0.622 0.869
Neural Network 88 98 104 0.542 0.876
K-means 370 55 147 0.284 0.882
IBK 102 95 107 0.512 0.878
Table 5.10: Results of experiment to calculate the Ecost for Jedit4.3 using
information of Jedit4.0, 4.1 and 4.2
Technique EFP EFN ETP ETN NEcost
RF 23 56 36 377 2.02
j48 33 53 53 353 2.35
NN 41 50 49 352 2.20
K-means 190 27 76 199 2.72
IBK 40 50 42 360 2.08
The Economics of Fault Prediction
5.2 Experimental Study 59
Figure 5.2: Value of Ecost for Jedit versions when
u
= 0.25 and
s
= 0.5
This technique (Figure 5.1) can also be used in industries to calculate the
estimated testing cost incurred on current project with the help of available
information from previous similar projects.
5.2.4 Threats to validity
In this section, we have identied and analyzed the confounding factors for
their possible eects on the results of the current experiment, similar to the
previous section. We highlight the factors aecting the viability of our pro-
posed cost evaluation framework so that it can be used to measure the per-
formance of fault-prediction techniques when we are using multiple versions
of the software. The validity considerations can be grouped in the following
categories:
Construct validity: We are interested to measure the eectiveness of a fault-
prediction technique for the current version with no fault information. The
eectiveness of a technique is measured as Ecost (NEcost, when consider nor-
malized value), which is the estimated fault removal cost. As discussed in sub-
section 4.3.5, the framework is developed considering the costs to be incurred
to rectify faults in the later phases of software development, if not predicted
before testing. Though the proposed cost framework is founded based on well
understood cost drivers, one can argue that the same is to be validated empir-
ically to establish its use in practice. Hence, our results should be accordingly
interpreted.
The Economics of Fault Prediction
5.3 Summary 60
Internal validity: The cost parameters (values of C
u
, C
s
and C
f
) are taken
from Wagner [52]. The fault identication eciencies (values of
u
and delta
s
)
are taken from Jones [26] and the value of M
p
is taken from the study of wilde
[45]. However, all these values may vary with organizational benchmarks. We
have used WEKA [53] tool to construct confusion matrix.
Conclusion validity: We have used the statistics of previous versions to calcu-
late the estimated false positive, false negative and true positive. The value
of estimated false positive, false negative and true positive may be dier from
the actual value. Here, we compared the values of Ecost with the unit testing
cost to decide whether fault prediction is useful. Our results are specic to
the versions of datasets included in the study.
External validity: In the presented cost evaluation framework, the values of
C
u
, C
s
, C
f
, M
p
,
u
and
s
are specic to organization as well as software
domain. Therefore the parameters and testing phases used in the framework
need to modify with organizations for eective evaluation. Our results suggest
that used ve fault-predicting techniques are not founded economic on used
versions of Jedit dataset, but it may vary with the change in the used project
characteristics.
5.3 Summary
Fault prediction has the potential to streamline the eort applied to remove
faults, which are injected in dierent development phases. In this chapter, we
presented the application of our cost evaluation framework, where we used the
framework to estimate the fault removal cost, when we use fault-prediction
techniques in the current version (unseen data) of the software. Specically,
our result suggested that the fault prediction useful when the dierence be-
tween inter-version fault rate was below a certain threshold (in our case, it was
2 %). Our aim is to provide the bench mark to estimate the fault removal cost
for newer version, when we train fault-prediction techniques with historical
information. In future this work could be more generalized to globally access
the eectiveness of fault-prediction techniques.
The Economics of Fault Prediction
Chapter 6
Conclusions and Future Work
The aim of Fault prediction is to nd the highest possible number of defects
with the least possible resources in order to streamline the eorts to be applied
for the later phases of software development. Fault-prediction techniques have
the potential to predict fault prone modules in early development stages, so as
to help streamlining the eorts made for economic software development. The
lack of suitable performance evaluation measures makes the interpretation of
fault prediction result a complex task. In this thesis, we have proposed a cost
evaluation framework that would help to put the results of fault prediction in
proper usability context.
The presented cost evaluation framework uses the organizational specic statis-
tics to eciently estimate the fault removal cost when we are using fault pre-
diction techniques. We have also included the fault identication eciencies
of various testing phases and the cost drivers to make more practical decision
regarding the usefulness of a fault prediction technique for a given project. It
enables the presented framework more viable performance measure than the
other existing cost-eective measures.
We performed a two part investigation for the usefulness of fault-prediction
techniques based on the proposed framework. In the rst part of the investiga-
tion study(chapter 4), we have evaluated the economic of ve fault-prediction
techniques on nineteen public datasets. Here, our result suggested that, fault
prediction can be useful for the projects with percentage of faulty module less
than certain threshold (in our case, it varied from 21% to 42% over the speci-
ed range of testing phases eciency). Also, there was no single technique that
could provide the best results in all cases. Specically, we found that Random
62
Forest technique performed better, when software had relatively lesser faulty
modules. In the other part of the investigation study (chapter 5), we have
applied the proposed framework on unseen dataset, in which fault information
of the previous versions of the software is used to predict the fault proneness
of the newer version. Here, we found the fault prediction can be useful when
the dierence between inter-version fault rate was below a certain threshold
(in our case, it was 2 %). Also, the usability of fault prediction got reduced
with the increase of inter-version fault rate.
One can note that for the results of our investigations to be applicable in
practice, the proposed cost framework requires to be empirically validated
for its use in practice. We argue that the proposed cost framework is based
on well founded theories of software fault removal activities, and accordingly
it includes cost drivers from this theory. The empirical validation of such
framework, though highly desirable, however would span over dierent phases
of software development from coding to testing and then maintenance upto
some considerable amount of time. Additionally, it will require identied fault
to be mapped to its potential place of getting revealed. This validation may
get further evolved as the software keep on changing over time.
The proposed cost evaluation framework can easily be adapted to other set-
tings by incorporating organizational benchmarks for dierent cost drivers.
As a future work, validation studies to validate our proposed framework on
industrial settings would be highly desirable. We also intend to include the
eect of fault severity and fault density in the cost evaluation framework. This
will further strengthen the framework and enhance its applicability in real set-
tings. We would also like to conduct more studies to strengthen or update the
arguments made in this thesis.
The Economics of Fault Prediction
References
[1] Nasa data repository.
[2] Promise data repository.
[3] Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen. A system-
atic and comprehensive investigation of methods to build and evaluate
fault prediction models. J. Syst. Softw., 83(1):217, January 2010.
[4] D. Azar, D. Precup, S. Bouktif, B. Kegl, and H. Sahraoui. Combining
and adapting software quality predictive models by genetic algorithms.
In Automated Software Engineering, 2002. Proceedings. ASE 2002. 17th
IEEE International Conference on, pages 285 288, 2002.
[5] V.R. Basili, L.C. Briand, and W.L. Melo. A validation of object-oriented
design metrics as quality indicators. Software Engineering, IEEE Trans-
actions on, 22(10):751 761, oct 1996.
[6] R.M. Bell, E.J. Weyuker, and T.J. Ostrand. Assessing the impact of using
fault prediction in industry. In Software Testing, Verication and Valida-
tion Workshops (ICSTW), 2011 IEEE Fourth International Conference
on, pages 561 565, march 2011.
[7] Abraham Bernstein, Jayalath Ekanayake, and Martin Pinzger. Improv-
ing defect prediction using temporal features and non linear models. In
Ninth international workshop on Principles of software evolution: in con-
junction with the 6th ESEC/FSE joint meeting, IWPSE 07, pages 1118,
New York, NY, USA, 2007. ACM.
[8] S. Bibi, G. Tsoumakas, I. Stamelos, and I. Vlahvas. Software defect
prediction using regression via classication. In Computer Systems and
63
References 64
Applications, 2006. IEEE International Conference on., pages 330 336,
8, 2006.
[9] Partha Sarathi Bishnu and Vandana Bhattacherjee. Software fault pre-
diction using quad tree-based k-means clustering algorithm. IEEE Trans-
actions on Knowledge and Data Engineering, 24:11461150, 2012.
[10] C. Catal, U. Sevim, and B. Diri. Clustering and metrics thresholds based
software fault prediction of unlabeled program modules. In Information
Technology: New Generations, 2009. ITNG 09. Sixth International Con-
ference on, pages 199 204, april 2009.
[11] Cagatay Catal. Review: Software fault prediction: A literature review
and current trends. Expert Syst. Appl., 38(4):46264636, April 2011.
[12] Venkata U.B. Challagulla, Farokh B. Bastani, and I-Ling Yen. A unied
framework for defect data analysis using the mbr technique. In Tools
with Articial Intelligence, 2006. ICTAI 06. 18th IEEE International
Conference on, pages 39 46, nov. 2006.
[13] V.U.B. Challagulla, F.B. Bastani, I-Ling Yen, and R.A. Paul. Em-
pirical assessment of machine learning based software defect prediction
techniques. In Object-Oriented Real-Time Dependable Systems, 2005.
WORDS 2005. 10th IEEE International Workshop on, pages 263 270,
feb. 2005.
[14] Jesse Davis and Mark Goadrich. The relationship between precision-recall
and roc curves. In Proceedings of the 23rd international conference on
Machine learning, ICML 06, pages 233240, New York, NY, USA, 2006.
ACM.
[15] Drummond, Chris, Holte, and Robert. Cost curves: An improved method
for visualizing classier performance. Machine Learning, 65(1):95130,
October 2006.
[16] Norman Fenton and Martin Neil. Software metrics and risk, 1999.
[17] Swapna S. Gokhale and Michael R. Lyu. Regression tree modeling for
the prediction of software quality. In In Proc. of ISSAT97, pages 3136,
1997.
The Economics of Fault Prediction
References 65
[18] Iker Gondra. Applying machine learning to software fault-proneness pre-
diction. J. Syst. Softw., 81(2):186195, February 2008.
[19] L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by
the dempster-shafer belief networks. In Automated Software Engineering,
2003. Proceedings. 18th IEEE International Conference on, pages 249
252, oct. 2003.
[20] L. Guo, Y. Ma, B. Cukic, and Harshinder Singh. Robust prediction of
fault-proneness by random forests. In Software Reliability Engineering,
2004. ISSRE 2004. 15th International Symposium on, pages 417 428,
nov. 2004.
[21] T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-
oriented metrics on open source software for fault prediction. Software
Engineering, IEEE Transactions on, 31(10):897 910, oct. 2005.
[22] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. A systematic
review of fault prediction performance in software engineering. Software
Engineering, IEEE Transactions on, PP(99):1, 2011.
[23] Timea Illes-Seifert and Barbara Paech. Exploring the relationship of his-
tory characteristics and defect count: an empirical study. In Proceedings
of the 2008 workshop on Defects in large software systems, DEFECTS
08, pages 1115, New York, NY, USA, 2008. ACM.
[24] Yue Jiang and Bojan Cukic. Misclassication cost-sensitive fault predic-
tion models. In Proceedings of the 5th International Conference on Pre-
dictor Models in Software Engineering, PROMISE 09, pages 20:120:10,
New York, NY, USA, 2009. ACM.
[25] Yue Jiang, Bojan Cukic, and Yan Ma. Techniques for evaluating fault
prediction models. Empirical Software Engineering, 13:561595, 2008.
10.1007/s10664-008-9079-3.
[26] Capers Jones. Software quality in 2010:a survey of the state of the art.
Survey, Founder and Chief Scientist Emeritus, December 2009.
[27] S. Kanmani, V. Rhymend Uthariaraj, V. Sankaranarayanan, and
P. Thambidurai. Object-oriented software fault prediction using neural
networks. Information and Software Technology, 49(5):483 492, 2007.
The Economics of Fault Prediction
References 66
[28] D. Kaur, A. Kaur, S. Gulati, and M. Aggarwal. A clustering algorithm for
software fault prediction. In Computer and Communication Technology
(ICCCT), 2010 International Conference on, pages 603 607, sept. 2010.
[29] Taghi M. Khoshgoftaar and Naeem Seliya. Analogy-based practical clas-
sication rules for software quality estimation. Empirical Softw. Engg.,
8(4):325350, December 2003.
[30] Taghi M. Khoshgoftaar, Xiaojing Yuan, Edward B. Allen, Wendell D.
Jones, and John P. Hudepohl. Uncertain classication of fault-prone
software modules. Empirical Software Engineering, 7:297318, 2002.
10.1023/A:1020511004267.
[31] T.M. Khoshgoftaar, K. Ganesan, E.B. Allen, F.D. Ross, R. Munikoti,
N. Goel, and A. Nandi. Predicting fault-prone modules with case-based
reasoning. In PROCEEDINGS The Eighth International Symposium On
Software Reliability Engineering, pages 27 35, 2-5 1997.
[32] T.M. Khoshgoftaar and N. Seliya. Tree-based software quality estimation
models for fault prediction. In Software Metrics, 2002. Proceedings. Eighth
IEEE Symposium on, pages 203 214, 2002.
[33] Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and An-
dreas Zeller. Predicting faults from cached history. In Proceedings of the
29th international conference on Software Engineering, ICSE 07, pages
489498, Washington, DC, USA, 2007. IEEE Computer Society.
[34] A. G unes Koru and Hongfang Liu. An investigation of the eect of module
size on defect prediction using static measures. In Proceedings of the 2005
workshop on Predictor models in software engineering, PROMISE 05,
pages 15, New York, NY, USA, 2005. ACM.
[35] Miroslav Kubat, Robert C. Holte, and Stan Matwin. Machine learning
for the detection of oil spills in satellite radar images. Machine Learning,
30:195215, 1998. 10.1023/A:1007452223027.
[36] David D. Lewis and William A. Gale. A sequential algorithm for training
text classiers. In Proceedings of the 17th annual international ACM
SIGIR conference on Research and development in information retrieval,
The Economics of Fault Prediction
References 67
SIGIR 94, pages 312, New York, NY, USA, 1994. Springer-Verlag New
York, Inc.
[37] Zhan Li and M. Reformat. A practical method for the software fault-
prediction. In Information Reuse and Integration, 2007. IRI 2007. IEEE
International Conference on, pages 659 666, aug. 2007.
[38] Huihua Lu, Bojan Cukic, and Mark Culp. An iterative semi-supervised
approach to software fault prediction. In Proceedings of the 7th Interna-
tional Conference on Predictive Models in Software Engineering, Promise
11, pages 15:115:10, New York, NY, USA, 2011. ACM.
[39] Yan Ma, Lan Guo, and Bojan Cukic. Statistical framework for the pre-
diction of fault-proneness. In ADVANCES IN MACHINE LEARNING
APPLICATIONS IN SOFTWARE ENGINEERING. Idea Group, 2007.
[40] T. Mende and R. Koschke. Eort-aware defect prediction models. In
Software Maintenance and Reengineering (CSMR), 2010 14th European
Conference on, pages 107 116. IEEE Computer Society, march 2010.
[41] Thilo Mende and Rainer Koschke. Revisiting the evaluation of defect
prediction models. In Proceedings of the 5th International Conference on
Predictor Models in Software Engineering, PROMISE 09, pages 7:17:10,
New York, NY, USA, 2009. ACM.
[42] T. Menzies, J. DiStefano, A. Orrego, and R. Chapman. Assessing pre-
dictors of software defects. Proc. Workshop Predictive Software Models,
2004.
[43] T. Menzies, J. Greenwald, and A. Frank. Data mining static code at-
tributes to learn defect predictors. Software Engineering, IEEE Transac-
tions on, 33(1):2 13, jan. 2007.
[44] Nachiappan Nagappan and Thomas Ball. Static analysis tools as early
indicators of pre-release defect density. In Proceedings of the 27th inter-
national conference on Software engineering, ICSE 05, pages 580586,
New York, NY, USA, 2005. ACM.
[45] Ross Huitt Norman Wilde. Maintenance support for object-oriented pro-
grams. Software Engineering, IEEE Transactions on, 18(12):1038 1044,
dec 1992.
The Economics of Fault Prediction
References 68
[46] T.J. Ostrand, E.J. Weyuker, and R.M. Bell. Predicting the location and
number of faults in large software systems. Software Engineering, IEEE
Transactions on, 31(4):340 355, april 2005.
[47] Naeem Seliya and Taghi Khoshgoftaar. Software quality estimation with
limited fault data: a semi-supervised learning perspective. Software Qual-
ity Journal, 15:327344, 2007. 10.1007/s11219-007-9013-8.
[48] Burak Turhan and Ayse Bener. Analysis of naive bayes assumptions on
software fault data: An empirical study. Data Knowl. Eng., 68(2):278
290, February 2009.
[49] B. Twala. Software faults prediction using multiple classiers. In Com-
puter Research and Development (ICCRD), 2011 3rd International Con-
ference on, volume 4, pages 504 510, march 2011.
[50] Olivier Vandecruys, David Martens, Bart Baesens, Christophe Mues,
Manu De Backer, and Raf Haesen. Mining software repositories for com-
prehensible software fault prediction models. J. Syst. Softw., 81(5):823
839, May 2008.
[51] Youden W. Index for rating diagnostic tests. In Cancer 3, pages 3235,
1950.
[52] Stefan Wagner. A literature survey of the quality economics of defect-
detection techniques. In Proceedings of the 2006 ACM/IEEE international
symposium on Empirical software engineering, ISESE 06, pages 194203,
New York, NY, USA, 2006. ACM.
[53] . H. Witten and E. Frank. Data mining: Practical machine learning tools
and techniques with java implementations.
[54] W.A. Yousef, R.F. Wagner, and M.H. Loew. Comparison of non-
parametric methods for assessing classier performance in terms of roc
parameters. In Information Theory, 2004. ISIT 2004. Proceedings. Inter-
national Symposium on, pages 190 195, oct. 2004.
[55] S. Zhong, T.M. Khoshgoftaar, and N. Seliya. Analyzing software measure-
ment data with clustering techniques. Intelligent Systems, IEEE, 19(2):20
27, mar-apr 2004.
The Economics of Fault Prediction
References 69
[56] Shi Zhong, T.M. Khoshgoftaar, and N. Seliya. Unsupervised learning
for expert-based software quality estimation. In High Assurance Systems
Engineering, 2004. Proceedings. Eighth IEEE International Symposium
on, pages 149 155, march 2004.
The Economics of Fault Prediction
Publications
Deepak Banthia, Atul Gupta,Investigating Fault Prediction Ca-
pabilities of Five Prediction Models for Quality Assurance,In
Proceedings of the 27th Annual ACM Symposium on Applied Computing
(SAC 12). ACM, New York, USA, pp1259-1261.
http://doi.acm.org/10.1145/2245276.2231975
Deepak Banthia, Atul Gupta,Economics of Fault Prediction,In
Proceedings of the IEEE - ICSM 2012. Italy(Submitted).
Deepak Banthia, Atul Gupta,Economics of Fault Prediction,Empirical
Software Engineering Journal, Springer (2012)(Submitted).
References 71
Appendix
Table A1: Details of used metrics
Metrics Name Description
amc average method size for each class
avg cc equal to average number of dierent paths in a method (function) plus one.
b Halstead error
blank loc number of blank lines.
branch count total number of branches in the method
branchCount total Branch
ca number of modules that depend upon the module
call pairs Number of calls to functions in a module
cam measures the extent of intersection of individual method parameter type lists
with the parameter type list of all methods in the class.
cbm number of new/redened methods to which all the inherited methods are coupled
cbo number of classes to which the class is coupled
ce number of modules, the module depends upon
condition count Number of conditionals in a given module
cyclomatic
complexity
number of linearly independent paths through a programs module
cyclomatic
density
cyclomatic density of the code lines within the procedures of class
d Halstead diculty
dam ratios of total number of private attribute to the total number of attribute de-
clare in a class.
decision
count
Number of decision points in a module
decision
density
Conditioncount / Decisioncount
design
complexity
The design complexity of a modul
design
density
Design density is calculated as =iv (g) / v(g)
dit maximum length from the class node to the root of the tre
e Halstead eort
edgecount total number of possible paths
ev(g) The essential complexity of a module
executable loc The number of lines of executable code for a module
formal
parameters
number of formal parameters used in the class
Getters number of getter methods
i Halstead intelligence
ic number of parent classes to which a given class is coupled
InDegrees number of message received from other modules
iv(g) McCabe design complexity
l Halstead program length
lcom number of lines of code of class.
lcom3 number of connected components in a graph of methods.
lOBlank Halsteads count of blank lines
loc method count number of lines of code of class.
loc McCabes line count of code
loc code
andcomment
The number of lines which contain both code and comment in a module
loc total The total number of lines for a given module
lOCode Halsteads line count
lOCodeAnd
Comment
Halsteads count of lines of code and comments
lOComment The number of lines of executable code for a module
maintenance
severity
Maintenance Severity is calculated as: ev(g) / v(g)
max cc equal to maximum number of dierent paths in a method (function) plus one.
mfa ratio of the number of methods inherited by a class to the total number of
methods accessible by the member methods of the class
moa number of data types declared
modied
condition
count
number of modied conditional statements The eect of a condition aect a
decision outcome by varying that condition only
The Economics of Fault Prediction
References 72
multiple
condition
count
Number of multiple conditions within a module
n Halstead total operators + operands
noc number of immediate descendants of the class in the class hierarchy
node count number of child nodes
NoM number of methods in a class
normalized
cyclomatic
complexity
v(g) / number of lines
npm the number of public methods denes in a class
OutDegrees number of message send to other modules
parameter
count
Number of parameters declared in class
percent
comments
Percentage of the code that is comments
rfc sum of number of methods called within the class method bodies and the number
of class methods
Setters number of setter methods
t Halsteads time estimator
total Op total operators
total Opnd total operands
uniq Op unique operators
uniq Opnd unique operands
v Halstead volume
v(g) McCabe cyclomatic complexity
wmc sum of complexity of the methods
Table A2: Metrics used in datasets
Dataset Name Metrics used
Jedit4.3 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
pc1 loc, v(g), ev(g), iv(g), n, v, l, d, I, e, b, t, locode, loccomment, loblank, locode-
AndComment, uniq op, uniq opnd, total op, total opnd and branchcount
ar1 total loc, blank loc, comment loc, code and comment loc, executable loc,
uniq op, uniq opnd, total op, total opnd, halstead vocabulary, n, v, l,
d, e, b, t, branch count, decision count, call pairs, condition count,
multiple condition count, cyclomatic complexity, cyclomatic density,
design complexity, design density, normalized cyclomatic complexity
nw1 blank loc, branch count, call pairs, comment loc, code and comment loc,
cyclomatic complexity, cyclomatic density, design complexity, design density,
essential complexity, essential density, executable loc, parameter count, n, v, l,
d, e, b, t, maintenance severity, modied condition count,
multiple condition count, node count, normalized cyclomatic complexity,
total op, total opnd, number unique operators, number unique operands,
number of lines, percent comment and loc total, edgecount
kc3 blank loc, branch count, call pairs, comment loc, code and comment loc,
cyclomatic complexity, cyclomatic density, design complexity, design density,
essential complexity, essential density, executable loc, parameter count, n, v, l,
d, e, b, t, maintenance severity, modied condition count,
multiple condition count, node count, normalized cyclomatic complexity,
total op, total opnd, number unique operators, number unique operands,
number of lines, percent comment and loc total, edgecount
cm1 loc, v(g), ev(g), iv(g), n, v, l, d, I, e, b, t, locode, loccomment, loblank, locode-
AndComment, uniq op, uniq opnd, total op, total opnd and branchcount
pc3 blank loc, branch count, call pairs, comment loc, code and comment loc,
cyclomatic complexity, cyclomatic density, design complexity, design density,
essential complexity, essential density, executable loc, parameter count, n, v, l,
d, e, b, t, maintenance severity, modied condition count,
multiple condition count, node count, normalized cyclomatic complexity,
total op, total opnd, number unique operators, number unique operands,
number of lines, percent comment and loc total, edgecount
Arc wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
The Economics of Fault Prediction
References 73
pc4 blank loc, branch count, call pairs, comment loc, code and comment loc,
cyclomatic complexity, cyclomatic density, design complexity, design density,
essential complexity, essential density, executable loc, parameter count, n, v, l,
d, e, b, t, maintenance severity, modied condition count,
multiple condition count, node count, normalized cyclomatic complexity,
total op, total opnd, number unique operators, number unique operands,
number of lines, percent comment and loc total, edgecount
kc1 loc, v(g), ev(g), iv(g), n, v, l, d, I, e, b, t, locode, loccomment, loblank, locode-
AndComment, uniq op, uniq opnd, total op, total opnd and branchcount
Jedit 4.2 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
ar4 total loc, blank loc, comment loc, code and comment loc, executable loc,
uniq op, uniq opnd, total op, total opnd, halstead vocabulary, n, v, l,
d, e, b, t, branch count, decision count, call pairs, condition count,
multiple condition count,cyclomatic complexity, cyclomatic density,
desing complexity, design density, normalized cyclomatic complexity and
formal parameters
jm1 loc, v(g), ev(g), iv(g), n, v, l, d, I, e, b, t, locode, loccomment, loblank, locode-
AndComment, uniq op, uniq opnd, total op, total opnd and branchcount
kc2 loc, v(g), ev(g), iv(g), n, v, l, d, I, e, b, t, locode, loccomment, loblank, locode-
AndComment, uniq op, uniq opnd, total op, total opnd and branchcount
camel1.6 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
ant1.6 getter, setter, nom, indegree, outdegree, clusteringcoecient, loc
Jedit4.0 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
Jedit4.1 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
ant1.7 getter, setter, nom, indegree, outdegree, clusteringcoecient, loc
mc2 blank loc, branch count, call pairs, comment loc, code and comment loc,
cyclomatic complexity, cyclomatic density, desing complexity, design density,
essential complexity, essential density, executable loc, parameter count, n, v, l,
d, e, b, t, maintenance severity, modied condition count,
multiple condition count, node count, normalized cyclomatic complexity,
total op, total opnd, number unique operators, number unique operands,
number of lines, percent comment and loc total, edgecount
jedit 3.2 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
lucene2.0 wmc, dit, noc, cbo, rfc, lcom, ca, ce, npm, locm3, loc, dam, moa, mfa, cam, ic,
cbm, amc, max cc and avg cc
The Economics of Fault Prediction

Anda mungkin juga menyukai