Anda di halaman 1dari 5

Cost-sensitive Classifiers

Daniel Eriksson
Sven Glansberg
Johan Jrts
December 4, 2009

Introduction

In data mining, when evaluating a classifier, there are several different measures
that can be used:
Accuracy
Sensitivity
Specificity
Precision
Recall
each one having a slightly different meaning of how good the classifier is.
Another approach is specifying a cost for misclassifications (and perhaps also
a reward for correct classifications). This is defined as a cost matrix C(i, j)
where i is the actual class and j is the predicted class, for example:


1 100
C(i, j) =
(1)
1
0
Here, if we assume that the classes are + and , that the cost C(1, 1)
is negative means that classifying a positive as a positive (true positive) is
considered a reward. The cost for false negatives C(1, 2) is 100 (severe), a
moderate cost C(2, 1) = 1 for predicting false positives and no cost C(2, 2) for
true negatives. From the cost matrix and the confusion matrix, the classifiers
total cost can be calculated as a measure of performance for the classifier.
daner823@student.liu.se
svegl167@student.liu.se
johjo103@student.liu.se

An important aspect to keep in mind is that the cost matrix is always


domain-dependent! Depending on the application, the defined costs can be
very different.
This approach to deal with cost just tweaks the performance measures of
our classifier. However, the classifier still produce bad results from a cost perspective. What if the classifier itself where sensitive to cost? Compared to cost
evaluation discussed above, such approach with cost-sensitive classifiers will be
a completely different way to deal with the notion of cost. One way to realize
this may be to let the cost-matrix influence the creation of the classifier model.

MetaCost

MetaCost is a method for creating cost-sensitive classifiers by created by Pedro


Domingos [3]. It is a wrapper algorithm in the sense that any classifier can be
used and the algorithm introduces a bias based on a cost matrix C(i, j) in the
training data.
A good thing about MetaCost is that it is independent of the actual classifying technique that is used, i.e., the classifying technique can be considered
as a black box. It also works with multi class problems and not only when
having binary class attributes.
A negative aspect of MetaCost is that it will take longer to run due to it
have to run the blackbox-algorithm m (the number of resamples) times and
then has some additional optimization step that will also take some computation
time. Though the time to run the algorithm will only increase by a fixed factor,
so the time complexity will be of the same order as the underlying classifier [3].

2.1

The MetaCost Algorithm

The meta cost algorithm as described by Domingos [3]:


S is the training set.
L is the classification learning algorithm.
C is a cost matrix.
m is the number of resamples to generate.
n is the number of examples in each resample.
1. For i in range 1 to m
(a) Create Si as a resample of S with n examples.
(b) Create model Mi by applying L to Si .
2. For each example x in S
(a) For each class j
2

1
Create P (j|x) = X

X
1

P (j|x, Mi )

(b) Change the the class of x to the class k that minimize

X
P (j|x)C(k, j)
j

3. Create final model M by applying L to S.


If L does not produce class probabilites, MetaCost sets P (j|x, Mi ) = 1 for the
class L produces P (j|x, Mi ) = 0 for all other classes. When calculating P (j|x),
we can also choose to not include those Mi where x belong to the corresponding
resample Si . The advantage of this is that the model M produced will have a
lower statistical bias, but if you use all generated models Mi the variance will
be lower. It is a tradeoff between the two.
The idea of the algorithm is to create m resamples Si of the training set
S. There will be a different bias in the distribution of classes in each resample,
creating different models Mi by applying L. Based on this, you create a bias in
the training set L by relabeling each x to the class that gives the lowest predicted
total cost. P (j|x) can be seen as a prediction of the confusion matrix for the
model. So what you do in step 2 b) in the algorithm is essentially relabeling
the class of x to the class that in the final model M gives the lowest total cost.

3
3.1

Other techniques
Stratification

Another way to introduce a desired bias in the training set to lower the total cost
of the model is to use stratification in a preprocess step to adjust the relative
occurrence of the classes. In tests performed by Domingos [3] the result of this
technique is almost always worse then MetaCost. The result is also dependent
on if the stratification is done with over- or undersampling.

3.2

Decision Trees with Minimal Costs

Another alternative is using decision trees with minimal costs, a technique that
was developed by Ling et al. [4]. The basic idea is to factor in the cost while
building the tree. Misclassification costs rule which class is assigned to a leaf
node in the tree. In short, the tree is built according to splitting criteria that
minimize the total cost, instead of minimizing entropy. In this way, decision trees
with minimal costs and MetaCost are similar, but there is a big difference. With
decision trees with minimal costs, the cost-sensitive part is built in directly in the
classifier. With MetaCost, you can use any classifier algorithm, not only decision
trees, since it only resamples the training data and wraps the classifier with a
cost-sensitive step (hence the meta in the name, it is not really a classifier
algorithm so much as a meta-algorithm). MetaCost is the more flexible model
here and can also take both class predictions or class probabilities as inputs,
depending on what L produces as output.
3

Practical Example

An example of an application where cost can be vital is in medicine. If you


are diagnosing whether someone has a disease or not, false positives means that
you waste resources treating a patient that is not ill. Even more severe is false
negatives where someone is ill, but classified as healthy. This case would have
a high cost associated with it. Costs associated with different misclassifications
are clearly not equal.
MetaCost is available in the data mining software Weka [2] We used the
heart data set from [1] and compared the results for the classifier C4.5 (J48 in
Weka) and C4.5 with MetaCost using the cost matrix in equation 2.


0 1
C(i, j) =
(2)
4 0
This gave the confusion matrices in Figure 1 and total cost 187 for C4.5 and
145 for MetaCost. As can be can seen in Table 1, this lower cost for MetaCost
comes with slightly lower percentage of correctly classified instances, due to the
bias introduced.

Figure 1: Confusion matrices for MetaCost (left) and C4.5 (right) in example
with heart data set from [1]. Note that meanings of classes + and here are:
+ low risk for disease high risk for disease.

C4.5
C4.5 + MetaCost

Correctly classified instances


77.9%
72.9%

Incorrectly classified instances


22.1%
27.1%

Table 1: Weka results for heart data from [1] with cost matrix from equation 2.

Conclusion

In cases where the cost of misclassifying plays a great role, just being aware of
the cost of misclassifications is not enough. The use of cost-sensitive classifiers
takes the data mining procedure much closer to what the application demands.
MetaCost is a flexible model to use in this kind of situation: you can use
any classifier algorithm, MetaCost wraps around it and makes it cost-sensitive.
You do introduce an unwanted bias in the training set that reduces accuracy,
but since low cost is more important in some KDD problems, i.e. when some
misclassifications are costlier than others, this is a bias you prefer in this case.
You could say cost-sensitive classifiers follow the principle: rather be safe than
sorry.

References
[1] Heart data set. http://staffwww.itn.liu.se/~aidvi/courses/06/dm/
labs/heart-c.arff.
[2] Weka. http://www.cs.waikato.ac.nz/ml/weka/.
[3] Pedro Domingos. Metacost: A general method for making classifiers costsensitive. In KDD, pages 155164, 1999.
[4] Charles X. Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. Decision
trees with minimal costs. In ICML 04: Proceedings of the twenty-first international conference on Machine learning, page 69, New York, NY, USA,
2004. ACM.

Anda mungkin juga menyukai