Daniel Eriksson
Sven Glansberg
Johan Jrts
December 4, 2009
Introduction
In data mining, when evaluating a classifier, there are several different measures
that can be used:
Accuracy
Sensitivity
Specificity
Precision
Recall
each one having a slightly different meaning of how good the classifier is.
Another approach is specifying a cost for misclassifications (and perhaps also
a reward for correct classifications). This is defined as a cost matrix C(i, j)
where i is the actual class and j is the predicted class, for example:
1 100
C(i, j) =
(1)
1
0
Here, if we assume that the classes are + and , that the cost C(1, 1)
is negative means that classifying a positive as a positive (true positive) is
considered a reward. The cost for false negatives C(1, 2) is 100 (severe), a
moderate cost C(2, 1) = 1 for predicting false positives and no cost C(2, 2) for
true negatives. From the cost matrix and the confusion matrix, the classifiers
total cost can be calculated as a measure of performance for the classifier.
daner823@student.liu.se
svegl167@student.liu.se
johjo103@student.liu.se
MetaCost
2.1
1
Create P (j|x) = X
X
1
P (j|x, Mi )
X
P (j|x)C(k, j)
j
3
3.1
Other techniques
Stratification
Another way to introduce a desired bias in the training set to lower the total cost
of the model is to use stratification in a preprocess step to adjust the relative
occurrence of the classes. In tests performed by Domingos [3] the result of this
technique is almost always worse then MetaCost. The result is also dependent
on if the stratification is done with over- or undersampling.
3.2
Another alternative is using decision trees with minimal costs, a technique that
was developed by Ling et al. [4]. The basic idea is to factor in the cost while
building the tree. Misclassification costs rule which class is assigned to a leaf
node in the tree. In short, the tree is built according to splitting criteria that
minimize the total cost, instead of minimizing entropy. In this way, decision trees
with minimal costs and MetaCost are similar, but there is a big difference. With
decision trees with minimal costs, the cost-sensitive part is built in directly in the
classifier. With MetaCost, you can use any classifier algorithm, not only decision
trees, since it only resamples the training data and wraps the classifier with a
cost-sensitive step (hence the meta in the name, it is not really a classifier
algorithm so much as a meta-algorithm). MetaCost is the more flexible model
here and can also take both class predictions or class probabilities as inputs,
depending on what L produces as output.
3
Practical Example
Figure 1: Confusion matrices for MetaCost (left) and C4.5 (right) in example
with heart data set from [1]. Note that meanings of classes + and here are:
+ low risk for disease high risk for disease.
C4.5
C4.5 + MetaCost
Table 1: Weka results for heart data from [1] with cost matrix from equation 2.
Conclusion
In cases where the cost of misclassifying plays a great role, just being aware of
the cost of misclassifications is not enough. The use of cost-sensitive classifiers
takes the data mining procedure much closer to what the application demands.
MetaCost is a flexible model to use in this kind of situation: you can use
any classifier algorithm, MetaCost wraps around it and makes it cost-sensitive.
You do introduce an unwanted bias in the training set that reduces accuracy,
but since low cost is more important in some KDD problems, i.e. when some
misclassifications are costlier than others, this is a bias you prefer in this case.
You could say cost-sensitive classifiers follow the principle: rather be safe than
sorry.
References
[1] Heart data set. http://staffwww.itn.liu.se/~aidvi/courses/06/dm/
labs/heart-c.arff.
[2] Weka. http://www.cs.waikato.ac.nz/ml/weka/.
[3] Pedro Domingos. Metacost: A general method for making classifiers costsensitive. In KDD, pages 155164, 1999.
[4] Charles X. Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. Decision
trees with minimal costs. In ICML 04: Proceedings of the twenty-first international conference on Machine learning, page 69, New York, NY, USA,
2004. ACM.