Anda di halaman 1dari 8

European Journal of Operational Research 126 (2000) 526533

www.elsevier.com/locate/dsw

Theory and Methodology

Pattern classication with principal component analysis and fuzzy


rule bases
a,*,1
V. Ravi , P.J. Reddy b, H.-J. Zimmermann a

a
Lehrstuhl f
ur Unternehmensforschung, RWTH, Templergraben 64, D-52056 Aachen, Germany
b
Computer Center, Indian Institute of Chemical Technology, Uppal Road, Hyderabad 500 007, Andhra Pradesh, India
Received 8 December 1998; accepted 13 April 1999

Abstract

For the rst time, the principal component analysis has been used to reduce the feature space dimension in fuzzy rule
based pattern classiers. A modied threshold accepting algorithm (MTA) proposed elsewhere by V. Ravi and H.-J.
Zimmermann [European Journal of Operational Research 123 (1) (2000) 1628] has been used to minimize the number
of rules in the classier while guaranteeing high classication power. The proposed methodology has been demon-
strated for (1) the wine classication problem, which has 13 features and (2) the Wisconsin breast cancer determination
problem, which has 9 features. The inuence of the type of aggregator used in the classication algorithm and the
number of partitions used for each of the feature spaces is also studied. In conclusion, the results are encouraging as
there is no reduction in the classication power in both the problems, despite the fact that some of the principal
components have been deleted form the study before invoking the classier. On the contrary, however, the rst ve
principal components in both the problems yielded 100% classication power in some cases. The high classication
power obtained for both the problems while working with reduced feature space dimension is the signicant outcome of
this study. 2000 Elsevier Science B.V. All rights reserved.

Keywords: Fuzzy sets; Data analysis; Feature selection; Principal component analysis; Modied threshold accepting

1. Introduction

Ever since fuzzy set theory was propounded by


Zadeh [13] as a new paradigm, its applications
*
Corresponding author. Tel.: +91-40-715-1310; fax: +91-40- grew dramatically over the past two decades.
717-3387. Knowledge based systems are among the earliest
E-mail address: vravi@iict.ap.nic.in (V. Ravi). applications, where fuzzy ifthen rules play a
1
On deputation from Computer Center, Indian Institute of
Chemical Technology, Uppal Road, Hyderabad 500 007,
paramount role. Most of these systems derive
Andhra Pradesh, India. Fax: +91-40-7173387 or +91-40- these fuzzy ifthen rules from human experts [3].
7173757. In the literature, one comes across several methods

0377-2217/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 9 9 ) 0 0 3 0 7 - 0
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 527

to generate these fuzzy ifthen rules directly from overcome this problem, Ishibuchi et al. [6] have
numerical data. Kosko [6] employed neural net- introduced a method where the fuzzy ifthen rules
works to achieve this goal. Later, Ishibuchi et al. with a small number of antecedent conditions are
[3] proposed a sound methodology to generate generated as candidate rules. However, the au-
such rules from numerical data and then they went thors are of the opinion that it is still not a com-
ahead to apply a genetic algorithm to determine a plete remedy because the method is not a general
compact rule set with a high classication power one and it is not dicult to nd problems where
[4]. Then, a software by name W I N R O S A [12], rules with a small number of antecedent conditions
which automatically generates fuzzy ifthen rules are intractable. This motivated us for the devel-
from numerical data using statistical methods, opment of other alternative methods which con-
became available in the market. However, all the centrate on feature selection or reduction of
aforementioned studies dier from that of [3,4] in feature space dimension by transforming it. Thus
several aspects. it is meaningful to look for any unimportant
Throughout this paper, the partition of a pat- variables (features) and remove them from the
tern space means its granularity. To generate fuzzy classication process. This results in reduced
ifthen rules from numerical data one must (i) nd computational time and memory requirement and
the partition of a pattern space into fuzzy sub- an easy-to-use classier with a manageable num-
spaces and (ii) determine the fuzzy ifthen rule for ber of features. Thus the point we would like to
each fuzzy partition [3,4]. Using those fuzzy if drive home is that the feature selection is an es-
then rules, either the training data or the test data sential component of any classier, specially in
are classied, which is essentially the classication dealing with problems having a large number of
phase. The performance of such a classier de- features. Ravi and Zimmermann [9] addressed this
pends very much on the choice of a fuzzy partition. problem by resorting to the use of a software plug-
If a fuzzy partition is too coarse, many patterns in to DataEngine, viz FeatureSelector [11]. They
may be misclassied. On the other hand, if it is too used it as a pre-processor to select the most salient
ne, many fuzzy ifthen rules cannot be generated features from the original set of features and went
due to the lack of training patterns in the corre- on to derive a compact set of fuzzy ifthen rules
sponding fuzzy subspaces. In their earlier paper, with high classication power.
Ishibuchi et al. [3] have proposed distributed fuzzy In the present paper, however, the authors
rules, by considering the fuzzy rules corresponding propose another way of reducing the feature space
to both coarse and ne partitions of a fuzzy sub- dimension, via the principal component analysis
space. For example, a two-dimensional pattern (PCA). It is a traditional multivariate statistical
space gives rise to 90 22 32 42 52 62 technique frequently used for data compression
fuzzy ifthen rules, assuming that each feature [10]. However, the authors make it abundantly
dimension is divided into 6 partitions at the most. clear that a comparison of the present study with
Thus, they considered all 5 rule tables corre- our earlier one [9] is simply not meaningful be-
sponding to all the partitions simultaneously. Also cause the principal components are only linear
by considering all the fuzzy partitions simulta- combinations of the original feature variables and
neously, the above mentioned diculty in choos- hence, they do not reect on the importance or
ing an appropriate partition is obviated. otherwise of the original variables. The rest of the
The main drawback of this approach, however, paper has been structured as follows. Section 2
is that the number of fuzzy ifthen rules increase gives an overview of the principal component
exponentially for classication problems with high analysis and an algorithm to compute the principal
dimensional pattern spaces [5] such as wine clas- components. Section 3 briey presents the fuzzy
sication problem [3] where 13 feature variables rule based classication method and the formula-
are present. For example, if up to 5 partitions are tion of the multi-objective optimization problem.
used for each of the 13 feature variables, the total Results of the numerical simulations are discussed
number of rules would be 213 313 413 513 . To in Section 4, and Section 5 concludes the paper.
528 V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533

2. Principal component analysis the rst principal component accounts for the
maximum variance and the second principal
2.1. Algorithm to determine principal components component accounts for the second largest vari-
[10] ance and so on.

1. In the data matrix X of order n  j, where


n > j; n is the number of patterns or observa- 2.2. A method to reduce feature space dimension
tions and j is the number of features. Center using PCA
and scale all the features, i.e. for a given col-
umn, its mean is subtracted from all the obser- To reduce the feature space dimension, those
vations in that column (each column principal components, which correspond to
corresponds to a feature) and every time the re- smaller eigenvalues should be deleted. To do this,
sult is divided by the standard deviation of the the guiding rule is to prespecify the percentage of
column. This is essential to make the features total variation we would like to account for using
dimensionless. the principal components. Using all the principal
2. Form the matrix of squares and products of the components as new features in the subsequent
features Z T Z, where Z is the centered and scaled analysis amounts to using the original features
version of the matrix X. themselves, because each principal component is
3. To perform the principal component analysis nothing but a linear combination of the original
on the matrix Z, either the singular value de- features. Thus the information contained in the
composition of Z or the eigen analysis of the original features is recast into a more rened form
correlation matrix of Z is carried out. In the where the principal components are pairwise or-
present paper, the latter approach is followed. thogonal. In view of the foregoing, in this paper, a
The correlation matrix is obtained by dividing threshold limit of 84% (90%) is set up for problem
each entry of Z T Z by n 1. Eigen analysis 1 (problem 2), because it leads to the rst 5 prin-
of the correlation matrix results in as many ei- cipal components as the most important ones.
genvalues and eigenvectors as the number of These ve principal components, which account
features and each eigenvector contains as many for 89% and 92% of the variance respectively in
entries. Each entry is called the principal com- problems 1 and 2, are treated as the new set of
ponent loading. The principal components are features.
calculated as P XE, where X is the original
data matrix of order n  j, E the matrix of or-
der j  j, whose columns are the eigenvectors 3. A fuzzy classication method with fuzzy ifthen
of Z T Z and P is the matrix of order n  j of rules
principal components P1 ; P2 ; . . . ; Pj , where Pi is
an n  1 vector, i 1; 2; . . . ; j. The principal Following the notation of [3,4], let the pattern
components are pairwise orthogonal unlike space be two-dimensional (i.e. there are two
the original features. features in the feature space) and given by the
4. The ratio of each of the eigenvalue to the total unit square 0; 1  0; 1. Suppose that Xp
sum of all the eigenvalues indicates the propor- xp1 ; xp2 ; p 1; 2; . . . ; m; are given as training
tion of variation accounted for by the corre- patterns from M classes (where M  m): Class 1
sponding principal component. Small (C1), Class 2 (C2); . . . ;Class M (CM). The
eigenvalues correspond to those dimensions problem is to generate fuzzy ifthen rules that
which cause multicollinearity. A salient feature divide the pattern space into M disjoint classes.
of the principal component analysis is that the For more details of the extension to the case of
eigenvalues are automatically sorted in the de- higher dimensions, the reader is referred to [4].
scending order and also their corresponding ei- Let each axis of the pattern space be partitioned
genvectors are rearranged accordingly. Then, into K fuzzy subsets fAK1 ; AK2 ; . . . ; AKK g, where AKi
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 529

is the ith fuzzy subset and the superscript K in- 3 partitions. For problems involving M classes
dicates the number of fuzzy subsets on each axis. and 2 features, a fuzzy ifthen rule corresponding
Thus, K denotes the grid size of a fuzzy partition to K 2 fuzzy subspaces has the following struc-
and dierentiates the rules belonging to dierent ture:
rule tables corresponding to 2; 3; . . . ; L partitions,
in the distributed representation of fuzzy rules. A Rule RKij : If xp1 is AKi and xp2 is AKj then Xp
symmetric triangular membership function in the
belongs to Class CijK ;
unit interval 0; 1 is used for AKi ; i 1; . . . ; K
[3,4,9]. Fig. 1 indicates the distributed represen- with CF CFijK ; i 1; 2; . . . ; K and
tation of the fuzzy ifthen rules for 2 features
when each feature is divided into a maximum of j 1; 2; . . . ; K;

where RKij is the label of the fuzzy ifthen rule, CijK


is the consequent (i.e. one of the M classes) and
CFijK is the grade of uncertainty of the fuzzy if
then rule, which is dened in [3,4,9]. CijK and CFijK
of the fuzzy ifthen rule in (1) are determined by
the procedures which can be found in detail in
[3,4]. These procedures (i) generate the fuzzy if
then rules and (ii) classify the new patterns. These
procedures are slightly modied by the authors in
[9]. Let SK be the set of generated K 2 fuzzy if
then rules given by S K fRKij j i 1; 2; . . . ; K;
j 1; 2; . . . ; Kg. Let the set of all fuzzy ifthen
rules corresponding to K 2; 3; . . . ; L partitions
be SALL which is:

SALL S 2 [ S 3 [    [ S L

fRKij j i 1; 2; . . . ; K; j 1; 2; . . . ; K

and K 2; 3; . . . ; Lg; 2

where L is an integer to be specied depending


on the classication problem. Let S be a subset
of SALL . The algorithm for the classier [9] for
computer simulations is coded as follows. Start-
ing from the rules corresponding to 2 partitions
rule table and going up to L partitions rule table,
all the rules no matter they are dummy or not,
are arranged in the form of a vector. Thus S is
nothing but a permutation of this vector and
SALL is set of all possible rules. The idea is to
determine a compact rule set S which has very
high classication power. To start with, S is
initialized randomly using a uniform random
number generator following a biased probability
Fig. 1. Labels and indices of fuzzy ifthen rules: (a) k 2 and method [9]. As the iterations progress, S gets
(b) k 3 [4]. modied.
530 V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533

3.1. The multi-objective combinatorial optimization hospital in Madison, USA. This is also freely
problem available in the Internet via anonymous ftp from
ics.uci.edu in directory /pub/machine-
As in [4,9], the main objective is to nd a learning-databases/wisconsin-breast-
compact rule set S with very high classication cancer. This problem has 9 features or attributes
power by employing a combinatorial optimization which determine whether a patient is benign or
algorithm. The two objectives in the present malign. There are 683 samples or patterns. This
problem are: (i) maximize the number of correctly data has been used in the past by Mangasarian
classied patterns and (ii) minimize the number of et al. [7,8] and Bennet et al. [1]. A software in
fuzzy ifthen rules. Accordingly, we have A N S I . C has been developed by the authors on a
Pentium 100 MHz machine under Windows 95
Maximize NCPS and Minimize jSj platform using the MS-Visual C++ 5.0 compiler to
implement the model.
subject to S  SALL ;
The methodology presented is tested in two
ways: (i) using the training data itself as the test
where NCPS is the number of correctly classied
data (ii) using the leave-one-out technique in the
patterns by S and jSj is the number of fuzzy if
testing phase. The latter method is preferable as
then rules in S. This is further reformulated as a
there is the danger of over-tting in the former
scalar optimization problem below.
method. In each of these methods, all the feature
spaces have been divided into a maximum of 5
Maximize f S WNCP  NCPS WS  jSj
partitions for both the examples. This is done in
subject to S  SALL : 3 order to keep the computational complexity to a
reasonable level, as we work with a new set of 5
Since the classicatory power of the system is features (the principal components) in both the
more important than its compactness [3,4,9], the examples. Further, the study has been conducted
weights have been specied as 0 < WS  WNCP and for 5 cases each corresponding to dierent aggre-
taken as WNCP 10:0 and WS 1:0 following [4]. gator viz (1) product operator (2) min operator (3)
For the details regarding the coding of the rules c-operator (compensatory and) [15] (4) fuzzy and
used in the optimization module, the reader is re- [14] and (5) a convex combination of min and max
ferred to [9]. We employ a meta-heuristic viz, operators [15].
modied threshold accepting algorithm [9] to solve Results of the wine classication problem (see
the problem just described. It should be kept in Table 1) indicate that the product operator per-
mind, however, that the algorithm used is a heu- formed consistently well and gave the best solution
ristic and that, depending on the initial feasible with 100% classication with just 11 rules whereas
solution the best solutions reported here may not the c-operator with c 0:1, came closely behind
be ecient. giving 100% classication with 13 rules for cases of
both 4 partitions and 5 partitions of the features.
min operator scored over the others when one
4. Numerical simulations and results looks at high classication power of 100% with 18
and 20 rules, respectively for 4 and 5 partitions.
The rst illustrative example solved using the Fuzzy and and a convex combination of min and
methodology presented here is the well-known max operators did not provide good solutions
wine classication problem for which the data are though the former performed better among the
freely available in the Internet [2]. It has 13 fea- two when 4 partitions were considered.
tures (attributes) which classify 178 patterns into The same example when studied with leave-
three types of wines. The second numerical ex- one-out technique (see Table 2), produced dif-
ample concerns the determination of the breast ferent results. Both the product operator and
cancer in humans from Wisconsin University c-operator (with c 0:1) provided the best
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 531

Table 1
Results of example 1 (training data used as test data)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 11 100 20 100 13 84.27 69 58.43 53
4 100 11 100 18 100 13 98.31 13 97.19 15
3 86.52 7 74.16 6 84.83 7 58.98 8 62.36 4
a
min/max indicates convex combination of min and max operators.

Table 2
Results of example 1 (leave-one-out technique)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 3.04 91.01 2.74 100 3.04 82.02 62.56 91.57 87.9
4 100 3.07 93.26 2.87 100 3.07 99.44 3.38 98.88 3.36
3 60.11 1.8 60.11 1.8 60.11 1.8 60.11 1.9 60.11 1.8
a
min/max indicates convex combination of min and max operators.

Table 3
Results of example 2 (training data used as test data)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 98.54 25 97.51 29 98.24 27 95.6 35 95.02 29
4 98.39 13 97.36 16 97.95 13 96.78 17 95.31 9
3 97.95 12 96.33 13 97.8 10 95.46 5 87.99 7
a
min/max indicates convex combination of min and max operators.

Table 4
Results of example 2 (leave-one-out technique)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 9.3 99.85 7.3 100 7.3 96.34 22.23 98.68 22.82
4 92.39 2.78 92.68 2.79 92.38 2.78 95.75 2.87 100 3
3 98.09 3 82.28 2.47 97.66 2.93 98.83 2.96 71.88 2.15
a
min/max indicates convex combination of min and max operators.

solution with 100% classication with 3.04 rules max operators did not perform well, but they
on average, when 5 partitions were considered. outperformed the min operator when 4 partitions
Fuzzy and and a convex combination of min and were used.
532 V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533

Results of the Wisconsin breast cancer problem Acknowledgements


(see Table 3) shows that, when 5 partitions were
considered, the product operator yielded the best The rst author wishes to thank Deutscher
result of 98.54% classication power with 25 rules; Akademischer Austauschdienst (DAAD) for
the c-operator c 0:1 gave rise to the next best providing him the nancial support for this work
solution, whereas the min operator yielded better through a research fellowship leading to his
solution than the fuzzy and and a convex combi- Ph.D. degree. The authors also express their
nation of min and max operators. The min oper- gratitude to Dr. W.H. Wolberg (physician),
ator could produce a maximum of 97.51% University of Wisconsin Hospitals Madison,
classication power with 29 rules. Wisconsin, USA and Prof. Olvi Mangasarian
When the leave-one-out technique was applied (mangasarian@cs.wisc.edu) who is the principal
to the same problem (see Table 4), the best solu- donor of the breast cancer database to the in-
tion of 100% classication power with just 3 rules ternet.
on the average was achieved by the convex com-
bination of min and max operators when 4 parti-
tions were considered. The next best solution of
100% classication power with 7.3 rules on the References
average was achieved by the c-operator c 0:1
[1] K.P. Bennett, O.L. Mangasarian, Robust linear program-
when 5 partitions were considered. The product ming discrimination of two linearly inseparable sets,
operator followed closely behind with 100% clas- Optimization Methods and Software 1 (1992) 2334.
sication power and 9.3 rules on the average with [2] M. Forina et al., Wine Recognition Database, Available
the same number of partitions. When 5 partitions via anonymous ftp from ics.uci.edu in directory /
were considered, the min operator yielded 99.85% pub/machine-learning-databases/wine, 1992.
[3] H. Ishibuchi, K. Nozaki, H. Tanaka, Distributed
classication power with just 7.3 rules on the representation of fuzzy rules and its application to
average. pattern classication, Fuzzy Sets and Systems 52 (1992)
2132.
[4] H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka,
5. Conclusions Selecting fuzzy ifthen rules for classication problems
using genetic algorithms, IEEE Transactions on Fuzzy
Systems 3 (1995) 260270.
This paper addresses the paramount aspect of [5] H. Ishibuchi, T. Murata, Minimizing the fuzzy rule base
dimensionality reduction of the feature space in and maximizing its performance by a multi-objective
classication problems involving a large number genetic algorithm, in: Sixth FUZZIEEE Conference,
of dimensions and demonstrates the utility of Barcelona, Spain, 1997, pp. 259264.
[6] B. Kosko, Neural Networks and Fuzzy Systems, Prentice-
principal component analysis thereof. The chosen Hall, Englewood Clis, NJ, 1992.
principal components are fed to a fuzzy rule [7] O.L. Mangasarian, W.H. Wolberg, Cancer diagnosis via
based classier as new set of features. This linear programming, SIAM News 23 (1990) 118.
yielded a very high classication power of 100% [8] O.L. Mangasarian, R. Setiono, W.H. Wolberg, Pattern
in leave-one-out technique with a few rules in the recognition via linear programming: Theory and applica-
tion to medical diagnosis, in: T.F. Coleman, Y. Li (Eds.),
case of some aggregators. This is particularly a Large-Scale Numerical Optimization, SIAM, Philadelphia,
signicant result because the chosen principal 1990, pp. 2230.
components accounted for only 89% and 92% of [9] V. Ravi, H.-J. Zimmermann, Fuzzy rule based classica-
the total variance in examples 1 and 2, respec- tion with FeatureSelector and modied threshold accept-
tively. Hence, the authors conclude that the ing, European Journal of Operational Research 123 (1)
(2000) 1628.
principal component analysis can be used as a [10] J.O. Rawlings, Applied Regression Analysis:A Research
useful alternative to other methods of feature Tool, Wadsworth and Brooks/Cole Statistics and Proba-
selection existing in literature while solving clas- bility series, Wadsworth, Belmont, CA, 1988.
sication problems of higher dimensions using [11] J. Strackeljan, D. Behr, F. Detro, FeatureSelector: A
fuzzy rule based classiers. plug-in for feature selection with DataEngine, in: First
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 533

International Data Analysis Symposium, Aachen, Ger- [14] H.-J. Zimmermann, P. Zysno, Latent connectives in
many, 1997. human decision making, Fuzzy Sets and Systems 4 (1980)
[12] WINROSA, Manual, MIT GmbH, Aachen, Germany, 3751.
1997. [15] H.-J. Zimmermann, Fuzzy Set Theory and Its Applica-
[13] L. Zadeh, Fuzzy sets, Information and Control 8 (1965) tions, 2nd ed., Kluwer Academic Publishers, Dordrecht,
338353. 1991.

Anda mungkin juga menyukai