478
World Academy of Science, Engineering and Technology 49 2009
not a member of the cluster under consideration. Many crisp m : is the fuzzification parameter
clustering techniques have difficulties in handling extreme p : is the number of specified clusters
outliers but fuzzy clustering algorithms tend to give them very dki : is the distance of xi in cluster Ck
small membership degree in surrounding clusters [14].
Also the algorithm imposes a restriction which says the sum
of memberships of a data point in all the clusters must be
equal to one. This constrain is represented by expression 3.
∑ μ j ( xi ) = 1
p
j =1
(3)
∑ [μ ( x )] x
m
= i j i i
∑ [μ ( x )]
cj m
(4)
i j i
where
Cj : is the center of the jth cluster
xi : is the ith data point
µj : the function which returns the membership
Fig. 1 Fuzzy membership in a cluster between zero and one m : is the fuzzification parameter
The non-zero membership values, with a maximum of one, This is a special form of weighted average. We modify the
show the degree to which the data point represents a cluster. degree of fuzziness in xi’s current membership and multiply
As shown in Fig. 1, the points at the centre of the cluster have this by xi. The product obtained is divided by the sum of the
maximum membership values and the membership gradually fuzzified membership. The c-means fuzzy clustering
decreases when we move away from the cluster centre. Thus algorithm is given in Table I.
fuzzy clustering provides a flexible and robust method for
handling natural data with vagueness and uncertainty. In TABLE I
fuzzy clustering, each data point will have an associated FUZZY C MEANS ALGORITHM
degree of membership for each cluster. The membership value
is in the range zero to one and indicates the strength of its initialize p=number of clusters
association in that cluster. initialize m=fuzzification parameter
initialize Cj (cluster centers)
Repeat
A. C-Means Fuzzy Clustering Algorithm
For i=1 to n :Update µj(xi) applying (3)
Fuzzy c-means clustering involves two processes: the For j=1 to p :Update Ci with(4)with current µj(xi)
calculation of cluster centers and the assignment of points to Until Cj estimate stabilize
these centers using a form of Euclidian distance. This process
is repeated until the cluster centers stabilize. The algorithm is
similar to k-means clustering in many ways but incorporates
The first loop of the algorithm calculates membership
fuzzy set’s concepts of partial membership and forms
values for the data points in clusters and the second loop
overlapping clusters to support it. It assigns membership
recalculates the cluster centers using these membership
value to the data items for the clusters within a range of 0 to 1.
values. When the cluster center stabilizes (when there is no
The algorithm needs a fuzzification parameter m in the range
change) the algorithm ends.
[1,n] which determines the degree of fuzziness in the clusters.
When m reaches the value of 1 the algorithm works like a B. Limitations of the Algorithm
crisp partitioning algorithm and for larger values of m the The fuzzy c-means approach to clustering suffers from
overlapping of clusters is tend to be more. The algorithm several constrains that affect the performance [10]. The main
calculates the membership value µ with the formula, drawback is from the restriction that the sum of membership
1
⎛ 1 ⎞ m −1 values of a data point xi in all the clusters must be one as in
⎜ ⎟ (4), and this tends to give high membership values for the
⎜d ⎟ outlier points. So the algorithm has difficulty in handling
μ j ( xi ) = ⎝ ji ⎠ (2)
1 outlier points. Secondly, the membership of a data point in a
p
⎛ 1 ⎞ m −1
cluster depends directly on its membership values in other
∑ ⎜⎜ d
k =1 ⎝
⎟⎟
cluster centers and this sometimes happens to produce
ki ⎠
479
World Academy of Science, Engineering and Technology 49 2009
480
World Academy of Science, Engineering and Technology 49 2009
were converted into numeric values. For the analysis of the Also a comparative analysis of the new algorithm with similar
new method we took the attributes income and health index as extensions of fuzzy c-means algorithm is to be carried out.
shown in Fig. 2. As we can see from the figure the response
on health is converted into a numeric index on a ten point
scale. ACKNOWLEDGMENT
We would like to thank Mr. Nidup Gyelesten, Director,
Gross National Happiness regional chapter, Sherubtse
College, Bhutan for providing the data used for the analysis.
REFERENCES
[1] Sankar K. Pal, P. Mitra, “Data Mining in Soft Computing Framework: A
Survey”, IEEE transactions on neural networks, vol. 13, no. 1, January
2002.
[2] R. Cruse, C. Borgelt, “Fuzzy Data Analysis Challenges and
Perspective”. Available: http://citeseer.ist.psu.edu/ kruse99fuzzy.html
[3] Lei Jiang and Wenhui Yang, “A Modified Fuzzy C-Means Algorithm
Fig. 2 The income(X axis) and health index(Y axis) for Segmentation of Magnetic Resonance Images” Proc. VIIth Digital
Image Computing: Techniques and Applications, pp. 225-231, 10-12
Dec. 2003, Sydney.
As we can see from the data set, in Bhutan the low income [4] Frank Klawonn and Annette Keller, “Fuzzy Clustering Based on
group maintains better health than high income group since Modified Distance Measures”, Available:
they are self sufficient in many ways. Like any other natural http://citeseer.istpsu.edu/fuzzy_clustering_62
[5] W. H. Inmon, “The data warehouse and data mining”, Commn. ACM,
data this data set also contains many outlier points which do vol. 39, pp. 49–50, 1996.
not belong to any of the groups. If we apply c-means [6] U. Fayyad and R. Uthurusamy, “Data mining and knowledge discovery
algorithm these points tend to get more membership values in databases”, Commn. ACM, vol. 39, pp. 24–27, 1996.
due to exp. (4). [7] Pavel Berkhin, “Survey of Clustering Data Mining Techniques”,
Available: http://citeseer.ist.psu.edu/berkhin02survey.html
To start the data analysis, first we applied k-means [8] Chau, M., Cheng, R., and Kao, B, “Uncertain Data Mining: A New
algorithm to find the initial three cluster centers. The Research Direction”, Available: www.business.hku.hk
algorithm ended with three cluster centers at C1(24243,6.7), /~mchau/papers/UncertainDataMining_WSA.pdf
C2(69794,5.1) and C3(11979.29,2.72). We applied these [9] Keith C.C, C. Wai-Ho Au, B. Choi, “Mining Fuzzy Rules in A Donor
Database for Direct Marketing by A Charitable Organization”, Proc of
initial values in both c-means algorithm and the new method First IEEE International Conference on Cognitive Informatics, pp: 239 -
to analyze the data and the algorithms ended with centroids as 246, 2002
given in Table II. [10] E. Cox, Fuzzy Modeling And Genetic Algorithms For Data Mining And
Exploration, Elsevier, 2005
TABLE II [11] G. J Klir, T A. Folger, Fuzzy Sets, Uncertainty and Information, Prentice
PERFORMANCE COMPARISON OF C-MEANS AND NEW METHOD Hall,1988
Cent C-means New Method [12] J Han, M Kamber, Data Mining Concepts and Techniques, Elsevier,
2003
ers X Y X Y [13] J. C. Bezdek, Fuzzy Mathematics in Pattern Classification, Ph.D. thesis,
C1 24243.11 6.53 23464.1 7.3485 Center for Applied Mathematics, Cornell University, Ithica, N.Y., 1973.
[14] Carl G. Looney, “A Fuzzy Clustering and Fuzzy Merging Algorithm”
C2 69749.6 5.08 68707 5.71 Available: http://citeseer.ist.psu.edu/399498.html
C3 115979.3 2.83 112894.1 1.905 [15] G. Raju, A. Singh, Th. Shanta Kumar, Binu Thomas, “ Integration of
X values represent the income in Nulgtrum( Bhutan’s currency). C1, C2 Fuzzy Logic in Data Mining: A comparative Case Study”, Proc. of
and C3 are the three final cluster centers. International Conf. on Mathematics and Computer Science, Loyola
College, Chennai, 128-136, 2008
[16] Sullen Donnelly, “How Bhutan Can Develop and Measure GNH”,
From Fig. 2 and Table II, it can be seen that the final Available: www.bhutanstudies.org.bt/seminar/ 0402-gnh/GNH-papers-
centroids of c-means method does not represent the actual 1st_18-20.pdf
centers of the clusters. This is due to the influence of outlier
points. But the new method identifies the cluster centers in a
better way by treating the outlier points in a different way.
VII. CONCLUSION
Fuzzy c-means algorithm, well known fuzzy clustering
algorithm has several limitations in handling natural data with
uncertainty and vagueness. In this paper we presented a
modified version of fuzzy c-means algorithm. The new
algorithm is applied on a natural data set and its performance
is compared with that of classical fuzzy c-means algorithm.
We found that the new method gives better performance in
defining cluster centers. A detailed study with more data sets
is necessary to ascertain the usefulness of the new method.
481