a good example. This measure has been widely used due to its
computational efficiency and effectiveness for hypersphereshaped clusters.
Calinski-Harabasz index
This index is computed by
=
( ) 1
( )
( ) +
1
,
=1
Dunn index
Bic index
= 1
1( )
28
1 N
min d (v i , v j )
N i =1 jC
j i
The smaller value NIVA(C ) indicates that a valid optimal
partition to the different given partitions was found.
SepxG(C ) =
NIVA index
B. External validity indexes
In this section, we describe a novel validity index called
NIVA [26].
We first need to introduce the basic principles.
C = {c i | i = 1,.., N }
F-measure
Combines the precision and recall concepts from
information retrival. We then calculate the recall and
precision of that cluster for each class as:
(, ) =
And
(, ) =
v i (i = 1,2,..., N ) , where N is
Compac(C )
SepxG(C )
Nmimeasure
SepxS (c i ) ).
Compac(C ) =
1
N
Esp(c ) * SepxS (c )
i =1
Where:
ESp (ci ) =
1 li 1 ni
k (d ( x j , x j +1 ))
li k =1 ni k =1
Purity
and
SepxS (c i ) =
{ {
}}
1 li
max (d ( sv p , sv j )
l i k =1
l i = subclusters number of c i
nik
C groups. It is
=1
29
Entropy
Entropy measures the purity of the clusters class labels.
Thus, if all clusters consist of objects with only a single class
label, the entropy is 0. However, as the class labels of objects
in a cluster become more varied, the entropy increases. To
compute the entropy of a dataset, we need to calculate the
class distribution of the objects in each cluster as follows:
V. STUDY COMPARATIVE
In this section, we show experimentally tested using the Kmeans and Bisecting K-means algorithms. We used 12
synthetic data sets (see Tables 5 and 6) y Cmc. These data sets
were used by Maria Halkidi [7] and Chien-Hsing Chou
[8][10].
= log ( )
Where the sum is taken over all classes. The total entropy
for a set of clusters is calculated as the weighted sum of the
entropies of all clusters, as shown in the next equation
=
=1
2.
3.
4.
5.
31
(e) DataSet 5
(a) DataSet 1
(b) DataSet 2
(f) DataSet 6
(c) DataSet 3
(g) DataSet 7
(d) DataSet 4
(h) DataSet 8
32
(a) DataSet 9
REFERENCES
[1]
[2]
(b) DataSet 10
[3]
[4]
[5]
[6]
[7]
[8]
(c)DataSet 11
[9]
[10]
[11]
[12]
[13]
(d) DataSet 12
33
[15]
[16]
[17]
[18]
[19]
[20]
[21]
34