Anda di halaman 1dari 5

Efcient Computation of Normalized Maximum

Likelihood Coding for Gaussian Mixtures with Its


Applications to Optimal Clustering
So Hirai
Graduate School of Information Science and Technology,
The University of Tokyo
7-3-1 Hogo, Bunkyo-ku, Tokyo, JAPAN
Email: So Hirai@mist.i.u-tokyo.ac.jp
Kenji Yamanishi
Graduate School of Information Science and Technology,
The University of Tokyo
7-3-1 Hogo, Bunkyo-ku, Tokyo, JAPAN
Email: yamanishi@mist.i.u-tokyo.ac.jp
Abstract
This paper addresses the issue of estimating from a given
data sequence the number of mixture components for a Gaussian
mixture model. Our approach is to compute the normalized maxi-
mum likelihood (NML) code-length for the data sequence relative
to a Gaussian mixture model, then to nd the mixture size that
attains the minimum of the NML. Here the minimization of the
NML code-length is known as Rissanens minimum description
length (MDL) principle. For discrete domains, Kontkanen and
Myllym aki proposed a method of efcient computation of the
NML code-length for specic models, however, for continuous
domains it has remained open how we compute the NML code-
length efciently. We propose a method for efcient computation
of the NML code-length for Gaussian mixture models. We develop
it using an approximation of the NML code-length under the
restriction of the domain and using the technique of a generating
function. We apply it to the issue of determining the optimal
number of clusters in clustering using a Gaussian mixture model,
where the mixture size is the number of clusters. We use articial
data sets and benchmark data sets to empirically demonstrate
that our estimate of the mixture size converges to the true one
signicantly faster than AIC and BIC.
I. INTRODUCTION
A. Motivation and signicance of this paper
This paper addresses the issue of estimating from a given
data sequence the number of mixture components for a
Gaussian mixture model (GMM). A GMM is often used
as a probabilistic model of clustering where each mixture
component corresponds to a cluster and a data is assumed to
be generated according to the linear combination of Gaussian
distributions. The estimation of the mixture size leads to the
choice of the optimal number of clusters, which has been
considered as one of the most important issues in clustering.
The best mixture size may be selected on the basis of the
information criteria such as AIC, BIC, MDL, etc. Note that
these criteria cannot straightforwardly be applied to GMMs
because they require that the central limit theorem holds for
the maximum likelihood estimator of a parameter vector, but
it does not hold due to the singularities of coefcient param-
eters of mixture models. Hence we introduce latent variables
indicating to which cluster each datum comes from. Then we
are able to apply the criteria to the selection of mixture size.
In this paper we specically investigate the performance of
Rissanens minimum description length (MDL) criterion in
the scenario of mixture size selection. It is known from the
modern theory of MDL [10] that the normalized maximum
likelihood (NML) code-length of a data sequence relative to a
model class is optimal in the sense that it achieves Shtarkovs
minimax criterion [12]. Hence we employ as the NML code-
length as a measure of goodness of a model class. Here are
important issues to be addressed:
1)It may not be computationally or analytically intractable to
compute the NML code-lengths in general. The NML code-
lengths may diverge for Gaussian distributions. How can we
efciently and approximately compute the NML code-lengths
for GMMs so that it does not diverge?
2)How well does the NML-based MDL criterion work in the
scenario of the selection of the mixture size? Does it perform
better than other criteria such as AIC, BIC or not?
This paper addresses these issues. As for Issue 1), we
propose a new method of efcient computation of approximate
NML code-lengths for GMMs. The key ideas are: A) we
appropriately restrict the range of data sequences over which
an integral is taken in the normalization term to approximately
compute the NML code-length for Gaussian distributions, B)
We employ the technique of a generating function, which
Kontkanen and Myllym aki [5],[6] developed, to derive a
recursion formula for the normalization term (Theorem 3.1),
which makes the NML code-lengths computable in O(n
2
K)
time (n is the sample size and K is the number of clusters).
As for Issue 2), we use articial data sets and benchmark
data sets to demonstrate the validity of the NML in terms of
the rate of convergence of the estimated number of clusters.
We show that for all of the data sets used in the experiments,
the number of clusters chosen by the NML-based criterion
converges signicantly faster than that chosen by other criteria
such as AIC, BIC.
2011 IEEE International Symposium on Information Theory Proceedings
978-1-4577-0595-3/11/$26.00 2011 IEEE 1031
B. Previous works
Clustering has been considered as the task of identifying
a nite mixture model (see e.g., [13],[3],[8], etc.) . The
information criteria such as AIC, BIC have been applied to the
issue of selecting the optimal number of clusters(see e.g., [8]).
Kontkanen and Myllym aki [6] applied the NML-based MDL
criterion to the issue of selecting the number of clusters and
empirically demonstrated its validity. However, for continuous
domains, there exists no works on the application of the NML-
based MDL criterion to that issue.
For discrete domains, Kontkanen and Myllym aki [5],[6]
proposed a method for efcient computation of the NML
code-length for multi-nomial models and Nave Bayes mod-
els. However, for continuous domains, their method cannot
straightforward be applied to the same issue. The NML code-
length may diverge for e.g., Gaussian distributions. Rissa-
nen [9] considered how an asymptotic form of the NML-
code-length is calculated for continuous domains, (see also
Gr unwald [4]). Recently, Luosto[7] proposed the calculation
of the NML code-length with continuous uniform distribution.
However, there has remained open how we approximately
compute the NML code-length over a continuous domain so
that it does not diverge, specically how we compute the NML
code-length for GMMs efciently and approximately.
II. APPROXIMATION OF NML FOR GAUSSIAN
DISTRIBUTION
A. Normalized Maximum Likelihood Distribution
This section introduces the notion of the NML code-length
into our discussion. We denote a data sequence of length n as
x
n
= (x
1
, , x
n
). Each x
j
ranges over X and thus x
n

X
n
.
If it is known that the data sequence x
n
was generated
according to a probability distribution P(X
n
), there exists a
prex coding such that x
n
can be encoded with code-length:
L(x
n
) = log P(x
n
). However, the probability distribution
generating the data sequence is unknown in general. When we
only know the class M = {P(X
n
|) : } (n = 1, 2, ,
is a k-dimensional compact parameter space, and ),
we consider the min-max criterion proposed by Shtarkov [12]:
min
Q
max
x
n

log Q(x
n
) min

(log P(x
n
|))

.
Here the minimum is attained by the normalized maximum
likelihood (NML) distribution dened by
P
NML
(x
n
|M) =
P(x
n
|

(x
n
, M))
C(M, n)
,
where

(x
n
, M) is the maximum likelihood estimator (MLE)
of from x
n
, and C(M, n) is the normalization constant
dened as follows:
C(M, n) =

y
n
X
n
P(y
n
|

(y
n
, M)). (1)
where the sum in (1) is taken over all possible data sequence
of length n.
The stochastic complexity (SC) [9] of x
n
relative to M is
dened as the code-length of x
n
using the NML distribution,
which we call the NML code-length, as follows:
SC(x
n
: M)
def
= log P
NML
(x
n
|M)
= log P(x
n
|

(x
n
, M)) + log C(M). (2)
The minimum description length (MDL) principle [10] asserts
that for given x
n
, the best model k is the one which attains the
minimum of the SC of x
n
relative to M. We employ the SC,
equivalently, the NML code-length as a criterion for model
selection. The problem is that in general the computational
cost for the normalizing termC(M, n) in Eq. (2) is exponential
in sample size n or may diverge.
B. Approximation of NML for Gaussian Distributions
This section gives a method of approximation of the
NML for Gaussian distributions. When X is a countable
set, Kontkanen and Myllym aki [5],[6] developed algorithms
for efciently computing the normalization term C(M, n). In
the discussion to follow we consider the case where X is
a continuous data set. In this case, the problem is that the
normalization term may diverge [4],[9]. We propose a method
for approximate computation of the normalization term for
Gaussian distributions so that it does not diverge. The key
idea is to appropriately restrict the range of x
n
over which
the sum is taken in the normalization term.
Let an observed data sequence be x
n
= (x
1
, , x
n
) where
x
i
= (x
i1
, , x
im
)
T
(i = 1, , n). We use a class of
Gaussian distributions: N(, ), where R
m
is a mean
vector, R
mm
is a covariance matrix, and m is the
dimension of x
i
. A probability density function of x
n
for the
Gaussian distribution is given by
f(x
n
; , )
=
1
(2)
mn
2
||
n
2
exp

1
2
n

i=1
(x
i
)
T

1
(x
i
)

,
(3)
and the NML distribution based on the Gaussian distribution
is dened as follows:
f
NML
(x
n
)
def
=
f(x
n
; (x
n
),

(x
n
))

Y (min,R)
f(y
n
; (y
n
),

(y
n
))dy
n
(4)
where (x
n
) and

(x
n
) are the MLEs of and respec-
tively:
(x
n
) =
1
n
n

i=1
x
i
,

(x
n
) =
1
n
n

i=1
(x
i
(x
n
))(x
i
(x
n
))
T
.
For given constants
min
(> 0), R(> 0), we set a restricted
domain as follows:
Y (min, R)
def
={y
n
|min

j(y
n
)(j = 1, , m),
|| (y
n
)||
2
R, y
n
X
n
},
1032
where

j
(y
n
) (j = 1, , m) are eigen values of

(y
n
).
This restriction makes the calculation of the normalization
term C(M, n) easier, as shown below.
First, by substituting MLE (x
n
),

(x
n
) into the formula
(3), the numerator of Eq. (4) can be expressed as follows:
f(x
n
; (x
n
),

(x
n
)) =
n

i=1
1
(2)
m
2 |

(x
n
)|
1
2
exp

1
2
(xi (x
n
))
T

(x
n
)
1
(xi (x
n
))

.
Next, we calculate the denominator of formula (4). Using
the fact that (x
n
) and

(x
n
) are sufcient statistics, we
can calculate the normalization term as an integral with
respect to (x
n
) and

(x
n
). As MLEs are sufcient statistics,
f(x
n
; , ) is decomposed as follows:
f(x
n
; , ) = f(x
n
| (x
n
),

(x
n
)) g
1
( (x
n
); , ) g
2
(

(x
n
); ).
where
g1( (x
n
); , )
def
=
1
(2/n)
m
2 ||
1
2
exp

1
2/n
( )
T

1
( )

,
g2( (x
n
); )
def
=
|

|
nm2
2
2
m(n1)
2 |
1
n
|
n1
2 m(
n1
2
)
exp

1
2
Tr(n
1

.
Here we dene the function f(x
n
| (x
n
),

(x
n
)) =
( (x
n
) = ,

(x
n
) =

). We x values (x
n
) =
,

(x
n
) =

, and let
g(

)
def
= g1( ; ,

) g2(

;

)
=
n
mn
2
2
mn
2
m
2 e
mn
2 m(
n1
2
)
|

m
2
1
.
Letting the normalization term of formula (4) be C(M, n),
we can calculate it by integrating g(

) with respect to and

over the restricted domain as follows:


C(M, n) =

Y (
min
,R)
g(

)d

d
=
2
m+1
R
m
2
min

m
2
2
m
m+1
(
m
2
)

n
2e
mn
2 1
m(
n1
2
)
= B(m, min, R)

n
2e
mn
2 1
m(
n1
2
)
, (5)
where we dene B(m,
min
, R) by
B(m, min, R)
def
=
2
m+1
R
m
2
min

m
2
2
m
m+1
(
m
2
)
.
B(m,
min
, R) does not depend on a number of data n. Since
(5) is nite, the normalization term C(M, n) does not diverge.
III. EFFICIENT COMPUTATION OF NML FOR GMMS
This section gives a method of efcient computation of the
NML for GMMs. A Gaussian mixture model (GMM) has been
used as a typical representation of clustering. It takes a form
of a linear combination of K Gaussian distributions with mean

k
and covariance matrix
k
(k = 1, . . . , K):
P(X) =
K

k=1

k
f(X|Z = k;
k
,
k
),
where
k
= P(Z = k), Z is hidden variable indicating the
cluster to which X belongs. We denote the class of Gaussian
mixture distributions with K clusters as M(K).
We consider the case where the cluster indexes z
n
are given
in addition to x
n
. Each z
i
corresponds to x
i
, and we write
this as z
n
= z
1
z
n
. For example, cluster indexes z
n
can be
obtained using the EM algorithm [2] from given x
n
.
The joint probability density function of x
n
and z
n
is given
by
f(x
n
, z
n
; , ) =
K

k=1

h
k
k

x
i
z
k
1
(2)
mh
k
2 ||
h
k
2
exp

1
2
(x
i
)
T

1
(x
i
)

,
where we write P(Z = k) as P(Z = k) =
k
, K is number
of clusters, and h
k
is the number of occurrences of data which
belongs to the cluster k. The NML distribution for x
n
relative
to the class M(K) of GMMs is
fNML(x
n
, z
n
) =
f(x
n
, z
n
; (x
n
, z
n
),

(x
n
, z
n
))
C(M(K), n)
. (6)
The MLE of
k
is
k
= h
k
/n. We use Eq. (5) to compute the
normalization term of the NML distribution for the Gaussian
mixture model M(K) as follows:
C(M(K), n) =

z
n

Y (
min
,R)
f(y
n
, z
n
; (y
n
, z
n
),

(y
n
, z
n
))dy
n
=

h
1
++h
K
=n
n!
h
1
! h
K
!
K

k=1

h
k
n

h
k
I(h
k
), (7)
where,
I(h
k
) = B(m, min, R)

h
k
2e

mh
k
2 1
m(
h
k
1
2
)
.
A straightforward computation of C(M(K), n) takes O(n
K
)
time. We give the following theorem showing that the formula
(7) can be recursively computed in O(n
2
K) time.
Theorem 3.1: The normalization term in (7) satises the
following recursion formula:
C(M(K + 1), n) =

r
1
+r
2
=n
nCr
1

r
1
n

r
1

r
2
n

r
2
C(M(K), r
1
)I(r
2
).
(8)
This enables us to calculate the formula (7) in O(n
2
K)
time.
Proof: In this proof, we use the technique of generating
functions, which was proposed by Kontkanen and Myllym aki
[5],[6], to derive the recurrence formula.
First, note that the normalization term is expressed as
C(M(K), n) =

h
1
++h
K
=n
n!
h
1
! h
K
!
K

k=1

h
k
n

h
k
I(h
k
).
1033
Algorithm 1 Calculation of Stochastic Complexity for GMM
STEP1. Calculate h
k
,
k
,

k
(k = 1, , K).
STEP2. Calculate Numerator of Eq. (6).
STEP3. Dene normalization term as C(M(K), 0) = 1.
STEP4. Calculate C(M(1), j) = I(j) (j = 1, , n).
STEP5. Calculate C(M(k), j) as follows:
for k=2 to K do
for j=1 to n do
C(M(k), j) =

r
1
+r
2
=j
j
Cr
1

r
1
j

r
1

r
2
j

r
2
C(M(k 1), r
1
)I(r
2
).
end for
end for
STEP6. Calculate Stochastic Complexity (NML Code-
length).
We dene a generating function J(z) by
J(z)
def
=

n0
n
n
n!
I(n) z
n
.
The Kth power of J(z) is calculated as follows:
(J(z))
K
=

n0


h
1
++h
K
=n
K

k=1
h
h
k
k
h
k
!
I(h
k
)

z
n
=

n0
n
n
n!


h
1
++h
K
=n
n!
h
1
! h
K
!
K

k=1

h
k
n

h
k
I(h
k
)

z
n
=

n0
n
n
n!
C(M(K), n)z
n
,
and the (K + 1)th power of J(z) is
(J(z))
K+1
= J(z) (J(z))
K
=

n0


r
1
+r
2
=n
r
r
1
1
r
1
!
C(M(K), r
1
)
r
r
2
2
I(r
2
)
r
2
!

z
n
=

n0
n
n
n!


r
1
+r
2
=n
nCr
1

r
1
n

r
1

r
2
n

r
2
C(M(K), r
1
)I(r
2
)

z
n
=

n0
n
n
n!
C(M(K + 1), n)z
n
.
Thus a recursion formula for the normalized term
C(M(K), n) is given by
C(M(K + 1), n) =

r
1
+r
2
=n
nCr
1

r
1
n

r
1

r
2
n

r
2
C(M(K), r
1
)I(r
2
).
We use this recursion formula to calculate the normalization
term (7) as in Algorithm 1. The computational complexity is
O(n
2
K). We use the formula (8) to compute C(M(K), n)
from C(M(K 1), r
1
) (r
1
= 1, , n) in O(n) time. As
all C(M(k), j)(k = 1, , K, j = 1, , n) are calculated
at STEP5 in Algorithm 1, we can compute C(M(K), n) in
O(n
2
K) time.
Then, the stochastic complexity of x
n
relative to M(K) is
calculated as follows:
SC(x
n
, z
n
: M(K)) = log f
NML
(x
n
, z
n
|M(K))
= log f(x
n
, z
n
| (x
n
, z
n
, M(K)),

(x
n
, z
n
, M(K)))
+ log C(M(K), n).
On the basis of the MDL principle, we select the number K
of clusters which minimizes SC(x
n
, z
n
|M(K)) for given x
n
.
IV. EXPERIMENTAL RESULTS
A. Empirical Results Using Articial Data Sets
This section gives empirical results showing the validity
of the NML for GMMs. We generated a number of data
sequences of size n according to the true Gaussian mixture
model M of mixture size K. Each mixture component is a
Gaussian distribution with mean
k
and variance-covariance
matrix
k
(k = 1, , K). For each data sequence x
n
gener-
ated according to the true model M, we also generated their
corresponding cluster indexes z
n
using the EM algorithm [2],
where z
i
shows which cluster x
i
comes from (i = 1, . . . , n).
In our experiment, we repeated cluster generation using the
EM algorithm 100 times by changing initial values of the
algorithm. We compared the three criteria: NML, Akaikes
Information Criterion (AIC) [1] and Bayesian Information
Criterion (BIC) [11] for the choice of the number of clusters.
NML is calculated according to the method proposed in the
previous sections. AIC and BIC are calculated as follows:
AIC(x
n
, z
n
|M(K))
= 2 log f(x
n
, z
n
| (x
n
, z
n
, M(K)),

(x
n
, z
n
, M(K)))
+m(m+ 3)K +K,
BIC(x
n
, z
n
|M(K))
= 2 log f(x
n
, z
n
| (x
n
, z
n
, M(K)), (x
n
, z
n
, M(K)))
+
m(m+ 3)K
2
K

k=1
log h
k
+K log n.
We measured their performance in terms of the identication
probability P(K) and the benet B(K) dened as follows:
Letting K be the true number of clusters and K

be the one
chosen using any criterion,
P(K) = Prob(K

= K),
B(K) = max

0, 1
|K

K|
T

,
where T is a given constant. The identication probability
P(K) is the probability that the true number of clusters is
chosen using the algorithm. The benet is a score assigned
to K

so that if K

= K it takes the maximum value 1,


and it decreases linearly to zero as |K

K| increases to T.
The benet is calculated as the average of the benets taken
over all of random generation. We compared NML, AIC, and
BIC in terms of how fast the identication probability and the
benet converge as sample size n increases.
Figure IV.1 shows the results in the case of K = 3 and
m = 3 where K is the true number of clusters and m is the
1034
dimension of a datum. We observe from Figure IV.1 that the
number of clusters chosen by NML converges signicantly
faster than those chosen by AIC and BIC. For example, the
least sample size requied for P(K) 0.9 is approximately
200 for NML while it is approximately 1500 for BIC. We
further observe from Figure IV.2 that the number of clusters
chosen by NML becomes very close to the true number of
clusters with high probability.
Table I and II shows the results in the case of the variety
of K and m. We observe that the probability of rightness for
NML is larger than that of BIC in the variety of K and m.
We see from these results that NML gives the best strategy
for the choice of the number of clusters.
O 6OO 1OOO 16OO 2OOO
O
O.2
O.4
O.6
O.8
1
sample size
I
d
e
n
t
i
f
i
c
a
t
i
o
n

P
r
o
b
a
b
i
l
i
t
y
3 clusters, 3 dimension


B!O
A!O
NML
Fig. IV.1. P(K) of the articial data
which has K = 3, m = 3
O 6OO 1OOO 16OO 2OOO
O
O.2
O.4
O.6
O.8
1
sample size
B
e
n
e
f
i
t
3 clusters, 3 dimension


B!O
A!O
NML
Fig. IV.2. B(K) of the articial data
which has K = 3, m = 3
m
3 5 10
3 140 240 700
K 4 180 360 1000
5 240 500 1800
6 360 600 3000
TABLE I
THE SAMPLE SIZE IN WHICH P(K)
IS OVER 80%(NML)
m
3 5 10
3 800 700 240
K 4 900 1400 500
5 1200 1200 1400
6 1600 2000 3000
TABLE II
THE SAMPLE SIZE IN WHICH P(K)
IS OVER 80%(BIC)
B. Empirical Results Using Benchmark Data Sets
We utilized two benchmark data sets: Blood Transfusion
Service Center Data Set andThe Number Image Data Set,
which we describe in details below. All of them were prepared
for the benchmark data sets for classication problems. For
each of data sets, a label is assigned to each datum. Without
knowing these labels in advance, we estimated the label for
each datum through their clustering.
Blood Transfusion Service Center Data Set [14]
This data set has 4 attributes and 2 clusters. The data
set was extracted from the donor database of Blood
Transfusion Service Center in Hsin-Chu City in Taiwan.
The Number Image Data Set
This data set has 10 attributes and 3 clusters. The data
set have the images which represent the numbers from 0
to 2.
Table III and Table IV show the average of the numbers
of clusters estimated by NML, AIC, and BIC for the rst
data set and the second one, respectively, where the average
is taken over 100 times trials for the rst data and over 10
times trials for the second data of cluster generation using
the EM algorithm. For the both cases, we see that only NML
is able to successfully identify the true number of clusters.
This implies that our method of NML computation works
signicantly better than other criteria AIC, BIC.
TABLE III
BLOOD TRANSFUSION
SERVICE CENTER DATA SET
NML AIC BIC
Average K 2.00 4.89 4.84
TABLE IV
THE NUMBER IMAGE DATA
SET
NML AIC BIC
Average K 3.00 7.00 7.00
V. CONCLUSION
We have proposed a new method of efcient computation of
approximate NML code-lengths for Gaussian mixture models.
We have derived an approximation of NML for Gaussian
distributions and a recursion formula for computing NML for
GMM, which enables us to compute NML in O(n
2
K) time
for GMM. In the experiment using articial data sets, the
number of clusters chosen by NML converges signicantly
faster than those chosen by AIC and BIC. In the experiment
using benchmark data sets, it turns out that NML is able to
identify the true number of clusters with high probability. It
remains for future study how to choose R and
min
depending
on sample size.
VI. ACKNOWLEDGEMENTS
This work was supported by Microsoft CORE6 project,
NTT corporation, Hakuhodo corporation, and Grant-in-Aid for
Scientic Research(A).
REFERENCES
[1] H. Akaike: Information theory and an extension of the maximum likelihood
principle. Proceeding of the Second International Symposium on Information
Theory, Budapest, Akademiai Kiado, pp. 267-281.
[2] A.P. Dempster, N.M. Laird, and D.B.Rubin. Maximum likelihood from incomplete
data via the EM algorithm. J.Royal Staitst. Soc.B, Vol.39, pp:138, 1977.
[3] C.Farley and A.E.Raftery: How many clusters? Which clustering method Answers
via model-based cluster analysis. Computer Journal, 41(8),578-588, 1998.
[4] P. D. Grunwald:The Minimum Description Length Principle, MIT Press, ,Cam-
bridge, June 2007.
[5] P.Kontkanen and P.Myllym aki: A linear time algorithm for computing the multino-
mial stochastic complexity. Information Processing Letters, Elsevier, 103, pp:227
233, 2007.
[6] P.Kontkanen and P. Myllym aki: An empirical comparison of NML clustering
algorithms. Proceedings of the 2008 International Conference on Information
Theory and Statistical Learning (ITSL-08), M. Dehmer, M. Drmota, and F.
Emmert-Streib, Eds., pp. 125-131. CSREA Press, 2008.
[7] P.Luosto. Code lengths for model classes with continuous uniform distributions.
Proceedings of Workshop on Information-Theoretic Methods for Science and
Engineering (WITMSE2010), 2010.
[8] G.McLachlan: Finite Mixture Models. John Wiley & Sons, 2000.
[9] J. Rissanen: Fisher information and stochastic complexity. IEEE Transaction on
Information Theory, 42(1):40-47, January 1996.
[10] J. Rissanen. Information and Complexity in Statistical Modeling, Springer, 2007.
[11] G. Schwarz: Estimating the dimension of a model. Annals of Statistics 6 (2), 461-
464, 1978.
[12] Yu. M, Shtarkov: Universal sequential coding of single messages. Translated from
Problems of Information Transmission, Vol.23, No.3,3-17, July-September 1987.
[13] P.Smyth. Probabilistic model-based clustering of multivariate and sequential data.
Proceedings of the Seventh International Conference on Articial Intelligence and
Statistics, Pp:299-304, Morgankaufmann, 1999.
[14] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html
1035

Anda mungkin juga menyukai