m on knowkdge-Based
~
ln&eUigrnt Engineering Systems 6 Allied Technologirs.30Aug-1 Sept 2000, Brighton,UK
Abstract
We review a recently developed method of performing k-means clustering in a high dimensional feature space and extend it to give the
resultant mapping topology preserving properties. We show the results of the new algorithm on the standard data set, random numbers d m m uniformly from [0,1)2 and on the
Olavetti database of faces. The new algorithm
convetyea extremely quickly.
Introduction
(1)
Kohonen feature maps can take a long while to
converge a problem which Kernel SOM solves.
Fourth Infernationrrl Conference on knowledge-Based Intelligent Engim'ng Systems &Allied Technologies,3@ Aug-I* Sqtt 2000, Brighton,UK
Now the SOM algorithm is a k means algorithm with an attempt to distrilbute the means
in an organised manner and so the first change
to the above algorithm is to update the closest
neuron's weights and those of its neighbours.
Thus we find the winning neuxon (the closest
in feature space) as above but now instead of
(31, we use
Mt+l,,
{;
Mi.0
(2)
otherwise
In terms of the kernel function (noting that
k(x,x) is common to all calculations) we have
Mt+l,a =
if
xiTUik(X, Xi)
otherwise
mt+l
- mt, + I(+(xt+1)
- mt,)
this leads naturally to 6 + 0 over time. To obviate this problem, we initially select a number
of centres, k and train the centres with one pass
through the data set in a random order. We
then have a partially ordered set of centres. We
now reset all values of Mi,a to aero and perform
a second pass through the daica set, typically
also decreasingthe width of the neighbourhood
function as with the normal Kohonen SOM.
C i , j TaiTajk(xi,xj)
-2
(7)
<+,
ifll~(xt+1)-mail
= A ( a , p')
Mt+l,a
(4)
Simulations
We report on three simulations. The first simulation is on the standard data set for topology preserving mappings - points are drawn iid
from the unit square in two dimensions. This
data set is chosen since it is very easy to interpret the results. We randomly select 100 points
from the unit square [0, 1)2 and use the KSOM
algorithm for only 10 iterations. In Figure 1,
we show which centre would be the winning
centre for a grid of points evenly distributed
about this unit square. We see that a topology
preserving mapping has been found. One centre (number 4) does not appear on the grid because its area of being closest does not include
one of the grid points. To seltxt the centre (or
mean) which is closest to the grid point, we
again work in feature space and calculate
318
(8)
Fourth lndmurtionnl Coyhnce on knowledge6ased Intelligent Enghming Systems 6 Allied Technolog&, 30"Aug-ldSept 2000, Brighbon.UK
(9)
Notice that this grid was obtained with a one
dimensional neighbourhood function in feature
space.
The second simulation uses the Same method
on artificial data drawn from two concentric
circles. Not only is the topology preservation
maintained on the data set, the two circles are
readily separated in feature space (Figure 2).
The first 9 nodes capture the inner circle, the
others the outer.
The third data set is the Olivetti database
of faces[3] which is composed of 6 individuals
each in 10 different poses against a dark background with no preprocessing.
Results are shown in Figure 3 in which the
numbers denote which individual was identified by the corresponding node on the graph.
Note that this time the grid is a grid showing
the mapping in feature space and we are displaying on the graph the identity of a person
in image space. We see that the faces have
been very clearly grouped into clusters each of
which is specif5c to a particular individual.
References
111 D Charles, C. Fyfe,P.L. Lai, D MacDonald, and R Wipal. Unsupervised Learning
using Wid Kernels. (submitted).
[2] Tuevo Kohonen.
Springer, 1995.
:I
[3] F. Samaria and A. Harter. Parameterisation of a stochastic model for human face
identifiation. In 2nd IEEE Workshop on
Applications of Computer Vision, 1994.
[4] B. Scholkopf, S . Mika, C. Burges,
P. Knirsch, K.-R. Muller, G. Ratsch, and
A. J. Smola. Input space vs feature space in
kernel-based methods. IEEE lkansactions
on Neural Networks, 1O:lOOO-1017,1999.
[5] B. Scholkopf, A. Smola, and K.-R. Muller.
Nonlinear component analysis as a kernel
eigenvalue problem. Neural Computation,
10:1299-1319,1998.
[6] A. J. Smola, 0. L. Mangasarian, and
B Scholkopf. Sparse kernel feature analysis. Technical Report 99-04, University of
Wiscosin Madison, 1999.
319
16
17
1*
l8
19
11
10
10
15
15
16
16
19
19
19
11
11
12
14
15
IS
a0
2D
ZD
11
12
12
13
14
15
20
20
12
12
13
14
14
20
20
-20
Fourth btmrational Confience on bledge-SaSed Intelligent E n g i m h g Systems b Allied T c h m w , 30 Aug-1 %t 2000, Brigh&n,UK
95
1 1 12
10-
13
,4
8-
P
8-
6-
15
ZL
7-
16
n
8
6-
4-
56
20-
46
-24
I
1
-6-
-4-
18
4I
- -10
-2
10
320
Figure
3: Each individual is identified by an
integer 1,...,6. The nodes are arranged in a two
dimensional *id as they wer,l during training.
We see that each individual person is identified
by a specific region of feature space.