Anda di halaman 1dari 4

Fourth InternatirmrlG

m on knowkdge-Based
~
ln&eUigrnt Engineering Systems 6 Allied Technologirs.30Aug-1 Sept 2000, Brighton,UK

The Kernel Self Organising Map


Donald MacDonald and Colin Fyfe
Applied Computational Intelligence Research Unit,
The University of Paisley,
Scotland.
email: mcdcwiO,fyfe-ciO@paisley.
zu:.uk

Abstract

We review a recently developed method of performing k-means clustering in a high dimensional feature space and extend it to give the
resultant mapping topology preserving properties. We show the results of the new algorithm on the standard data set, random numbers d m m uniformly from [0,1)2 and on the
Olavetti database of faces. The new algorithm
convetyea extremely quickly.

The interest in feature maps stems directly from


their biological importance. A feature map
uses the physical layout of the output neurons to model some feature of the input space.
In particular, if two inputs XI and xz are close
together with respect to some distance measure
in the input space, then if they cause output
neurons pa and pt. to fire respectively, pa and
96 must be close together in some layout of the
output neurons. Further we can sfate that the
opposite should hold if ya and 86 are close te
gether in the output layer, then those inputs
which cause pa and to fire should be close
together in the input space. When these two
conditions hold, we have a feature map. Such
maps are also d i e d topology preserving maps.
There are several ways of creating feature maps
- the most popular is Kohonens.
Kohonens algorithm is exceedingly simple
the network is a 2-layer network and competition takes place between the output neurons; however now not only are the weights
into the winning neuron updated but also the
weights into its neighbours. Kohonen defined
a neighbourhood function A(i,i*) of the winning neuron i*. The neighbourhood function
is a function of the distance between i and io.
A typical function is the DBerence of G a w
sians function; thus if unit i is at point ri in
the output layer then

Introduction

The use of kernels in unsupervised learning has


become popular particularly in the field of Kernel Principal Component Analysis (KPCA) [6,
5, 41 The method has recently been extended
to other unsupervised techniques e.g. Kernel
Principal Factor Analysis, Kernel Exploratory
Projection Pursuit and Kernel Canonical Correlation Analysis [l].In this paper, we extend
the method and create a Kernel equivalent of
the Self Organising Map of Kohonen [2].
The set of methods known under the generic
title of Kernel Methods use a nonlinear m a p
ping to map data into a high dimensional fe*
ture space in which linear operations are performed. This gives us the computational advantages of h e a r methods but also the representational advantages of nonlinear methods.
The result is a very efficient method of performing nonlinear operations on a data set. In
more detail, let #(x) be the nonlinear function which maps the data into feature space, F.
Then in F,we can define a matrix, K, in terms
of a dot product in that space i.e. K ( i , j ) =
4(xi).#(xj. Typically we select the matrix K
based on our knowledge of the properties of
the matrix rather than any knowledge of the
function $0. The kernel trick allows us to define every operation in feature space in terms
of the kernel matrix rather than the nonlinear
function, (60.

The Kohonen Feature Map

(1)
Kohonen feature maps can take a long while to
converge a problem which Kernel SOM solves.

Kernel K-means Clustering

We will follow the derivation of [5] who have


shown that the k means algorithm can be per-

0-7803-6400-7/00/$10.00 02000 IEEE


317

Fourth Infernationrrl Conference on knowledge-Based Intelligent Engim'ng Systems &Allied Technologies,3@ Aug-I* Sqtt 2000, Brighton,UK

formed in Kernel space. The aim is to find


k means, m, so that each point is close to
one of the means. Now as with KPCA, each
mean may be described as lying in the manifold spanned by the observations, $(xi) i.e.
m, = Zi~,&(xi). Now the k means alge
rithm choses the means, m,, to minimise the
Euclidean distance between the points and the
closest mean

which leads to an update equation of

Now the SOM algorithm is a k means algorithm with an attempt to distrilbute the means
in an organised manner and so the first change
to the above algorithm is to update the closest
neuron's weights and those of its neighbours.
Thus we find the winning neuxon (the closest
in feature space) as above but now instead of
(31, we use
Mt+l,,

i.e. the distance calculation can be accomp


ished in Kernel space by means of the K matrix
alone.
Let Mi, be the cluster assignment variable.
i.e. Mi, = 1if $(xi) is in the pth cluster and is
0 otherwise. [SI initialise the means to the first
training patterns and then each new training
point, #(xt+l),t 1 > k, is assigned to the
closest mean and its cluster assignment variable calculated using

{;

Mi.0

(2)

otherwise
In terms of the kernel function (noting that
k(x,x) is common to all calculations) we have

Mt+l,a =

if

xiTUik(X, Xi)

< Ci,jTpirpjk(xi,xj) (3)


-2Ci.y,ik(x,Xi),vP # a

otherwise

We must then update the mean, m, to take


account of the (t l)th data point

mt+l

- mt, + I(+(xt+1)

- mt,)

this leads naturally to 6 + 0 over time. To obviate this problem, we initially select a number
of centres, k and train the centres with one pass
through the data set in a random order. We
then have a partially ordered set of centres. We
now reset all values of Mi,a to aero and perform
a second pass through the daica set, typically
also decreasingthe width of the neighbourhood
function as with the normal Kohonen SOM.

C i , j TaiTajk(xi,xj)

-2

(7)

<+,

ifll~(xt+1)-mail

< Il4(Xt+l) - m,ll,Vp # a

= A ( a , p')

where a is the identifier of the closest neuron.


Now the rest of the algorithm cam be performed
as before. However there is one difficulty with
this: the SOM requires a great number of iterations for convergence and since C =

Mt+l,a

The Kernel Self Organising Map

(4)

where we have used the term mL+lto designate


the updated mean which takes into account the
new data point and
(5)

Simulations

We report on three simulations. The first simulation is on the standard data set for topology preserving mappings - points are drawn iid
from the unit square in two dimensions. This
data set is chosen since it is very easy to interpret the results. We randomly select 100 points
from the unit square [0, 1)2 and use the KSOM
algorithm for only 10 iterations. In Figure 1,
we show which centre would be the winning
centre for a grid of points evenly distributed
about this unit square. We see that a topology
preserving mapping has been found. One centre (number 4) does not appear on the grid because its area of being closest does not include
one of the grid points. To seltxt the centre (or
mean) which is closest to the grid point, we
again work in feature space and calculate

a = argmin Il$(x) -- mP1l2


P

318

(8)

Fourth lndmurtionnl Coyhnce on knowledge6ased Intelligent Enghming Systems 6 Allied Technolog&, 30"Aug-ldSept 2000, Brighbon.UK

which in terms of the kernel function may be


written as

(9)
Notice that this grid was obtained with a one
dimensional neighbourhood function in feature
space.
The second simulation uses the Same method
on artificial data drawn from two concentric
circles. Not only is the topology preservation
maintained on the data set, the two circles are
readily separated in feature space (Figure 2).
The first 9 nodes capture the inner circle, the
others the outer.
The third data set is the Olivetti database
of faces[3] which is composed of 6 individuals
each in 10 different poses against a dark background with no preprocessing.
Results are shown in Figure 3 in which the
numbers denote which individual was identified by the corresponding node on the graph.
Note that this time the grid is a grid showing
the mapping in feature space and we are displaying on the graph the identity of a person
in image space. We see that the faces have
been very clearly grouped into clusters each of
which is specif5c to a particular individual.

References
111 D Charles, C. Fyfe,P.L. Lai, D MacDonald, and R Wipal. Unsupervised Learning
using Wid Kernels. (submitted).
[2] Tuevo Kohonen.
Springer, 1995.

:I

Self- Organising Maps.

[3] F. Samaria and A. Harter. Parameterisation of a stochastic model for human face
identifiation. In 2nd IEEE Workshop on
Applications of Computer Vision, 1994.
[4] B. Scholkopf, S . Mika, C. Burges,
P. Knirsch, K.-R. Muller, G. Ratsch, and
A. J. Smola. Input space vs feature space in
kernel-based methods. IEEE lkansactions
on Neural Networks, 1O:lOOO-1017,1999.
[5] B. Scholkopf, A. Smola, and K.-R. Muller.
Nonlinear component analysis as a kernel
eigenvalue problem. Neural Computation,

10:1299-1319,1998.
[6] A. J. Smola, 0. L. Mangasarian, and
B Scholkopf. Sparse kernel feature analysis. Technical Report 99-04, University of
Wiscosin Madison, 1999.

319

16

17

1*

l8

19

11

10

10

15

15

16

16

19

19

19

11

11

12

14

15

IS

a0

2D

ZD

11

12

12

13

14

15

20

20

12

12

13

14

14

20

20

-20

Figure 1: The grid of points were not shown


t o the KSOM during training but are used to
identify which node is closest to each point in
feature space. The number at each point on the
grid identifies the winning neuron in feature
space. There 19 a clear topographic ordering of
the data.

Fourth btmrational Confience on bledge-SaSed Intelligent E n g i m h g Systems b Allied T c h m w , 30 Aug-1 %t 2000, Brigh&n,UK

95

1 1 12

10-

13

,4

8-

P
8-

6-

15

ZL

7-

16

n
8

6-

4-

56

20-

46

-24

I
1

-6-

-4-

18

4I

- -10

-2

10

Figure 2: The KSOM identifies the two concentric data sets.

320

Figure
3: Each individual is identified by an
integer 1,...,6. The nodes are arranged in a two
dimensional *id as they wer,l during training.
We see that each individual person is identified
by a specific region of feature space.

Anda mungkin juga menyukai