Imed CHIHI
January 1998
en s i
National School for Computer Sciences
1
.
2
Abstract
Kohonen neural nets are some kind of competitive nets. The most commonly known variants
are the Self-Organizing Maps (SOMs) and the Learning Vector Quantization (LVQ). The
former model uses an unsupervized learning, the latter is an ecient classier.
This paper tries to give, in simple words, a clear idea about the basis of competitive neural
nets and competitive learning emphasizing on the SOMs and some of their real-world applica-
tions. It should not be considered as an exhaustive reference but as a simple introductory.
Contents
3
1 Introduction Competition layer
Competition is one of the most common neu- 2.2 Learning and Competitive
rocomputing paradigms. It was rst used Dynamics
in Rosenblatt's networks and studied later by
Grossberg (1960). To make a long story short, Each cell on the competition layer receives the
competition consists in letting neurons \ght" same set of inputs. If an input pattern x is pre-
till one wins, and then to \reward" the winner. sented to the input layer, each neuron would
calculate its activation as follows,
2.1 Architecture X
s= wi xi
Competitive nets are structured (in their sim- i
plest form) as a couple of layers. One for input,
all it does is to re
ect the input patterns to the One of the neurons, say k, will have the high-
whole network just like the input layer in Multi- est activation. The lateral connections are the
Layer Perceptrons (MLPs). The second layer means by which competition occurs. That is,
is for competition, each neuron receives some node k will get even more and more active any
excitation from the input and produces some time the vector x is presented. If this process
response. The response of a neuron c to the is iterated with the same vector x, node k will
excitation varies depending on the weights wci strengthen its activation while others will have
where i covers all the input layer neurons. their activation falling o to zero. Note that the
4
lateral connections learn nothing, they are there with biological nets is that cells trained to rec-
just to specify the concurrency topology, that is, ognize the same pattern tend to come close to
what neurons should get active together. one another. In the visual cortex, cells tuned to
the same orientation are placed vertically below
initial
each other perpendicularly to the surface of the
cortex.
final
4 Self-Organizing Maps
The SOM denes a mapping from the input data
k nodes indices
space <n onto a two-dimensional array of nodes.
To every node i, a reference vector mi 2 <n is
Figure 2. With competitive dynam- assigned, in competitive networks jargon this is
ics, the situation evolves in such a way called a codebook. An input vector x 2 <n is
that the winning cell gets more and compared with the mi and the best match is de-
more specialized for a given input pat- ned as \response": the input is thus mapped
tern. onto this location [2]. The best matched one is
The dynamics are governed by the dierential the closest to vector x according to some metric.
equation: 0 1 w
dwij = ,w (t) + f (Y )(X , w ) BB wii12 CC
dt ij i j ij
mi = B@ ... CA
The factor (Xj , wij ) makes wij go close to win
Xj , where f (Yi ) insures that a non-active neu-
ron remains non-active so that only the winning Figure 3. Weights vector associ-
nodes gain more activation. ated with a node i from the competi-
In the performance phase of a competitive tion layer. This expression of mi as-
learning network, the index of the winning neu- sumes that the input data space is n-
ron is typically the output of the network. dimensional.
We can say that the SOM is a projection of
3 The Biological Back- the probability density function of the high-
ground dimensional input data space onto the two-
dimensional display (the map). Let x be an in-
It often occurs that sensory inputs may be put data vector, it may be compared with all
mapped in such a way that it makes sense to the mi in any metric, in practice the Euclidean
talk of one stimulus being `close to' another ac- distances kx , mi k are compared to pick up the
cording to some metric property of the stimulus best matched node (the one that minimizes the
[4]. In the visual area 1 of the mammalian brain, distance). Suppose now that the node closest to
cells are sensitive to the orientation of the pre- x is denoted with the subscript c [2].
sented pattern. That is, if a set of alternating During learning, the nodes that are topo-
black and white lines is presented, a particular graphically close in the array up to a certain
cell will respond the most strongly when the set distance will activate each other to learn from
has a particular orientation. The response will the same input. Kohonen has shown that the
fall quickly as the lines of the set are moved o suitable values of mi are the limit of the conver-
that preferred orientation to some other ones. gence of the following learning process (at each
This was practically established by the work of presentation of a new input learning vector, mi 's
Hubel and Weisel (1962). To make the close- are updated):
ness make sense, one should dene some metric
or measure of the stimuli. mi (t + 1) = mi (t) + hci (t)[x(t) , mi (t)]
Competitive nets simulate this behavior by
making each cell learn to recognize some par- where t is the discrete-time coordinate. hci is
ticular input patterns. An interesting feature the neighborhood kernel; it is a function dened
5
over the competition layer's neurons, usually it
looks like hci (t) = h(rc , ri ; t), where rc and
ri 2 <n are the radius vectors of nodes c and i,
respectively, it is a measure of the topographic
distance between i and c. As that distance in-
creases, hci ! 0. The most used neighborhood
distributions are the bubble and the Gaussian
forms.
hci
(b)
Figure 5. Neighborhood boundaries,
(a) hexagonal and (b) square.
6
5.2 LVQ2
The LVQ2 algorithm attempts to better approx- wj (t + 1) = wj (t) + (t)[x(t) , wj (t)]
imate the decision boundaries by making adjust-
ments to pairs of codewords. For each train- For k 2 fi; j g, and cx = ci = cj
ing vector x with class cx , LVQ2 uses a nearest
neighbor selection scheme to choose the closest wk (t + 1) = wk (t) , (t)[x(t) , wk (t)]
(winning) codeword ww with class cw , and the
second closest (runner up) codeword wr , with The scalar learning constant depends upon
class cr . If the class of x is dierent from the the size of the window. Reasonable values for
winning class cw but the same as the runner up range between 0.1 and 0.5. With LVQ3, code-
class cr then the codewords are modied accord- words placement does not change with extensive
ing to the following equation. learning. It is said to be self-stabilizing [6].
If (cx 6= cw ); (cx = cr ) and x falls within the
\window",
ww (t + 1) = ww (t) , (t)[x(t) , ww (t)] 6 Applications
6.1 SOMs in High Performance
wr (t + 1) = wr (t) , (t)[x(t) , wr (t)] Computing
For the above equation to apply, the train- Tomas Nordstrm from the Department of Sys-
ing vector x must fall within the boundaries of tems Engineering and Mathematics, Lulea Uni-
a \window" dened in terms of the relative dis- versity of Technology, Sweden is leading a team
tances dw and dr from ww and wr respectively. working on the design of parallel computer ar-
This \window " criterion is dened below, chitectures to support neural networks algo-
rithms [7]. They are building a parallel com-
min( ddw ; ddr ) > s; puter especially tuned for SOM, which they
called REMAP.
r w
7
MasPar Uses between 1024 and 16384 PEs.
Achieves a peak performance of 1500
MFlops (on the 16k-PE version).
8
7.d) the mappings seemed to \approach" the
fault region.
(b) F
F F F F F F
(c)
F F F F F F
(d)
Figure 7. The best-matching cell tra-
jectory along the training. (a) For all
the 3840 training patterns. (b) For vec-
tors representing correct performance
F
9
all layer. This mechanism, or some minor vari-
ant, is a key component in Grossberg's Adap-
tive Resonance Theory (ART), Fukushima's
neo-cognitron and Kohonen's nets that develop
topological feature maps.
F F F
F
F F F
Acknowledgment
I would like to thank miss Fatma Chaker for her
(e) Component plane 5 valuable help. This work was supervised by Dr.
Figure 8. Distribution of the weights' Rak Brahem as part of a Master Degree course
vectors per plane of components. Cells on Neural Networks at the National School for
marked with 'F' denote a faulty perfor- Computer Sciences, Tunisia.
mance state.
Looking at Figure 8. e, we can deduce that when References
the parameter with subscript 5 in the input data [1] Jean-Francois Jodouin. Les reseaux neu-
vectors is high, the device is likely to operate in romimetiques (pp. 111-130). Hermes, Paris,
the sane region. 1994.
8 Conclusion [2] T. Kohonen, J. Hynninen, J. Kangas, J.
Laaksonen. SOM PAK The Self-Organizing
Competitive nets are a kind of hybrid, where a Map Program Package. Helsinki University
feedforward structure contains at least one layer of Technology, 1995.
with intra-layer recurrence. Each node is con- [3] T. Kohonen, J. Hynninen, J. Kangas, J.
nected to a large neighborhood of surrounding Laaksonen. LVQ PAK The Learning Vector
nodes by negative or inhibitory weighted inputs, Quantization Program Package. Helsinki
and to itself (and possible a small local neighbor- University of Technology, 1995.
hood) via positive or excitatory inputs. These
lateral connections are usually xed in magni- [4] Kevin Guerney. Neural Nets. Psychology
tude and do not partake in the learning process. Department, University of Sheeld, 1996.
It is also connected to the previous layer by a set
of inputs which have trainable weights of either [5] Warren Sarle. comp.ai.neural-nets.
sign. The point of the lateral recurrent links is Neural nets newsgroup, 1997.
to enhance an initial pattern of activity over the [6] Edward Riegelsberger. Context-sensitive
layer, resulting in the node which was most ac- vowel recognition using the FSCL-LVQ
tive being turned fully \on", and the others be- classier. Master Degree thesis. Ohio State
ing turned \o"; hence the label \competitive" University, 1994.
or \winner-take-all" being applied to these nets.
The object here is to encode groups of pat- [7] Tomas Nordstrm. Designing parallel com-
terns in a 1-out-of-n code, where each class is as- puters for self organizing maps. Lulea Uni-
sociated with an active node in the winner-take- versity of Technology, Sweden, 1992.
10