Oral10 1

Artificial Neural Network Learning
---
A Comparative Review
Costas Neocleous
Higher Technical Institute, Cyprus
costasucy.ac.cy
Christos Schizas
University oI Cyprus, Cyprus
schizasucy.ac.cy
%his is an attempt to present an organized
review of learning techniques as used in
neural networks, classified according to
basic characteristics such as functionality,
applicability, chronology, etc.
Outline
Outline
The main objectives are:
To identiIy and appraise the important rules and to establish
precedence.
To identiIy the basic characteristics oI learning as applied to
neural networks and propose a taxonomy.
IdentiIy what is a generic rule and what is a special case.
To critically compare various learning procedures.
To gain a global overview oI the subject area, and hence
explore the possibilities Ior novel and more eIIective rules or
Ior novel implementations oI the existing rules by applying
them in new network structures or strategies.
Attempt a systematic organization and generalization oI the
various neural network learning rules.
These have been implemented with diIIerent approaches or tools
such as basic mathematics, statistics, logical structures, neural
structures, inIormation theory, evolutionary systems, artiIicial
liIe, and heuristics
Introduction
An abundance oI learning rules and procedures, both in the
general AR%IFICIAL IN%LLIGNC context and in speciIic
subIields oI machine learning and neural networks exist
Many oI the rules can be identiIied to be special cases oI more
generalized ones. Their variation is usually minor. Typically, they
are given a diIIerent name or simply oI diIIerent terminology and
symbolism
Introduction
Some learning procedures that will be reviewed are:
Hebbian-like learning:
Grossberg, Sejnowski, Sutton, Bienenstock, Oja & Karhunen,
Sanger, Yuile et al., Hasselmo, Kosko, Cheung & Omidvar, .
Reinforcement learning
Min-max learning
Stochastic learning
Genetics-based learning
Artificial life-based learning
ebster`s dictionary:
%o learn is to gain knowledge, or understanding of, or skill in,
by study, instruction or experience
Learning
Definitions:
In the general AI context:
Learning is a dynamical process by which a system
responding to an environmental influence, reorganises itself
in such a manner that it becomes better in functioning in the
environment
Learning
Can be unsupervised or supervised
Unsupervised:
NVIRONMN%
LARNING
SYS%M
Input state
INFLUNC
NVIRONMN%
Output state
%he system learns if the output state is such that the
functioning of the system (in the environment), will be
more advantageous than that of the previous input state.
OU%COM
$upervised:
LARNING
SYS%M
Input state
NVIRONMN%
INFLUNC
NVIRONMN%
OU%COM
%ACHR
Comparison
Desired
state
Learning
Output state
%he system learns if the output
state is such that the system will
be more advantageous than that
of the previous input state.
Learning denotes changes in a system that are adaptive in the
sense that they enable the system to do the same task(s) drawn
from the same population more effectively the next time
Simon H. The Sciences of the Artificial. MIT Press, Cambridge, MA (1981)
Learning
In machine learning:
or
Learning involves changes to the content and organization of a
system`s knowledge, enabling it to improve it`s performance on
a particular task or set of tasks
A computational system learns from experience with respect
to a class of tasks and some performance measure, if its
performance for some task(s), as evaluated by the
performance measure, improves with experience
Learning in artificial neural systems may be thought of as a
special case of machine learning
Learning in neural networks
Learning
In most neural network paradigms a somewhat restrictive
approach to learning is adopted.
%his is done by systematically modifying a set of suitable
controllable parameters, the so-called synaptic weights.
A more general approach to neural learning is proposed by
Haykin:
Learning is a process by which the free parameters of a
neural network are adapted through a continuing process of
stimulation by the environment in which the network is
embedded. %he type of learning is determined by the manner
in which the parameter changes take place
An alternative, more general approach:
'Learning is achieved through any change, in any characteristic of a neural
network, so that improved meaningful results are achieved
Synaptic weight modification
Network structure modifications (creating or deleting neurons or synaptic
connections)
Use of suitable attractors or other suitable stable state points
Learning through forgetting
Appropriate choice of activation functions
Modifying controllable parameters in a look-up table defining an
activation scaling
Combinations of such rules (e.g. competitive learning systems)
Through:
Learning as optimization
The majority oI learning rules are such that a desired objective is
met by a procedure oI minimizing a suitable associated criterion
(also known as Computational energy, Lyapunov Iunction, or
Hamilton Iunction), whenever such exists or may be constructed, in
a manner similar to the optimization procedures.
Many methods have been proposed Ior the implementation oI the
desired minimization, such as
0
th
order
1
st
order gradient-descent (Newton`s, Steepest-descent)
Damped Newton (Levenberg-Marquardt)
Quasi-Newton (Broyden-Fletcher-Goldfarb-Shanno, Barnes-Rosen)
Conjugate gradient methods
Many oI these rules are special cases oI a generalized
unconstrained optimization procedure, brieIly described:
For a neural network described by
the optimization procedure interpreted as learning may be
deIined as Iinding a W* that minimizes the perturbed
computational energy criterion given by
, ,
cost

5erturbation
where, is the network output, the network input,
cost
a
suitable cost (error, objective, or computational energy) Iunction,
and
5erturbation
a shake-up component used to enable the system
to hopeIully escape Irom local minima.
y

t , ,
II is continuous in the domain oI interest, the minima oI with
respect to the adaptable parameter (weights), , are obtained when
the gradient oI is zero, or when:
\
0
An exact solution oI above is not easily obtained an it is not usually
sought.
DiIIerent, non-analytical methods Ior Iinding the minima oI have
been proposed as neural learning rules. These are mainly
implemented as iterative procedures suitable Ior computer
simulations.
The general iterative approach is:
Starting Irom a
(0)
Iind
(0)
, then,
|i1| |i| p
i
/
i
where
p
k
is the search step
/
is the search direction

II |i1| is less than |i|, keep the change and repeat until an
minimum is reached.
The search direction /
i
and the search step p
i
may be randomly
picked thus leading to a stochastic search approach.
Alternatively, /
i
may be guided (through an intelligent drive/guess)
so that (hopeIully) a speedier search may be implemented.
Typically, /
i
is proportional to the gradient (1
st
order methods), as
Ior example in the steepest descent, damped Newton (Levenberg-
Marquardt), quasi-Newton (Broyden-Fletcher-GoldIarb-Shanno,
Barnes-Rosen), conjugate gradient or it is proportional to the Hessian
(2
nd
order methods).
A popular approach used in artiIicial neural network learning in
order Ior the network to reach these minima, is based on allowing
multi-dimensional dynamical systems to relax, driven by a scaled
gradient descent.
In such a case, the system is allowed to settle by Iollowing its
trajectories.
It will then, hopeIully, reach the minima oI the hypersurIace deIined
by .
A general parameter adaptation approach, which is a generalization
oI previous, is as shown below.
f

,

- \

The Iunction f is so-speciIied that it drives the system to acceptable
minima.
It is rarely needed to be oI higher than second degree, and in most
cases a Iirst degree model is used.
Let a second-degree dynamical system that is Iorced to seek the
desired minima, in which the input oI the system is the negative oI
the gradient oI (gradient descent):
where
ot and t are positive real-valued Iunctions
% a suitable matrix.
The above equation may be considered as a generalized second order
learning equation based on gradient descent.
ot

t%

- \

SpeciIic instances oI this equation, as maybe used in optimization-
learning:
5erturbation
0
II ot and t = 0 Second degree optimization
II ot 0, % positive deIinite and t
= 0 First degree optimization

II ot 0, % I and t p
-1
Steepest descent method
II ot 0, % \
2
and t 1 Newton`s method
II ot 0, % \
2
t and t 1 Levenberg-Marquardt method
5erturbation
= 0
In this case diIIerent stochastic gradient techniques are obtained.
The perturbation is generally used as a 'shake-up that will hopeIully Iorce the
network to escape Irom local minima. As this is approached, the perturbation in
is gradually reduced to zero so that the system reaches a state near the global
minimum and settles there.
At the end oI the procedure the network becomes deterministic.
A commonly used Iorm Ior the perturbation is
where ct is a suitable decaying Iunction used to gradually reduce the eIIects oI
noise and
t is noise applied to each neuron .
5erturbation
ct
1
n
y
t
ne may wonder how memory, mapping, classification,
categorization, recognition, knowledge discovery and creativity
are related to learning as deIined above, and to what extend can
these traits be considered as learning tasks.
May be done based on diIIerent characteristics. Such characteristics
can be (among other possible Ieatures) the degree oI resemblance to
biological learning, the extend oI applicability, the degree oI
external guidance/supervision, the type oI adaptable parameters, the
degree oI 'rigidity oI the neural structure, the degree oI dynamical
system evolution, the degree oI stochasticity, and Iinally on whether
it is algorithmic or non-algorithmic.
%axonomy of neural learning rules
Characteristic feature Comment
The degree to which a neural
learning paradigm resembles
learning in biological systems
There is no universal agreement among
researchers on what constitutes biological
learning and how it is implemented.
Rules that cannot be autonomous, cannot
be considered as belonging to this class,
unless one emphasizes a speciIic local
interaction (e. g. the Hebbian locality).
All algorithmically deIined rules (PAC,
EM, Boosting, .) cannot be included in
this category.
Typical rules oI the class are the basic
Hebbian, as well as Hebbian-like rules
used in spiking neuron networks.
Extend oI applicability Learning rules may be classiIied
according to their depth oI applicability.
n whether the rule applies to diverse
environments, or to some special cases.
xternal guidance during
learning
The process oI adaptation may be
externally guided by a teacher, in which
case it is known as supervised training or
internally, in which case it is known as
unsupervised training.
It is debatable whether truly unsupervised
learning does exist.
Typical learning rules that may be used in
unsupervised manner are those used in
selI-organized maps, in learning vector
quantizers, in principal component
analysis (PCA) and in independent
component analysis (ICA) procedures.
The type oI adaptable
parameters
Learning rules may be classiIied
depending on whether the parameters that
are adapted are the synaptic weights or
any others such as some activation
Iunction characteristics (slope, amplitude,
oIIsets, .).
The degree oI 'rigidity oI the
neural structure
InIlexible structures (hardwired systems)
A random generation oI parameters, is
hoped to give some meaningIul results.
(there is no learning procedure as such).
Constructive learning (growing networks)
Groups oI neurons (layers, slabs .) or
individual neurons or connections are
added in the network during training.
(e.g. Cascade Correlation and its variants
such as up-start, tiling, etc., the Boosting
algorithm, .)
Destructive learning (shrinking networks)
Groups oI neurons (layers, slabs .) or
individual processing units (neurons) or
connections are removed Irom a network
during training (pruning)
The degree oI evolution as a
dynamical system
ClassiIication on whether the learning
rule/algorithm is expressed in terms of
differential equations where some time-
dependent evolution is implemented.
Learning with non-dynamical equations
does not involve time evolution, delays or
recurrencies. Instead, the various
parameters are changed in a nearly
instantaneous manner.
The degree oI stochasticity
employed
The neural learning rules may or
may not include stochastic
elements (eg Simulated Annealing,
Boltzman machines .).
n whether learning is
algorithmic or
non-algorithmic
Rules may be algorithmic (Genetic
algorithm-based, artiIicial liIe-based,
growing and pruning algorithms, .), in
the sense that a sequence oI procedures is
needed to deIine the rule.
Non-algorithmic rules are those that can
easily be expressed with a mathematical
equation, such that the system may grow
autonomously.
This is a rather artiIicial distinction, and
Irom a practical point oI view, the end
result is what counts most.
Hebbian (and many of its special cases as depicted later)
Reinforcement learning
Min-max
Stochastic
Stochastic search in combination with steepest descent
Genetics based
Artificial life based
In this taxonomy the rror Correction and the Competitive
rules (specified as distinct by Haykin) are considered as special
cases of the generalized Hebbian
A proposed taxonomy could be:
Library of neural network learning rules
SUB-GROUP RUL (Discrete - time) COMMN%S
Simple Hebbian
sr
p
r
y
s
y
r

Known as activity product
rule, or passive decay long
term memory
rror correction Hebbian

Delta rule (idrow & Hoff, 1960)
sr
p
r
y
s
e
o

Backpropagation and variants

For single-layer nets oI linear units:
sr
p
r
y
s
e
o

For multi-layer Ieed Iorward nets oI non-linear
units:
sr
p
r
y
s
o
r
where a diIIerent o
is used Ior each neuron r oI

layer as speciIied below:
o
r
|L|

s
|L|
u
r
|L|
'
'
r
o
r
|L1|

sr
|L1|

It is a gradient descent
Least mean square error
reduction procedure, closely
related to statistical
optimization procedures.

Some other
backpropagation variants
are:
RPRP, Quickprop,
Minkowski-r
Recirculation, ELEANNE,
Delta-bar-delta, Super SAB,
.

HBBIAN
Grossberg
(Grossberg, 1969)
sr
y
s
(y
r
- o

sr
)
(Grossberg, 1970)
sr
oy
r
(y
r
-

sr
)
(Grossberg, 1976)
sr
p
r
y
r
(y
s
-

sr
)

o o(t) is a decreasing Iunction.
Also known as the simple
competitive learning rule,
or simple competitive -
cooperative learning rule.
A similar rule is used in
selI-organization and ART.

Bienenstock et al. (1982)
sr
- o
sr
y
s
y
r

This rule induces a
temporal competition
between the input patterns.

Normalized Hebbian (Oja, 1982)
sr
oy
r
y
s
- y
r
2
sr

This is a gradient descent -
principal component rule.

Symmetrical subspace learning rule
(Oja & Karhunen, 1985)
sr
oy
r
y
s
- y
r
h1
5

rh
y
h

It is a principal component
learning rule closely related
to the ja-gawa-
Wangviwattana rule and the
stochastic gradient ascent
rule.

Sanger (1989)

sr
p
r
(y
r
y
s
- y
r
h1
s

rh
y
h
)
It is a principal component
learning rule also called the
generalized Hebbian.

HBBIAN
HBBIAN
Yuile et al (1989)
sr
p
r
(y
r
y
s
- ,,
sr
,,
2
sr
)

Hasselmo (1994)
sr
py
s
(y
r

i1
n
sr
y
s
)

Differential Hebbian - Covariance Correlation
sr
p
r
Covariance((y
s
- y

s
)(y
r
-

y

r
))

Also known as the activity
covariance rule

Differential Hebbian - Covariance Correlation
sr
p
r
y

s
(y
r
-

y

r
)

Differential Hebbian - Drive Reinforcement
sr
y
r
h
ut - h ,,
sr
t - h ,,Ay
s
t - h
Where ut - h is a decreasing Iunction oI time.
The basic Iorm oI a
diIIerential Hebbian is:
sr
y
r
|k| y
r
|k-1|

where,
y|k| y|k| - y|k-1|

RINFORCMN% LARNING

sr
p
r
( - r
r
)e
sr

where, c
success
Iailure

r
r
reinIorcement threshold value
e
sr
canonical eligibility oI the weight
Irom neuron s to r
lng
s
sr

g
s
Pr(y
r
d
r
,
r
, x)
The rule is related to the
error correction learning.
The Adaptive Heuristic
Critic and the Associative
Reward - Penalty are two
similar paradigms oI this
rule.

ReinIorcement learning was
also considered by Minsky.

MIN-MAX LARNING
SUB-GROUP RUL (Discrete - time)

For the minimum vector:

v
sr
|k1| min(y
s
- v
sr
|k|)

For the maximum vector:

sr
|k1| max(y
s
-
sr
|k|)

S%OCHAS%IC LARNING

Stochastic search by random modiIication oI the
parameters (weights) until a suitable energy (or
criterion) Iunction settles to a desired value.

This search can be purely
random, or guided as in the
simulated annealing,
Boltzman machine,
Cauchy machine and
mean field annealing.

It is similar to the Quasi-
Newton method

S%OCHAS%IC HBBIAN
SUB-GROUP RUL (Discrete - time)

Hebbian Annealing

A stochastic local Hebbian rule in which the weights are changed depending on
their score in a suitable Iunction.

GN%ICS BASD LARNING

Evolutionary techniques (genetic algorithms) are used to Iind weights and other
parameters, or to prune or grow neural structures

AR%IFICIAL LIF BASD LARNING

The particle swarm optimizer is one such learning procedure

Concluding remarks
%he problem of neural system learning is ultimately very important in
the sense that evolvable intelligence can emerge when the learning
procedure is automatic and unsupervised.
%he rules mostly used by researchers and application users are of
gradient descent type
%hey are closely related to optimization techniques developed by
mathematicians, statisticians and researchers working mainly in the
field of ~operations research
A systematic examination of the effectiveness of these rules is a matter of
extensive research being conducted at different research centers.
Conclusive comparative findings on the relative merits of each learning
rule are not presently available.
Concluding remarks
%he term ~unsupervised is debatable depending on the level of scrutiny
applied when evaluating a rule. It is customary to consider some
learning as unsupervised when there is no specific and well defined
external teacher
In the so-called self-organizing systems, the system organizes apparently
unrelated data into sets of more meaningful packets of information
Ultimately though, how can intelligent organisms learn in total
isolation? Looking at supervisability in more liberal terms, one could
say that learning is not well-specified supervised or unsupervised
procedure. It is rather a complicated system of individual processes that
jointly help in manifesting an emergent behavior that ~learns from
experience

Oral10 1

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Oral10 1

Diunggah oleh

Hak Cipta:

Format Tersedia

Artificial Neural Network Learning

is the search direction

= 0 First degree optimization

t is noise applied to each neuron .

is used Ior each neuron r oI

Anda mungkin juga menyukai