Anda di halaman 1dari 62

FUNDAMENTALS OF NEURAL

NETWORKS

Dr.(Mrs.)Lini Mathew
Associate Professor
Electrical Engineering Department

The Biological Neural Network

Characteristics of Human Brain


Ability to learn from experience
Ability to generalize the knowledge it
possess
Ability to perform abstraction
To make errors.

The Biological Neural Network

Objective
To emulate or simulate the human brain.

Organization of Human Brain


Over one hundred billion neurons.
Over one hundred trillion connections
called synapses.
Neurons are responsible for thought
emotion, cognition etc.
Consists of a dense network blood
vessels.

Organization of Human Brain


A highly effective filtration system called
blood brain barrier.
A tight covering of glial cells around the
neurons.
The glial cells provide structural
scaffolding for the brain.
Brain is the most concentrated consumer
of energy in the body.

The Neuron
Fundamental
building block of the
nervous system
Performs all the
computational and
communication
functions within the
brain
A many inputs/ one
output unit

The Neuron
Consists of three
sections
cell body
dendrites
axon

Cell body
manufactures a wide variety of complex
molecules, to keep it renewed for a life
time
manages the energy economy of the
neuron
the outer membrane of the cell body
generates nerve impulses.
Cell body is 5 to 100 microns in dia

Dendrites

bushy branching structure emanating from the cell body.


Receive the signals from other cells at connection points called
synapses.
Usually no physical or electrical connection made at the
synapse.

Dendrites
Neurotransmitters which are specialized
chemicals are released by the axon, into
the synaptic cleft, diffuse across to the
dendrite.
Neurotransmitters are excitatory, which
tend to produce an output pulse.
Some are inhibitory, which tend to
suppress such a pulse
More than thirty neurotransmitters .

The Axon
may be as short as 0.1mm or it is 1m in
length.
has multiple branches each terminating in a
synapse.
Axons are wrapped in Schwann cells,
forming an insulating sheath known as
myelin.
This myelin sheath is interrupted every
millimeter or so, at narrow gaps called the
nodes of Ranveir.

The Axon
Nerve impulses which pass down the axon,
jump from node to node, thus saving energy.

The Cellular Membrane


About five nanometers thick and consists
of two layers of lipid molecules.
Embedded in the membrane are various
specific proteins.
Five classes of proteins:
i) Pumps
ii) Channels iii) Receptors
iv) Enzymes v) Structural Proteins.

Five Classes of Proteins


Pumps actively move ions across the cells
membrane to maintain concentration gradients.
Channels pass ions selectively and control
their flow through the membrane.
Receptors are proteins that recognize and
attach many types of molecules in the cellular
environment with great specificity.
Enzymes in or near the membrane speed a
variety of chemical reactions.
Structural Proteins interconnect cells and
help to maintain the structure of the cell itself.

Sodium Pump maintains the necessary


concentrations.
Potassium Channel increases the
potassium concentrations.
These two are responsible for the dynamic
chemical equilibrium and the cell assume an
electrical potential of -70 mV.
For the cell to fire, -70mV to -50mV and then
-50mV to +50mV
This polarity reversal spreads rapidly
through the cell, propagating down through
the axon to its synaptic connections.

Computers and Human Brain


Similarities
both operate on electrical signals
both are a composition of a large number of simple
elements.
both perform functions that are computational.
Differences
compared to s or ns time scales of digital computation,
nerve impulses are astoundingly slow.
The brains huge computation rate is achieved by a
tremendous number of parallel computational units, far
beyond any proposed for a computer system.
A digital computer is inherently error free, but brain often
produces best guesses and approximations from partially
incomplete and incorrect inputs, which may be wrong.

Computers and Human Brain


Desirable Characteristics of Human Brain
Massive Parallelism
Distributed Representation and Computation
Learning Capability
Generalization Ability
Adaptivity
Inherent Contextual Information Processing
Fault Tolerance
Low Energy Consumption

Computers and Human Brain


Modern Computer

Biological Neural System

Complex

Simple

High Speed

Low Speed

One or Few

A Large Number

Separate from a
Processor

Integrated into Processor

Localized

Distributed

Centralized

Distributed

Sequential

Parallel

Stored Programs

Self-Learning

Reliability

Very Vulnerable

Robust

Expertise

Numerical and Symbolic


Manipulations

Perceptual Problems

Operating
Environment

Well-Defined

Poorly Defined

Well Constrained

Unconstrained

Processor

Memory

Computing

Simple Neural Network


X = I1W1+ I2W2+ ----- + INWN
Activation Function
S = K(X)
K is a threshold function
ie. S = 1
if X > T
S = O otherwise
T is a constant
threshold value.

Activation Functions
Threshold Function
S=1
S=0

S
+1

if X > 0
if X 0
S = hardlim(X)

-1

hard-limit transfer function

Signum Function

S
+1

S=1
if X > 0
0
S = -1
if X 0
-1
S = hardlims(X)
symmetric hard-limit transfer function

Activation Functions
Squashing Function or Logistic Function or Sigmoidal Function.

1
S
aX
1 e
S=logsig(X)

X=0
X>0
X<0

S = 0.5
S=1
S=0

log-sigmoid transfer function

Activation Functions
Hyperbolic Tangent Function.

S = tanh(X)
X=0
X>0
X<0
S=tansig(X)

S=0
S=1
S = -1
tan-sigmoid transfer function

Activation Functions - MATLAB


S

Linear Transfer Function

+1

S = purelin(X)

-1

Positive Linear Transfer Function


S = poslin(X)

S
+1
0
-1

Activation Functions - MATLAB

Saturating Linear Transfer Function


S = satlin(X)

S
+1
-1

+1

-1
Symmetric Saturating Linear Transfer
Function

S = satlins(X)

S
+1
-1

0
-1

+1

Activation Functions - MATLAB


Radial Basis Function
S = radbas(X)

Triangular Basis Function


S = tribas(X)

McCulloch-Pitts Neuron Model


Formulated by Warren McCulloch and Walter
Pitts in 1943
McCulloch-Pitts neuron allows binary 0 or 1
states only ie.it is binary activated
The input neurons are connected by direct
weighted path, excitatory or inhibitory
The excitatory connections-positive weights,
inhibitory-negative weights
Neuron is associated with a threshold value

Single Layer Artificial Neural Networks

Multilayer Artificial Neural Networks

Training
Training is accomplished by sequentially applying input vectors while
adjusting network weights according to a predetermined procedures.
Supervised Training
requires the pairing of each input vector with a target vector
representing the desired output.
Unsupervised Training
requires no target vector for the output and no comparisons to
predetermined ideal responses. The training algorithm modifies
network weights to produce output vectors that are consistent. Also
called self-organizing networks.
Reinforcement Training
No target. The network is presented with an indication of whether the
output is right or wrong, with which it will improve its performance.

Classification of Learning Algorithms


Learning Algorithms

Supervised Learning
(Error Based)

Error Correction
Gradient Descent

Stochastic

Least Mean Square

Back Propagation

Unsupervised Learning

Hebbian

Competitive

Learning Rules
A neural network learns about its environment through
an interactive process of adjustments applied to its
synaptic weights and bias levels.
The set of well defined rules for the solution of a learning
problem is called a learning algorithm
Hebbian Learning Rule. Oldest and most famous of all
learning rules, designed by Donald Hebb in 1949.
Represents a purely feed-forward, unsupervised learning
If the cross product of output and input is positive, this
results in increase of weights, otherwise the weight
decreases.
The weights are adjusted as Wij (k+1) = Wij (k) + xi y

Characteristics of Neural Networks


Exhibit mapping capabilities. They can map input
patterns to their associated output patterns
Learn by examples. They can be trained with
known examples of a problem and therefore can
identify new objects previously untrained
Possess the capability to generalize. They can
predict new outcomes from past trends.
Are robust systems and are fault tolerant. They
can recall full patterns from incomplete, partial or
noisy patterns.
Can process information in parallel, at high speed
and in a distributed manner

Perceptron

Perceptron
Supervised Learning Algorithm - Weights are
adjusted to minimize error whenever the
computed output does not match the target
output.
Netj = xi Wij
yj = f(Netj)
ie. yj = 1 if Netj > 0
= 0 otherwise
Weight Adjustment: (i) if the output is 1, but
should have been 0, then Wij (k+1) = Wij (k) - xi
(ii) Otherwise Wij (k+1) = Wij (k) + xi
Successful only for linearly separable problems

Linear Separability
Netj = xi wi + b = x1 w1 + x2 w2 + b
The relation xi wi + b = 0 gives the boundary region of
the net input.
The equation denoting this decision boundary can
represent a line or plane.
On training, if the weights of training input vectors of
correct response +1 lie on one side of the boundary and
that of -1 lie on the other side of the boundary, then the
problem is linearly separable.

x1 w1 + x2 w2 + b = 0

b
w1
x2
x1
w2
w2

Linear Separability
Vectors to be Classified

Vectors to be Classified

1.5

0.5

0.5

P(2)

P(2)

1.5

-0.5

-0.5

-1

-1
-0.8

-0.6

-0.4

-0.2

P(1)

0.2

0.4

0.6

-0.8

-0.6

-0.4

-0.2

P(1)

0.2

0.4

0.6

ADALINE Network
Adaptive Linear Neural Element Network
Output values are bipolar (-1 or 1)
Inputs could be binary, bipolar or real valued
Bias weight could be +1
Learning algorithm (Delta Rule)
yj = 1 if Netj > 0
= -1 otherwise
Weight Adjustment:
Wij (k+1) = Wij (k) + (t-y)xi

MADALINE Network
Developed by Bernard Widrow
Multiple ADALINE Network
Combining a number of ADALINE Networks
spread across multiple layers with adjustable
weights
The use of multiple ADALINEs help counter
the problem of non-linear separability

Back Propagation Network


Developed by Rumelhart, Hinton, Williams
The Back propagation learning rule is
applicable on any feed forward network
architecture (multilayer also)
The Back propagation is a systematic method
of training, built on high mathematical
foundation and has very good application
potential.
Slow rate of convergence and local minima
problem are its weaknesses

Back Propagation Network


Ii1

Oi1

V11

Ih1

Oh1

W11
W21

V21 V
31

Io1

Oo1

W31

V12
Ii2

Oi2

V22

Ih2

Oh2

Io2

Oo2

Io3

Oo3

V32

Ii3

V23
Oi3

V33

V13
Ih3

Oh3

Back Propagation Network


Input Layer Computation
{O}i = {I}I
{I}h = [V]t {O}i
Hidden Layer Computation

O h

sigmoidal gain

fh threshold of
the hidden layer

1 e

I h fh

{I}o = [W]t {O}h

Back Propagation Network


Output Layer Computation
1

O o
I o fo
1 e

Calculation of error (Euclidean


1
Norm)
2
E To Oo
2

Back Propagation Network


E

Vector AB = (Vi+1 - Vi) + (Wi+1 - Wi) = V + W


E
E
AB
i
j
W
V

Back Propagation Network


E
E Oo I o

W
Oo I o W
E
T Oo
OO
OO
Oo 1 Oo
IO
IO
Oh
W
E
T Oo Oo 1 Oo Oh
W
E
W
W

Back Propagation Network


E
E Oo I o Oh Io

V
Oo Io Oh Io V
E
T Oo Oo 1 Oo WOh 1 Oh I i
V
E
V
V
E
i 1
W

W i
W
E
i 1
V
V i
V
W i 1 W i W i
V i 1 V i V i

Counter Propagation Network


Developed by Robert Hecht Nielson
The counter propagation is a combination of two
well-known algorithms: the self-organizing map of
Kohenen and the Grossberg outstar networks.
The weight adjustments between the layers follow
Kohenens unsupervised learning rule and
Grossbergs supervised learning rule.
Counter propagation networks are trained in two
stages:
(i) The input vectors are clustered on the basis of
Euclidean distances or by the dot product method.
(ii) The desired response is obtained by adopting the
weights from the cluster units to the output units.

Counter Propagation Network

Counter Propagation Network


During learning, pairs of the input vector X and
output vector Y are presented to the input and
interpolation layers, respectively.
These vectors propagate through the network in a
counter-flow manner to yield the competition weight
vectors and interpolation weight vectors.
Once these weight vectors become stable, the
learning process is completed.
The output vector Y of the network corresponding to
the input vector X is then computed.
The vector Y is intended to be an approximation of
the output vector Y, i.e. Y=Y=f(X).

Associative Memory
Developed by John Hopfield
Single layer feed forward or recurrent
network which makes use of Hebbian
learning or Gradient Descent learning rule
A storehouse of associated patterns
If the associated pattern pairs (x,y) are
different heteroassociative memory.
If x and y refer to the same pattern
autoassociative memory.
Heterocorrelators and Autocorrelators

Autocorrelators
Hopfield Associative Memory
Connection matrix is indicative of the association of the
pattern with itself m

A A
T
i

i 1

Autocorrelators recall
equation
Two parameter bipolar
threshold equation
Hamming Distance of
vector X from Y

old
anew
=
f
a
t
,
a
j
i ij
j

1, if > 0
f ( , ) = , if = 0
- 1, if < 0
HD x, y

x
i 1

yi

Heterocorrelators
Developed by Bart Kosko
Bidirectional Associative Memory ability to
recall stored pairs
Two-layer recurrent networks
There are N training pairs {(A1,B1), (A2,B2), --- }
Ai = (ai1, ai2, ai3 .. ain)
Bi = (bi1, bi2, bi3 .. bin)
m

Correlation Matrix M =

[ A ][ B ]
T
i

i =1

Heterocorrelators
Recall equation:
= (M)
= (MT)
(F) = G= g1, g2, g3, gn
F= (f1, f2, f3, fn )
( f i ) =

1, f i > 0
gi , fi = 0
-1, f i < 0

Character Recognition

Fabric Defect Identification

Adaptive Resonance Theory


ART was introduced by Stephen Grossberg
In the ART paradigm, autonomous learning and
pattern recognition proceed in a stable fashion in
response to an arbitrary sequence of input
paradigms.
Self-regulatory control structure is embedded into
competitive learning mode.
ART is capable of developing stable clustering of
arbitrary sequences of input patterns by selforganization.
ART structure is a neural network for cluster
formation in an unsupervised learning domain.

Self-Organizing Maps (SOMs)


Self-Organizing Maps (SOMs)
are a tool for clustering and
visualizing multi-dimensional
data.
Most common methods for
visualizing data only allow
you to see two or three
dimensions at a time, but
SOMs allow viewing dozens
dimensions simultaneously.
SOMs were invented by
Professor T. Kohonen

Competitive Network

Self-Organizing Maps (SOMs)


Kohenen worked in the development of the theory
of competition.
The mostly used competition among group of
neurons is Winner-Takes-All.
Here, only one neuron in the competing group will
have a non-zero output signal when the
competition is completed.
The self-organizing map, developed by Kohenen,
groups the input data into clusters which are
commonly used for unsupervised learning.

Self-Organizing Maps (SOMs)


Whenever an input is presented, the network
finds out the distance of the weight vector of
each node from the input vector, and selects the
node with the greatest distance.
In this way, the whole network selects the node
with its weight vector closest to the input vector,
i.e. the winner.
The network learns by moving the winning
weight vector towards the input vector while the
other weight vectors remain unchanged

Self-Organizing Maps (SOMs)


If the samples are in clusters, then every time
the winning weight vector moves towards a
particular sample in one of the clusters.
Eventually each of the weight vectors would
converge to the centroid of one cluster. At this
point, the training is complete.
After training, the weight vectors become
centroids of various clusters.

Clustering Technique
Vector Quantization is a method of dynamic
allocation of cluster centers

THANKYOU

Anda mungkin juga menyukai