Anda di halaman 1dari 21

Elaborating the Theory of ANN

Chapter 1

1.3 Activation and Learning of a Neuron


A neural network has two phases; a training phase and
a working phase.

Training Phase:
 In the training phase a neuron is provided with a
sample pattern X=(x0, x1,..., xn), say, and it is asked to
learn for it.
 Mathematically, training means that the neuron is
asked to solve the following equation for weights
W=(w0, w1,..., wn), say.

W.X=T (1)

where T is a given constant and is known as target


input.
 As a result of the training, the neurone finds out
the vector W. Actually there exists infinite number of
solutions of the equation (1). The neurone achieves
one of these. By this achievement we mean that the

1
neuron has learnt for the pattern. All the neurones of
a network learn in the same way.

Working Phase:
 In the second phase, the trained network is set to
work. On its input lines an actual pattern is passed
which most probably is unknown to the network. The
pattern passes simultaneously to all the neurones of
the layer next to the input layer.
 What the neuron does is that it reconsider the
equation (1), computes the scalar product W.X,
compares its resultant with T, and fires if it finds that
the resultant belongs to certain predetermined
neighbourhood of T. This completes the basic concept
of the activation.

2
Summary: (Design Topology and Training of
a Perceptron)

We can design and train a network which can tell us the


apples from the lemons where all the apples have
features greater than that of the lemons.

Topology of the Network (design):


This problem can be solved by designing a feed forward
ANN having the number of input neurones equal to the
number of features of apple/lemon class (plus one bias
input; optional), and one or two output neurones as
shown in Figures(3).

3
input layer
input layer
x0
w0 x1 w1
output layer output layer
x1 w1
x2 w2
w2
x2

b) Thresholding at X
a) Thresholding at 0

input layer
x0 w0 input layer
output layer
output layer
x1 w1
w0
x1 w1 (class A) w2
w1 (class A)

w2 w1
x2 x2
(class B)
w2 (class B)
w2

c) Thresholding at 0 d) Thresholding at X

Figure 3: Variation in Topology and Thresholding of an ANN

4
Learning Method:
Learn from experience/mistakes. That is, apply some
iterative error and trial algorithm.

Supervision:
The network is supervised to produce output as follows
output = 1, if we present a sample from class A
(apples)
output = 0, if we present a sample from class B
(lemons)

5
Training procedure:
For training purposes, we collect many (all possible)
samples of each class to serve as exemplars.

We present all samples to the network one sample at a


time; first for one class then for the other.

A sample consists of a set of features that characterise


an object.

Let each apple and lemon be characterised by its weight


and size, denoted by x1 and x2, respectively. The
objective of training is to find a straight line as drawn in
the Figure (4).

x2 *
X X
(weight)
* X
X
Class A (apples)
* *
X
*
Class B (lemons)
X X
* x1 *
0 (size)

Figure 4: Status of the trained network.

1.5 Geometry of St. Line and the Hebbian Rule

6
 From the geometry of a straight line, we know that
> 0 if (x1,x2) lies above the line.
x2+mx1+c is = 0 if (x1,x2) lies on the line.
> 0 then the line passes below the sample.
If w1x1+w2x2+ is = 0<then
0 if the(x
line,xpasses
) lies below
from the line.
the sample. X
1 2
Note that w1x1+w2x2+=0 x2=-(w1</w02)x
then the line passes above the sample.
1-/w2

or x2+mx1+c=0, where m=-w1/w2 and c=-/w2.


 Weights are initialised randomly. This is equivalent to
draw a straight line having a random angle and x 2 intercept
equal to -/w2.
 A sample (x1,x2) from the class A (suppose) is presented to
the network.
At this stage, the network has five pieces of information;
wis, xis, , supervision, and the target output

Supervision:

Produce correct output by upgrading wis (if necessary)


according to the rule

if correct wi(t+1):=wi(t)
if output 0, should be 1 (class A) wi(t+1):=wi(t)+xi(t)
if output 1, should be 0 (class B) wi(t+1):=wi(t)-xi(t)

7
1.6 The Training Steps with Graphical Support

Step 1. Input training sets and, where


A={( x 0 , x 1 , x 2 )}
.
x 0= y 0 =G

Step 2. Set . Randomly choose and .


w 0=1 w1 w2

x2
/w2 Randomly chosen weight vector

0 x1

Class A presentation:

8
Step 3. Retrieve first sample of class A and compute
.
net =w 0 x 0 + w1 x 1 +w 2 x 2

Apply activation. That is,

x2
X
if , do nothing.
f (net )=1
[i.e. these weights are good
for this sample]
0 x1

else if x2
f (net )=0 X
[i.e. these weights are not
good for this sample], then
w i (t +1 )=w i (t )+ x i
and goto step 3 0 x1

Step 4. Repeat step 3 for all samples of the class.

Class B presentation:

9
Step 3. Retrieve first sample of class B and compute
.
net =w 0 x 0 + w1 x 1 +w 2 x 2

x2 X
X X
Apply activation. That is,
if , do nothing. X X
f (net )=0
[i.e. these weights are good for * X X

this sample]
0 x1

else if x2 X
X X
f (net )=1
[i.e. these weights are not good X X
for this sample], then
w i (t +1 )=w i ( t )−x i * X X
and goto step 3
0 x1

Step 4. Repeat step 3 for all samples of the class.

10
Step 5. Save the final weight vector and delete
(w1 , w2 )

the training sets.

x2
(weight)
Class
Class A A (apples)

Class B

0 x1
(size)
Status of the Trained Network.

1.7 Describing Linear Activation of Classifier ANNs

Option-1:

11
When thresholding is shifted at , the actual threshold
0

acts as the bias value. That is . In this case


G x 0=G

. If actual pattern be denoted by , then


w 0=−1 ( x1 , x 2 )

input layer
x0
w0
output layer
x1 w1 1 or 0

w2
x2

for , the activation is given by


net=w 0 x 0 + w1 x 1 +w 2 x 2

1, when net ³0 : object belogs to class A


0, otherwise : object belogs to class B
¿
f ( net )=¿ { ¿ ¿ ¿
¿

Option-2:

12
When thresholding is at : If the actual pattern is
G

denoted by , then
( x1 , x 2 )

input layer

x1 w1 output layer

w2
x2

for , the activation is given by


net=w 1 x1 +w 2 x 2

1, when net ³ G : object belogs to class A


0, otherwise : object belogs to class B
¿
f ( net )=¿ { ¿ ¿ ¿
¿

Option-3:

13
When thresholding is at , actual pattern is , and
0 ( x1 , x 2 )

there are two output neurons.

input layer

x0 w0 output layer

w0 out=1
w1
x1 (class A)
w1
w2
out=1
x2
(class B)
w2

Thresholding at 0

For , the activations are given by


( x1 , x 2 )

Neuron 1: if , output=1 (i.e. the object belongs


f (net ) ³0

to class A).

Neuron 2: if , output=1 (i.e. the object belongs


f (net )< 0

to class B).

14
Option-4:
When thresholding is at and there are two output
G

neurons: If the actual pattern is denoted by , then


( x1 , x 2 )

input layer
output layer
x1 w1

out=1
w2 (class A)

w1
x2 out=1

w2 (class B)

Thresholding at X

For , the activations are given by


net=w 1 x1 +w 2 x 2

Neuron 1: if , output=1 (i.e. the object belongs to


f (net )³ G

class A).
Neuron 2: if , output=1 (i.e. the object belongs to
f (net )<G

class B).

1.8 Modification of the Hebbian Learning Rule

15
 Suppose that the network given in option-2 of
Section(1.7) has already been trained for class B (i.e
lemons) and we want to train it for class A (apples).

 Let =0, random weight vector (w1,w2) equals (-2,1),


and the first sample (x1,x2) equals (4,3).

 Then obviously the straight line -2x1+x2=0,


associated with the weight vector, is perpendicular to
the weight vector as shown below:

x2
w2
-2x1+x2=0

(4,3)X

(-2,1)
*
* *
* * *
x1
-2 -1 0 1 2 3 4
w1
Figure 5

It is clear from Figure(5) that the sample (4,3) lies below the
straight line. Obviously, the output is 0 in this case, but since
the network is supervised to upgrade weights as

16
repeatedly until the output becomes 1. So the
w i ( t +1 ):=w i (t )+ x i

second iteration yields the following status.


x2 *
w2 (2,4)
*
*
(4,3)X
2x1+4x2=0
*
*
*

-2 -1 0 1 2 3 4 x1
w1

Clearly, the line has come down from the point (4,3), resulting
in the output=1. Although, by attaining the weight vector (2,4),
the network has learnt for the first sample. But, unfortunately,
it has destroyed the learning for class B. This is due to the
unnecessary long jump of the weights. This can be avoided by

introducing a gain term as follows


h

for class A,
w i (t +1 ):=w i (t )+hx i

for class B,
w i (t +1 ):=w i (t )−hxi

where is a constant such that .


h 0<h≤1

1.9 Windro-Hoff Delta Rule

17
An other rule of a similar nature was suggested by Windrow
and Hoff. They realised that it would be best to change the
weights by a lot when the weighted sum is a long way from the
desired value, whilst altering them only slightly when the
weighted sum is close to that required to give the correct
solution.
They proposed a learning rule known as the Windrow-Hoff
delta rule, which calculates the difference between the
weighted sum and the required output, and call that the error.
Weight adjustment is then carried out in proportion to that
error. The error term can be written as
=d(t)-y(t)
where d(t) is the desired response of the system, and y(t) is the
actual response.
This takes the care of the addition or the subtraction, since if
the desired output is 1 and the actual output is 0, =+1 and so
the weights are increased. Conversely, if the desired output is 0
and the actual output is 1, =-1 and so the weights will be
decreased. Note that weights are unchanged if the net makes
the correct decision, since d(t)-y(t)=0.

The rule is given by


=d(t)-y(t)

18
wi(t+1)=wi(t)+ xi(t)
d(t)=1, if input from class A and
d(t)=0, if input from class B.

Flow Chart of Windro-Hoff Delta rule[1]

Start

Initialise weights randomly and set bias weight to 1

Present training set and target outputs

Process all samples

Yes Is
output correct ?

No

=d(t)-y(t)X
wi(t+1)=wi(t)+ xi(t)X

Save wis and delete training setX

End

1.10 All Neurons can be Trained in the Same Way

19
Inputs from input layer
.
.

All the neurons, either hiddenInputs


or output,
. can
from be trained
hidden layer in the same way. We prove here
.
that the output of a hidden neuron is nothing more than a source of an additional input to
the neuron of the next layer. This input is treated as it were coming directly from the
input layer. For example, consider a three layer network having a single output neuron.

Let and denote the weight vector and signal respectively. We define the
W X
activation of the output neuron as
,
f (net )=out

Figure 6: A single output neuron

where is the activation function of the output neuron and .


f net=W . X

It should be noted that the weight vector is consists of and , the weight
W Wi Wh
vectors corresponding to the lines from input- and hidden layer, respectively.

That is, . Similarly, . So that


W =W i ÈW h X =X i ÈX h

W . X=W i . X i ÈW h . X h
,
n n+r r ³0
=∑ wi x i + ∑ wi x i
i=0 i=n+1

(1)
n r
=∑ wi x i + ∑ w k x k
i=0 k =1

,
=net i +net h
(2)

20
where and denote the net inputs corresponding to the input- and hidden
net i net h
layer, respectively. From relations (1) and (2), we have
Inputs from input layer (3)
r
.
net = w .x
h ∑ k k
k =1
Figure 7 A hidden neuron
where
m
x k =f h (net ikh )=f h ( ∑ w jk x jk ), k=1,2, .. . r
j =0

(4)

(output)

From relations (1) and (4). it is clear that the output of k th hidden neuron is nothing more
that an additional source of an input to the output neuron. Hence the proof.

21