Anda di halaman 1dari 61

www.Vidyarthiplus.

com

UNIT-1

ARCHITECTURE

1
www.Vidyarthiplus.com

What are Neural Networks?


• Simple computational elements forming a large
network
– Emphasis on learning (pattern recognition)
– Local computation (neurons)

• Definition of NNs is vague


– Often | but not always | inspired by biological brain

2
www.Vidyarthiplus.com

Machine Learning
Machine learning involves adaptive mechanisms
that enable computers to learn from experience,
learn by example and learn by analogy. Learning
capabilities can improve the performance of an
intelligent system over time. The most popular
approaches to machine learning are artificial
neural networks and genetic algorithms. This
lecture is dedicated to neural networks.

3
www.Vidyarthiplus.com

Biological neural network


Synapse

Synapse Dendrites
Axon
Axon

Soma Soma
Dendrites
Synapse

4
www.Vidyarthiplus.com

The neuron as a simple computing element


Diagram of a neuron

Input Signals Weights Output Signals

x1
Y
w1
x2
w2
Neuron Y Y

wn
Y
xn

5
www.Vidyarthiplus.com

Architecture of a typical artificial neural network

Out put Signals


Input Signals

Middle Layer
Input Layer Output Layer

6
www.Vidyarthiplus.com

n A neural network can be defined as a model of


reasoning based on the human brain. The brain
consists of a densely interconnected set of nerve
cells, or basic information-processing units, called
neurons.
n The human brain incorporates nearly 10 billion
neurons and 60 trillion connections, synapses,
between them. By using multiple neurons
simultaneously, the brain can perform its functions
much faster than the fastest computers in
existence today.

7
www.Vidyarthiplus.com

n Each neuron has a very simple structure, but an


army of such elements constitutes a tremendous
processing power.
n A neuron consists of a cell body, soma, a number of
fibers called dendrites, and a single long fiber
called the axon.

8
www.Vidyarthiplus.com

n Our brain can be considered as a highly complex,


non-linear and parallel information-processing
system.
n Information is stored and processed in a neural
network simultaneously throughout the whole
network, rather than at specific locations. In other
words, in neural networks, both data and its
processing are global rather than local.
n Learning is a fundamental and essential
characteristic of biological neural networks. The
ease with which they can learn led to attempts to
emulate a biological neural network in a computer.
9
www.Vidyarthiplus.com

n An artificial neural network consists of a number of


very simple processors, also called neurons, which
are analogous to the biological neurons in the
brain.
n The neurons are connected by weighted links
passing signals from one neuron to another.

10
www.Vidyarthiplus.com

Network Structure
• The output signal is transmitted through the
neuron’s outgoing connection. The outgoing
connection splits into a number of branches
that transmit the same signal. The outgoing
branches terminate at the incoming
connections of other neurons in the network.

11
www.Vidyarthiplus.com

Analogy between biological and


artificial neural networks

Biological Neural Network Artificial Neural Network


Soma Neuron
Dendrite Input
Axon Output
Synapse Weight
Synapse

Out put Signals


Input Signals
Synapse Dendrites
Axon
Axon

Soma Soma
Dendrites Middle Layer

Synapse Input Layer Output Layer

12
www.Vidyarthiplus.com

Course Topics
Learning Tasks

Supervised Unsupervised

Data: Data:
Labeled examples Unlabeled examples
(input , desired output) (different realizations of the
input)
Tasks:
classification Tasks:
pattern recognition clustering
regression content addressable memory
NN models:
perceptron NN models:
adaline self-organizing maps (SOM)
feed-forward NN Hopfield networks
radial basis function
support vector machines

13
www.Vidyarthiplus.com

Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized


– multi-layer feed-forward in acyclic layers
– recurrent

• The architecture of a neural network is linked with the


learning algorithm used to train

14
www.Vidyarthiplus.com

Single Layer Feed-forward

Input layer Output layer


of of
source nodes neurons

15
www.Vidyarthiplus.com

Multi layer feed-forward

3-4-2 Network

Input Output
layer layer

Hidden Layer

16
www.Vidyarthiplus.com

Recurrent network
Recurrent Network with hidden neuron: unit delay operator z-1 is
used to model a dynamic system

z-1

input
z-1 hidden
output

z-1

17
www.Vidyarthiplus.com

The Neuron
Bias
b
x1 w1
Activation
Local function
Field
Output
 ()
v
x2 w2 y
Input
values
…………. Summing
function

xm wm

weights

18
www.Vidyarthiplus.com

The Neuron
• The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with
weights W1, W2, …, Wm
2 An adder function (linear combiner) for computing the
weighted sum of mthe inputs
(real numbers):
u w x j j
j 1

3 Activation function (squashing function) 


for
limiting the amplitude of the neuron output.

y  (u  b)
19
www.Vidyarthiplus.com

Bias of a Neuron
• The bias b has the effect of applying an affine
transformation to the weighted sum u
v=u+b
• v is called induced field of the neuron

u 
x1-x2= -1
x2 x1-x2=0 x1  x2
x1-x2= 1

x1

20
www.Vidyarthiplus.com
Bias as extra input
• The bias is an external parameter of the neuron. It can be
modeled by adding an extra input. m

x0 = +1
w0
v wx
j0
j j

x1
w0  b
w1 Activation
Local function
Field
Input Output
 ()
v
signal x2 w2 y

Summing
………….. function

xm wm Synaptic
weights
21
www.Vidyarthiplus.com

Activation Function 
There are different activation functions used in different applications. The
most common ones are:

Hard-limiter Piecewise linear Sigmoid Hyperbolic tangent

1 if v  1 2
1 if v  0
 v      v   tanhv 
 v   v if 1 2  v   1 2
 v  
1
0 if v  0 0 1  exp( av)
 if v  1 2

22
www.Vidyarthiplus.com

Neuron Models
• The choice of determines the neuron model. Examples:
• step function:
a if v  c
 (v )  
• ramp function: b if v  c
a if v  c

 ( v )  b if v  d
a  (( v  c )(b  a ) /(d  c )) otherwise
• sigmoid function: 
with z,x,y parameters
1
 (v )  z 
1  exp(  xv  y )
• Gaussian function:
1  1  v   2 
 (v )  exp    

2   2   

23
www.Vidyarthiplus.com

Learning Algorithms

Depend on the network architecture:


• Error correcting learning (perceptron)
• Delta rule (AdaLine, Backprop)
• Competitive Learning (Self Organizing Maps)

24
www.Vidyarthiplus.com

Applications
• Classification:
– Image recognition
– Speech recognition
– Diagnostic
– Fraud detection
– …
• Regression:
– Forecasting (prediction on base of past history)
– …
• Pattern association:
– Retrieve an image from corrupted one
– …
• Clustering:
– clients profiles
– disease subtypes
– …

25
www.Vidyarthiplus.com

Supervised Learning

• Training and test data sets


• Training set: input & target

26
www.Vidyarthiplus.com

Perceptron: architecture
• We consider the architecture: feed-forward NN
with one layer
• It is sufficient to study single layer perceptrons
with just one neuron:



27
www.Vidyarthiplus.com

Single layer perceptrons


• Generalization to single layer perceptrons with more
neurons is easy because:

 

•The output units are independent among each other


•Each weight only affects one of the outputs

28
www.Vidyarthiplus.com

Perceptron: Neuron Model


• The (McCulloch-Pitts) perceptron is a single layer
NN with a non-linear , the sign function
 1 if v  0
 (v )  
 1 if v  0
b (bias)
x1
w1
v y
x2 w2
(v)
wn
xn

29
www.Vidyarthiplus.com

Perceptron for Classification


• The perceptron is used for binary
classification.
• Given training examples of classes C1, C2 train
the perceptron in such a way that it classifies
correctly the training examples:
– If the output of the perceptron is +1 then the input is
assigned to class C1
– If the output is -1 then the input is assigned to C2

30
www.Vidyarthiplus.com

Perceptron Training
• How can we train a perceptron for a
classification task?
• We try to find suitable values for the
weights in such a way that the training
examples are correctly classified.
• Geometrically, we try to find a hyper-plane
that separates the examples of the two
classes.

31
www.Vidyarthiplus.com

Perceptron Geometric View


The equation below describes a (hyper-)plane in the input space
consisting of real valued 2D vectors. The plane splits the input
space into two regions, each of them describing one class.

decision
region for C1
x2
2 w1x1 + w2x2 + w0 >= 0

w x  w
i 1
i i 0 0 decision
boundary C1
x1
C2
w1x1 + w2x2 + w0 = 0

32
www.Vidyarthiplus.com
Example: AND
• Here is a representation of the AND function
• White means false, black means true for the output
• -1 means false, +1 means true for the input

-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true

33
www.Vidyarthiplus.com

Example: AND continued


• A linear decision surface (i.e. a plane in 3D
space) intersecting the feature space (i.e.
the 2D plane where z=0) separates false
from true instances

34
www.Vidyarthiplus.com

Example: AND continued


• Watch a perceptron learn the AND function:

35
www.Vidyarthiplus.com

Example: XOR

• Here’s the XOR function:


-1 XOR -1 = false
-1 XOR +1 = true
+1 XOR -1 = true
+1 XOR +1 = false

Perceptrons cannot learn such linearly inseparable functions


36
www.Vidyarthiplus.com

Example: XOR continued

• Watch a perceptron try to learn XOR

37
www.Vidyarthiplus.com

Perceptron: Limitations
• The perceptron can only model linearly separable
classes, like (those described by) the following Boolean
functions:
• AND
• OR
• COMPLEMENT
• It cannot model the XOR.

• You can experiment with these functions in the Matlab


practical lessons.

38
www.Vidyarthiplus.com
Gradient Descent Learning Rule
• Perceptron learning rule fails to converge if examples
are not linearly separable

• Gradient Descent: Consider linear unit without


threshold and continuous output o (not just –1,1)
• o(x)=w0 + w1 x1 + … + wn xn
• Update the wi’s such that they minimize the squared
error
• E[w1,…,wn] = ½ (x,d)D (d-o(x))2
where D is the set of training examples

39
www.Vidyarthiplus.com

• Replace the step function in the perceptron with a continuous (differentiable)


function f, e.g the simplest is linear function
• With or without the threshold, the Adaline is trained based on the output of the
function f rather than the final output.

+/ f (x)

(Adaline)
40
www.Vidyarthiplus.com
Incremental Stochastic
Gradient Descent
• Batch mode : gradient descent
w=w -  ED[w] over the entire data D
ED[w]=1/2d(td-od)2
• Incremental mode: gradient descent
w=w -  Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2

Incremental Gradient Descent can approximate Batch Gradient Descent


arbitrarily closely if  is small enough

41
www.Vidyarthiplus.com
Weights Update Rule:
incremental mode
• Computation of Gradient(E):
E(w) e
e
w w
 e[x ]
T

• Delta rule for weight update:

w(n  1)  w(n)   e(n)x(n)


42
www.Vidyarthiplus.com
LMS learning algorithm
n=1;
initialize w(n) randomly;
while (E_tot unsatisfactory and n<max_iterations)
Select an example (x(n),d(n))

ne=(n)  d(n) w(n)


n+1; x(n) T

w(n 1)  w(n)  e(n)x(n)


end-while;
 = learning rate parameter (real number)
A modification uses

x(n)
w(n  1)  w(n)   e(n)
|| x(n) ||
43
www.Vidyarthiplus.com

Perceptron Learning Rule VS.


Gradient Descent Rule
Perceptron learning rule guaranteed to succeed if
• Training examples are linearly separable
• Sufficiently small learning rate 
Linear unit training rules uses gradient descent
• Guaranteed to converge to hypothesis with
minimum squared error
• Given sufficiently small learning rate 
• Even when training data contains noise
• Even when training data not separable by H

44
www.Vidyarthiplus.com

Outline
• INTRODUCTION
• ADALINE
• MADALINE
• Least-Square Learning Rule
• The proof of Least-Square Learning Rule

45
www.Vidyarthiplus.com

Widrow and Hoff, 1960


Bernard Widrow and Ted Hoff introduced the Least-Mean-
Square algorithm (a.k.a.
delta-rule or Widrow-Hoff rule) and used it to train the
Adaline (ADAptive Linear Neuron)
--The Adaline was similar to the perceptron, except that it
used a linear activation function instead of the threshold
--The LMS algorithm is still heavily used in adaptive signal
processing

MADALINE: Many ADALINEs; Network of ADALINEs


46
www.Vidyarthiplus.com

x0
Perceptron vs. ADALINE
w0
Percptron: LTU
x1 w1 y
Emperical Hebbian Assumption
 f
wn
xn
Linear Threshold Unit (LTU) or ADALINE: LGU
Linear Graded Unit (LGU)
Gradient-Decent
f(s)
sgn(s)
1
tanh(s)
linear(s)

s
LTU: sign function; +/- (Positive/Negative)
LGU: Continuous and Differentiable Activation function
-1 including Linear function

MADALINE: Many ADALINEs; Network


47
of ADALINEs
www.Vidyarthiplus.com

ADALINE
• ADALINE(Adaptive Linear Neuron) is a network
model proposed by Bernard Widrow in1959.

X1 PE

single processing element


X2


X3

48
www.Vidyarthiplus.com

Method
• Method : The value in each unit must +1 or –1
net =  X iWi
X 0  1  net  W0  W1 X 1  W2 X 2    Wn X n
1 net  0

Y  if
 1 net  0

This is different from perception's transfer function.

49
www.Vidyarthiplus.com

Method
Wi   (T-Y) X i , T  expected output
Wi  Wi  Wi

ADALINE can solve only linear problem(the limitation)

50
www.Vidyarthiplus.com

MADALINE
• MADALINE: It is composed of many ADALINE
(Multilayer Adaline.)
Wij
Xi netj
No Wij
. Yj








•if more than half of netj ≥ 0,then
Xn output +1,otherwise, output -1

•After the second layer, the majority vote is used.

51
www.Vidyarthiplus.com

Least-Square Learning Rule (1/2)


• Least-Square Learning Rule:  X 0 
 
 X1 
 X j  ( X 0 , X 1 ,  , X n ) , (i.e. X   ) 1  j  p
t


 
X 
 n
 W0 
 
W1 
 W0  (W0 , W1 ,  , Wn ) , (i.e. W   )
t


 
W 
n  n
 Net j   W t X j , i.e., Net j  W0 X 0  W1 X 1    Wn X n
i 0

52
www.Vidyarthiplus.com

Least-Square Learning Rule (2/2)


• By applying the least-square learning rule the
weights is : 


 R: correlation matrix
 R' ' P
W *  R -1P where R  , R  R1  R2      RP   X j X j
' ' ' t

 p j 1
 p


t
 T j X j
 Pt 
j 1
or  p
RW *  P

53
www.Vidyarthiplus.com

Exercise: Use Adaline (1/4)


1  1   1
     
Example :X1  1  X 2   0  X 3   1
0 1   1
     
T1 1 T2 1 T3 1

X1 X2 X3 Tj

X1 1 1 0 1
X2 1 0 1 1

X3 1 1 1 -1

54
www.Vidyarthiplus.com

• Sol. First calculate R


1   1 1 0 
   
R1  1 110    1 1 0  
'

0  0 0 0 
   
  1 2 2 
1  1 0 1   3 2 2  3 3
      
R 2   0 101   0 0 0   R   2 2 1    2
' 1 2 1 
1  1 0 1  3   3 3 3
     2 1 2 2 1 2 
 3 3 3
1 1 1 1 
    
R3  1111  1 1 1 
'

1 1 1 1 
    

55
www.Vidyarthiplus.com

P1  1 1,1,0  110 
t


 t 1 1 
P2  1 1,0,1  101  P  100   ,0,0
t

 3 3 
P3  1 1,1,1  1 1 1
t

 1
 
 1 2 2 W1   3  3W1  2W2  2W3  1 W  3
 3 3     1
      
R W  P 
* 2 2 1 W2   0   2W1  2W2  W3  0   W2  -2
 3 3 3  
 2 1 2 W   0  2W  W  2W  0  W1  -2
 3 3 3  3     1 
 
2 3

 

56
www.Vidyarthiplus.com

• Verify the net:


• 代入(1,1,0) net=3X1-2X2-2X3=1 Y=1 ok
代入(1,0,1) net=3X1-2X2-2X3=1 Y=1 ok
代入(1,1,1) net=3X1-2X2-2X3=-1 Y=-1 ok
X1 3

-2
X2 ADALINE Y
-2
X3

57
www.Vidyarthiplus.com

Proof of Least Square Learning Rule(1/3)


• Let us use Least Mean Square Error to ensure the minimum
total error. As long as the total error approaches zero, the best
solution is found. Therefore, we are looking for the minimum of
k
2
〈 〉。
• Proof:
1 L 2 1 L 1 L
mean       k   (Tk  Yk )   (Tk  2Tk Yk  Yk )
2 2 2 2

L k 1 L k 1 L k 1

1 L 2 2 L 1 L 2
  T   Tk Yk   Yk  let  Tk2  is mean of
L k 1 L k 1 L k 1 <Tk2>

2 L 1 L
 T    Tk Yk   [W ( X k X k )W ]
2 t t
k
L k 1 L k 1

58
www.Vidyarthiplus.com

Proof of Least Square Learning Rule(2/3)

1 L n L L
ps. Yk   ( wi xik )   (W X k )   (W t X k )( X k W )
2 2 t 2 t

上 k 1 k 1 i 1 k 1 k 1
頁 L
 W ( X k  X k )  W
t t

k 1
L L
2 t 1
 Tt    Tk Yk  W [ ( X k X t )]W
2

L k 1 L k 1
2 L
 Tt    Tk (W k X k )  W t  X k X k W
2 t t

L k 1
1 L
 Tt  2[  (Tk X k )W ]  W t  X k X k W
2 t t

L k 1
 Tt 2  2Tk X k W  W t  X k X k W
t t

59
www.Vidyarthiplus.com

Proof of Least Square Learning Rule(3/3)


 Let Rk  X k X k , i.e., Rk is a n  n matrix , also called Correlation Matrix
t

承 L
 Let R'  R'1  R'2    R'K    R'L   Tk X k
t

k 1

 Let R  ( R' / L)  which is R  X k X kt 
 Tk  W t RW  2Tk X k W
2 t

We want to find W such that   k  is minimal


 k 
2
 [Tk   W t RW  2Tk X k W ]'
2 t

W
 2 RW  2  Tk X k  Let P  Tk X k 
t

 2 RW  2 P
 k 
2
if    2 RW *- 2 P = 
W
即 W * = R -1P 或 RW *=P
60
www.Vidyarthiplus.com

Comparison of Perceptron and Adaline


Perceptron Adaline

Architecture Single-layer Single-layer

Neuron Non-linear linear


model
Learning Minimze Minimize total
algorithm number of error
misclassified
examples

Application Linear Linear classification and


classification regression
61

Anda mungkin juga menyukai