com
UNIT-1
ARCHITECTURE
1
www.Vidyarthiplus.com
2
www.Vidyarthiplus.com
Machine Learning
Machine learning involves adaptive mechanisms
that enable computers to learn from experience,
learn by example and learn by analogy. Learning
capabilities can improve the performance of an
intelligent system over time. The most popular
approaches to machine learning are artificial
neural networks and genetic algorithms. This
lecture is dedicated to neural networks.
3
www.Vidyarthiplus.com
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
4
www.Vidyarthiplus.com
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
5
www.Vidyarthiplus.com
Middle Layer
Input Layer Output Layer
6
www.Vidyarthiplus.com
7
www.Vidyarthiplus.com
8
www.Vidyarthiplus.com
10
www.Vidyarthiplus.com
Network Structure
• The output signal is transmitted through the
neuron’s outgoing connection. The outgoing
connection splits into a number of branches
that transmit the same signal. The outgoing
branches terminate at the incoming
connections of other neurons in the network.
11
www.Vidyarthiplus.com
Soma Soma
Dendrites Middle Layer
12
www.Vidyarthiplus.com
Course Topics
Learning Tasks
Supervised Unsupervised
Data: Data:
Labeled examples Unlabeled examples
(input , desired output) (different realizations of the
input)
Tasks:
classification Tasks:
pattern recognition clustering
regression content addressable memory
NN models:
perceptron NN models:
adaline self-organizing maps (SOM)
feed-forward NN Hopfield networks
radial basis function
support vector machines
13
www.Vidyarthiplus.com
Network architectures
14
www.Vidyarthiplus.com
15
www.Vidyarthiplus.com
3-4-2 Network
Input Output
layer layer
Hidden Layer
16
www.Vidyarthiplus.com
Recurrent network
Recurrent Network with hidden neuron: unit delay operator z-1 is
used to model a dynamic system
z-1
input
z-1 hidden
output
z-1
17
www.Vidyarthiplus.com
The Neuron
Bias
b
x1 w1
Activation
Local function
Field
Output
()
v
x2 w2 y
Input
values
…………. Summing
function
xm wm
weights
18
www.Vidyarthiplus.com
The Neuron
• The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with
weights W1, W2, …, Wm
2 An adder function (linear combiner) for computing the
weighted sum of mthe inputs
(real numbers):
u w x j j
j 1
y (u b)
19
www.Vidyarthiplus.com
Bias of a Neuron
• The bias b has the effect of applying an affine
transformation to the weighted sum u
v=u+b
• v is called induced field of the neuron
u
x1-x2= -1
x2 x1-x2=0 x1 x2
x1-x2= 1
x1
20
www.Vidyarthiplus.com
Bias as extra input
• The bias is an external parameter of the neuron. It can be
modeled by adding an extra input. m
x0 = +1
w0
v wx
j0
j j
x1
w0 b
w1 Activation
Local function
Field
Input Output
()
v
signal x2 w2 y
Summing
………….. function
xm wm Synaptic
weights
21
www.Vidyarthiplus.com
Activation Function
There are different activation functions used in different applications. The
most common ones are:
1 if v 1 2
1 if v 0
v v tanhv
v v if 1 2 v 1 2
v
1
0 if v 0 0 1 exp( av)
if v 1 2
22
www.Vidyarthiplus.com
Neuron Models
• The choice of determines the neuron model. Examples:
• step function:
a if v c
(v )
• ramp function: b if v c
a if v c
( v ) b if v d
a (( v c )(b a ) /(d c )) otherwise
• sigmoid function:
with z,x,y parameters
1
(v ) z
1 exp( xv y )
• Gaussian function:
1 1 v 2
(v ) exp
2 2
23
www.Vidyarthiplus.com
Learning Algorithms
24
www.Vidyarthiplus.com
Applications
• Classification:
– Image recognition
– Speech recognition
– Diagnostic
– Fraud detection
– …
• Regression:
– Forecasting (prediction on base of past history)
– …
• Pattern association:
– Retrieve an image from corrupted one
– …
• Clustering:
– clients profiles
– disease subtypes
– …
25
www.Vidyarthiplus.com
Supervised Learning
26
www.Vidyarthiplus.com
Perceptron: architecture
• We consider the architecture: feed-forward NN
with one layer
• It is sufficient to study single layer perceptrons
with just one neuron:
27
www.Vidyarthiplus.com
28
www.Vidyarthiplus.com
29
www.Vidyarthiplus.com
30
www.Vidyarthiplus.com
Perceptron Training
• How can we train a perceptron for a
classification task?
• We try to find suitable values for the
weights in such a way that the training
examples are correctly classified.
• Geometrically, we try to find a hyper-plane
that separates the examples of the two
classes.
31
www.Vidyarthiplus.com
decision
region for C1
x2
2 w1x1 + w2x2 + w0 >= 0
w x w
i 1
i i 0 0 decision
boundary C1
x1
C2
w1x1 + w2x2 + w0 = 0
32
www.Vidyarthiplus.com
Example: AND
• Here is a representation of the AND function
• White means false, black means true for the output
• -1 means false, +1 means true for the input
-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true
33
www.Vidyarthiplus.com
34
www.Vidyarthiplus.com
35
www.Vidyarthiplus.com
Example: XOR
37
www.Vidyarthiplus.com
Perceptron: Limitations
• The perceptron can only model linearly separable
classes, like (those described by) the following Boolean
functions:
• AND
• OR
• COMPLEMENT
• It cannot model the XOR.
38
www.Vidyarthiplus.com
Gradient Descent Learning Rule
• Perceptron learning rule fails to converge if examples
are not linearly separable
39
www.Vidyarthiplus.com
+/ f (x)
(Adaline)
40
www.Vidyarthiplus.com
Incremental Stochastic
Gradient Descent
• Batch mode : gradient descent
w=w - ED[w] over the entire data D
ED[w]=1/2d(td-od)2
• Incremental mode: gradient descent
w=w - Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2
41
www.Vidyarthiplus.com
Weights Update Rule:
incremental mode
• Computation of Gradient(E):
E(w) e
e
w w
e[x ]
T
x(n)
w(n 1) w(n) e(n)
|| x(n) ||
43
www.Vidyarthiplus.com
44
www.Vidyarthiplus.com
Outline
• INTRODUCTION
• ADALINE
• MADALINE
• Least-Square Learning Rule
• The proof of Least-Square Learning Rule
45
www.Vidyarthiplus.com
x0
Perceptron vs. ADALINE
w0
Percptron: LTU
x1 w1 y
Emperical Hebbian Assumption
f
wn
xn
Linear Threshold Unit (LTU) or ADALINE: LGU
Linear Graded Unit (LGU)
Gradient-Decent
f(s)
sgn(s)
1
tanh(s)
linear(s)
s
LTU: sign function; +/- (Positive/Negative)
LGU: Continuous and Differentiable Activation function
-1 including Linear function
ADALINE
• ADALINE(Adaptive Linear Neuron) is a network
model proposed by Bernard Widrow in1959.
X1 PE
X3
48
www.Vidyarthiplus.com
Method
• Method : The value in each unit must +1 or –1
net = X iWi
X 0 1 net W0 W1 X 1 W2 X 2 Wn X n
1 net 0
Y if
1 net 0
This is different from perception's transfer function.
49
www.Vidyarthiplus.com
Method
Wi (T-Y) X i , T expected output
Wi Wi Wi
50
www.Vidyarthiplus.com
MADALINE
• MADALINE: It is composed of many ADALINE
(Multilayer Adaline.)
Wij
Xi netj
No Wij
. Yj
.
.
.
.
.
.
.
.
•if more than half of netj ≥ 0,then
Xn output +1,otherwise, output -1
51
www.Vidyarthiplus.com
X
n
W0
W1
W0 (W0 , W1 , , Wn ) , (i.e. W )
t
W
n n
Net j W t X j , i.e., Net j W0 X 0 W1 X 1 Wn X n
i 0
52
www.Vidyarthiplus.com
p j 1
p
t
T j X j
Pt
j 1
or p
RW * P
53
www.Vidyarthiplus.com
X1 X2 X3 Tj
X1 1 1 0 1
X2 1 0 1 1
X3 1 1 1 -1
54
www.Vidyarthiplus.com
0 0 0 0
1 2 2
1 1 0 1 3 2 2 3 3
R 2 0 101 0 0 0 R 2 2 1 2
' 1 2 1
1 1 0 1 3 3 3 3
2 1 2 2 1 2
3 3 3
1 1 1 1
R3 1111 1 1 1
'
1 1 1 1
55
www.Vidyarthiplus.com
P1 1 1,1,0 110
t
t 1 1
P2 1 1,0,1 101 P 100 ,0,0
t
3 3
P3 1 1,1,1 1 1 1
t
1
1 2 2 W1 3 3W1 2W2 2W3 1 W 3
3 3 1
R W P
* 2 2 1 W2 0 2W1 2W2 W3 0 W2 -2
3 3 3
2 1 2 W 0 2W W 2W 0 W1 -2
3 3 3 3 1
2 3
56
www.Vidyarthiplus.com
-2
X2 ADALINE Y
-2
X3
57
www.Vidyarthiplus.com
L k 1 L k 1 L k 1
1 L 2 2 L 1 L 2
T Tk Yk Yk let Tk2 is mean of
L k 1 L k 1 L k 1 <Tk2>
2 L 1 L
T Tk Yk [W ( X k X k )W ]
2 t t
k
L k 1 L k 1
58
www.Vidyarthiplus.com
1 L n L L
ps. Yk ( wi xik ) (W X k ) (W t X k )( X k W )
2 2 t 2 t
承
上 k 1 k 1 i 1 k 1 k 1
頁 L
W ( X k X k ) W
t t
k 1
L L
2 t 1
Tt Tk Yk W [ ( X k X t )]W
2
L k 1 L k 1
2 L
Tt Tk (W k X k ) W t X k X k W
2 t t
L k 1
1 L
Tt 2[ (Tk X k )W ] W t X k X k W
2 t t
L k 1
Tt 2 2Tk X k W W t X k X k W
t t
59
www.Vidyarthiplus.com
承 L
Let R' R'1 R'2 R'K R'L Tk X k
t
上
k 1
頁
Let R ( R' / L) which is R X k X kt
Tk W t RW 2Tk X k W
2 t
W
2 RW 2 Tk X k Let P Tk X k
t
2 RW 2 P
k
2
if 2 RW *- 2 P =
W
即 W * = R -1P 或 RW *=P
60
www.Vidyarthiplus.com