• L – number of layers
• L – number of layers
• nℓ – number of nodes in layer ℓ, ℓ = 1, · · · , L.
• L – number of layers
• nℓ – number of nodes in layer ℓ, ℓ = 1, · · · , L.
• yiℓ – output of ith node in layer ℓ,
i = 1, · · · , nℓ , ℓ = 1, · · · , L.
• L – number of layers
• nℓ – number of nodes in layer ℓ, ℓ = 1, · · · , L.
• yiℓ – output of ith node in layer ℓ,
i = 1, · · · , nℓ , ℓ = 1, · · · , L.
• wijℓ – weight of connection from node-i, layer-ℓ to
node-j , layer-(ℓ + 1).
• L – number of layers
• nℓ – number of nodes in layer ℓ, ℓ = 1, · · · , L.
• yiℓ – output of ith node in layer ℓ,
i = 1, · · · , nℓ , ℓ = 1, · · · , L.
• wijℓ – weight of connection from node-i, layer-ℓ to
node-j , layer-(ℓ + 1).
• ηiℓ – net input of node-i in layer-ℓ
• L – number of layers
• nℓ – number of nodes in layer ℓ, ℓ = 1, · · · , L.
• yiℓ – output of ith node in layer ℓ,
i = 1, · · · , nℓ , ℓ = 1, · · · , L.
• wijℓ – weight of connection from node-i, layer-ℓ to
node-j , layer-(ℓ + 1).
• ηiℓ – net input of node-i in layer-ℓ
• Our network represents a function from ℜn1 to ℜnL .
yjℓ = f (ηjℓ )
PR NPTEL course – p.36/130
• If we want to include bias input, then
nℓ−1
X
ηjℓ = wijℓ−1 yiℓ−1 + bℓj
i=1
ℓ−1
where, by notation, w0j = bℓj and y0ℓ = +1 for all ℓ.
PR NPTEL course – p.38/130
• This can be shown in the figure as below.
yjℓ = f (ηjℓ )
yjℓ = f (ηjℓ )
• The y1L , · · · , ynLL form the final outputs of the network.
m′
1 X L
Ji (W ) = (yj (W, X i ) − dij )2
2 j=1
m′
1 X L
Ji (W ) = (yj (W, X i ) − dij )2
2 j=1
• Ji is the square of the error between the output of the
network and the desired output for the training
example X i .
PR NPTEL course – p.75/130
• One method of finding a minimizer of J is to use
gradient-descent.
∂J ∂J ∂ηjℓ+1
=
ℓ
∂wij ∂ηjℓ+1 ∂wijℓ
∂J ∂J ∂ηjℓ+1
=
ℓ
∂wij ∂ηjℓ+1 ∂wijℓ
= δjℓ+1 yiℓ
∂J ∂J ∂ηjℓ+1
=
ℓ
∂wij ∂ηjℓ+1 ∂wijℓ
= δjℓ+1 yiℓ
• We can get all the needed partial derivatives if we
calculate δjℓ for all nodes.
PR NPTEL course – p.109/130
• We can compute δjℓ recursively:
nℓ+1 ℓ+1
∂J X ∂J ∂ηs
δjℓ = ℓ = ℓ+1 ∂η ℓ
∂ηj s=1
∂ηs j
L ∂J
δ = L
j
∂ηj