The update rule is the same, namely:
i i i
w w : w A + =
i
i
w
E
w
c
c
= A q
For w
0
,
) ( ) 1 )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
0
2
0
2
0 0
d
D d
d d
D d
d
d d d
D d
d d
D d
d d
D d
d
o t o t
o t
w
o t o t
w
o t
w w
E
= =
c
c
=
c
c
=
c
c
=
c
c
e e
e e e
Thus
) (
0 d
D d
d
o t w = A
e
q
For w
1
,w
2
,,w
n
)) ( )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
2
2 2
id id d
D d
d
d d
i
d
D d
d d
D d
d
i
d
D d
d
i i
x x o t
o t
w
o t o t
w
o t
w w
E
+ =
c
c
=
c
c
=
c
c
=
c
c
e
e e e
Thus
) )( (
2
id id d
D d
d i
x x o t w + = A
e
q
3
4. Consider a two-layer feedforward ANN with two inputs a and b, one hidden unit c,
and one output unit d. This network has five weights (w
ca
, w
cb
, w
c0
, w
dc
, w
d0
), where
w
x0
represents the threshold weight for unit x. Initialize these weights to the values
(0.1, 0.1, 0.1, 0.1, 0.1), then give their values after each of the first two training
iterations of the BACKPROPAGATION algorithm. Assume learning rate q =0.3,
momentum o = 0.9, incremental weight updates, and the following training
examples:
a b d
1 0 1
0 1 0
Answer:
The network and the sigmoid activation function sigmoid function are as follows:
y
e
) y (
+
=
1
1
o
Training example 1:
The outputs of the two neurons, noting that a=1and b=0:
53867 0 15498 0 1 1 0 5498 0 1 0
5498 0 2 0 1 1 0 0 1 0 1 1 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o
The error terms for the two neurons, noting that d=1:
002836 0 1146 0 1 0 5498 0 1 5498 0
1146 0 53867 0 1 53867 0 1 53867 0
. . . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o
Compute the correction terms as follows, noting that a=1, b=0 and q=0.3:
0 0 002836 0 3 0
000849 0 1 002836 0 3 0
000849 0 1 002836 0 3 0
0189 0 5498 0 1146 0 3 0
0342 0 1 1146 0 3 0
0
0
= = A
= = A
= = A
= = A
= = A
. . w
. . . w
. . . w
. . . . w
. . . w
cb
ca
c
dc
d
0
a
b
c d
w
d0
w
c0
w
ca
w
cb
w
dc
4
and the new weights become:
1 0 0 1 0
100849 0 000849 0 1 0
100849 0 000849 0 1 0
1189 0 0189 0 1 0
1342 0 0342 0 1 0
0
0
. . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= + =
= + =
= + =
= + =
= + =
Training example 2:
The outputs of the two neurons, noting that a=0 and b=1:
5497 0 1996 0 1 1342 0 55 0 1189 0
55 0 200849 0 1 100849 0 1 1 0 0 100849 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o
The error terms for the two neurons, noting that d=0:
004 0 1361 0 1189 0 55 0 1 55 0
1361 0 5497 0 0 5497 0 1 5497 0
. ) . ( . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o
Compute the correction terms as follows, noting that a=0, b=1, q=0.3 and o=0.9:
0012 0 0 9 0 1 004 0 3 0
00086 0 000849 0 9 0 0 004 0 3 0
0004 0 000849 0 9 0 1 004 0 3 0
0055 0 0189 0 9 0 55 0 1361 0 3 0
01 0 0342 0 9 0 1 1361 0 3 0
0
0
. . ) . ( . w
. . . ) . ( . w
. . . ) . ( . w
. . . . ) . ( . w
. . . ) . ( . w
cb
ca
c
dc
d
= + = A
= + = A
= + = A
= + = A
= + = A
and the new weights become:
0988 0 0012 0 1 0
1016 0 00086 0 100849 0
100849 0 0004 0 100849 0
1134 0 0055 0 1189 0
1242 0 01 0 1342 0
0
0
. . . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= =
= + =
= =
= =
= =
5
5. Revise the BACKPROPAGATION algorithm in Table 4.2 so that it operates on units
using the squashing function tanh in place of the sigmoid function. That is, assume
the output of a single unit is ) x . w tanh( o
= . Give the weight update rule for output
layer weights and hidden layer weights. Hint: ) x ( tanh ) x ( h tan
2
1 = ' .
Answer:
Steps T4.3 and T4.4 in Table 4.2 will become as follows, respectively:
k
outputs k
kh h h
k k k k
w ) o (
) o t )( o (
o o
o
e
2
2
1
1
6
6. Consider the alternative error function described in Section 4.81.
+
e e j , i
ji kd
D d outputs k
kd
w ) o t ( ) w ( E
2 2
2
1
Derive the gradient descent update rule for this definition of E. Show that it can be
implemented by multiplying each weight by some constant before performing the
standard gradient descent update given in Table 4.2.
Answer:
c
c
+
c
c
=
c
c
Vc
c
= A
A +
e e j , i
ji
ji
kd
D d outputs k
kd
ji ji
ji
ji
ji ji ji
w
w
) o t (
w w
) w ( E
w
) w ( E
w
w w w
2 2
2
1
The first term in the R.H.S of the above equation can be derived in the same manner
as in equation (4.27), while we continue to work on the 2
nd
term. For output nodes,
leads to:
ji j ji ji
ji ji j j j j ji ji
ji ji j j j j
ji
x w w
w x ) o ( o ) o t ( w w
w x ) o ( o ) o t (
w
) w ( E
qo |
q q
+
+
+ =
c
c
2 1
2 1
where q | 2 1 = and ) o ( o ) o t (
j
j j j j
= 1 o
Similarly, for hidden units, we can derive:
ji j ji ji
x w w qo | +
where q | 2 1 = and
e
=
) j ( Downstream k
kj k j j
w ) o ( o
j
o o 1
The above shows the update rule can be implemented by multiplying each weight
by some constant before performing the gradient descent update given in Table 4.2.
7
7. Assume the following error function:
2 2
2
1
2 ) ( w w w E o + =
where o, and are constants. The weight w is updated according to gradient
descent with a positive learning rate q. Write down the update equation for
w(k+1) given w(k). Find the optimum weight w that gives the minimal error E(w).
What is the value of the minimal E(w)? (8 marks)
Answer:
) ( ) ( ) 1 (
) (
w k w k w
w
w
E
w
w
w
E
q
q q
+ = +
=
c
c
= A
+ =
c
c
When E(w) becomes the smallest, 0 =
c
c
w
E
Thus, optimal
=
optimal
w
Minimal error:
o
2
2
2
2 ) (
2
2
2 2
2
= + =
optimal
w E
8
8. WEKA outputs the following confusion matrix after training a J 48 decision tree
classifier with the contact-lenses dataset. (a) Count the number of True Positives,
True Negatives, False Positives and False Negatives for each the three classes, i.e.
soft, hard and none. (b) Calculate the TP rate (Recall), FP rate, Precision and F-
measure for each class.
a b c <- - cl assi f i ed as
4 0 1 | a = sof t
0 1 3 | b = har d
1 2 12 | c = none
Answer:
soft:
(a) TP =4
TN =18
FP =1
FN =1
(b) TP rate =Recall =TP / (TP +FN) =4/5 =0.8
FP rate =FP / (FP +TN) =1/19 =0.053
Precision =TP / (TP +FP) =4/5 =0.8
F-Measure =20.80.8/(0.8+0.8) =0.8
hard:
(a) TP =1
TN =18
FP =2
FN =3
(b) TP rate =Recall =TP / (TP +FN) =1 / 4 =0.25
FP rate =FP / (FP +TN) =2/20 =0.1
Precision =TP / (TP +FP) =1 / 3=0.333
F-Measure =20.250.333/(0.25+0.333) =0.286
none:
(a) TP =12
TN =5
FP =4
FN =3
(b) TP rate =Recall =TP / (TP +FN) =12 / 15 =0.8
FP rate =FP / (FP +TN) =4/9 =0.444
Precision =TP / (TP +FP) =12 / 16 =0.75
F-Measure =20.80.75/(0.8+0.75) =0.774