Anda di halaman 1dari 8

1

CS3244 Machine Learning Semester 1, 2012/13


Solution to Tutorial 4

1. What are the values of weights w
0
, w
1
, and w
2
for the perceptron whose decision
surface is illustrated in Figure 4.3? Assume the surface crosses the x
1
axis at 1,
and the x
2
axis at 2.
Answer:
The line for the decision surface corresponds to the equation x
2
=2x
1
+2, and since
all points above the line should be classified as positive, we have x
2
2x
1
2 >0.
Hence w
0
=2, w
1
=2, and w
2
=1.

2. Consider two perceptrons defined by the threshold expression w
0
+w
1
x
1
+w
2
x
2
>0.
Perceptron A has weight values
w
0
=1, w
1
=2, w
2
=1
and Perceptron B has weight values
w
0
=0, w
1
=2, w
2
=1
True or false? Perceptron A is more-general-than perceptron B. (More-general-
than is defined in Chapter 2).
Answer:







True. Perceptron A is more general than B, because any point lying above B will
also be above A. i.e. as per definition in chapter 2 for more-general-than:
)] ) ( ) ) ( )[( ( 1 x A 1 x B X x = = e


B
A
+
+
+



2
3. Derive a gradient descent training rule for a single unit with output o, where
o = w
0
+w
1
x
1
+w
1
x
1
2

+. . . +w
n
x
n
+w
n
x
n
2
Answer:
First, the error function is defined as:
2
2
1
) o t ( ) w ( E
d
D d
d
=

e


The update rule is the same, namely:
i i i
w w : w A + =
i
i
w
E
w
c
c
= A q

For w
0
,
) ( ) 1 )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
0
2
0
2
0 0
d
D d
d d
D d
d
d d d
D d
d d
D d
d d
D d
d
o t o t
o t
w
o t o t
w
o t
w w
E
= =

c
c
=
c
c
=
c
c
=
c
c


e e
e e e

Thus
) (
0 d
D d
d
o t w = A

e
q

For w
1
,w
2
,,w
n

)) ( )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
2
2 2
id id d
D d
d
d d
i
d
D d
d d
D d
d
i
d
D d
d
i i
x x o t
o t
w
o t o t
w
o t
w w
E
+ =

c
c
=
c
c
=
c
c
=
c
c


e
e e e

Thus
) )( (
2
id id d
D d
d i
x x o t w + = A

e
q


3
4. Consider a two-layer feedforward ANN with two inputs a and b, one hidden unit c,
and one output unit d. This network has five weights (w
ca
, w
cb
, w
c0
, w
dc
, w
d0
), where
w
x0
represents the threshold weight for unit x. Initialize these weights to the values
(0.1, 0.1, 0.1, 0.1, 0.1), then give their values after each of the first two training
iterations of the BACKPROPAGATION algorithm. Assume learning rate q =0.3,
momentum o = 0.9, incremental weight updates, and the following training
examples:
a b d
1 0 1
0 1 0
Answer:
The network and the sigmoid activation function sigmoid function are as follows:

y
e
) y (

+
=
1
1
o


Training example 1:
The outputs of the two neurons, noting that a=1and b=0:
53867 0 15498 0 1 1 0 5498 0 1 0
5498 0 2 0 1 1 0 0 1 0 1 1 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o

The error terms for the two neurons, noting that d=1:
002836 0 1146 0 1 0 5498 0 1 5498 0
1146 0 53867 0 1 53867 0 1 53867 0
. . . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o

Compute the correction terms as follows, noting that a=1, b=0 and q=0.3:
0 0 002836 0 3 0
000849 0 1 002836 0 3 0
000849 0 1 002836 0 3 0
0189 0 5498 0 1146 0 3 0
0342 0 1 1146 0 3 0
0
0
= = A
= = A
= = A
= = A
= = A
. . w
. . . w
. . . w
. . . . w
. . . w
cb
ca
c
dc
d

0
a
b
c d
w
d0
w
c0

w
ca

w
cb

w
dc

4
and the new weights become:
1 0 0 1 0
100849 0 000849 0 1 0
100849 0 000849 0 1 0
1189 0 0189 0 1 0
1342 0 0342 0 1 0
0
0
. . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= + =
= + =
= + =
= + =
= + =

Training example 2:
The outputs of the two neurons, noting that a=0 and b=1:
5497 0 1996 0 1 1342 0 55 0 1189 0
55 0 200849 0 1 100849 0 1 1 0 0 100849 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o

The error terms for the two neurons, noting that d=0:
004 0 1361 0 1189 0 55 0 1 55 0
1361 0 5497 0 0 5497 0 1 5497 0
. ) . ( . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o

Compute the correction terms as follows, noting that a=0, b=1, q=0.3 and o=0.9:
0012 0 0 9 0 1 004 0 3 0
00086 0 000849 0 9 0 0 004 0 3 0
0004 0 000849 0 9 0 1 004 0 3 0
0055 0 0189 0 9 0 55 0 1361 0 3 0
01 0 0342 0 9 0 1 1361 0 3 0
0
0
. . ) . ( . w
. . . ) . ( . w
. . . ) . ( . w
. . . . ) . ( . w
. . . ) . ( . w
cb
ca
c
dc
d
= + = A
= + = A
= + = A
= + = A
= + = A

and the new weights become:
0988 0 0012 0 1 0
1016 0 00086 0 100849 0
100849 0 0004 0 100849 0
1134 0 0055 0 1189 0
1242 0 01 0 1342 0
0
0
. . . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= =
= + =
= =
= =
= =



5
5. Revise the BACKPROPAGATION algorithm in Table 4.2 so that it operates on units
using the squashing function tanh in place of the sigmoid function. That is, assume
the output of a single unit is ) x . w tanh( o

= . Give the weight update rule for output
layer weights and hidden layer weights. Hint: ) x ( tanh ) x ( h tan
2
1 = ' .
Answer:
Steps T4.3 and T4.4 in Table 4.2 will become as follows, respectively:

k
outputs k
kh h h
k k k k
w ) o (
) o t )( o (
o o
o

e


2
2
1
1



6
6. Consider the alternative error function described in Section 4.81.

+
e e j , i
ji kd
D d outputs k
kd
w ) o t ( ) w ( E
2 2
2
1


Derive the gradient descent update rule for this definition of E. Show that it can be
implemented by multiplying each weight by some constant before performing the
standard gradient descent update given in Table 4.2.
Answer:

c
c
+
c
c
=
c
c
Vc
c
= A
A +
e e j , i
ji
ji
kd
D d outputs k
kd
ji ji
ji
ji
ji ji ji
w
w
) o t (
w w
) w ( E
w
) w ( E
w
w w w
2 2
2
1


The first term in the R.H.S of the above equation can be derived in the same manner
as in equation (4.27), while we continue to work on the 2
nd
term. For output nodes,
leads to:
ji j ji ji
ji ji j j j j ji ji
ji ji j j j j
ji
x w w
w x ) o ( o ) o t ( w w
w x ) o ( o ) o t (
w
) w ( E
qo |
q q

+
+
+ =
c
c
2 1
2 1


where q | 2 1 = and ) o ( o ) o t (
j
j j j j
= 1 o
Similarly, for hidden units, we can derive:
ji j ji ji
x w w qo | +
where q | 2 1 = and

e
=
) j ( Downstream k
kj k j j
w ) o ( o
j
o o 1
The above shows the update rule can be implemented by multiplying each weight
by some constant before performing the gradient descent update given in Table 4.2.
7
7. Assume the following error function:
2 2
2
1
2 ) ( w w w E o + =

where o, and are constants. The weight w is updated according to gradient
descent with a positive learning rate q. Write down the update equation for
w(k+1) given w(k). Find the optimum weight w that gives the minimal error E(w).
What is the value of the minimal E(w)? (8 marks)

Answer:

) ( ) ( ) 1 (
) (
w k w k w
w
w
E
w
w
w
E
q
q q

+ = +
=
c
c
= A
+ =
c
c


When E(w) becomes the smallest, 0 =
c
c
w
E

Thus, optimal

=
optimal
w

Minimal error:

o
2
2
2
2 ) (
2
2
2 2
2
= + =
optimal
w E

8
8. WEKA outputs the following confusion matrix after training a J 48 decision tree
classifier with the contact-lenses dataset. (a) Count the number of True Positives,
True Negatives, False Positives and False Negatives for each the three classes, i.e.
soft, hard and none. (b) Calculate the TP rate (Recall), FP rate, Precision and F-
measure for each class.
a b c <- - cl assi f i ed as
4 0 1 | a = sof t
0 1 3 | b = har d
1 2 12 | c = none
Answer:

soft:
(a) TP =4
TN =18
FP =1
FN =1

(b) TP rate =Recall =TP / (TP +FN) =4/5 =0.8
FP rate =FP / (FP +TN) =1/19 =0.053
Precision =TP / (TP +FP) =4/5 =0.8
F-Measure =20.80.8/(0.8+0.8) =0.8


hard:
(a) TP =1
TN =18
FP =2
FN =3

(b) TP rate =Recall =TP / (TP +FN) =1 / 4 =0.25
FP rate =FP / (FP +TN) =2/20 =0.1
Precision =TP / (TP +FP) =1 / 3=0.333
F-Measure =20.250.333/(0.25+0.333) =0.286


none:
(a) TP =12
TN =5
FP =4
FN =3

(b) TP rate =Recall =TP / (TP +FN) =12 / 15 =0.8
FP rate =FP / (FP +TN) =4/9 =0.444
Precision =TP / (TP +FP) =12 / 16 =0.75
F-Measure =20.80.75/(0.8+0.75) =0.774

Anda mungkin juga menyukai