Anda di halaman 1dari 3

Answer to the problem no.

7
a) Error rate data without partitioning
50 50 50
𝐸𝑜𝑟𝑖𝑔 = 1 − max ( , )=
100 100 100

1. Splitting on attribute A, gain error rate


A=T A=F
+ 25 25
- 0 50
25 0 0
𝐸𝐴=𝑇 = 1 − 𝑚𝑎𝑥 ( , ) = =0
25 25 25
25 50 25
𝐸𝐴=𝐹 = 1 − 𝑚𝑎𝑥 ( , ) =
75 75 75
25 75 25
∆𝐴 = 𝐸𝑜𝑟𝑖𝑔 − 𝐸𝐴=𝑇 − 𝐸𝐴=𝐹 =
100 100 100

2. Splitting on attribute B, gain error rate


B=T B=F
+ 30 20
- 20 30
20
𝐸𝐵=𝑇 =
50
20
𝐸𝐵=𝐹 =
50
50 50 10
∆𝐵 = 𝐸𝑜𝑟𝑖𝑔 − 𝐸𝐵=𝑇 − 𝐸𝐵=𝐹 =
100 100 100

3. Splitting on attribute C, gain error rate


C=T C=F
+ 25 25
- 25 25
25
𝐸𝐶=𝑇 =
50
25
𝐸𝐶=𝐹 =
50
50 50 0
∆𝐶 = 𝐸𝑜𝑟𝑖𝑔 − 𝐸𝐶=𝑇 − 𝐸𝐶=𝐹 = =0
100 100 100
So choose the attribute A because it has the highest gain

b) The A=T child node is pure so no need for splitting, in A=F child node, the distribution
of training instances:
Class label
B C
+ -
T T 0 20
F T 0 5
T F 25 0
F F 0 25

Classification error of node A=F:


25
𝐸𝑜𝑟𝑖𝑔 =
75
1. Splitting on attribute B, gain error rate
B=T B=F
+ 25 0
- 20 30
20
𝐸𝐵=𝑇 =
45
𝐸𝐵=𝐹 = 0
45 20 5
∆𝐵 = 𝐸𝑜𝑟𝑖𝑔 − 𝐸𝐵=𝑇 − 𝐸𝐵=𝐹 =
75 75 75

2. Splitting on attribute C, gain error rate


C=T C=F
+ 0 25
- 25 25
0
𝐸𝐶=𝑇 = =0
45
25
𝐸𝐶=𝐹 =
50
25 50
∆𝐶 = 𝐸𝑜𝑟𝑖𝑔 − 𝐸𝐶=𝑇 − 𝐸𝐶=𝐹 = 0
75 75

The split will be made on attribute B

c) 20 instances are misclassified (the error rate is 20/100)

d) C=T child node, the error rate before splitting is


25
𝐸𝑜𝑟𝑖𝑔 =
50

1. Splitting on attribute A, gain error rate


A=T A=F
+ 25 0
- 0 25
𝐸𝐴=𝑇 = 0
𝐸𝐴=𝐹 = 0
25
∆𝐴 =
50

2. Splitting on attribute B, gain error rate


B=T B=F
+ 5 20
- 20 5
5
𝐸𝐵=𝑇 =
25
5
𝐸𝐵=𝐹 =
25
15
∆𝐵 =
50

Therefore, A is chosen as splitting attribute. For C=F child, the rate of error splitting is
25
𝐸𝑜𝑟𝑖𝑔 =
50
1. Splitting on attribute A, error rate is
A=T A=F
+ 0 25
- 0 25
𝐸𝐴=𝑇 = 0
25
𝐸𝐴=𝐹 =
50
∆𝐴 = 0

2. Splitting on attribute B, error rate is


B=T B=F
+ 0 25
- 0 25

𝐸𝐵=𝑇 = 0
𝐸𝐵=𝐹 = 0
25
∆𝐵 =
50
B is used as the splitting attribute. The overall error rate of the induced tree is 0.

e) The greedy heuristic does not necessarily lead to the best tree.

Anda mungkin juga menyukai