Anda di halaman 1dari 12

1) a) Define what is Apriori principle and briefly discuss why Apriori principle useful in association rule mining.

Apriori principle is, If an item set is frequent, then all of its subsets must also frequent, or if an item set is infrequent then all its supersets must also be infrequent. Apriori principle reduce the number of candidate item sets in an association rule mining process by eliminating the candidates that are infrequent and leaving only those that are frequent. As a result the number of remaining candidate item sets ready for further support checking becomes much smaller, which dramatically reduce the computation, I/O cost and memory requirement. Also in Apriori principle support of an item set never exceeds the support of its subsets. This is known as the anti-monotone property of support. By using Apriori principle it avoids the effort wastage of counting the item sets that are known to be infrequent. Because of these reasons Apriori principle is very useful in association rule mining.

is

b) Compare and contrast FP-Growth algorithm with Apriori algorithm.


Main difference between Apriori algorithm and FP-Growth algorithm is, Apriori algorithm generates candidate item sets and test if they are frequent. But the FP-Growth algorithm allows frequently item set discovery without candidate item set generation. Withal, there are more differences between Apriori algorithm and FB-Growth algorithm as listed below.

Apriori algorithm FP-Growth algorithm Use apriori property and join and prune It construct conditional frequent pattern property tree and conditional pattern base from database which satisfy minimum support Due to large number of candidates are Due to compact structure and no generated require large memory space candidate generation require less memory Multiple scans for generating candidate Database scanning happens twice only sets Execution time is higher than FP-Growth Execution time is less than apriori algorithm as time is wasted in producing algorithm candidates every time

2) Consider the market basket transactions given in the following table. Let min_sup = 40% and min_conf = 40%.

Transaction ID T1 T2 T3 T4 T5 A,B,C A,B,C,D,E A,C,D A,C,D,E A,B,C,D

Items Bought

a) Find all the frequent item sets using Apriori algorithm.

Min_sup = 40% and min_conf = 40% Minimum Support = 40% = 2/5

Transaction ID T1 T2 T3 T4 T5 A,B,C A,B,C,D,E A,C,D A,C,D,E A,B,C,D

Items Bought

Item A B C D E

Number of Transactions 5 3 5 4 2

Item Pairs A,B A,C A,D A,E B,C B,D B,E C,D C,E D,E

Number of Transactions 3 5 4 2 3 2 1 4 2 2

Item Pairs A,B A,C A,D A,E B,C B,D C,D C,E D,E {A,B} & {A,C} {A,B} & {A,D} {A,B} & {A,E} {A,C} & {A,D} {A,C} & {A,E} {A,D} & {A,E} {B,C} & {B,D} {C,D} & {C,E} A,B,C A,B,D A,B,E A,C,D A,C,E A,D,E B,C,D C,D,E

Number of Transactions 3 5 4 2 3 2 4 2 2

Item Pairs A,B,C A,B,D A,B,E A,C,D A,C,E A,D,E B,C,D C,D,E

Number of transactions 3 2 1 4 2 2 2 2

Item Pairs A,B,C A,B,D A,C,D A,C,E A,D,E B,C,D C,D,E {A,B,C} & {A,B,D} {A,C,D} & {A,C,E} A,B,C,D A,C,D,E

Number of transactions 3 2 4 2 2 2 2

Item Pairs A,B,C,D A,C,D,E

Number of transactions 2 2

According to Apriori principle both {A,B,C,D} and {A,C,D,E} sets are bought together frequently.

b) Obtain significant decision rules.


Subset of {A,B,C,D} {A} , {B} , {C} , {D} , {A,B} , {A,C} , {A,D} , {B,C} , {B,D} , {C,D} , {A,B,C} , {A,B,D} , {A,C,D} , {B,C,D}

{A} {B,C,D} C = SUP {A,B,C,D} / SUP {A} = 2/5 = 40% {B} {A,C,D} C = SUP {A,B,C,D} / SUP {B} = 2/3 = 66.6% {C} {A,B,D} C = SUP {A,B,C,D} / SUP {C} = 2/5 = 40% {D} {A,B,C} C = SUP {A,B,C,D} / SUP {D} = 2/4 = 50% {A,B} {C,D} C = SUP {A,B,C,D} / SUP {A,B} = 2/3 = 66.6% {A,C} {B,D} C = SUP {A,B,C,D} / SUP {A,C} = 2/5 = 40% {A,D} {B,C} C = SUP {A,B,C,D} / SUP {A,D} = 2/4 = 50%

{B,C} {A,D} C = SUP {A,B,C,D} / SUP {B,C} = 2/3 = 66.6% {B,D} {A,C} C = SUP {A,B,C,D} / SUP {B,D} = 2/2 = 100% {C,D} {A,B} C = SUP {A,B,C,D} / SUP {C,D} = 2/4 = 50% {A,B,C} {D} C = SUP {A,B,C,D} / SUP {A,B,C} = 2/3 = 66.6% {A,B,D} {C} C = SUP {A,B,C,D} / SUP {A,B,D} = 2/2 = 100% {A,C,D} {B} C = SUP {A,B,C,D} / SUP {A,C,D} = 2/4 = 50% {B,C,D} {A} C = SUP {A,B,C,D} / SUP {B,C,D} = 2/2 = 100%

Rule {A} {B,C,D} {B} {A,C,D} {C} {A,B,D} {D} {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} {B,C} {A,D} {B,D} {A,C} {C,D} {A,B} {A,B,C} {D} {A,B,D} {C} {A,C,D} {B} {B,C,D} {A}

Confidence 40% 66.6% 40% 50% 66.6% 40% 50% 66.6% 100% 50% 66.6% 100% 100% 50%

Subset of {A,C,D,E} {A} , {C} , {D} , {E} , {A,C} , {A,D} , {A,E} , {C,D} , {C,E} , {D,E} , {A,C,D} , {A,C,E} , {A,D,E} , {C,D,E}

{A} {C,D,E} C = SUP {A,C,D,E} / SUP {A} = 2/5 = 40% {C} {A,D,E} C = SUP {A,C,D,E} / SUP {C} = 2/5 = 40% {D} {A,C,E} C = SUP { A,C,D,E } / SUP {D} = 2/4 = 50% {E} {A,C,D} C = SUP { A,C,D,E } / SUP {E} = 2/2 = 100%

{A,C} {D,E} C = SUP { A,C,D,E } / SUP {A,C} = 2/5 = 40% {A,D} {C,E} C = SUP { A,C,D,E } / SUP {A,D} = 2/4 = 50% {A,E} {C,D} C = SUP { A,C,D,E } / SUP {A,E} = 2/2 = 100% {C,D} {A,E} C = SUP { A,C,D,E } / SUP {C,D} = 2/4 = 50% {C,E} {A,D} C = SUP { A,C,D,E } / SUP {C,E} = 2/2 = 100% {D,E} {A,C} C = SUP { A,C,D,E } / SUP {D,E} = 2/2 = 100% {A,C,D} {E} C = SUP { A,C,D,E } / SUP {A,C,D} = 2/4 = 50% {A,C,E} {D} C = SUP { A,C,D,E } / SUP {A,C,E} = 2/2 = 100%

{A,D,E} {C} C = SUP { A,C,D,E } / SUP {A,D,E} = 2/2 = 100% {C,D,E} {A} C = SUP { A,C,D,E } / SUP {C,D,E} = 2/2 = 100%

Rule {A} {C,D,E} {C} {A,D,E} {D} {A,C,E} {E} {A,C,D} {A,C} {D,E} {A,D} {C,E} {A,E} {C,D} {C,D} {A,E} {C,E} {A,D} {D,E} {A,C} {A,C,D} {E} {A,C,E} {D} {A,D,E} {C} {C,D,E} {A}

Confidence 40% 40% 50% 100% 40% 50% 100% 50% 100% 100% 50% 100% 100% 100%

c) Derive the FP-Tree for the above transaction table.

Transaction ID T1 T2 T3 T4 T5 A,B,C A,B,C,D,E A,C,D A,C,D,E A,B,C,D

Items Bought

Support for each item sets A = 5/5 = 100% B = 3/5 = 60% C = 5/5 = 100% D = 4/5 = 80% E = 2/5 = 40%

According to Support A,C,D,B,E

Re-arrange the table Transaction ID T1 T2 T3 T4 T5 A,C,B A,C,D.B,E A,C,D A,C,D,E A,C,D,B Items Bought

FP-Tree

Null
A1 C1 B1

After TID T1

Null
A2 C2 B1 D1 B1 E1

After TID T2

Null
A3 C3 B1 D2 B1 E1

After TID T3

Null
A4 C4 B1 D3 B1 E1 E1

After TID T4

Null
A5 C5 B1 D4 B2 E1 E1

After TID T5