Anda di halaman 1dari 4

Apriori Itemset Generation

 A frequent itemset is an itemset whose support is greater than some user-specified


minimum support (denoted Lk, where k is the size of the itemset)
 A candidate itemset is a potentially frequent itemset (denoted Ck, where k is the size
of the itemset)

Apriori Algorithm

A Java applet which combines DIC, Apriori and Probability Based Objected Interestingness
Measures can be found here.

Apriori Algorithm: (by Agrawal et al at IBM Almaden Research Centre) can be used to
generate all frequent itemset

Pass 1
1. Generate the candidate itemsets in C1
2. Save the frequent itemsets in L1

Pass k
1. Generate the candidate itemsets in Ck from the frequent
itemsets in Lk-1
1. Join Lk-1 p with Lk-1q, as follows:
insert into Ck
select p.item1, p.item2, . . . , p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1q
where p.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1
2. Generate all (k-1)-subsets from the candidate itemsets in Ck
3. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate
itemset is not in the frequent itemset Lk-1
2. Scan the transaction database to determine the support for each candidate itemset in
Ck
3. Save the frequent itemsets in Lk

Implementation: A working Apriori Itemset Generation program can be found on the


Itemset Implementation page.

Example 1: Assume the user-specified minimum support is 50%

 Given: The transaction database shown below

TID A B C D E F
T1 1 0 1 1 0 0
T2 0 1 0 1 00
T3 1 1 1 0 10
T4 0 1 0 1 01
 The candidate itemsets in C2 are shown below

Itemset X supp(X)
{A,B} 25%
{A,C} 50%
{A,D} 25%
{B,C} 25%
{B,D} 50%
{C,D} 25%
 The frequent itemsets in L2 are shown below

Itemset X supp(X)
{A,C} 50%
{B,D} 50%

Example 2: Assume the user-specified minimum support is 40%, then generate all frequent
itemsets.

Given: The transaction database shown below

TID ABCDE
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0

Pass 1

C1 L1
Itemset X supp(X) Itemset X supp(X)
A ? A 100%
B ? B 60%
C ? C 100%
D ? D 80%
E ? E 40%

Pass 2

C2

Itemset X supp(X)
A,B ?
A,C ?
A,D ?
A,E ?
B,C ?
B,D ?
B,E ?
C,D ?
C,E ?
D,E ?
 Nothing pruned since all subsets of these itemsets are infrequent

L2
L2 after saving only the frequent itemsets
Itemset X supp(X)
Itemset X supp(X)
A,B 60%
A,B 60%
A,C 100%
A,C 100%
A,D 80%
A,D 80%
A,E 40%
A,E 40%
B,C 60%
B,C 60%
B,D 40%
B,D 40%
B,E 20%
C,D 80%
C,D 80%
C,E 40%
C,E 40%
D,E 40%
D,E 40%

Pass 3

 To create C3 only look at items that have the same first item (in pass k, the first k - 2 items
must match)

C3
C3 after pruning
Itemset X supp(X)
Itemset X supp(X)
join AB with AC A,B,C ?
A,B,C ?
join AB with AD A,B,D ?
A,B,D ?
join AB with AE A,B,E ?
A,C,D ?
join AC with AD A,C,D ?
A,C,E ?
join AC with AE A,C,E ?
A,D,E ?
join AD with AE A,D,E ?
B,C,D ?
join BC with BD B,C,D ?
C,D,E ?
join CD with CE C,D,E ?
 Pruning eliminates ABE since BE is not frequent
 Scan transactions in the database

L3

Itemset X supp(X)
A,B,C 60%
A,B,D 40%
A,C,D 80%
A,C,E 40%
A,D,E 40%
B,C,D 40%
C,D,E 40%

Pass 4

 First k - 2 = 2 items must match in pass k = 4

C4

Itemset X supp(X)
combine ABC with ABD A,B,C,D ?
combine ACD with ACE A,C,D,E ?
 Pruning:
o For ABCD we check whether ABC, ABD, ACD, BCD are frequent. They are
in all cases, so we do not prune ABCD.
o For ACDE we check whether ACD, ACE, ADE, CDE are frequent. Yes, in all
cases, so we do not prune ACDE

L4

Itemset X supp(X)
A,B,C,D 40%
A,C,D,E 40%
 Both are frequent

Pass 5: For pass 5 we can't form any candidates because there aren't two frequent 4-itemsets
beginning with the same 3 items.

http://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/ite
mset_eg.html

Anda mungkin juga menyukai