Anda di halaman 1dari 17

ASSOCIATION RULE MINING

• The task of association rule mining is to find certain association


relationships among a set of items in a dataset/database.
• A typical example of an association rule created by data mining
often termed to as “market basket data” is: “ 80% of customers
who purchase bread also purchase butter.”
Suppose, as manager of an AllElectronics branch, you would like to
learn more about the buying habits of your customers. Specifically,
you wonder, “Which groups or sets of items are customers likely to
purchase on a given trip to the store?”
To answer your question, market basket analysis may be performed on
the retail data of customer transactions at your store. You can then use
the results to plan marketing or advertising strategies, or in the design
of a new catalog. For instance, market basket analysis may help you
design different store layouts. In one strategy, items that are
frequently purchased together can be placed in proximity to further
encourage the combined sale of such items.
If customers who purchase computers also tend to buy
antivirus software at the same time, then placing the
hardware display close to the software display may help
increase the sales of both items.
• The association relationships are described in association
rules.
• In association rule mining there are two measurements, support
and confidence.
• The confidence measure indicates the rule’s strength, while
support corresponds to the frequency of the pattern.
For example, the information that customers who purchase computers also tend
to buy antivrus software at the same time is represented in the following
association rule:

A support of 2% for Rule (6.1) means that 2% of all the transactions


under analysis show that computer and antivirus software are
purchased together.
A confidence of 60% means that 60% of the customers who
purchased a computer also bought the software.
• Given a user specified minimum support and minimum
confidence.
• The problem of mining association rules is to find all the
association rules whose support and confidence are larger
than the minimum support and minimum confidence.
• Thus, this approach can be broken into two sub-problems
as follows:
(1) Finding the frequent itemsets which have support above
the predetermined minimum support.
(2) Deriving all rules, based on each frequent itemset, which
have confidence more than the minimum confidence.
• There are a lots of ways to find the large itemsets but we
will only discuss the Apriori Algorithm
Apriori Algorithm
• Step 1: Data in the database
• Step 2: Calculate the support/frequency
of all items
• Step 3: Discard the items with minimum
support less than 2
• Step 4: Combine two items
• Step 5: Calculate the support/frequency
of all items
• Step 6: Discard the items with minimum
support less than 2
• Step 6.5: Combine three items and
calculate their support.
• Step 7: Discard the items with minimum
support less than 2
• Step 1: Data in the
database
• Step 2: Calculate the
support/frequency of
all items
• Step 3: Discard the
items with minimum
support less than 2
• Step 4: Combine two
items
• Step 5: Calculate the
support/frequency of
all items
• Step 6: Discard the
items with minimum
support less than 2
• Step 6.5: Combine
three items and
calculate their
support.
• Step 7: Discard the
items with minimum
support less than 2
Fp Growth Algorithm (Frequent pattern growth).
The FP-Growth Algorithm, proposed by J.Han .
FP growth algorithm is an improvement of apriori algorithm.
FP growth algorithm used for finding frequent itemset in a
transaction database without candidate generation.
FP growth represents frequent items in frequent pattern trees or
FP-tree.
Advantages of FP growth algorithm:-
1. Faster than apriori algorithm
2. No candidate generation
3. Only two passes over dataset
Disadvantages of FP growth algorithm:-
1. FP tree may not fit in memory
2. FP tree is expensive to build
FP Tree Algorithm
• Input: A database DB, represented by FP-tree constructed according to Algorithm 1, and a minimum
support threshold ?.
• Output: The complete set of frequent patterns.
• Method: call FP-growth(FP-tree, null).
• Procedure FP-growth(Tree, a)
{
(01) if Tree contains a single prefix path then
{ // Mining single prefix-path FP-tree
(02) let P be the single prefix-path part of Tree;
(03) let Q be the multipath part with the top branching node replaced by a null root;
(04) for each combination (denoted as ß) of the nodes in the path P do
(05) generate pattern ß ∪ a with support = minimum support of nodes in ß;
(06) let freq pattern set(P) be the set of patterns so generated;
}
(07) else let Q be Tree;
(08) for each item ai in Q do
{ // Mining multipath FP-tree
(09) generate pattern ß = ai ∪ a with support = ai .support;
(10) construct ß’s conditional pattern-base and then ß’s conditional FP-tree Tree ß;
(11) if Tree ß ≠ Ø then
(12) call FP-growth(Tree ß , ß);
(13) let freq pattern set(Q) be the set of patterns so generated;}
(14) return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) × freq pattern set(Q)))
}
The original example can be viewed in Consider the transactions below and the
minimum support as 3:

Step 2 - Find frequency Step 3 - Prioritize the items


of occurrence

Frequency /
{
i(t) B(6),
Support
A 4 E(5),
B 6 A(4),
C 4 C(4),
D 4 D(4)
E 5 }
i(t)
{
B(6),
1 BEAD
E(5),
2 BEC
A(4),
3 BEAD C(4),
4 BEAC D(4)
5 BEACD }
6 BCD

(b) Transaction 2: BEC (c) Transaction


3: BEAD (d) Transaction 4:
BEAC
(a) Transaction 1: BEAD (e) Transaction 5:
BEACD
• (f) Transaction 6: BCD
i(t)
T100 I2, I1, I5
T200 I2, I4
T300 I2, I3
T400 I2, I1, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3

Anda mungkin juga menyukai