Anda di halaman 1dari 52

Associative Classification

Mining

Dr Fadi Fayez Thabtah


Why Do We Need Data Mining
?
• Leverage organization’s data assets
– Only a small portion (typically - 5%-10%) of the
collected data is ever analysed
– Data that may never be analysed continues to be
collected, at a great expense, out of fear that
something which may prove important in the future
is missing.
– Growth rates of data rules out traditional “manually
intensive” approach
Why Do We Need Data Mining?
• As databases grow, the ability to support the decision
support process using traditional query languages
becomes infeasible
– Many queries of interest are difficult to state in a
query language (Query formulation problem)
– “find all cases of fraud”
– “find all individuals likely to buy a FORD expedition”
– “find all documents that are similar to this customers
problem”
Motivations of Data Mining
• Scalability
• High Dimensionality
• Complex Data
• Non-traditional analysis
• Improve the quality of the interaction
Data mining
• Discovering new information in terms of patterns or rules from
vast amounts of data based on the following techniques
– Machine learning
– Statistics
– Neural networks
– Genetic algorithms
• Applications
– Retail/Marketing
• Consumer behaviour based on buying patterns
– Finance
• Creditworthiness of clients
• Performance analysis of finance investments
– Health care/Medicine
• Effectiveness / side effects of treatments
Goals of Data Mining
• Simplification and automation of the overall statistical
process, from data source(s) to model application
• Prediction of values in many real world applications, i.e.
retails supermarkets, insurance, banking, etc
• Analysing the behaviour of attributes within data sets
• Visualisation of data results to decision makers
Classification : A Two-Step Process
1. Classifier building: Describing a set of predetermined classes
2. Classifier usage:
• Calculate error rate Classification
• If Error rate is acceptable, then Algorithm
apply the classifier to test data
Training Data
Test Data
RowId A1 A2 Class RowIds A1 A2 Class
1 x1 y1 1 x1 y1 c1
2 x2 y4 2 x1 y2 c2
3 x1 y1 3 x1 y1 c2
4 x1 y2 c1
5 x2 y1 c2
6 x2 y1 c1
7 x2 y3 c2
Classification 8 x1 y3 c1

Rules 9
10
x2
x3
y4
y1
c1
c1
Classification Algorithms
Typical Algorithms:

• Decision trees
• Rule-based induction
• Neural networks
• Memory(Case) based reasoning
• Genetic algorithms
• Bayesian networks
Decision Tree: Example
Day Outlook Temperature Humidity Wind Play Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak


No Yes No Yes
Association rule mining
• Proposed by Agrawal et al in 1993.
• It is an important data mining model studied
extensively by the database and data mining
communities.
• Assume all data are categorical.
• No good algorithm for numeric data.
• Initially used for Market Basket Analysis to
find how items purchased by customers are
related.

Bread → Milk [sup = 5%, conf = 100%]


Association Rules Mining Cont.
• Strong tool that aims to find Advantages:
relationships between variables • Items shelving
in a database. • Sales promotions
• Its applied widely especially in
• Future planning
market basket analysis in order
to infer items from the presence
of other items in the customer’s
shopping cart Transactional Database

• Example : if a customer buys Transaction Id Items Time


milk, what is the probability that 12 bread, milk 10:12
he/she buys cereal as well?
13 bread, juice, milk 12:13
• Unlike classification, the target
class is not pre-specified in 14 milk, beer, bread, 13:22
association rule mining. juice
15 bread, eggs, milk 13:26

16 beer, basket, bread, 15:11


juice
Rule strength measures
• Support: The rule X ----Æ Y holds with support
sup in T (the transaction data set) if sup% of
transactions contain X ∪ Y.
– sup = Pr(X ∪ Y).
• Confidence: The rule holds in T with confidence
conf if conf% of transactions that contain X also
contain Y.
– conf = Pr(Y | X)
• An association rule is a pattern that states when
X occurs, Y occurs with certain probability.
Support and Confidence
• Support count: The support count of an
itemset X, denoted by X.count, in a data
set T is the number of transactions in T
that contain X. Assume T has n
transactions.
( X ∪ Y ).count
• Then, support =
n
( X ∪ Y ).count
confidence =
X .count
AR Mining Problem
• Given a set of transactions,
• Generate all association rules that have
the support and confidence greater than
the user-specified
minimum support (minsup)
and
minimum confidence (minconf).
Mining Association Rules—an
Example

Transaction-id Items bought


Min. support 50%
10 A, B, C
Min. confidence 50%
20 A, C Frequent Items Support
30 A, D {A} 75%
40 B, E, F {B} 50%
{C} 50%
{A, C} 50%
For rule A ⇒ C:
support = support({A}∪{C}) = 50%
confidence = support({A}∪{C})/support({A}) =
66.6%
Classification based on
Association Or
Associative Classification (AC)

• Aim
– A small set of rules as classifier
– All rules according to minsup and minconf
• Syntax
XÆ Y, where Y is restricted to the class attribute values
Why & How to Integrate
• Both classification rule mining and
association rule mining are
indispensable to practical applications.
• The integration is done by focusing on a
special subset of association rules
whose right-hand-side are restricted to
the classification class attribute.
– CARs: class association rules
Associative Classification (AC) Problem

• Given a labeled training data set, the problem


is to derive a set of class association rules
(CARs) from the training data set which
satisfy certain user-constraints, i.e support
and confidence thresholds.

Common Associative Algorithms:


• CBA
• CPAR
• CMAR
• MCAR
AC Steps

Training
Associative classification Algorithm Data

Frequent Ruleitems:

user Attribute values that pass support


threshold

Class Association Rules


Rule support and confidence for AC
Given a training data set T, for a rule R : P → c

• The support of R, denoted as sup(R) , is the number of rows


in T matching R condition and having a class label c
• The confidence of R , denoted as conf(R), is the the number
of rows matching R condition and having class label c over
the number of objects matching R condition
• Any Item has a support larger than the user minimum
support is called frequent itemset
CBA an Associative Algorithm:
Three Steps
Presnted by : B. Liu, W. Hsu & Y. Ma. Integrating
classification and association rule mining. In
KDD’98
1. Discretise continuous attributes, if any
2. Generate all class association rules
(CARs)
3. Building a classifier using the generated
CARs.
CBA Objectives
• To generate the complete set of CARs
that satisfy the user-specified minimum
support (minsup) and minimum
confidence (minconf) constraints.
• To build a classifier from the CARs.
Schedule

• CBA-RG: rule generator


• CBA-CB: classifier builder
• M1

• Evaluation
Rule Generator: Basic
Concepts
• Ruleitem
<condset, y> :condset is a set of items, y is a class label
Each ruleitem represents a rule: condset->y

• condsupCount
• The number of cases in D that contain condset
• rulesupCount
• The number of cases in D that contain the condset and
are labeled with class y
• Support=(rulesupCount/|D|)*100%
• Confidence=(rulesupCount/condsupCount)*100%
RG: Basic Concepts (Cont.)
• Frequent ruleitems
– A ruleitem is frequent if its support is above
minsup
• Accurate rule
– A rule is accurate if its confidence is above
minconf
• Possible rule
– For all ruleitems that have the same condset, the
ruleitem with the highest confidence is the
possible rule of this set of ruleitems.
• The set of class association rules (CARs)
consists of all the possible rules (PRs) that
are both frequent and accurate.
RG: An Example
• A ruleitem:<{(A,1),(B,1)},(class,1)>
– assume that
• the support count of the condset (condsupCount)
is 3,
• the support of this ruleitem (rulesupCount) is 2,
and
• |D|=10
– then (A,1),(B,1) -> (class,1)
• [supt=20% (rulesupCount/|D|)*100%
• confd=66.7%
(rulesupCount/condsupCount)*100%]
RG: The Algorithm
1 F 1 = {large 1-ruleitems};
2 CAR 1 = genRules (F 1 );
3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to
determine the frequent 1-ruleitems and prune it
4 for (k = 2; F k-1≠ Ø; k++) do
5 C k = candidateGen (F k-1 ); //generate the candidate ruleitems Ck
using the frequent ruleitems Fk-1
6 for each data case d∈ D do //scan the database
7 C d = ruleSubset (C k , d); //find all the ruleitems in Ck whose condsets
are supported by d
8 for each candidate c∈ C d do
9 c.condsupCount++;
10 if d.class = c.class then
c.rulesupCount++; //update various support counts of the candidates in Ck
11 end
12 end
RG: The Algorithm(cont.)
13 F k = {c∈ C k | c.rulesupCount≥ minsup};
//select those new frequent ruleitems to form Fk
14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent
15 prCAR k = pruneRules(CAR k );
16 end
17 CARs = ∪ k CAR k ;
18 prCARs = ∪ k prCAR k ;
Class Builder M1: Basic
Concepts
• Given two rules ri and rj, define: ri f rj if
– The confidence of ri is greater than that of
rj, or
– Their confidences are the same, but the
support of ri is greater than that of rj, or
– Both the confidences and supports are the
same, but ri is generated earlier than rj.
• Our classifier is of the following format:
– <r1, r2, …, rn, default_class>,
• where ri∈ R, ra f rb if b>a
M1: Three Steps
The basic idea is to choose a set of high precedence
rules in R to cover D.
• Sort the set of generated rules R
• Select rules for the classifier from R following
the sorted sequence and put in C.
– Each selected rule has to correctly classify at
least one additional case.
– Also select default class and compute errors.
• Discard those rules in C that don’t improve
the accuracy of the classifier.
– Locate the rule with the lowest error rate and
discard the rest rules in the sequence.
M1: Algorithm
• 1 R = sort(R); //Step1:sort R according to the relation “f”
• 2 for each rule r ∈ R in sequence do
• 3 temp = Ø;
• 4 for each case d ∈ D do //go through D to find those cases covered by
each rule r
• 5 if d satisfies the conditions of r then
• 6 store d.id in temp and mark r if it correctly classifies d;
• 7 if r is marked then
• 8 insert r at the end of C; //r will be a potential rule because it can correctly
classify at least one case d
• 9 delete all the cases with the ids in temp from D;
• 10 selecting a default class for the current C; //the majority class in the
remaining data
• 11 compute the total number of errors of C;
• 12 end
• 13 end // Step 2
• 14 Find the first rule p in C with the lowest total number of errors and drop all the
rules after p in C;
• 15 Add the default class associated with p to end of C, and return C (our
classifier). //Step 3
M1: Two conditions it satisfies

• Each training case is covered by the rule


with the highest precedence among the
rules that can cover the case.
• Every rule in C correctly classifies at least
one remaining training case when it is
chosen.
CBA- Example
Table : Car sales training data
minsup=2/7,
minconf= 50%
Age Income has a car Buy/class

senior middle n yes


youth low y no
junior high y yes
youth middle y yes
senior high n yes
junior low n no
senior middle n no
CBA- Example

Table 2: Possible Ruleitems from Table 1

AC Ruleitem
Itemset Class Support Confidence

{low} no 2/7 2/2


{high} yes 2/7 2/2
{senior, no} yes 2/7 2/3

{middle} yes 2/7 2/3


{senior} yes 2/7 2/3
{y} yes 2/7 2/3
{n} yes 2/7 2/4
{n} no 2/7 2/4
Other Developed AC Techniques
• MCAR (Thabtah et al., Proceeding of the 3rd IEEE International
Conference on Computer Systems and Applications (pp. 1-7)
• MMAC (Thabtah, et al., Journal of Knowledge and Information System
(2006)00:1-21.

MCAR Characteristics: MMAC characteristics:


™ Combinations of two general ™ Produces classifiers of the form:
data mining approaches, i.e.
(association rule, classification) v1 ∧ v 2 ∧ ... ∧ v k ⇒ c1 ∨ c 2 ∨ ... c i
™ Suitable for traditional that are suitable to not only
classification problems traditional binary classification
™ Employs a new method of problems but also useful to multi-
finding the rules class labels problems such as
™ Employ a new rule ranking Medical Diagnoses and Text
procedure Classification.
™ Presents three Evaluation
Accuracy measures
Data and Experiments

Supp=5%, confidence=40%
Number of datasets : 12-16 UCI data
Algorithms used:
• CBA (AC algorithm)
• MMAC (AC algorithm)
• Decision Tree algorithms (C4.5)
• Covering algorithms (RIPPER)
• Hybrid Classification algorithm (PART)
(%)

Co Ti

0
10
20
30
40
50
60
70
80
90
100
nt c
ac - T a
c

PART
t -l
en
se
Br s
ea L
st e d7
-c
an

RIPPER
W ce r
ea
th
He er
ar
t

CBA
He -c
ar
t-
Ly s
M mp
pr us h h
im r
a r oo m

MMAC
y-
tu
m
or
Vo
te
CR
Ba X
la Si
nc c
e- k
sc
al
e
Au
B r t os
Da ta se ts

Hy ea s
and MMAC on UCI data sets

po t -w
th
yr
oi
d
z
k r oo
-v
Accuracy (%) for PART, RIPPER, CBA

s-
kp
Comparison between AC algorithms on 12
UCI data sets
30

27

24

21

18 CBA
CMAR
(%)

15
CPAR
12 MCAR

0
1 2 3 4 5 6 7 8 9 10 11 12
Data
Potential Applications for
Associative Classification

• Scheduling and Optimisation using


Hyperheuristic Approach

• Text Categorisation
Hyperheuristics
Hyperheuristic
Heuristic
Choice

Low level L.L. Heuristic


heuristics performance
Solution perturbation
Solution
quality

Problem
Benefits of Hyperheuristics
• Low level heuristics easy to implement
• Objective measures may be easy to
implement – they should be present to
raise decision quality
• Rapid prototyping – time to first solution
low
Concrete example
• Organising meetings at a sales summit
• Low level heuristics:
– Add meeting, delete meeting, swap meeting,
add delegate, remove delegate, etc.
• Objectives:
– Minimise delegates
– Maximise supplier meetings
Concrete Example
• Hyperheuristic based on the exponential
smoothing forecast of performance, compared to
simple restarting approaches
• Result: 99 delegates reduced to 72 delegates
with improved schedule quality for both
delegates and suppliers
• Compares favourably with bespoke
metaheuristic (Simulated Annealing) approach
• Fast to implement and easy to modify
Other applications
• Timetabling mobile trainers
• Nurse rostering
• Scheduling project meetings
• Examination timetabling
Other Hyperheuristics
• Genetic Algorithms
– Chromosomes represent sequences of low
level heuristics
– Evolutionary ability to cope with changing
environments useful
• Forecasting approaches
• Genetic Programming approaches
• Artificial Neural Network approaches
Role of Data Mining (AC) in Hyperheuristic
Apply DM (AC)
Problem: algorithm
Assume there are 10 LLHs,
then at each choice point we
have to test all available LLHs LLH1 LLH1 LLH1 class
to select just a single one to 10 2 2 48
apply
2 20 2 73
48 48 70 2
LLH1 LLH1 LLH1 class 10 2 2 48
10 2 70 10
10 2 2 ?
48 48 2 73
2 20 2 73

Predict the class


Produce Rules

Classifier
(Model)
Text categorisation task

• Document categorisation; determining an assignment of


a value from {0,1} to each entry of the decision matrix.
• C = {c1,....., cm} is a set of pre-defined categories,
• D = {d1,..... dn} is a set of documents to be categorised.
• 1 for aij; a decision to file dj under ci,
• 0 for aij; a decision not to file dj under ci.
Category- or document-pivoted
categorisation

• An important distinction is whether we want to


fill the matrix one row at a time (Category-
Pivoted Categorisation - CPC), or fill it
one column at a time (Document-Pivoted
Categorisation - DPC).
AC for Text Categorisation
• Two approaches
– Learn a classifier for each document category
(class) and then merge all classifiers to
produce a global one. (Modified CBA by Zaine
and Antonie, 2002)
– Learn rules for all available categories once.
However, this may produce a multi-label rules
since a document might be associated with
multiple categories. (MMAC by Thabtah, et
al., 2004).
Some Future Research Directions

• Multi-label classification

• Noise in Test Data Sets


Multi-label classification
• Is it eatable? Is it a banana? Is it a banana?
• Is it sweet? Is it an apple? Is it yellow?
• Is it a fruit? Is it an orange? Is it sweet?
• Is it a banana? Is it a pineapple? Is it round?
Different structures

Nested/ Hierarchical Exclusive/ Multi-class General/Structured


Conclusions
• Associative classification is a promising approach in
data mining
• Associative classifiers produce more accurate
classification models than traditional classification
algorithms such as decision trees and rule induction
approaches
• The CARs produced by associative algorithms are
simple if-then rules which easily understood by
human
• One challenge in associative classification is the
exponential growth of rules, therefore pruning
becomes essential

Anda mungkin juga menyukai