Donato Malerba
Dipartimento di Informatica
Università degli studi di Bari
malerba@di.uniba.it
http://www.di.uniba.it/~malerba/
Overview
• Single-table assumption
• (Multi-)relational data mining and ILP
• FO representations
• Upgrading propositional DM systems to FOL
• A case study: Mining Association rules
• Conclusions
Induced model
• Example:
eastbound([c(rectangle,short,none,2,l(circle,1)),
c(rectangle,long,none,3,l(hexagon,1)),
c(rectangle,short,peaked,2,l(triangle,1)),
c(rectangle,long,none,2,l(rectangle,3))]).
• Background theory: empty
• Hypothesis:
eastbound(T):member(C,T),arg(2,C,short),
not arg(3,C,none).
eastbound::Train>Bool;
• Example:
eastbound([(Rectangle,Short,None,2,(Circle,1)),
(Rectangle,Long,None,3,(Hexagon,1)),
(Rectangle,Short,Peaked,2,(Triangle,1)),
(Rectangle,Long,None,2,(Rectangle,3))]) = True
• Hypothesis: eastbound(t) = (exists \c > member(c,t) &&
proj2(c)==Short && proj3(c)!=None)
• Example language: Escher™ functional logic programming
CAR_TABLE
CAR
CAR TRAIN
TRAIN SHAPE
SHAPE LENGTH
LENGTH ROOF
ROOF WHEELS
WHEELS
c1
c1 t1
t1 rectangle
rectangle short
short none
none 22
c2
c2 t1
t1 rectangle
rectangle long
long none
none 33
c3
c3 t1
t1 rectangle
rectangle short
short peaked
peaked 22
c4
c4 t1
t1 rectangle
rectangle long
long none
none 22
…… …… …… ……
Problem statement
Given:
• a set of transactions D
• a couple of thresholds, minsup and minconf
Find
all association rules that have support and
confidence greater than minsup and minconf
respectively.
Problem decomposition
• Find large (or frequent) itemsets
• Generate highly-confident association rules
Representation issues
• The transaction set D may be a data file, a relational
table or the result of a relational expression
• Each transaction is a binary vector
Relevant work
Agrawal & Srikant (1999). Fast Algorithms for Mining Association Rules,
in Readings in Database Systems, Morgan Kaufmann Publishers.
Han & Fu (1995). Discovery of Multiple-Level Association Rules from
Large Databases, in Proc. 21st VLDB Conference
Problem statement
Given:
• a deductive relational database D
• a couple of thresholds, minsup and minconf
Find
all association rules that have support and
confidence greater than minsup and minconf
respectively.
Problem decomposition
• Find large (or frequent) atomsets
• Generate highly-confident association rules
Representation issues
A deductive relational database is a relational database
which may be represented in first-order logic as follows:
• Relation Set of ground facts (EDB)
• View Set of rules (IDB)
WARMR APRIORI
• Breadth-first search on • Breadth-first search on
the atomset lattice the itemset lattice
• Loading of an • Loading of a transaction t
observation o from D from D (tuple)
(query result)
• Largeness of candidate • Largeness of candidate
atomsets computed by a itemsets computed by a
coverage test
subset check
Pruning step
MRDM – Prof. D. Malerba 38
Mining association rules
The ILP approach
Candidate evaluation
is_a(X, large_town), intersects(X,R), is_a(R, road), adjacent_to(X,W), is_a(W, water)
?- is_a(X, large_town),
intersects(X,R), is_a(R, road),
adjacent_to(X,W), is_a(W, water) D
no
<X=barletta,R=a14,W=adriatico>
Large?
<X=bari,R=ss16bis,W=adriatico>
...
yes
MRDM – Prof. D. Malerba 39
Mining association rules
The ILP approach
Rule generation
yes High no
confidence?