Towards Efficient Re-Mining of Frequent Patterns Upon

Towards Efficient Re-mining of Frequent Patterns upon
Threshold Changes∗
1 1,2 1 2
Xiu-li Ma , Shi-wei Tang , Dong-qing Yang , and Xiao-ping Du
1
Department of Computer Science and Technology, Peking University, Beijing 100871,
China
xlma@db.pku.edu.cn,
2
National Laboratory on Machine Perception, Center for Information Science, Peking Uni-
versity, Beijing 100871, China
{tsw,dqyang}@pku.edu.cn, xpdu@cis.pku.edu.cn
Abstract. Mining of frequent patterns has been studied popularly in data min-
ing area. However, very little work has been done on the problem of updating
mined patterns upon threshold changes, in spite of its practical benefits. When
users interactively mine frequent patterns, one difficulty is how to select an ap-
propriate minimum support threshold. So, it is often the case that they have to
continuously tune the threshold. A direct way is to re-execute the mining proce-
dure many times with varied thresholds, which is nontrivial in large database. In
this paper, an efficient Extension and Re-mining algorithm is proposed for
update of previously discovered frequent patterns upon threshold changes. The
algorithm proposed in this paper has been implemented and its performance is
compared with re-running FP-growth algorithm under different thresholds. The
study shows that our algorithm is significantly faster than the latter, especially
when mining long frequent patterns in large databases.
1 Introduction
Frequent patterns mining plays an essential role in mining associations [1], correla-
tions [2], sequential patterns [3], and many other important data mining tasks. Since
the introduction of association mining in [1], there have been many studies on effi-
cient and scalable frequent patterns mining algorithms. A milestone is the develop-
ment of an Apriori-based, candidate generation-and-test approach. However, based on
the analysis in [4, 5], candidate set generation and test may still be costly, especially
when encountering long patterns. To overcome this difficulty, a new approach, called
frequent pattern growth, with a novel frequent pattern tree structure, has been devel-
oped in [4]. The approach adopts a divide-and-conquer methodology and mines fre-
quent patterns without candidate generation.
However, very little work has been done on the problem of maintaining discovered
association rules. Lee and Cheung have done some work [6, 7, and 8], all of which
focus on how to update association rules when the database is changed, while the
threshold remains the same.
∗ This work is supported by the National Grand Fundamental Research 973 Program of China
under Grant No.G1999032705
X. Meng, J. Su, and Y. Wang (Eds.): WAIM 2002, LNCS 2419, pp. 80-91, 2002.
© Springer-Verlag Berlin Heidelberg 2002
Towards Efficient Re-mining of Frequent Patterns upon Threshold Changes 81
As stated in [11], in real-world applications, users are often unsure about their re-
quirement on the minimum support at first. This can be due to the lack of knowledge
about the application domains or the outcomes resulting from different threshold set-
tings. As a result, they may be repeatedly unsatisfied with the association rules dis-
covered, and hence need to re-execute frequent patterns mining procedure many times
with varied thresholds. In the cases where large databases are involved, this could be
a time-consuming, trial-and-error process. In order to deal with this situation, it is
both desirable and imperative to develop an efficient means for re-mining a database
under different thresholds. As related to this problem, some work [9, 10, and 11] has
been done. However, they were based on Apriori. The difficulty of candidate genera-
tion-and-test method still exists.
Is there any other way that one may reduce these costs in frequent patterns re-
mining? A re-examination of this task in the framework of frequent pattern growth
leads to the answer:
First, due to the nature of FP-tree, the new FP-tree constructed for a database under
a lower support is just the extension to the old FP-tree for the same database. The
most essential feature of extension is, those existing nodes or their arrangement do not
change at all. This means we can reuse old trees to the utmost extent. Note that ex-
tending an existing tree is relatively much faster than constructing a new tree from
scratch, especially when the tree is big resulting from prolific or long patterns.
Second, based on its divide-and-conquer idea, FP-growth decomposes the mining
task into a set of smaller tasks for mining confined patterns in conditional databases.
Actually the method partitions the data set to be mined, the set of patterns resulted as
well as the mining task. With the threshold becoming lower, generating conditional
pattern bases and FP-trees for newly-becoming-frequent items does not affect the
originally frequent items.
Third, FP-growth adopts a pattern fragment growth method. In the recursive proc-
ess, quite a few bases and trees are generated. In this paper, we prove that the re-
mining procedure under a lower support is equivalent to a procedure of extending and
mining all of those old FP-trees. Moreover, the most exciting feature of this procedure
is its linearity, i.e., we can extend and mine those existing trees one by one, which
leads to the removal of first-level recursion. This is a real improvement.
These features combined together form the core of our new algorithm, Extension
and ReMining. A performance study has been conducted to compare the performance
of our method with re-running FP-growth. The Extension and ReMining method is
found to be averagely 1.5 to 6.5 times faster than re-mining with FP-growth.
The remaining of the paper is organized as follows. Section 2 gives a more precise
description of the problem. Section 3 briefly goes through the FP-tree structure and
FP-growth algorithm on which our new algorithm is based. Section 4 analyzes the
structure nature of FP-tree, develops the FP-tree extension algorithm. Section 5 de-
velops the Re-mining algorithm. Section 6 presents our performance study and ex-
perimental results. Finally, section 7 summarizes our study.
82 X.-l. Ma et al.
2 Problem Definitions
2.1 Mining of Frequent Patterns
Let I = {a1, a2, … , am} be a set of items, and a transaction database DB = <T1, T2, … ,
Tn>, where Ti ( i [1..n]) is a transaction which contains a set of items in I. The sup-
1
port (or occurrence frequency) of a pattern A, which is a set of items, is the number
of transactions containing A in DB. A, is a frequent pattern if A’s support is no less
than a predefined minimum support threshold(or threshold in short), ξ.
Given a transaction database DB and a minimum support threshold ξ, the problem
of finding the complete set of frequent patterns is called the frequent patterns mining
problem.
2.2 Re-mining of Frequent Patterns
Let L be the set of frequent patterns in DB given a minimum support threshold ξ. As-
sume that for each X L, its support, X.support, is available.
After having found some frequent patterns, users may be unsatisfied with the re-
sults and want to try out new results with certain changes on the minimum support
thresholds, such as from ξ to ξ’ .
Thus the essence of the problem of re-mining frequent patterns is to find the set L’
of frequent patterns under the new threshold efficiently.
We borrow two terms from [6] but give new meanings to them:
Losers: frequent itemsets those become infrequent after the threshold changes;
Winners: infrequent itemsets those become frequent after the threshold changes.
Specially, if such an itemset has only one item, we call it a loser item, or a winner
item.
When the minimum support threshold is changed, two cases may happen:
ξ’ >ξ, in this case, some original frequent item-sets will become losers.
ξ’ <ξ , in this case, some original infrequent item-sets will become winners.
In the first case, the updating of frequent itemsets is simple and intuitive. Just se-
lect those itemsets in L with support no less than ξ’, put them into L’. The algorithm
and its proof can be found in [11]. So let us concentrate on the second case. Note that
all of the conditional pattern bases and FP-trees are available as by-products of a pre-
vious mining.
3 FP-Tree and FP-Growth
Let’s briefly go through FP-tree structure and FP-growth algorithm in [4], on which
we will base. Readers are referred to the cited paper for detailed information.
A FP-tree is a tree structure defined as in Definition 1 of [4]:
1 Notice that support is defined as in [4] as absolute occurrence frequency, not the relative one
as in some literature.
1. It consists of one root labeled as “null”, a set of item prefix sub-trees as the chil-
dren of the root, and a frequent-item header table.
2. Each node in the item prefix sub-tree consists of three fields: item-name, count,
and node-link, where item-name registers which item this node represents, count reg-
isters the number of transactions represented by the portion of the path reaching this
node, and node-link links to the next node in the FP-tree carrying the same item-
name, or null if there is none.
3. Each entry in the frequent-item header table consists of two fields, item-name
and head of node-link, which points to the first node in the FP-tree carrying the item-
name.
The FP-growth mining process scans the FP-tree of DB once and generates a con-
ditional pattern-base Bai for each frequent item ai. Frequent pattern mining is then re-
cursively performed on the small pattern-base, by constructing a conditional FP-tree
for Bai. Boundary condition is that the FP-tree contains a single path.
4 Extending FP-Tree
Although FP-tree is rather compact, its construction needs two scans of a transaction
database, which may represent a nontrivial overhead. It could be beneficial to extend
an existing FP-tree. Materialization plays a very important role and provides a foun-
dation in our work. But we will not focus on it as the performance data show this
phase only takes a very small portion of the whole re-mining process. In this section,
we focus on how to extend an existing FP-tree. The example transaction database
(borrowed from [4]) shown in table 1 is used for illustration. Table 2 gives a list of the
symbols used in this paper.
Table 1. The example transaction database, DB
TID Items Bought (Ordered)Frequent (Ordered) Delta Set

Items of Frequent Items
100 f, a, c, d, g, i, m,p f, c, a, m, p
200 a, b, c, f, l, m, o f, c, a, b, m l, o
300 b, f, h, j, o f, b o
400 b, c, k, s, p c, b, p
500 a, f, c, e, l, p, m, n f, c, a, m, p l
Table 2. Definitions of several symbols
DB A transaction database
ξ A minimum support threshold
ξ’ The new minimum support threshold, ξ’<ξ
T The FP-tree constructed for DB under threshold = ξ
T′ The FP-tree constructed for DB under threshold = ξ′
84 X.-l. Ma et al.
root
f:4 c:1
c:3 b:1 b:1
a:3 p:1
m:2 b:1
p:2 m:1
Fig. 1. The FP-tree under threshold 3 Fig. 2. The FP-tree under threshold 2
Several important properties of FP-tree can be observed from its construction proc-
ess.
Property1 Given DB, ξ, ξ’, T and T′ as meant in table2. Then T′ is just an extension
to T, i.e., in T′, neither the arrangement nor the supports of all those nodes, which
have been appeared in T, change.
Rationale. Based on the FP-tree construction process, each transaction in DB is
mapped to one path in the tree, moreover, only the frequent items in transaction are
selected out, and then sorted in support descending order. When re-mining using a
lower support threshold, some originally infrequent items may become winners, but
such an item can only be appended to the tail of the path corresponding to the trans-
action in which it is. As that the appearance of winner items does not change the order
and the support of the original elements in each path, we have the lemma.
Example1 Let us examine the FP-tree for DB in Table1 with the threshold changing
from 3 to 2. The FP-trees under threshold 3 and 2 are respectively illustrated in Fig.1
and Fig.2. Winner items (in shadow)’s joining doesn’t affect those originally frequent
items. (For lack of space, we do not give out the head tables.)
With these observations, we make some changes on the FP-tree construction and
FP-growth algorithm from [4] in order of the first-time mining.
1. As the first-time scan of DB can obtain all items’ supports, we store them for
later use. This saves the time for later re-scan to identify winner items. We put all
items in Header Table, and sort them in support descending order. After mining under
a threshold ξ, a tail value is stored to denote the index of the entry with smallest sup-
port no less than ξ. When re-mining using ξ’<ξ, we only need to start from tail, to
identify the index of the entry with smallest support no less than ξ’. The sort of all
items burdens the first-time mining, but benefits the whole procedure.
2. In order to extend the branches, a table end-position should be maintained to
save the end position in the FP-tree of each transaction.
3. In order to extend a FP-tree, its corresponding threshold, its conditional base, its
header table and the tail value, the pattern fragment grown (we call it the seed) and
the end position table are all needed. When a mining ends, for each FP-tree (some are
generated recursively), we integrate the above things into a whole object and materi-
alize it. When extending later, we can load and re-use the materialized objects. For the
whole tree, we save null as its base. Because, materializing the whole database will be
unfair for other algorithms. According to the analysis in [4], a conditional base is usu-
ally much smaller than its original FP-tree.
Based on the above analysis, we have the following FP-tree extension algorithm.
Algorithm1 FP-tree extension

Input: An integrated object of T for DB under ξ. ξ’ < ξ.
(DB, ξ, ξ’, T as in Table2.)
Output: The FP-tree, T′, for DB under ξ’.
Method: T′ is obtained by extending T in the following
steps:
1. Identify the new tail, which is the index of the en-
try in header table with the smallest support no less
than ξ′.
2. For each transaction Trans in DB do the following.
Select those items in Trans satisfying ξ’ ≤
item.support < ξ and sort them into a list [p|P] ac-
cording to the order of the header table, where p is
the first item and P is the remaining list. Let the end
position of Trans in T is E, Call insert_tree([p|P],
E), as in [4]. Update the entry for Trans in the end-
position table to the new end.
We can see that, contrasting with re-constructing a FP-tree, extending an existing

FP-tree saves one scan of DB for collecting of the set of frequent items.
The cost of inserting a transaction Trans into FP-tree is O(|Trans|), where |Trans| is
the number of all frequent items in Trans. As in FP-tree extension, only winner items
in each Trans are appended into the tree from the corresponding end position, the cost
is cut also.
The completeness and compactness of the new FP-tree obtained based on Algo-
rithm1 can be guaranteed by [4] and Property1.
Note that we exchange space for the improvement in performance.
86 X.-l. Ma et al.
5 Re-mining of FP-Tree
Obtaining a FP-tree through extension ensures that the previous FP-tree can be re-
used. However, this does not guarantee an efficient mining if we simply execute FP-
growth on this FP-tree, since one still needs to recursively construct conditional bases
and FP-trees. In this section, we will study how to re-use everything of the previous
mining process including all the conditional bases and FP-trees. An efficient re-
mining method will be developed for mining the complete set of frequent patterns.
Lemma1 Given DB, ξ, ξ’ as meant in Table2, and a item ai, ai.support ≥ ξ, ai’s
conditional pattern base under ξ is the same as that under ξ’.
Proof. Based on its definition in [4], ai’s conditional pattern base is formed by the set
of transformed prefix paths of ai. Based on Property1, we can easily find that all the
paths collected in the conditional pattern base of ai do not change at all. Thus we have
the lemma.
Example2 Let us examine the re-mining process for DB in Table1 with the threshold
changing from 3 to 2. The FP-tree under threshold 2 is illustrated in Figure 2.
As the FP-growth is a partitioning-based method, we can split the set of frequent
items under threshold 2 into two parts: (1) each item ai satisfying ai.support ≥ 3, such
as p, m, b, a, c, f, has been considered under threshold 3, not under 2 yet. (2) Each
item aj satisfying 2 ≤ aj.support < 3, such as l and o, has not been considered at all.
It is easier for us to deal with the items of the second type, just construct condi-
tional pattern base and FP-tree and execute FP-growth as in [4], under threshold 2.
Table 3. Summary of all frequent items for DB under threshold 2
frequent conditional pattern base conditional FP- conditional FP-tree

item tree under 3 under 2
p {(f:2, c:2, a:2, m:2), {(c:3)}|p {(c:3, f:2, a:2, m:2)}|p
(c:1, b:1)}
m {(f:2, c:2, a:2), (f:1, {(f:3, c:3, a:3)}|m {( f:3, c:3, a:3)}|m
c:1, a:1, b:1)}
b {(f:1, c:1, a:1), (f:1), φ {(f:2, c:1), (c:1)}|b
(c:1)}
a {(f:3, c:3)} {(f:3, c:3)}|a {(f:3, c:3)}|a
c {(f:3)} {(f:3)}|c {(f:3)}|c
f φ φ φ
l {(f:1, c:1, a:1, m:1, p:1), {(f:2, c:2, a:2, m:2)}|l
(f:1,c:1, a:1, b:1, m:1)}
o {(f:1,c:1, a:1, b:1, m:1, {(f:2, b:2)}|o
l:1), (f:1, b:1)}
bc {(f:1)} φ
bf φ φ
Next let’s concentrate on part one. Based on Lemma1, each item’s conditional
pattern base remains the same under different thresholds, whereas based on Property1,
the FP-tree T′i of an item ai under 2 should be an extension of its FP-tree under 3. For
p, we just extend the conditional FP-tree {(c:3)}|p} into {(c:3, f:2, a:2, m:2)}|p}. Then
we can decompose the task into two parts as above. Thus the re-mining problem about
p under 2 turns to the same problem as re-mining the whole FP-tree.
For each frequent item in the global FP-tree of Fig.2, we summaries their condi-
tional pattern bases and extension of conditional FP-trees in Table3. Note the entries
for ‘bc’ and ‘bf’ result from the FP-tree extension of ‘b’.
Remember everything necessary for extending a FP-tree have been integrated, a
FP-tree can be extended and mined alone, instead of nested within some other FP-
growth. Based on the above analysis, we can find the re-mining problem is a proce-
dure of extending and re-mining all the existing FP-trees. Thus we have the following
algorithm for re-mining.
Algorithm2 ReMining(Re-mining frequent patterns under a
lower threshold)
Input: Given DB, ξ, ξ’ as meant in table2; S, the list of
integrated objects for all the FP-trees generated during
the FP-growth process for DB under ξ; L , set of frequent
patterns under ξ; X.support ∀X•L, the support count of
every item-set X in DB under ξ;
Output: L′, the complete set of frequent patterns of DB un-
der ξ'.
Method: {
for each element Treei in S do {
oldtail= Treei.tail;
call FP-tree-extension(Treei, ξ');
call Extending-Growth(Treei, ξ', oldtail, S, L);}
}
Procedure Extending-Growth (Treei, ξ', oldtail, S, L)
{
α = the seed of Treei
for each item aj between oldtail and the new tail in the
header table of Treei , do
{
generate pattern X = aj ∪ α with support =
aj.support;
L′ = L′ ∪ {X};
construct X’s conditional pattern base BX and then
X’s conditional FP-tree TreeX;
integrate the object for TreeX, and append it to S′;
if TreeX <> null then call FP-growth(TreeX, X);
}
L′= L∪ L′
}
88 X.-l. Ma et al.
Analysis. Note that, we can optimize the re-mining when a FP-tree contains only one
path. Such a tree can be cut into two parts. For lack of space, we remove the details.
Note if we can get the complement frequent patterns consisting of only winners if
we want. As in real-world applications, users are often more interested in the differ-
ential pattern sets between different thresholds. Not the final status, but what changes,
may attract more interest.
Lemma2 Given DB,ξ, ξ’ as meant in Table2, re-mining DB using FP-growth under ξ’

is equivalent to re-mining all the FP-trees generated under ξ by calling ReMining.
Proof. Let S denote the set of FP-trees obtained during re-mining, S′ denote the set of
FP-trees obtained during re-execute FP-growth under ξ’. According to Lemma1 and
the generation principle of the FP-tree in S, we can conclude that there is one and only
one element in S corresponds and equals to a FP-tree in S′, and vise versa. So S= S′.
As in FP-growth, the result comes from concatenation between all the frequent items
of a FP-tree with the tree’s seed fragment. In re-mining, we guarantee the right con-
catenation by integration of a FP-tree and its right seed fragment.
6 Experimental Evaluation and Performance Study
Let’s now examine the efficiency of the re-mining algorithm. We transform the prob-
lem from recursively mining the whole FP-tree for DB to extending each existing
small conditional FP-tree and dealing with those winner items in each FP-tree. So,
what we should care for is whether storing and loading all those integrated objects
spend more time than constructing them again. Note that one path in FP-tree repre-
sents frequent items in multiple transactions. We just deal with a node N once in ma-
terialization instead of reaching N n times in construction, where n=N.count. Thus
materialization spends less time than construction.
Obviously, we spend more space than before. However, with the growing capacity
of main memory, we prefer sacrificing space for effectiveness.
In this section, we present a performance comparison of re-mining with re-
executing FP-growth under different thresholds.
All the experiments are performed on a 733-MHz Pentium PC machine with 256M
main memory. All the programs are written in Microsoft/Visual C++6.0. Notice we
do not directly compare our absolute number of runtime with those in [4] because dif-
ferent programs may differ on the absolute runtime for the same algorithms. Instead,
we implement the algorithms in [4] in our environment.
The synthetic data sets which we used for our experiments were generated using
the procedure described in [1]. We report experimental results on two data sets. The
first one, denoted as D1, is T25I15D10 with 1K items. The second data set, denoted
as D2, is T25I20D100 with 10K items. There are exponentially numerous frequent
itemsets in both data sets, as the support threshold goes down. There are pretty long
frequent itemsets as well as a large number of short frequent itemsets in them.

F
H
V
FP-growth
H
P
L
Re -mining
W
Q
X
U

PLQLPXPVXSSRUW
Fig. 3. Scalability with support threshold for D1

F
H
V
)3JURZWK
H
P 5HPLQLQJ
L
W
Q
X
U

PLQLPXPVXSSRUW
Fig. 4. Scalability with support threshold for D2
R
L
W
D
U
'
S
X '
G
H
H
S
6

PLQLPXPVXSSRUW
Fig. 5. Speed Up Ratio vs. support threshold change
Let us just consider the case ξ’ <ξ, the runtime of FP-growth and Re-mining for D1
are plotted in Fig.3, while for D2 in Fig.4. In Figure5, performance ratio of re-mining
to FP-growth for D1 and D2 are plotted.
We start the mining with threshold 3%, and then tune the threshold to 2%, 1.5%,
1%, 0.75%, 0.6%, 0.5%. Clearly, for both datasets, re-mining wins FP-growth except
for the first time mining. Because at the first time mining using our algorithm, more
time is needed to integrate and materialize the FP-trees for the purpose of the later
90 X.-l. Ma et al.
extension. We also see, when the support threshold is not very low, the performance
speedup ratio for D2 is higher than that for D1. As the support threshold becomes
lower, the speedup ratio for D2 descends. The main reason is that, generally speaking,
the FP-tree constructed from a larger database with long patterns is more prolific and
taller than that from a smaller database with short patterns. Hence, extension and re-
mining of a big FP-tree can save much more time than re-construction and re-
executing mining from scratch. But when the support is tuned much lower, the FP-
tree is too big to fit in main memory, and easily cause thrashing. Note that the
FP-tree in [4] itself also encounter this difficulty. So, in the case the main memory is
enough, our new algorithm performs better on mining large database with long pat-
terns.
Let’s have an analysis on the advantage of the re-mining method: (1) FP-growth
needs to construct all FP-trees recursively, whereas ReMining algorithm only extends
existing FP-trees separately. Removing recursion at first level is an important virtue
of the latter. This leads to the major part of performance difference. (2) The material-
izing process manipulates each node only once. (3) FP-growth needs to re-construct a
very large FP-tree for DB from scratch, whereas the extension needs only to make
comparatively much smaller change to existing FP-tree. All of these contribute to the
efficiency of re-mining. So we can first mine under a given threshold, then tune the
threshold for re-mining till satisfying result is got.
7 Conclusion
We have proposed an efficient method for re-mining frequent patterns upon support
changes. Based on reusing materialized by-products of previous mining, re-mining
becomes a tuning procedure. Our algorithm is based on the recently reported FP-tree
as we find many properties of FP-tree suitable for re-mining.
We have implemented the re-mining method in our DataMagic system, studied its
performance in comparison with re-running FP-growth. Our performance study shows
that the method mines efficiently and outperforms FP-growth in large databases with
long patterns. The problem of our new algorithm lies in the need of large capacity of
main memory. However, we believe that, with the growing capacity of main memory,
many users prefer to exchange space for efficiency.
There are a lot of interesting research issues related to re-mining, including further
study and implementation of re-mining sequential patterns and other interesting fre-
quent patterns. We also take efforts towards space-preserving re-mining.
References
1. R. Agrawal and R.Srikant. Fast algorithm for mining Association rules. In VLDB’94,
(1994) 487-499.
2. S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generatalizing associa-
tion rules to correlations. In SIGMOD’97, (1997) 265-276.
3. R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE’95, (1995) 3-14.
4. J. Han, J.Pei, Y.Yin. Mining frequent patterns without candidate generation. In
SIGMOD’00, (2000) 1-12
5. J. Han and J. Pei Mining frequent patterns by pattern-growth: methodology and implica-
tions In SIGKDD’00, (2000) 14-20.
6. D. Cheung, J. Han, V. Ng, C. Wong Maintenance of discovered association rules in large
databases: An incremental updating technique. In ICDE’96. (1996)
7. D. Cheung, S. Lee, B. Kao A general incremental technique for maintaining discovered
th
association rules. In Proceedings of the 5 International Conference on Database Systems
for Advanced Applications, (1997).
8. S. Lee and D. Cheung Maintenance of discovered association rules: When to update? In
DMKD’97. (1997)
9. Feng Yu-cai, Feng Jian-lin Incremental updating algorithms for Mining association rules
In Journal Of Software, Vol. 9, No. 4, (1998) 301-306.
10. OU-YANG Weimin, CAI Qing-sheng An incremental updating technique for discovered
generalized sequential patterns In Journal Of Software, Vol. 9, No. 10, (1998) 777-780.
11. J. Liu and J. Yin Towards efficient data re-mining (DRM) In PAKDD’01, (2001).406-
412.

Towards Efficient Re-Mining of Frequent Patterns Upon

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Towards Efficient Re-Mining of Frequent Patterns Upon

Diunggah oleh

Hak Cipta:

Format Tersedia

Towards Efficient Re-mining of Frequent Patterns upon

2.1 Mining of Frequent Patterns

2.2 Re-mining of Frequent Patterns

3 FP-Tree and FP-Growth

Table 1. The example transaction database, DB

TID Items Bought (Ordered)Frequent (Ordered) Delta Set

Table 2. Definitions of several symbols

c:3 b:1 b:1

Algorithm1 FP-tree extension

We can see that, contrasting with re-constructing a FP-tree, extending an existing

Table 3. Summary of all frequent items for DB under threshold 2

frequent conditional pattern base conditional FP- conditional FP-tree

Lemma2 Given DB,ξ, ξ’ as meant in Table2, re-mining DB using FP-growth under ξ’

6 Experimental Evaluation and Performance Study

Fig. 3. Scalability with support threshold for D1

Fig. 4. Scalability with support threshold for D2

Fig. 5. Speed Up Ratio vs. support threshold change

Anda mungkin juga menyukai