A Rough Set Model For Sequential Pattern Mining With Constraints

16 (IJCNS) International Journal of Computer and Network Security,
Vol. 1, No. 2, November 2009
A Rough Set Model for Sequential Pattern Mining

with Constraints
Jigyasa Bisaria1, Namita Srivastava2 and Kamal Raj Pardasani3
1,2,3
Department of Mathematics,
Maulana Azad National Institute of Technology (A Deemed University), Bhopal M.P.
jigyasab@gmail.com, sri.namita@gmail.com, kamalrajp@hotmail.com
Abstract: data mining and knowledge discovery methods sequential patterns is about finding all those patterns which
host many decision support and engineering application needs satisfy . Under classical framework constraints can be
of various organisations. Most real world data has time classified as monotonic, anti-monotonic and succint [14]. A
component inherent in them. Sequential patterns are inter-event constraint is anti-monotonic if its agreement for any
patterns ordered in time associated with various objects under sequence α implies its satisfaction by all its subsequences. A
study. Analysis and discovery of frequent sequential patterns in constraint is monotonic if a sequence α satisfies
user defined constraints are interesting datamining results.
implies that every super-sequence of α also satisfies .
These patterns can serve a variety of enterprise applications
concerning analytic and decision support needs. Impostion of Succinct type of constraints is pre-counting pushable
various constraints further enhances the quality of mining constraints such that for any sequence α the satifaction of
results and retrict the results to only relevent patterns. In this the constraint implies its satisfaction by all the elements of
paper, we have proposed and rough set perspective to the sequence α. A succinct constraint is specified using a
problem ofconstraint driven mining of sequential pattern. We precise “formula”. According to the “formula”, one can
have used indiscernibility relation from theory of rough sets to generate all the patterns satisfying a succinct constraint.
partition the search space of sequential patterns and have There is no need to iteratively check the constraint in the
proposed a novel algorithm that allows pre-visualization of mining process.
patterns and imposition of various types of constraints in the Early work in the domain of constriant imposition into
mining task. The algorithm C-Rough Set Partitioning is atleast
sequential pattern mining task is the algorithm GSP [3].
ten times faster than the naïve algorithm SPRINT that is based
on various types of regular expression constriants.
They proposed the concept of time interval constraint,
maximum gap and minimum gap constraint and build them
Keywords: Rough sets, Sequential patterns, constriants, into apriori algorithm framework. Another work in the
indiscernibility, partitioning framework in time interval constraints is given by Mannilla
et.al [2]. They defined “an episode as a collection of events
that occur relatively close to each other in a given partial
1. Introduction
order.” They did consider the importance of time frame of
Sequential pattern mining is studied extensively in data patterns and gave the concept of event window and sliding
mining literature due to its applicability into a variety of event window. They defined patterns as directed acyclic
applications. It is applied to a lot of real world decision graphs with vertex as a single event and edge as “Event A
support applications like root causes of banking customer occurs before event B”. Their method of finding frequent
churn [8], analysis of web logs [9], fault diagnosis and episodes is “bottom-up candidate-generate and test
prediction in telecom networks [10], study of adverse drug apporach” which is similar to Apriori ALL proposed by
reactions as temporal association rules[11]. The enormous Agrawal and Srikant [1].
search space and huge number of patterns are inherent F Masseglia et al.[15] have also proposed the time
challenges in the sequence mining task. Conventional constraint imposition into mining of sequential patterns.
studies into sequential pattern mining give various They have presented a graph theoretic mining algorithm to
computational methodologies to enumerate the frequent deduce the search space of time constraint sequential
sequence space [1]-[6]. These methods mine all sequential pattern.
patterns in the support confidence framework. Garofalakis et al. [16] have given the framework for
Computational methodologies in [1]-[5] are botton up imposing regular expression cosntraint into sequential
candidate generate and test approaches. The method pattern mining. A regular expression R is a set of
PrefixSpan [6] works on the concept of iteratively projecting expressions such as disjunction and Kleene closure [17]. R
the database on the basis of the prefix. This method does not specifies a language of strings over a regular family of
generate any candidate and is strictly based on the events sequential patterns that are of interest to the user. They
present in the database. confirmed that Regular expression constraints have the
New generation mining methods require the retrieval of same expressive power as diterministic finite automata [17].
patterns in user defined constraints. Impostion of constraints The algorithms SPRINT is a multi database scan candidate
not only condense the mining results to the most useful ones generate and test strategy based on GSP [3]. The candidate
but also reduce the search space and improve performance. generate strategy works on imposing a relaxed constraint
A constraint can be regarded as a Boolean function on
all sequences. The problem of constraint based mining of
(IJCNS) International Journal of Computer and Network Security, 17
The method first genrates candidates and checks for validity address most decision centric constraint imposition tasks. In
patterns that statisfy the given the regular expression this paper, we explain all the seven types of constraint their
constraint and then finds occurance frequency for such treatment in the rough set based framework. Here we retrict
length-1 sequences that cross the minimum support our discussion to length-1 sequences. This correspond to
threshold. This becomes the seed set for further iteration. many real world sequential patterns for example sequential
The Candidate Length-2 sequences are formed by joining pattern of web access patterns, faults in telecom landline
the elements of the seed set. Now, the database is scanned networks etc. (i) We have proposed a user friendly interface
again for searching these candidates and their counts are that generates previsualization of a sample of emerging
accumulated after checking the relaxed constriant . In sequential patterns and allows flexible imposition of time,
subsequent iterations, candidate k-length sequences are length, gap constraint prior to mining task and (ii) we have
formed by joining frequent k-1 sequences that have the same presented a novel algorithm based on indiscerniblity relation
contiguous subsequences. Suppose a sequence from theory of rough sets to address the computational
Sα= e1 , e 2 ,......e n , another sequence sβ is a contigeous aspect of the expensive mining problem of frequent
sequential patterns satisfying item, super pattern, regular
subsequence of Sα if (i) sβ is derived from Sα (ii) sβ is expression contraints. It is found from experimental
derived from Sα by dropping an item from an element ej that evaluations that our algorithm is atleast 10 times faster than
has at least 2 items. (iii) sβ is a contiguous subsequence of algorithm SPRINT [16].
sδ and sδ is a contiguous subsequence of Sα
The process is continued untill all frequent sequences 2. Problem Formulation
present in the database are found satifying the relaxed
constriant From theory of rough sets, an information system is given
Given an anti-monotonic constriant, the constraint is first as: S = {U, A t , V, f} where U is a finite set of objects,
imposed and candidates which do not satisfy the constraint U = {x1 , x 2 ,.............x n } At is a finite set of attributes, At is
are pruned. It is clear that like the support constraint, the further classified into two disjoint subsets, conditional
constraint is also anti-monotonic, that is if the constriant attributes C and decision attributes D, A t = C ∪ D
is not supported by a sub-pattern it will not be supported by
its super pattern also. V= UV p
and Vp is a domain of attribute p
p∈ At
In case the constraint in monotone an appropriate
choice of relaxed constriant is used for generate of valid f : U × A t → V is a total function such that f (x i , q) ∈ Vq for
results. The family of SPRINT methods suffer from the every q ∈ A t and x i ∈ U . Consider an example transaction
drawback of huge query overhead due to multiple scans, database as in TABLE I.
weak constriant imposition based candidate generation
followed by frequent pattern discovery from amongst the Table 1: Example transaction database
candidate set.
Han et al. [17] have confirmed the imposition of various
user defined constraints for efficient mining of patterns.
They have proposed architecture for mining
multidimensional association rules in the framework of
online analytical mining. They proposed constraint
imposition at the level of transaction database with the use
of PLSQL query language which is further subject to
multidimenisional association pattern discovery.
Pei et al. [14] have studied the process of constriant
imposition in the framework of prefixspan [6]. They have
presented the constraint imposition framework in both
classical and application centric framework. Their work
presents a detailed study on how conventional monotone,
anti-monotone and succinct constraints can be studied as a
prefix constraint while recursively projecting the database
with the same. Their study confirmed that while the method
prefixspan is efficient for sequential pattern mining it is not
suitable for constraint driven mining. They have presented a
systematic study of regular expression and aggregate
constraints imposition and presented various application
oriented examples for tough but interesting constraints.
They defined seven categories of constraints from the A t = (T, I) where T is the set of transaction times and I is
application perspective; item, super pattern, time intrerval, the set of associated itemsets with x i . Examples of
gap between subsequent transactions, regular expression transaction database can be database of customer purchase
constraint, length of sequence and various aggregate patterns in a retail store, web access details etc. There are
constraints. Though these are not the complete set of multiple instances of the same customers ( x i ) in the
possible constraints but are more or less comprehensive to
information system U. Alternate representation of the
transaction database is termed as a sequence database For example consider the example of web browsing patterns
formed by grouping transactions corresponding to same of customers, a pattern of type 3 can be web access pattern
( x i ).The alternate information system is S' = (U, E) where which encapsulates the subsequence (online advertisement,
U = {x1 , x 2 ,.......x n } E = {e1 , e 2 , e3 ,........em } a sequence or product site). Super pattern constraint is monotone and
succint.
serial episode is defined as a set of events that occur within
Constriant type 3: (time interval constraint) a transaction
a predetermined event interval and are associated with the
database has time stamp information against event labels.
object under study. Given I be the set of itemsets
The time interval or duration constraint are a set of
I = {i1 ,i 2 , i3 ..............i n } then the set of sequence E ⊂ A t is
sequences with the property that the time interval between
formed by combining itemsets associated with the same first and last transaction is less than or greater than a
object ordered by time. ∀ei ∈ E ei = {i1 , i 2 ,.......i l } The specific value.
length of a sequence is the number of items it contains. A k- (5)
sequence contains k items k = ∑ e j . The absolute support Where and is a given integer. The length of
j the sequential pattern depends on the choice of the time
of a sequence ei is defined as the number of transaction that interval under study. Let in T ⊂ A t , t s be the start Time
contain it and relative support is defined as sup (ei) = and t e be the end time for study of transaction patterns.
absolute support/no. of objects in the dataset. A pattern is Then, the event/time interval for study of patterns is given
frequent if it crosses a user specified frequency threshold by: t s − t e for given information system S. If we group the
called minimum support threshold [15]
Given sequences, , represents a disjuction transaction information I ⊂ At corresponding to the
operator which indicates the selection of either of the event same x i , we derive and alternate representation of the
patterns. Here, is the ith element of the sequence. is a information system S. If we impose time interval retriction
regular expression constraint. represent the Kleene we derive sequence database in constraint time interval. The
closure operatorwhich signifies zero or more occurances of maximum length can be controlled by the appropriate
element . consideration of time interval constraint. Consider the
The problem of constraint driven mining of sequential transaction database in TABLE I If the time interval under
patterns is concerned with discovery of frequent patterns consideration is 20 days then the sequence database is as
that also satisfy user specified contriants. Commonly given in TABLE II and if the time interval under
imposed constriant can be classified in the following consideration is 25 days then the derived sequence database
categories. is given by TABLE III. Both length and time interval
Contraint type 1: (Item constraint) An item constraint constriants are anti-monotone under operation and they
specifies subset of items that should or should not be present are monotone and succint under the operation.
in the patterns. Considering the case of n size length-1 Constraint type 4: (Length Constraint) In case of length-1
sequential patterns V also corresponds to subsequence sequences this type of constraint restricts the size of the
relation. sequence under consideration. It can be the restriction of the
(1) maximal pattern length.
(6)
Where V is the subset of items, Consider the example in TABLE I,II,III the maximum
length of sequential pattern in TABLE II is 5 while in case
If then the item constraint is both anti-monotone and of TABLE III it is 3.
succint under operation. Constraint type 5: An aggregate constraint is the
If then the item constraint is both amonotone and constraint on an aggregate of items in a pattern, where the
succint under operation. aggregate function can be sum, avg, max, min, standard
Example of type 1 constraint is discovery of specific web deviation, etc.
usage pattern of customer characterized by one type of sites For example in case of data for market basket analysis the
for example online gift stores. Another example in case of retails store customer might be intrested in knowing those
fault diagnosis in telecom landline networks; a constraint of items which the sum of bill was more than 2000 Rs. Some
type 1 can be characterized by all sequential patterns in aggregate function like sum, average on both positive and
which the fault signal “dead phone” is present or absent. If negetive values are neither monotone, anti-monotone or
T is the set of gift stores on the web then, succinct.
(2) Constraint type 6: (Regular Expression Constraint) the
Given the domain all uniques sequential patterns; all regular expression constraints are specified as a regular
transactions that follow the type 1 constraint are the expression over the set of items using regular expression
members of the indiscernibility relation formed by the operators like disjunction or Kleene closure. A sequential
equivalence class of patterns indiscernible with respect to pattern satisfies a regular expression constraint if and only if
the concept of pattern existance. the pattern is accepted by equivalent finite automata. Like
(3) aggregate constraints regular expression constraints are also
Constraint type 2: (super pattern constraint) a super neither monotone or anti-monotone nor succinct.
pattern constraint finds those patterns which encapsulate a Constraint type 7: (gap constraint in adjescent
user specified sequence. transactions) in many transaction events have to be
(4) equispaced in time that is the time gap between subsequent
transactions have to be either greater or smaller than a

prespecified gap.
(7)
Where and is a given integer. The gap
constraint has the anti-monotone property.
We categorize constriants in two categories one which
influence sequence database like length of pattern, time
interval and gap between subsequent patterns named as
CAT1 and other are that category of constraints which mine
specific patterns in the sequence database under study.
Examples of second category of constraints are regular Figure 2. Patterns with constriant
expression, item, super patterns and aggregate constraints
named as CAT2. Figure 3 and Figure 4 give PLSQL code snnipets for finding
appropriate sequence database in above mentioned
3. Proposed Model and Method constraints.
The proposed algorithm C-RSP is a break and search Π Top k LocationId from Table1 where transaction_date>=Tstart &
strategy. C-RSP proposes a complete mining system that transaction_date<=Tend
//--Π is project operator of relational algebra which implies Select Distinct k is
allows imposition of all types of constriant. The input to the the number of records the user //wishes to visualize
problem of mining sequential pattern in user defined FOR each customer id in the rec_inner_test
constriant is the transaction database of objects under study. LOOP
A sample database is given in TABLE I. It is evident that return_str:='';
FOR I IN 1..rec_inner_test.COUNT
resultant sequence database is governed by user’s choice of LOOP
time interval and maximum length constriant. return_sequence:=return_str||rec_inner_test(i).signal||':'
The algorithm first presents a user interface that allows END LOOP;
Update the Sequence_table with Sequence against each LocationID
flexible and adjustible impostion of CAT1 types of
ENDLOOP
constraints.
Figure 3. Algorithm Pseudocode to derive sequences from
Once the user derives the relevent sequence database under
transaction database in user specified time interval
study by impostion of CAT1 categories of constriants; the
sequence database is now the input to the mining of patterns
in CAT2 categories of constriants. Π Top k LocationId from Table1 where Lengthofsequence<=n
This is a done by presenting a user interface that gives a //--Π is project operator of relational algebra which implies Select Distinct k
is the number of records the user //wishes to visualize
view FOR each customer id in the rec_inner_test
of the sequence database on choosing an appropriate time LOOP
interval. Figure 1 gives user interface that allows pre- return_str:='';
visualization of sequences formed by transactions FOR I IN 1..rec_inner_test.COUNT
LOOP
indiscernible with maximal time interval of patterns. Figure return_sequence:=return_str||rec_inner_test(i).signal||':'
2 gives the user interface for previsualization of the END LOOP;
maximum length of patterns as a result of user’s choice of Update the Sequence_table with Sequence against each LocationID
ENDLOOP
time interval.
Figure 4. Algorithm Pseudocode to derive sequences from

transaction database in user specified maximum length
Suppose the sequence database is as in TABLE II. Now the

task is to enumerate frequent sequence space in the user
defined constraints of category CAT2. The method C-Rough
Set Partitioning is a divide and conquer strategy.
We scan the database once and store all the data in the
attribute set of events into two datastructures. One is the
domain of set E containing all unique sequences and itemset
in S.
Step 1: Now to find frequent items, we query all unique
Figure 1. Patterns with constriant
itemsets and store them in a set Î . We partition the set V in
a way that all sequence and subsequence with the same
prefix are stored in one equivalence class. Thus each
element in Î has an corresponding equivalence class
partition in V. Considering the sequence database in
TABLE II, the partitions in V are given in Figure 5.0.
(7)
Step 2: We maintain an array of frequencies which is of the

size of the set V. The following steps explain the support
counting process in the indiscernibility mapping:
Step 2.1: For all tuples in sequence database S,
Step 2.2: Deduce subsequences, check if the subsequence is
a superset of pattern β.
Step 2.3: Each element subsequence found accounts for an
increment in the element frequency at appropriate index in
partition and one increment to its subset, the process
continues till all elements of S are considered. The process
of mapping item constriant is the same as that of super
pattern constraint.
Case 2: Suppose we desire to impose the regular expression
constriant characterized by a disjunction operator.
(8)
This can be imposed by retricting the indiscernibility
mapping to patterns
(9)
Case 3: Consider an example of fault pattern mining in
Figure 5. Partitions in V on basis of prefix indiscernibility telecom landline access networks. Often the user wants to
mine support of pattern within pre-specified time interval
Lemma 1: All equivalence classes formed by patterns with with specific items of intrest embedded in the sequence.
the same prefix form a partition in the database under study. This types of dirty constriants cannot be handled by
Proof: An equivalence class [12] is formed by elements PrefixSpan based methods or even the class of SPRINT
which can be treated as equivalent in some way. An based methods reder inadequate for handling such
equivalence relation on a set forms a natural partitioning combination of constriants. With C-RSP such constraints
into groups of like objects. From the theory of rough sets can be easily build into sequence mining task. In above
[13] given a knowledge base U, a concept is a relation which example, the time constraint can be imposed at the level of
forms a partition of a certain universe in families. If transaction database and pattern existance and support
C = {y1 , y 2 ,............yn } such that the following conditions counting is build onto the sequence database.
are satisfied yi ⊆ U :
4. Results and discussion
Condition 1: yi ≠ φ
We have compared the effiiciency of C-RSP with SRINT(N)
yi ∩ y j = φ
Condition 2: naïve. It was found to be more than 10 times faster than
Uy i = U ' for i ≠ j i,j=1,2,.....n SPRINT. Figure [6][7][8] give runtime comparison of C-
Condition 3: RSP with SPRINT by imposition of time interval and length
V = U Vs constriant represtively. Figure [6] give comparitive
Given the domain S
of all sequence present in the efficiency on impotion of time constriant on real data of
database under study, V can be partitioned on the basis of network fault patterns in telecom landline networks of
equivalence classes yi such that each yi contains patterns Madhya Pradesh in India. The time period of data was
with the same prefix. Clearly condition 1 is satisfied since considered by the knowledge worker as three months. The
each element of V will be a member of some yi . Condition algorithm C-RSP is implemented in JDK1.3. The
2 is satisfied since no two elements in V with the same preprocessing step is a java program which connects to
prefix will be different equivalence classes. Since all database as in TABLE 1 and invokes a PLSQL cursor which
members of V with different prefix are in some equivalence creates TABLE II. The entire process is undertaken using
class union of all equivalence class should result in V. java database connectivity interface. It connects to the
U yi = V for i ≠ j i,j=1,2,.....n database in MSSQL Server 2005 as in TABLE II and
Now the database is in good form for impostion of various fetches the data into data structures using jdbc. The machine
constriants of CAT2, item constraint, super pattern used is HP Proliant DL580G5 with Intel Xeon CPU 1.6
constriant, regular expression constraint and other complex GHZ processor with 8 GB RAM. The operating system is
constriants. Ms Windows Server 2003 R2. The data comprised of 75833
Case 1: Suppose the user want to find all frequent sequences records with voice related gross faults collected over a time
that have pattern in them, the algorithm finds patterns window of three months. There are 215 distinct elements in
which are indiscernible on the basis of pattern existance. the sequence and maximum length of the sequence is 14.
The algoithm SPRINT is also programmed onthr same
machine using jdk1.3. The time contraint imposition is done
at the level of generating candidates. Only those candidates the same recursively. There is no candidate generation since
are considered in the support counting process in subsequent we are only fetching data into data structures and applying
scan of data which satisfy the specified time constraints. computation logic on the same. The method C-RSP requires
only one to two scans of the database while SPRINT
recursively scans the databases and works on candidate
generate test strategy. The constriant impostion strategies
allow impostions of individual and composite constraints.
5. Conclusion
The following are the benefits of proposed model:
(i) Since support counting is usually the most costly

step in sequential pattern mining, proposed
technique improves the performance greatly by
avoiding costly scanning. Also the algorithm is
strictly based on elements that exist in the database
inder study. The partitions once constructed and
stored can be used to mine further data increments
Figure 6. Runtime evaluation on real data of network faults in the database.
on imposition of time constriant (ii) The creation of equivalence classes by
indiscernibility relation greatly reduces the search
Other experiments on efficiency are performed on data space. Especially with impostion of CAT2
similar to data generated by synthetic data generation constraints, the search space is restristed to specific
program at http://www.almaden.ibm.com/cs/quest. The eqivalence class.
following are the descriptions of the parameters of the (iii) The dynamic frequency accumulation sceme in
dataset. each partition saves computaiton time.
|D| size of the database (number of customers) (iv) While other methods search the whole search
|C| Average number of transactions per customer space, our method partitions the problem into
|I| Average size of itemset in maximal potentially large subproblems.
sequence (v) The categorization of constriants enables flexible
|N| Number of items and adjustable constraint imposition scheme on
various data representations.
Here we have imposed the constriant on the maximum (vi) Based on experimental results obtained and
length of the pattern. The maximum length of the pattern is depicted in graphs, we conclude that C-RSP is
retricted to 14 in the dataset under consideration. atleast 10 times faster than SPRINT.
References
[1] R. Agrawal and R. Srikant, “Mining Sequential
Patterns", In Proceeding of International Conference in
Data Engineering pp:3-14, 1995.
[2] Manilla, H. Toivonen H. and Verkamo A. I.
“Discovering frequent episodes in sequences.” In
proceeding of International Conference on Knowledge
Discovery and Data Mining, IEEE Computer Society
Press 1995 pp:210-125, 1995 .
[3] R. Srikant and R. Agrawal, “Mining sequential patterns:
Generalizations and performance improvements.” In
Proc. 5th Int. Conf. Extending Database Technology
(EDBT’96), pp: 3-17, Avignon, France, March 1996.
[4] Jay Ayres, Johannes Gehrke, Tomi Yiu,& Jason
Figure 7. Runtime evaluations of synthetic data on Flannick, “Sequential Pattern Mining using A Bitmap
impostion of length constraint Representation”, In Proc. 2002 of the eighth ACM
SIGKDD international conference on Knowledge
It is clear from the above graphs that C-RSP outperforms the discovery and data mining Edmonton, Alberta Canada
SPRINT family of methods by an order of magnitude. This pp: 429 – 435, 2002.
is due to partitions of search space and impostion of
constriant at the preprocessing level and avoiding validity of
[5] Zaki M. J. “SPADE-An efficient algorithm for mining Authors Profile

frequent sequences.” Machine Learning 42,1/2, pp:31-
60, 2001. Jigyasa Bisaria is a faculty and research fellow with the
[6] Jian pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Department of Mathematics Maulana Azad National Institute of
Wang, Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu, Technology.Bhopal India. Her research interests are predictive
“Mining Sequential Patterns-Growth: The PrefixSpan data mining and its applications to real world problems.
Approach”, IEEE Transactions on Data and knowledge
engineering vol 16,no.11,pp:1424-1440, 2004. Dr. Namita Srivastava is working as Assistant Professer with the
[7] Yen Liang Chen ,Mei Ching Chiang, Ming- Tat Ko, Department of Mathematics, Maulana Azad National Institute of
Technology. She obtained her PhD. in Mathemetics in 1992 in
“Discovering time-interval sequential patterns in
crack problem. Her current research interest are data mining and
sequence databases” Expert systems with applications its applications.
25, pp:343-354, 2003.
[8] Ding-An Chiang, Yi-Fan Wang, Shao-Lun Lee,Cheng- Dr. Kamal raj Pardasani is working as Professor and Head with
Jung Lin, “Goal-oriented sequential pattern for network the Department of Mathematics and Dean Research and
banking churn analysis”, Expert Systems with Development Maulana Azad National Institute of Technology,
Applications (25), pp:293–302, 2003. Bhopal. He did his PhD. in applied Mathematics in 1988.
[9] Sasisekharan, R., Seshadri, V., Weiss, S. “Data mining His current research interests are computational biology, data
and forecasting in large-scale telecommunication warehousing and mining, bio-computing and finite element
modeling.
networks.” IEEE Expert 11 (1), pp:37-43, 1995.
[10] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, “Mining
Access Patterns Efficiently from Web Logs.” In Proc.
Pacific-Asia Conf. on Knowledge Discovery and Data
Mining (PAKDD'00), Kyoto, Japan pp: 396-407, 2000.
[11] Huidong Jin, Jie Chen, Hongxing He, Graham J.
Williams, Chris Kelman and Christine M. O’Keefe,
“Mining Unexpected Temporal Associations:
Applications in Detecting Adverse Drug Reactions”,
IEEE Transactions on Information Technology in
Biomedicine, Volume 12, Issue 4, pp:488–500 July
2008.
[12] Jigyasa Bisaria, namita srivastava, K. R. Pardasani, A
Rough Sets Partitioning Model for Mining Sequential
Patterns with Time Constraint, International Journal of
Computer science and information security Vol 2. No 1.
pp: 178-189 June 2009.
[13] Z.Pawlak, “Rough Sets, Theoretical Aspects of
Reasoning about data” Springer, 1991.
[14] Jian Pei, Jiawei Han, WeiWang, “Constraint-based
sequential pattern mining: the pattern-growth
methods” Journal of Intelligent Information
Systems 28. pp:133–160, 2007.
[15] F Masseglia, P Poncelet, M Teisseire, “Efficient
mining of sequential patterns with time constraiant:
reducing the combinations”, Expert systems with
applications Elsevier, Vol. 40, N. 3, 29 pp : 2677-
2690, 2008.
[16] Minos N. Garofalakis, Rajeev Rastogi,Kyuseok
Shim, SPRINT: Sequential PatternMining with
Regular Expression Constraints, Proceedings of the
25th VLDB Conference, Edinburgh, Scotland 1999
[17] Laksmanan, Han, Raymond T, Constraint based
multidimensional mining SIGKDD 2006.
[18] H. R. Lewis and C. Papadimitriou. “Elements of the
Theory of Computation”. Prentice Hall, Inc.,
1981.

A Rough Set Model For Sequential Pattern Mining With Constraints

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A Rough Set Model For Sequential Pattern Mining With Constraints

Diunggah oleh

Hak Cipta:

Format Tersedia

16 (IJCNS) International Journal of Computer and Network Security,

Vol. 1, No. 2, November 2009

A Rough Set Model for Sequential Pattern Mining

jigyasab@gmail.com, sri.namita@gmail.com, kamalrajp@hotmail.com

transactions have to be either greater or smaller than a

Figure 4. Algorithm Pseudocode to derive sequences from

Suppose the sequence database is as in TABLE II. Now the

Step 2: We maintain an array of frequencies which is of the

(i) Since support counting is usually the most costly

[5] Zaki M. J. “SPADE-An efficient algorithm for mining Authors Profile

Anda mungkin juga menyukai