Anda di halaman 1dari 25

Sequential Pattern Mining

Sequential Pattern

Bahasan
1. Pendahuluan
2. Sequence
3. Sequential Pattern Mining

Sequential Pattern

Sequence
Sebuah sequence adalah urutan dari elemen-elemen
(transaksi)
s = < e 1 e2 e 3 >
Setiap elemen terdiri dari kumpulan kejadian-kejadian (item)
ei = {i1, i2, , ik}
Setiap elemen merupakan atribut yang dihubungkan dengan
suatu lokasi atau waktu tertentu (spesifik)
Panjang Sequence, |s|, adalah banyaknya unsur-unsur
sequence yang diberikan.
A k-sequence adalah sebuah sequence yang terdiri dari k
kejadian (item)
Sequential Pattern

Sequence
Database
Sequence

Sequence

Elemen
(Transaksi)

Kejadian
(Item)

Customer

Transaksi-transaksi
penjualan yang dilakukan
oleh konsumen tertentu

Item item yang dibeli


konsumen dalam waktu t.

Buku, diary Produk,


CD, dll.

Web Data

Aktifitas browsing pada


pengunjung web tertentu

Sekumpulan File-file yang


dilihat pengunjung web
setelah melakukan proses
single mouse click

Home page, index


page, contact info, dll

Event data

Kejadian kejadian yang


dihasilkan oleh sensor
tertentu

Kejadian-kejadi yang
timbul dari sensor saat
waktu t

Jenis-jenis
tanda(alarm) yang
dihasilkan oleh sensor

Genome
sequences

DNA sequence dari spesies


tertentu

Elemen dari DNA


sequence

Bases A,T,G,C

Sequential Pattern

Sequence
Web sequence:
< {Homepage} {Electronics} {Digital Cameras}
{Canon Digital Camera} {Shopping Cart} {Order
Confirmation} {Return to Shopping} >
Sequence kejadian kecelakaan yang disebabkan oleh
ledakan nuklir pada 3-mile Island:
(http://stellarone.com/nuclear/staff_reports/summary_SOE_the_initiating
_event.htm)
< {clogged resin} {outlet valve closure} {loss of
feedwater}
{condenser polisher outlet valve shut} {booster pumps
trip}
{main waterpump trips} {main turbine trips} {reactor
pressure increases}> Sequential Pattern

Mining Sequential Patterns


(1)
Sequential patterns are ordered list
of itemsets.
Market basket example:
Customers typically rent star wars
then empire strikes back then return
of the Jedi
Fitted sheets and pillow cases then
comforter then drapes and ruffles

4/3/01

CS632 - Data Mining

Mining Sequential Patterns


(2)
Looks at sequences of transactions
as opposed to a single transaction.
Groups transactions based on
customer ID.
Customer sequence.

4/3/01

CS632 - Data Mining

Sequence
Definisi Subsequent
Sebuah sequence <a1 a2 an> terdapat dalam
sequence lain <b1 b2 bm> (m n) jika terdapat
integer i1 < i2 < < in maka a1 bi1 , a2 bi1, , an bin
Data sequence
Subsequence
Contain?
< {2,4} {3,5,6}
< {2} {3,5} >
Yes
{8} >
< {1,2} {3,4} >
< {1} {2} >
No
< {2,4} {2,4}
< {2} {4} >
Yes
{2,5} subsequence
>
Support
w didefinisikan sebagai
bagian dari data sequence yang berisi w
Sequential pattern adalah subsequence yang
sering muncul (yaitu, support subsequence
Sequential Pattern
minsup)

Sequential Pattern Mining


Definisi
Terdapat:
Database sequence
Minimum menetapkan user yang
mendukung(support), minsup
Task:
Menemukan semua subsequence dengan user
yang mendukung minsup

Sequential Pattern

Sequential Pattern Mining


Algoritma Sequential Pattern Mining
1. Sort Phase
2. Large Itemset Phase
3. Transformation Phase
4. Sequence Phase
5. Maximal Phase

Sequential Pattern

10

Sequential Pattern Mining


customerID

itemBought

1
1
1

10,50
20
30
40

2
2
2
2

10
30
40
30,50

3
3
3
3

10
20
30
40

4
4
4

10
30
50

5
5

40
50

Contoh Kasus 1

Sequential Pattern

11

Sequential Pattern Mining


1.Sort Phase
Mengurutkan
berdasarkan
customerID sebagai
major key

customer
ID

itemBoug
ht

1
1
1
1

10,50
20
30
40

2
2
2
2

10
30
40
30,50

3
3
3
3

10
20
30
40

custome
rID

Customer
Sequence

<(10,50) (20) (30)


(40)>

<(10) (30) (40)


(30,50)>

4
4
4

10
30
50

<(10) (20) (30)


(40)>

5
5

40
50

<(10) (30) (50)>

Sequential Pattern

12

Sequential Pattern Mining


Itemse Suppor
t
t

2. Large Itemset
Phase
Menentukan Large
Itemset
Memetakan Itemset
custome Customer

(10)

(20)

(30)

(40)

(50)

min_sup =
40%
40% x 5 = 2
customer
sequence

rID

Sequence

(10,50)

<(10,50) (20) (30)


(40)>

(30,50)

<(10) (30) (40)


(30,50)>

Large
Itemset

Dipetakan
ke-

<(10) (20) (30)


(40)>

(10)

(20)

Catatan:
4
<(10) (30) (50)>
(30)
Sehingga
apabila
terdapat
5
<(40) (50)>
(40)
itermset (10,50),
Memenuhi minimum support
(50)
akan dikodekan 6, dst
Sequential Pattern

3
4
5
13

Sequential Pattern Mining


3. Transformation
Phase
Menghapus non-Large Itemset
Memetakan Large Itemset ke suatu integer
Customer
ID

Original
Sequence

Transformed Customer
Sequence

Setelah
Pemetaan

<(10,50) (20) (30)


(40)>

{(10) (50)} {(20)} {(30)}


{(40)}

{1, 5} {2} {3}


{4}

<(10) (30) (40)


(30,50)>

{(10)} {(30)} {(40)} {(30)


(50)}

{1} {3} {4} {3,


5}

<(10) (20) (30)


(40)>

{(10)} {(20)} {(30)}


{(40)}

{1} {2} {3}


{4}

<(10) (30) (50)>

{(10)} {(30)} {(50)}

{1} {3} {5}

<(40) (50)>

{(40)} {(50)}

{4} {5}

Sequential Pattern

14

Algorithm: Transformation (contoh)

4/3/01

CS632 - Data Mining

15

Sequential Pattern Mining


4. Sequential Phase
Menggunakan set Large Itemset, untuk
mencari hasil sequence tertentu
Dua jenis algoritma
Count-All
1. Algoritma AprioriAll
. Count-Some
1. Algoritma AprioriSome
2. Algoritma DynamicSome

Sequential Pattern

16

Sequential Pattern Mining


Customer
Sequence
{1, 5} {2} {3} {4}
{1} {3} {4} {3, 5}
{1} {2} {3} {4}
{1} {3} {4}
{4} {5}

Sequential Pattern

17

Sequential Pattern Mining


{1, 5} {2} {3} {4}

{1} {3} {4} {3,


5}
{1} {2} {3} {4}
{1} {3} {5}
{4} {5}

Sequential Pattern

Large 1Sequence

Sequenc Supp
e
ort
<1>
<2>

4
2

<3>
<4>
<5>

4
4
4

18

Sequential Pattern Mining


Large 2Sequence
{1, 5} {2} {3} {4}

{1} {3} {4} {3,


5}
{1} {2} {3} {4}
{1} {3} {5}
{4} {5}

Sequenc Supp
e
ort
<1 2>

<1 3>
<1 4>
<1 5>

4
3
2

<2
<2
<3
<3

3>
4>
4>
5>

2
2
3
2

<4 5>

Sequential Pattern

19

Sequential Pattern Mining


{1, 5} {2} {3} {4}

{1} {3} {4} {3,


5}
{1} {2} {3} {4}
{1} {3} {5}
{4} {5}

Sequential Pattern

Large 3Sequence

Sequenc Supp
e
ort
<1 2 3>
<1 2 4>

2
2

<1 3 4>
<1 3 5>
<2 3 4>

3
2
2

20

Sequential Pattern Mining


{1, 5} {2} {3} {4}

{1} {3} {4} {3,


5}
{1} {2} {3} {4}
{1} {3} {5}
{4} {5}

Sequential Pattern

Large 4Sequence

Sequenc Supp
e
ort
<1 2 3
4>

<1 3 4
5>

21

Sequential Pattern Mining


5. Maximum Phase
S, set seluruh Large Itemset
n, merupakan jarak terpanjang
sequence

Sequential Pattern

22

Sequential
Pattern
Mining
Sequenc
Supp
Sequenc Supp
e
<1>
<2>
<3>
<4>
<5>
Sequenc
e
<1 2 3>
<1 2 4>
<1 3 4>
<1 3 5>
<2 3 4>

ort
4
2
4
4
4
Supp
ort
2
2
3
2
2

e
<1 2>
<1 3>
<1 4>
<1 5>
<2 3>
<2 4>
<3 4>
<3 5>
<4 5>
Sequenc
e
<1 2 3
Sequential Pattern

ort
2
4
3
3
2
2
3
2
2
Supp
ort
2

23

Terima kasih

Sequential Pattern

Latihan soal
Berikut daftar transaksi pembelian dari customer.
Maka carilah pola urutan pembelian item yang
dilakukan customer.
Customer ID Transaction
Time

Items
Bought

1
1

June 25 93
June 30 93

30
90

2
2
2

June 10 93
June 15 93
June 20 93

10, 20
30
40, 60, 70

June 25 93

30, 50, 70

4
4
4

June 25 93
June 30 93
July 25 93

30
40, 70
90

June 12 93

90

Virtual Memory

Minimum support: 2 ite

25