Anda di halaman 1dari 34

Database 12c Row

Pattern Matching
Beating the Best Pre-12c Solutions
[CON3450]
Stew ASHTON
Oracle OpenWorld 2014

Photo Opportunity
Presentation available on
http://www.slideshare.net/stewashton/rowpatternmatching12coow14

For exact link:


See @StewAshton on Twitter
Or see http://stewashton.wordpress.com

Agenda
Who am I?
Pre-12c solutions compared to row pattern
matching with MATCH_RECOGNIZE
For all sizes of data
Thinking in patterns

Watch out for catastrophic backtracking


Other things to keep in mind (time permitting)
OOW CON3450, Stew
Ashton 3

Who am I?
33 years in IT
Developer, Technical Sales Engineer, Technical Architect
Aeronautics, IBM, Finance
Mainframe, client-server, Web apps

25 years as an American in Paris


9 years using Oracle database
Performance analysis
Replace Java with SQL

2 years as internal Oracle Development Expert


OOW CON3450, Stew
Ashton 4

1) Fixed Difference
Identify and group rows with
consecutive values
My presentation: print slides to keep
Math: subtract known consecutives
If A-1 = B-2 then A = B-1
Else A <> B-1
Consecutive becomes equality,
non-consecutive becomes inequality

PAG E
1
2
3
5
6
7
10
11
12
42

OOW CON3450, Stew


Consecutive = fixed difference of
1 Ashton 5

1) Pre-12c
select m in(page) fi
rstpage,
m ax(page) lastpage,
count(*) cnt
FRO M (
SELECT page,
p ag e
R ow _N u m b er() over(ord er b y
p ag e)
as g rp _id
FRO M t
)
G R O U P BY g rp _id ;

PA
R PS_I
FIG
R S TPA GLA
TPA C N
E [R G
NE
]
DG E
T
1
11
0 3
3
2
25
0 7
3
3
3
0 12
10
3
5
4
1 42
42
1
6
5
1
7
6
1
10
7
3
11
8
3
12
9
3
42 10
32
OOW CON3450, Stew
Ashton 6

Think match a row pattern


PATTERN
Uninterrupted series of input rows
Described as a list of conditions (regular expressions)
PATTERN (A B*)
"A" : 1 row, "B" : 0 or more rows, as many as possible

DEFINE each row condition


[A undefi
n ed = TRU E]
B AS page = PREV(page)+ 1

Each series that matches the pattern is a match


"A" and "B" identify the rows that meet their conditions
OOW CON3450, Stew
Ashton 7

Input, Processing, Output


1.
2.
3.
4.
5.
6.

Define input
Order input
Process pattern
using defined conditions
Output: rows per match
Output: columns per
row
7. Go where after match?

SELEC T *
FRO M t
M ATC H _REC O G N IZE (
O RD ER BY page
M
EASU RES
PATTERN
(A B*)
page
rstpage,
D A.
EFI
N E Bfi
AS
page = PREV(page)+ 1
page,
O LAST(
N E ROpage)
W PERlast
M ATCH
O U N RES
T(*) cnt
M CEASU
O A.
NE
RO Wfi
M ATCH
page
rPER
stpage,
AFTER
ATC Hlast
SKI
P PAST LAST RO W
LAST(M
page)
page,
PATTERN
(A cnt
B*)
C O U N T(*)
D
EFIN EMBATC
ASHpage
PREV(
page)
AFTER
SKIP=PAST
LAST
RO+W1
);
OOW CON3450, Stew
Ashton 8

1) Run_Stats comparison
For one million
rows:
S tat

M atch _
P re 12c
R

P ct

100
%
101
Elapsed Tim e
5.51
5.56
%
Latches are serialization devices: fewer means
101
more
scalable
OOW
CON3450,
CPU used by this session
5.5
5.55 AshtonStew9
%
Latches

4090

4079

1) Execution Plans
O p eration
SELECT STATEM EN T
Id
O p eration
H ASH
G RO U P BY
0 SELECT STATEM EN T
EWG RO U P BY
1VI
H ASH
2 VIEW
W SO RT
3 WW I
INN
D OD
W O
SO RT
4
TABLE ACCESS FU LL
TABLE ACCESS FU LL
Id

O p eration

STATEM
EN T
O01pSELECT
erati
on
VIEW
M ATCH RECO G N IZE SO RT D ETERM IN ISTIC FIN ITE
AU TO
3 TABLE ACCESS FU LL

N am e

S tarts

T
N am e

E-R ow s
1
1 1000K
1 1000K
1 1000K
1 1000K

S tarts

E-R ow s
1
1 1000K

AR ow s
400K
400K
1000K
1000K
1000K
AR ow s
400K
400K

A -Tim e
00:00:01.83
00:00:01.83
00:00:12.69
00:00:03.46
00:00:02.53
A -Tim e
00:00:03.45
00:00:03.45

1 1000K 400K
00:00:01.87
SELECT STATEM EN T
T
1 1000K 1000K 00:00:02.09
VIEW
M ATCH RECO G N IZE SO RT D ETERM IN ISTIC FIN ITE
AU TO
2

U sed M em

U sed B uf f
e rs O M e 1M em
M em
40M
(
0)
m
1594
5035K 40M (0)
1594 41M
1594

20M
(0)
1594 22M
1749K
20M (0)
1594

U sed B uf f
e rs O M e 1M em
M em
m U sed 1594

1594 M em

1594 22M
1749K 20M (0)

1594

(0)Stew
OOW20M
CON3450,
Ashton 10

2) Start of Group
Identify group boundaries, often using LAG()
3 steps instead of 2:
1. For each row: if start of group, assign 1
Else assign 0
2. Running total of 1s and 0s produces a group
identifier
3. Group by the group identifier
OOW CON3450, Stew
Ashton 11

2) Requirement
G R O U P _N
AM E
X
X
X
X
X
X
Y
Y
Y

EFF_D ATE
TER M _D ATE
2014-01-01 00:00 2014-02-01 00:00
2014-03-01 00:00 2014-04-01 00:00
2014-04-01 00:00 2014-05-01 00:00
2014-06-01 00:00 2014-06-01 01:00
2014-06-01 01:00 2014-06-01 02:00
2014-06-01 02:00 2014-06-01 03:00
2014-06-01 03:00 2014-06-01 04:00
2014-06-01 04:00 2014-06-01 05:00
2014-07-03 08:00 2014-09-29 17:00

Merge contiguous date ranges in


same group

OOW CON3450, Stew


Ashton 12

w ith grp_starts as (
select a.*,
case w hen start_ts =
lag(end_ts) over(
partition by group_nam e
order by start_ts
)
then 0 else 1 end grp_start
from t a
),grps as (
select b.*,
sum (grp_start) over(
partition by group_nam e
order by start_ts
) grp_id
from grp_starts b)
select group_nam e,

XX 01-01
XX 03-01
XX 0604-01
YX 06-01
YX 0706-03
01
X 06-01
Y 06-01
Y 06-01
Y 07-03

00:00 02-01
00:00 0405-01
00:00 0506-01
00:00 06-01
03:
01:00 0608:
09-01
29
02:00 06-01
03:00 06-01
04:00 06-01
08:00 09-29

00:00 1
00:00 1
00:
03:00 0
01:
05:00 1
02:
17:00 0
03:00 0
04:00 1
05:00 0
17:00 1

1
2
2
3
3
3
1
1
2

OOW CON3450, Stew


Ashton 13

2) Match_Recognize
New this time:
Added PARTITION
BY
MEASURES
added gap using
row outside the
match!
ONE ROW PER
MATCH
and
solution
SKIP One
PAST LAST
ROW simple!

SELECT * FRO M t
M ATCH _RECO G N IZE(
PA R TITIO N BY group_nam e
O RD ER BY start_ts
M EASU RES
A.start_ts start_ts,
end_ts end_ts,
n ext(start_ts) - end_ts gap
PATTERN (A B*)
D EFIN E B AS start_ts = prev(end_ts)
);

replaces two methods:OOW CON3450, Stew


Ashton 14

Which row do we mean?


Expression
start_ts
FIRST(start_ts)
LAST(end_ts)
FINAL
LAST(end_ts)
B.start_ts
PREV(), NEXT()
COUNT(*)
COUNT(B.*)

MEASURES
ALL ROWS ONE ROW
last row of
current row
match
First row of match
last row of
current row
match

DEFINE

ORA-62509

last row of match

most recent B row


last B row
Physical offset from referenced row
all rows in
from first to current row
match
OOW CON3450, Stew
B rows including current row
all B rows
Ashton 15

2) Run_Stats comparison
For 500,000
rows:
Stat
Latches
Elapsed Time
CPU used by this session

Pre 12c Match_R


Pct
10165
8066 79%
32,16
20,58 64%
31,94
19,67 62%
OOW CON3450, Stew
Ashton 16

2) Execution Plans
Operation
SELECT STATEMENT
HASH GROUP BY
VIEW
WINDOW BUFFER
VIEW
WINDOW SORT
TABLE ACCESS FULL
Operation
SELECT STATEMENT
VIEW
MATCH RECOGNIZE SORT DETERMINISTIC FINITE
AUTO
TABLE ACCESS FULL

Used-Mem

20M (0)

32M (0)

27M (0)

Used-Mem

27M (0)
OOW CON3450, Stew

Ashton 17

2) Predicate pushing
Select * from < view > w here group_nam e = 'X'
O p eration
SELECT STATEM EN T
VIEW
M ATC H RECO G N IZE SO RT
D ETERM IN ISTIC
FIN ITE AU TO
TABLE ACCESS BY IN D EX RO W ID
BATC H ED
IN D EX RAN G E SCAN

N am
e

AB uf f
e rs
R ow s
3
4
3
4

TI

OOW CON3450, Stew


Ashton 18

3) Bin fitting: fixed size


S TU D Y _S I
TE
1001
1002
1004
1008
1011
1012
1014
1015
1017
1018
1020
1022
1023

CN T
3407
4323
1623
1991
885
1159
7
1989
5282
2841
5183
6176
2784
2586

S TU D Y _S I
TE

1026

1028

1029

1031

1032

CN T
137
6005
76
4599
1989

1034 3427

1036 879
1038 6485
1039
3
1040 1105
1041 6460
1042 968

1044

471

Requirement
Order by study_site
Put in bins with
size = 65,000 max
FIR S T_S IT LA S T_S IT S U M _C
E
E
NT
1001
1022 48081
1023
1044 62203
1045
1045
3360
OOW CON3450, Stew
Ashton 19

SELECT s fi
rst_site,M AX(e) last_site,M AX(sm ) sum _cnt FRO M (
SELECT s,e,cnt,sm FRO M t
M O D EL
D IM EN S IO N BY (row _n u m b er() over(ord er b y stu d y_site) rn )
M EASU RES (study_site s,study_site e,cnt,cnt sm )
RU LES (
smrn
[ > 1] =
CASE W [Hcv(
EN) sm
1]
[cv()[cv(
1])]
+ cnt[cv()]
[cv(
> 65000
)]
O R cnt[cv()] > 65000
TH EN[cv(
cnt)[]cv()]
ELSE[cv(
sm )[cv(
1]
) - 1][+
cv(
cnt
)] [cv()]
EN D ,
s[rn > 1] =
CASE W [Hcv(
EN) sm
- 1][cv()[cv(
- 1])]+ cnt[cv()[]cv(
> )
65000
]
O R cnt[cv()] > 65000
TH EN
[cv(
s[)cv(
] )]
DIMENSION with row_number
ELSE
[cv(
s[cv(
) )1]
- 1]
orders data and processing
EN D
rn can be used like a subscript
)
)
cv() means current row
G RO U P BY s;
cv()-1 means previous row

20

New this time:


PATTERN
(A+) replaces (A
B*)
means 1 or more
rows
Why? In previous
examples I used
PREV(), which
returns NULL on
One
solution
the first
row.

simpler!

SELECT * FRO M t
M ATCH _RECO G N IZE (
O RD ER BY study_site
M EASU RES
FIRST(study_site) fi
rst_site,
LAST(study_site) last_site,
SU M (cnt) sum _cnt
PATTERN (A+ )
D EFIN E A AS SU M (cnt) < = 65000
);

replaces 3 methods:
OOW CON3450, Stew
Ashton 21

3) Run_Stats comparison
For one million
rows:
S tat
Latches
Elapsed Tim e
CPU used by this session

P re 12c
357448
32.85
31.31

M atch _
R
4622
2.9
2.88

P ct
1%
9%
9%

OOW CON3450, Stew


Ashton 22

3) Execution Plans
Id
0
1
2
3
4
5
Id
0
1
2
3

O p eration
U sed -M em
SELECT STATEM EN T

H ASH G RO U P BY
7534K (0)
VIEW

SQ L M O D EL O RD ERED
105M (0)
W IN D O W SO RT
27M (0)
TABLE ACCESS FU LL

O p eration
U sed -M em
SELECT STATEM EN T

VIEW

M ATCH RECO G N IZE SO RT D ETERM IN ISTIC FIN ITE AU TO


27M (0)
TABLE ACCESS FU LL

OOW CON3450, Stew


Ashton 23

4) Bin fitting: fixed number


N am
V al
e
1
1

B IN B IN B IN
1
2
3
10 10

V al

10

10

10

15

10

15

15

15

15

15

19

15

15

19

18

15

19

18

17

Requirement
Distribute values in 3
bins as equally as
possible

Best fit decreasing


Sort values in
decreasing order
Put each value in least
full bin
OOW CON3450, Stew
Ashton 24

4) Brilliant pre 12c solution


SELECT bin,M ax (bin_value) bin_value
FRO M (
SELECT * FRO M item s
M O D EL
D IM EN SIO N BY
(Row _N um ber() O VER
(O RD ER BY item _value D ESC) rn)
M EASU RES (
item _nam e,
item _value,
Row _N um ber() O VER
(O RD ER BY item _value D ESC) bin,
item _value bin_value,
Row _N um ber() O VER
(O RD ER BY item _value D ESC) rn_m ,
0 m in_bin,
Count(*) O VER () - 3 - 1 n_iters

OOW CON3450, Stew


Ashton 25

SELECT * from item s


M ATCH _RECO G N IZE (
O RD ER BY item _value desc
M EASU RES
sum (bin1.item _value) bin1,
sum (bin2.item _value) bin2,
sum (bin3.item _value) bin3
PATTERN
PAT TER N((((b
bin1|
inbi
1|n2|
b inbi
2|
n3)
b in
+ 3)+
) )
D EFIN E
bi
bi
n1
n 1AS
A Scount
cou n(bi
t(b
n1.
in*)
1.=*)1= 1
OORR sum
su m(bi
(bn1.
in 1.
item
item
_val
_val
ue)u-bi
e)n1.item _value
b in 1.
<item
= least
_val
(u e
< =sum
least(
(bin2.item _value),
sum
su m(bi
(bn3.
in 2.
item
item
_val
_val
ue)u e),
),su m (b in 3.item _valu e)
bin2 AS
), count(bin2.*) = 1
bOinR2sum
A S (cou
bin2.
ni
t(b
temin_val
2.*)
ue)
= -bi
1 n2.item _value
O<R= su
sum
m (b
(bi
in
n3.
2.ii
ttem
em _val
_val
ue)
u e));in 2.item _valu e
b
< = su m (b in 3.item _valu e)

()+ = 1 or more of
whatever is inside
'|' = alternatives,
preferred in the
order specified
Bin1 condition:
No rows here yet,
Or this bin least
full
Bin2 condition
No rows here yet,
Stew
or OOW CON3450,
Ashton 26

4) Run_Stats comparison
For 10,000 rows:
S tat
Latches
Elapsed Tim e
C PU used by this session

M atch _
R
3124
47
28
0.02
26.39
0.03

P re 12c

P ct
2%
0%
0%

OOW CON3450, Stew


Ashton 27

4) Execution Plans
Id
0
1
2
3
4
5
Id
0
1
2
3

O p eration
SELEC T STATEM EN T
H ASH G RO U P BY
VIEW
SQ L M O D EL O RD ERED
W IN D O W SO RT
TABLE AC CESS FU LL
O p eration
SELEC T STATEM EN T
VIEW
M ATCH REC O G N IZE
SO RT
TABLE ACC ESS FU LL

U sed -M em

817K (0)

1846K (0)
424K (0)

U sed -M em

330K (0)

OOW CON3450, Stew


Ashton 28

Backtracking
What happens when there is no match???
Greedy quantifiers - * + {2,}
are not that greedy
Take all the rows they can, BUT
give rows back if necessary one at a time

Regular expression engines will test all


possible combinations to find a match
OOW CON3450, Stew
Ashton 29

Repeating conditions
select 'm atch'from (
select leveln from dual
connect by level< = 100
)
m atch_recognize(
pattern(a b * c)
defi
n e b as n > prev(n)
,c as n = 0
);

select 'm atch'from (


select leveln from dual
connect by level< = 100
)
m atch_recognize(
pattern(a b * b * b * c)
defi
n e b as n > prev(n)
,c as n = 0
);

Runs in 0.005 secs

Runs in 5.4 secs


OOW CON3450, Stew
Ashton 30

Imprecise Conditions
CREATE TABLE Ticker (
SYM BO L VARCH AR2(10),
tstam p D ATE,
price N U M BER
);
insert into ticker
select 'ACM E',
sysdate + level/24/60/60,
10000-level
from dual
connect by level< = 5000;

SELECT
SELECT *
* FRO
FRO M
M Ti
Ticker
cker
M
_RECO G
M ATCH
ATCH _RECO
GN
NI
IZE
ZE (
(
PARTI
TIO
ON
N BY
BY sym
sym bol
bol
PARTITI
O
BY t
tst
stam
am p
p
O RD
RD ER
ER BY
M
FIRST(
RST(t
tst
stam
am p)
p) AS
_tst
stam
am p,
p,
M EASU
EASU RES
RES FI
AS st
start
art_t
LAST(t
tst
stam
am p)
p) AS
AS end_t
end_tst
stam
am p
p
LAST(
AFTER
SKIP
P TO
TO LAST
LAST U
UP
P
AFTER M
M ATCH
ATCH SKI
PATTERN
STRT D
N+
+ U
WN
+ U
P+ )
PATTERN (
(STRT
DO
OW
WN
U P+
P+ D
DO
OW
N+
U P+
)
D
EFIN
DO
AS pri
< PREV(
price)
ce),
,
D EFI
NE
ED
OW
WN
N AS
price
ce <
PREV(pri
UP
P AS
AS pri
price
ce >
> PREV(
PREV(pri
price)
ce),
U
); S TR T A S p rice > = n vl(P R EV (P R IC E),0)
)
;
Runs
in 24 seconds
Runs in 0.0213
seconds
INMEMORY:
seconds
31

Keep in Mind
Backtracking
Precise conditions
Test data with no matches

To debug:
Measures classifier() cl,
match_number() mn
All rows per match with
unmatched rows

No DISTINCT, no LISTAGG
MEASURES columns must
have aliases
Reluctant quantifier
= ? = JDBC bind variable
Pattern variables are
range variables, not bind
variables
OOW CON3450, Stew
Ashton 32

Output Row shape


Per
Match
ONE ROW

PARTITIO
N BY
X

ALL ROWS

ORDER MEASUR
BY
ES
Omitted
X
X

Other
input
omitted
X

ORA-00918,
anyone?
OOW CON3450, Stew
Ashton 33

Questions?

More details at:


stewashton.wordpress.com
34

Anda mungkin juga menyukai