Pattern Matching
Beating the Best Pre-12c Solutions
[CON3450]
Stew ASHTON
Oracle OpenWorld 2014
Photo Opportunity
Presentation available on
http://www.slideshare.net/stewashton/rowpatternmatching12coow14
Agenda
Who am I?
Pre-12c solutions compared to row pattern
matching with MATCH_RECOGNIZE
For all sizes of data
Thinking in patterns
Who am I?
33 years in IT
Developer, Technical Sales Engineer, Technical Architect
Aeronautics, IBM, Finance
Mainframe, client-server, Web apps
1) Fixed Difference
Identify and group rows with
consecutive values
My presentation: print slides to keep
Math: subtract known consecutives
If A-1 = B-2 then A = B-1
Else A <> B-1
Consecutive becomes equality,
non-consecutive becomes inequality
PAG E
1
2
3
5
6
7
10
11
12
42
1) Pre-12c
select m in(page) fi
rstpage,
m ax(page) lastpage,
count(*) cnt
FRO M (
SELECT page,
p ag e
R ow _N u m b er() over(ord er b y
p ag e)
as g rp _id
FRO M t
)
G R O U P BY g rp _id ;
PA
R PS_I
FIG
R S TPA GLA
TPA C N
E [R G
NE
]
DG E
T
1
11
0 3
3
2
25
0 7
3
3
3
0 12
10
3
5
4
1 42
42
1
6
5
1
7
6
1
10
7
3
11
8
3
12
9
3
42 10
32
OOW CON3450, Stew
Ashton 6
Define input
Order input
Process pattern
using defined conditions
Output: rows per match
Output: columns per
row
7. Go where after match?
SELEC T *
FRO M t
M ATC H _REC O G N IZE (
O RD ER BY page
M
EASU RES
PATTERN
(A B*)
page
rstpage,
D A.
EFI
N E Bfi
AS
page = PREV(page)+ 1
page,
O LAST(
N E ROpage)
W PERlast
M ATCH
O U N RES
T(*) cnt
M CEASU
O A.
NE
RO Wfi
M ATCH
page
rPER
stpage,
AFTER
ATC Hlast
SKI
P PAST LAST RO W
LAST(M
page)
page,
PATTERN
(A cnt
B*)
C O U N T(*)
D
EFIN EMBATC
ASHpage
PREV(
page)
AFTER
SKIP=PAST
LAST
RO+W1
);
OOW CON3450, Stew
Ashton 8
1) Run_Stats comparison
For one million
rows:
S tat
M atch _
P re 12c
R
P ct
100
%
101
Elapsed Tim e
5.51
5.56
%
Latches are serialization devices: fewer means
101
more
scalable
OOW
CON3450,
CPU used by this session
5.5
5.55 AshtonStew9
%
Latches
4090
4079
1) Execution Plans
O p eration
SELECT STATEM EN T
Id
O p eration
H ASH
G RO U P BY
0 SELECT STATEM EN T
EWG RO U P BY
1VI
H ASH
2 VIEW
W SO RT
3 WW I
INN
D OD
W O
SO RT
4
TABLE ACCESS FU LL
TABLE ACCESS FU LL
Id
O p eration
STATEM
EN T
O01pSELECT
erati
on
VIEW
M ATCH RECO G N IZE SO RT D ETERM IN ISTIC FIN ITE
AU TO
3 TABLE ACCESS FU LL
N am e
S tarts
T
N am e
E-R ow s
1
1 1000K
1 1000K
1 1000K
1 1000K
S tarts
E-R ow s
1
1 1000K
AR ow s
400K
400K
1000K
1000K
1000K
AR ow s
400K
400K
A -Tim e
00:00:01.83
00:00:01.83
00:00:12.69
00:00:03.46
00:00:02.53
A -Tim e
00:00:03.45
00:00:03.45
1 1000K 400K
00:00:01.87
SELECT STATEM EN T
T
1 1000K 1000K 00:00:02.09
VIEW
M ATCH RECO G N IZE SO RT D ETERM IN ISTIC FIN ITE
AU TO
2
U sed M em
U sed B uf f
e rs O M e 1M em
M em
40M
(
0)
m
1594
5035K 40M (0)
1594 41M
1594
20M
(0)
1594 22M
1749K
20M (0)
1594
U sed B uf f
e rs O M e 1M em
M em
m U sed 1594
1594 M em
1594 22M
1749K 20M (0)
1594
(0)Stew
OOW20M
CON3450,
Ashton 10
2) Start of Group
Identify group boundaries, often using LAG()
3 steps instead of 2:
1. For each row: if start of group, assign 1
Else assign 0
2. Running total of 1s and 0s produces a group
identifier
3. Group by the group identifier
OOW CON3450, Stew
Ashton 11
2) Requirement
G R O U P _N
AM E
X
X
X
X
X
X
Y
Y
Y
EFF_D ATE
TER M _D ATE
2014-01-01 00:00 2014-02-01 00:00
2014-03-01 00:00 2014-04-01 00:00
2014-04-01 00:00 2014-05-01 00:00
2014-06-01 00:00 2014-06-01 01:00
2014-06-01 01:00 2014-06-01 02:00
2014-06-01 02:00 2014-06-01 03:00
2014-06-01 03:00 2014-06-01 04:00
2014-06-01 04:00 2014-06-01 05:00
2014-07-03 08:00 2014-09-29 17:00
w ith grp_starts as (
select a.*,
case w hen start_ts =
lag(end_ts) over(
partition by group_nam e
order by start_ts
)
then 0 else 1 end grp_start
from t a
),grps as (
select b.*,
sum (grp_start) over(
partition by group_nam e
order by start_ts
) grp_id
from grp_starts b)
select group_nam e,
XX 01-01
XX 03-01
XX 0604-01
YX 06-01
YX 0706-03
01
X 06-01
Y 06-01
Y 06-01
Y 07-03
00:00 02-01
00:00 0405-01
00:00 0506-01
00:00 06-01
03:
01:00 0608:
09-01
29
02:00 06-01
03:00 06-01
04:00 06-01
08:00 09-29
00:00 1
00:00 1
00:
03:00 0
01:
05:00 1
02:
17:00 0
03:00 0
04:00 1
05:00 0
17:00 1
1
2
2
3
3
3
1
1
2
2) Match_Recognize
New this time:
Added PARTITION
BY
MEASURES
added gap using
row outside the
match!
ONE ROW PER
MATCH
and
solution
SKIP One
PAST LAST
ROW simple!
SELECT * FRO M t
M ATCH _RECO G N IZE(
PA R TITIO N BY group_nam e
O RD ER BY start_ts
M EASU RES
A.start_ts start_ts,
end_ts end_ts,
n ext(start_ts) - end_ts gap
PATTERN (A B*)
D EFIN E B AS start_ts = prev(end_ts)
);
MEASURES
ALL ROWS ONE ROW
last row of
current row
match
First row of match
last row of
current row
match
DEFINE
ORA-62509
2) Run_Stats comparison
For 500,000
rows:
Stat
Latches
Elapsed Time
CPU used by this session
2) Execution Plans
Operation
SELECT STATEMENT
HASH GROUP BY
VIEW
WINDOW BUFFER
VIEW
WINDOW SORT
TABLE ACCESS FULL
Operation
SELECT STATEMENT
VIEW
MATCH RECOGNIZE SORT DETERMINISTIC FINITE
AUTO
TABLE ACCESS FULL
Used-Mem
20M (0)
32M (0)
27M (0)
Used-Mem
27M (0)
OOW CON3450, Stew
Ashton 17
2) Predicate pushing
Select * from < view > w here group_nam e = 'X'
O p eration
SELECT STATEM EN T
VIEW
M ATC H RECO G N IZE SO RT
D ETERM IN ISTIC
FIN ITE AU TO
TABLE ACCESS BY IN D EX RO W ID
BATC H ED
IN D EX RAN G E SCAN
N am
e
AB uf f
e rs
R ow s
3
4
3
4
TI
CN T
3407
4323
1623
1991
885
1159
7
1989
5282
2841
5183
6176
2784
2586
S TU D Y _S I
TE
1026
1028
1029
1031
1032
CN T
137
6005
76
4599
1989
1034 3427
1036 879
1038 6485
1039
3
1040 1105
1041 6460
1042 968
1044
471
Requirement
Order by study_site
Put in bins with
size = 65,000 max
FIR S T_S IT LA S T_S IT S U M _C
E
E
NT
1001
1022 48081
1023
1044 62203
1045
1045
3360
OOW CON3450, Stew
Ashton 19
SELECT s fi
rst_site,M AX(e) last_site,M AX(sm ) sum _cnt FRO M (
SELECT s,e,cnt,sm FRO M t
M O D EL
D IM EN S IO N BY (row _n u m b er() over(ord er b y stu d y_site) rn )
M EASU RES (study_site s,study_site e,cnt,cnt sm )
RU LES (
smrn
[ > 1] =
CASE W [Hcv(
EN) sm
1]
[cv()[cv(
1])]
+ cnt[cv()]
[cv(
> 65000
)]
O R cnt[cv()] > 65000
TH EN[cv(
cnt)[]cv()]
ELSE[cv(
sm )[cv(
1]
) - 1][+
cv(
cnt
)] [cv()]
EN D ,
s[rn > 1] =
CASE W [Hcv(
EN) sm
- 1][cv()[cv(
- 1])]+ cnt[cv()[]cv(
> )
65000
]
O R cnt[cv()] > 65000
TH EN
[cv(
s[)cv(
] )]
DIMENSION with row_number
ELSE
[cv(
s[cv(
) )1]
- 1]
orders data and processing
EN D
rn can be used like a subscript
)
)
cv() means current row
G RO U P BY s;
cv()-1 means previous row
20
simpler!
SELECT * FRO M t
M ATCH _RECO G N IZE (
O RD ER BY study_site
M EASU RES
FIRST(study_site) fi
rst_site,
LAST(study_site) last_site,
SU M (cnt) sum _cnt
PATTERN (A+ )
D EFIN E A AS SU M (cnt) < = 65000
);
replaces 3 methods:
OOW CON3450, Stew
Ashton 21
3) Run_Stats comparison
For one million
rows:
S tat
Latches
Elapsed Tim e
CPU used by this session
P re 12c
357448
32.85
31.31
M atch _
R
4622
2.9
2.88
P ct
1%
9%
9%
3) Execution Plans
Id
0
1
2
3
4
5
Id
0
1
2
3
O p eration
U sed -M em
SELECT STATEM EN T
H ASH G RO U P BY
7534K (0)
VIEW
SQ L M O D EL O RD ERED
105M (0)
W IN D O W SO RT
27M (0)
TABLE ACCESS FU LL
O p eration
U sed -M em
SELECT STATEM EN T
VIEW
B IN B IN B IN
1
2
3
10 10
V al
10
10
10
15
10
15
15
15
15
15
19
15
15
19
18
15
19
18
17
Requirement
Distribute values in 3
bins as equally as
possible
()+ = 1 or more of
whatever is inside
'|' = alternatives,
preferred in the
order specified
Bin1 condition:
No rows here yet,
Or this bin least
full
Bin2 condition
No rows here yet,
Stew
or OOW CON3450,
Ashton 26
4) Run_Stats comparison
For 10,000 rows:
S tat
Latches
Elapsed Tim e
C PU used by this session
M atch _
R
3124
47
28
0.02
26.39
0.03
P re 12c
P ct
2%
0%
0%
4) Execution Plans
Id
0
1
2
3
4
5
Id
0
1
2
3
O p eration
SELEC T STATEM EN T
H ASH G RO U P BY
VIEW
SQ L M O D EL O RD ERED
W IN D O W SO RT
TABLE AC CESS FU LL
O p eration
SELEC T STATEM EN T
VIEW
M ATCH REC O G N IZE
SO RT
TABLE ACC ESS FU LL
U sed -M em
817K (0)
1846K (0)
424K (0)
U sed -M em
330K (0)
Backtracking
What happens when there is no match???
Greedy quantifiers - * + {2,}
are not that greedy
Take all the rows they can, BUT
give rows back if necessary one at a time
Repeating conditions
select 'm atch'from (
select leveln from dual
connect by level< = 100
)
m atch_recognize(
pattern(a b * c)
defi
n e b as n > prev(n)
,c as n = 0
);
Imprecise Conditions
CREATE TABLE Ticker (
SYM BO L VARCH AR2(10),
tstam p D ATE,
price N U M BER
);
insert into ticker
select 'ACM E',
sysdate + level/24/60/60,
10000-level
from dual
connect by level< = 5000;
SELECT
SELECT *
* FRO
FRO M
M Ti
Ticker
cker
M
_RECO G
M ATCH
ATCH _RECO
GN
NI
IZE
ZE (
(
PARTI
TIO
ON
N BY
BY sym
sym bol
bol
PARTITI
O
BY t
tst
stam
am p
p
O RD
RD ER
ER BY
M
FIRST(
RST(t
tst
stam
am p)
p) AS
_tst
stam
am p,
p,
M EASU
EASU RES
RES FI
AS st
start
art_t
LAST(t
tst
stam
am p)
p) AS
AS end_t
end_tst
stam
am p
p
LAST(
AFTER
SKIP
P TO
TO LAST
LAST U
UP
P
AFTER M
M ATCH
ATCH SKI
PATTERN
STRT D
N+
+ U
WN
+ U
P+ )
PATTERN (
(STRT
DO
OW
WN
U P+
P+ D
DO
OW
N+
U P+
)
D
EFIN
DO
AS pri
< PREV(
price)
ce),
,
D EFI
NE
ED
OW
WN
N AS
price
ce <
PREV(pri
UP
P AS
AS pri
price
ce >
> PREV(
PREV(pri
price)
ce),
U
); S TR T A S p rice > = n vl(P R EV (P R IC E),0)
)
;
Runs
in 24 seconds
Runs in 0.0213
seconds
INMEMORY:
seconds
31
Keep in Mind
Backtracking
Precise conditions
Test data with no matches
To debug:
Measures classifier() cl,
match_number() mn
All rows per match with
unmatched rows
No DISTINCT, no LISTAGG
MEASURES columns must
have aliases
Reluctant quantifier
= ? = JDBC bind variable
Pattern variables are
range variables, not bind
variables
OOW CON3450, Stew
Ashton 32
PARTITIO
N BY
X
ALL ROWS
ORDER MEASUR
BY
ES
Omitted
X
X
Other
input
omitted
X
ORA-00918,
anyone?
OOW CON3450, Stew
Ashton 33
Questions?