Agenda
SELECT
FROM
WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGH
AND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGH
AND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGH
AND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH
Step3
AT BIND TIME DB2 CHOOSES IX3
Step5
Step4
AT RUN TIME
User only fills out a value for ZIP CODE
DB2 determines for each SQL statement the best way to resolve the query. The result
of this calculation is the access path. If it is a static SQL statement, this access path
will be chosen at bind time. As a general rule we can say that after DB2 has chosen
this, it wont change this access strategy at execution time. Even if at run time certain
other access path choices would have been better. This is somewhat simplifying the
truth, but is in most cases accurate.
In the example on the slide this is explained by a very simple static query.
Our table has 4 indexes. Our select has on all 4 columns a between. If nothing is
filled out by a user, the host variables are low value and high value. If the user
provides a value both host variables for that column hold the provided value.
At bind time, DB2 chooses IX3 as the best possible access path, with the known
parameters at that time.
IF at execution time our user doesnt fill out a value for COL3, but he does provide a
value for COL1. DB2 doesnt change his access path to IX1, but uses IX3, which
doesnt filter anything.
Well explain more later, the purpose here is just to explain that DB2 chooses one
access path and sticks to it. This access path can be a cheap access path or a more
expensive access path. But DB2 estimates that within the parameters at bind time it is
the cheapest.
Agenda
Index, Stage1,
Stage2
DB2
Index
RDS
Stage2
DM
Stage1
Index
5
Well get back to this slide later on. But for now the purpose is to show you that DB2
has different ways to solve a where clause (predicate). The idea is to resolve a
predicate as cheap as possible. This may require some changes in your code. If we
can code a predicate in stead of stage2, in such a way that it becomes stage1 or even
better indexable, we should always do so. In the next couple of slides well go into
what can be resolved where.
Indexable
Keep it positive and simple!
= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN
DB2
RDS
Stage2
DM
Stage1
Index Index
To slightly simplify things, you can use the phrase keep it positive and simple.
Indexes can verify very simple predicates, basically all simple and positive predicates
can be evaluated within an index provided they are Boolean.
A short reminder of what it means Boolean: if by not matching the predicate the row
can be negated, regardless of any other predicate, the predicate is considered a
Boolean predicate
Matching Columns
Matching columns is an indication of how well an index is used,
- more matching columns better index use
- always start with first index column
- on = and on one IN you can continue
Example: Index on (Name, Clientno, Salarycode)
Predicate
--------------------1. Name = Smith AND
Clientno = 20 AND
Salarycode = 56
Matching
-------------
Predicate
Matching
--------------------------------4. Name IN (Smith, Doe)
AND Salarycode > 0
2. Name = Smith OR
Clientno > 20 AND
Salarycode = 56
6. Cliento = 56
AND Salarycode = 0
7
Stage 1
Keep it positive and simple but no index!
= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN
DB2
RDS
Stage2
DM
Stage1
Index Index
To slightly simplify things, you can use the phrase keep it positive and simple but
no index.
If you have a very simple and positive where clause, but you dont have an index on
it, DB2 will resolve this on the data pages. In what we call the data manager, or
stage1
If you have a predicate which is positive and has an index on it, but it is not
BOOLEAN, then again DB2 will resolve this in stage1
If you have a predicate which is in the list but negative e.g. Not equal to it is
resolved in stage1
Stage 2
All the rest !!
All functions such as
SUBSTR
CONCAT
CHAR()
DB2
RDS
Stage2
DM
Stage1
Decryption
Current date between col1 and col2
Sorting
Index Index
All functions require by definition more procession power then what the data
manager is capable of providing, and so they are resolved in stage 2.
This functions also include any mathematical function such as adding and subtracting
with a column.
Mismatching data types, this is a bit more complex. As a general rule of thumb, you
can say that, when the data type of a the host variable doesnt match the data type of
the column. The predicate is stage2. This is cutting it a bit short, you could also say
(and is more correct) if the host variable is bigger than the column data type the
predicate is stage 2. Many exceptions exist, but best is to use the correct data type.
Host variable checking is done in stage2 and this should NEVER be done in SQL and
should always be done in COBOL.
IN COBOL
DB2
RDS
Stage2
DM
Stage1
So BETTER
Index Index
a Stage2 predicate then NO predicate
10
Being said that stage2 is expensive doesnt mean that you should use them.
If indeed the only way to write the predicate on a COLUMN is as a stage2 predicate,
you should write it as a stage2 and not pass the row on to COBOL and check it in
COBOL, that obviously is even more expensive. If such a thing as stage3 would
exist this would be it.
Index, Stage1,
Stage2
DB2
Index
RDS
Stage2
DM
Stage1
Index
11
This time around you should understand this slide. And know that there are more and
less expensive ways to writing a query, depending on where DB2 can resolve its
where predicates. And how many rows are filtered as early (index) on as possible and
how many are carried on to stage1 or even stage2.
SQL processing
DM (stage1)
1) matching index predicates (when the index is accessed)
2) other indexable stage 1 predicates (index screening)
3) non indexable stage 1 predicates on index pages
4) stage 1 predicates on the data
5) rows passed to RDS
RDS (stage2)
1) stage 2 predicates
2) sort
Selected rows passed to the user
12
Within all the non index steps of the previous slide, the same logic is followed.
E.G step 4 stage1 on data pages :
First DB2 will resolve all equal predicates
Secondly all range predicates
Thirdly all the rest (e.g. not equal to)
Within each sub step, the order in the SQL statement is followed. That means that if
we for example have two equal predicates that we have to resolve in the data pages,
DB2 will take the physical sequence in the SQL statement to determine the order in
which to resolve the predicates.
Well explain with a little example on the next slide
Example
SELECT *
FROM MYTABLE
WHERE C1 > ?
AND :HV <> 5
AND C5 = ?
AND C4 = ?
AND DATE(C2) < ?
AND C3 = ?
ORDER BY C2
1
6
3
4
5
2
7
index
stage2
stage1
stage1
stage2
index
stage2
14
Agenda
Sort impact
Sort YES :
at open cursor all rows are retrieved (e.g. 3000)
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : 2980 rows needless retrieved
Sort NO :
at fetch time first row retrieved
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : ONLY 20 rows retrieved from DB2, no
sort cost
16
Agenda
Select *
Program maintenance
CPU cost per extra column
SORT file becomes bigger
Maybe not index only
18
Select *
Even for :
where exists (select *)
Better where exists (select 1)
Select col5, where col5= AB
Better Select AB where col5= AB
Best Select where col5= AB
Select col1, col2order by col2
Better Select col1order by col2
if just for order by
19
Substr(name,5,2) = MI
denormalize
V9 index on expression
20
V9
21
Before version 9, although logically alike, there was a clear difference between both,
queries.
Using a distinct would always result in an extra sort, whereas the second query, with
adequate indexing could avoid the sort.
For instance an index on COL1, COL2 would have avoided a sort in the second query.
Since version 9, the distinct clause can also be used to avoid an extra sort.
Another important change is that since V9 and index COL2, COL1 can also be used to
avoid an extra sort. That of course means that you could have an impact in the
sequence of your result set and an order by clause should be included if you want to
guarantee the V8 sequence.
Col1 in (A,B)
Col1 = :hv1 or
Col1 >:hv1 and Col2 = :hv2
Col1 = 5
:hv = 5
IN COBOL !!!
22
col 1 < 10
union all
col1 > 50
Existence checking
select 1
from table
where col1 =:hv
fetch first 1 row only
if possible
Col1 in (the rest)
will be cheaper even
when list is bigger
23
Agenda
Plan_table
See next slide
Might require some exercise
Not everything in it
DSN_statement_table
Contains the Cost columns
25
DB2 Plan_table
SELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,
TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, INDEXONLY, PREFETCH
FROM PLAN_TABLE WHERE QUERYNO = 30303
ORDER BY QBLOCKNO, PLANNO ;
QBLOCKNO
1
1
1
PROGNAME
DSNESM68
DSNESM68
DSNESM68
PLANNO
1
2
3
METHOD
0
1
3
TNAME ACCESSTYPE
AATEHA1
AATEHB1
I
I
MATCHCOLS
2
2
0
ACCESSNAME
INDEXONLY
AAX0EHA1
AAX0EHB1
N
N
N
DSN_Statement_Table
Amongst others :
COST_CATEGORY:
A: Indicates that DB2 had enough information to make a cost estimate without using
default values.
B: Indicates that some condition exists for which DB2 was forced to use default
values.
PROCMS:
The estimated processor cost, in milliseconds, for the SQL statement
PROCSU:
The estimated processor cost, in service units, for the SQL statement
28
DSN_PREDICAT_TABLE
Contains all predicates and how they are
used.
Extremely useful for index design
Replaces the old spreadsheet
technique
29
General
plan_tables
New binds
plan_tables nsert
I
Changes
plan_tables
EXCELL
Transfer
LAN
30
It is also best to set up, an automated way of following up your access path changes.
And notifying your DBA and responsible developers.
Agenda
32
instead of
EXEC SQL
DECLARE cursor-name CURSOR FOR
SELECT column1
,column2 FROM table-name;
END-EXEC
instead of
EXEC SQL
FETCH cursor-name
INTO
END-EXEC
34
Via rowset
Gain on DB2
in CPU seconds
FETCH
16
10 (-60%)
76
66
10 (-15%)
76
60
16 (-35%)
36
gain 10 seconds of CPU per one million rows when using rowset pointers
Following data is based upon treatment of 1 million rows (in seconds CPU).
Sequences
Easy, fast and cheap way to generate
unique numbers if :
Holes are allowed
The order isnt important
Sequences
Effect of concurrency on elapsed
time
6
own table
seq object
4
2
0
1
amount jobs
cpu
duration
120
100
80
60
40
20
0
own table
seq object
amount jobs
38
Questions ?
Kurt.struyf@cp.be
39
E03
Practical SQL performance tuning,
for developers and DBA
Kurt Struyf
Competence Partners
Kurt.struyf@cp.be
40