Anda di halaman 1dari 40

Session: E03

Practical SQL performance tuning,


for developers and DBA
Kurt Struyf
Competence Partners
Oct 5th 2009 4pm
Platform: z/OS

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
2

Static SQL One SQL = One access path


Step1
Step2

Our table has 4 indexes :


IX1 on NAME
IX2 on FIRSTNAME
IX3 on BIRTHDATE
IX4 on ZIP CODE

SELECT
FROM
WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGH
AND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGH
AND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGH
AND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH

Step3
AT BIND TIME DB2 CHOOSES IX3

Step5

Step4
AT RUN TIME
User only fills out a value for ZIP CODE

AT RUN TIME DB2 USES IX3


which doesnt filter anything
ONE SQL = ONE access path

DB2 determines for each SQL statement the best way to resolve the query. The result
of this calculation is the access path. If it is a static SQL statement, this access path
will be chosen at bind time. As a general rule we can say that after DB2 has chosen
this, it wont change this access strategy at execution time. Even if at run time certain
other access path choices would have been better. This is somewhat simplifying the
truth, but is in most cases accurate.
In the example on the slide this is explained by a very simple static query.
Our table has 4 indexes. Our select has on all 4 columns a between. If nothing is
filled out by a user, the host variables are low value and high value. If the user
provides a value both host variables for that column hold the provided value.
At bind time, DB2 chooses IX3 as the best possible access path, with the known
parameters at that time.
IF at execution time our user doesnt fill out a value for COL3, but he does provide a
value for COL1. DB2 doesnt change his access path to IX1, but uses IX3, which
doesnt filter anything.
Well explain more later, the purpose here is just to explain that DB2 chooses one
access path and sticks to it. This access path can be a cheap access path or a more
expensive access path. But DB2 estimates that within the parameters at bind time it is
the cheapest.

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
4

Index, Stage1,
Stage2
DB2

Index

RDS

Stage2

DM

Stage1

Index
5

Well get back to this slide later on. But for now the purpose is to show you that DB2
has different ways to solve a where clause (predicate). The idea is to resolve a
predicate as cheap as possible. This may require some changes in your code. If we
can code a predicate in stead of stage2, in such a way that it becomes stage1 or even
better indexable, we should always do so. In the next couple of slides well go into
what can be resolved where.

Indexable
Keep it positive and simple!

= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN

DB2
RDS

Stage2

DM

Stage1

Index Index

MUST ALSO BE BOOLEAN


6

To slightly simplify things, you can use the phrase keep it positive and simple.
Indexes can verify very simple predicates, basically all simple and positive predicates
can be evaluated within an index provided they are Boolean.
A short reminder of what it means Boolean: if by not matching the predicate the row
can be negated, regardless of any other predicate, the predicate is considered a
Boolean predicate

Matching Columns
Matching columns is an indication of how well an index is used,
- more matching columns better index use
- always start with first index column
- on = and on one IN you can continue
Example: Index on (Name, Clientno, Salarycode)
Predicate
--------------------1. Name = Smith AND
Clientno = 20 AND
Salarycode = 56

Matching
-------------

Predicate
Matching
--------------------------------4. Name IN (Smith, Doe)
AND Salarycode > 0

2. Name = Smith OR
Clientno > 20 AND
Salarycode = 56

5. Name <> Smith


AND Clientno = 56

3. Name IN (Smith, Doe)


AND Clientno > 20
AND Salarycode = 56

6. Cliento = 56
AND Salarycode = 0
7

Stage 1
Keep it positive and simple but no index!

= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN

DB2
RDS

Stage2

DM

Stage1

Index Index

Keep it simple but not positive !


Keep it simple but not Boolean !
8

To slightly simplify things, you can use the phrase keep it positive and simple but
no index.
If you have a very simple and positive where clause, but you dont have an index on
it, DB2 will resolve this on the data pages. In what we call the data manager, or
stage1
If you have a predicate which is positive and has an index on it, but it is not
BOOLEAN, then again DB2 will resolve this in stage1
If you have a predicate which is in the list but negative e.g. Not equal to it is
resolved in stage1

Stage 2
All the rest !!
All functions such as
SUBSTR
CONCAT
CHAR()

DB2
RDS

Stage2

DM

Stage1

Mismatching data types


Colchar_6 = 1234567

Host variable checking


AND :HV1 = 5

Decryption
Current date between col1 and col2
Sorting

Index Index

All functions require by definition more procession power then what the data
manager is capable of providing, and so they are resolved in stage 2.
This functions also include any mathematical function such as adding and subtracting
with a column.
Mismatching data types, this is a bit more complex. As a general rule of thumb, you
can say that, when the data type of a the host variable doesnt match the data type of
the column. The predicate is stage2. This is cutting it a bit short, you could also say
(and is more correct) if the host variable is bigger than the column data type the
predicate is stage 2. Many exceptions exist, but best is to use the correct data type.
Host variable checking is done in stage2 and this should NEVER be done in SQL and
should always be done in COBOL.

In COBOL checking stage 3


NEVER (ab)use this!

IN COBOL

DB2

All DB2 columns that


CAN be checked in SQL
SHOULD be checked in SQL

RDS

Stage2

DM

Stage1

So BETTER
Index Index
a Stage2 predicate then NO predicate
10

Being said that stage2 is expensive doesnt mean that you should use them.
If indeed the only way to write the predicate on a COLUMN is as a stage2 predicate,
you should write it as a stage2 and not pass the row on to COBOL and check it in
COBOL, that obviously is even more expensive. If such a thing as stage3 would
exist this would be it.

Index, Stage1,
Stage2
DB2

Index

RDS

Stage2

DM

Stage1

Index
11

This time around you should understand this slide. And know that there are more and
less expensive ways to writing a query, depending on where DB2 can resolve its
where predicates. And how many rows are filtered as early (index) on as possible and
how many are carried on to stage1 or even stage2.

SQL processing
DM (stage1)
1) matching index predicates (when the index is accessed)
2) other indexable stage 1 predicates (index screening)
3) non indexable stage 1 predicates on index pages
4) stage 1 predicates on the data
5) rows passed to RDS
RDS (stage2)
1) stage 2 predicates
2) sort
Selected rows passed to the user

12

DB2 resolves its where predicate always in the same manner.


First it will resolve the matching index predicates in the sequence of the index
columns
Secondly it will resolve all the screening predicates in the index
Thirdly DB2 will resolve all non indexable where predicates, that are stage 1 and can
be resolved in the index pages
Fourth, DB2 will resolve all stage1 predicates on the data
Then all stage2 predicates are resolved and lastly all returning rows are sorted.

Order of evaluating predicates


Within each of the above non index steps :
1) all equal predicates
2) all range predicates and col IS NOT NULL
3) all other predicates

Within each of the above sub-step :


the order in which they appear
13

Within all the non index steps of the previous slide, the same logic is followed.
E.G step 4 stage1 on data pages :
First DB2 will resolve all equal predicates
Secondly all range predicates
Thirdly all the rest (e.g. not equal to)
Within each sub step, the order in the SQL statement is followed. That means that if
we for example have two equal predicates that we have to resolve in the data pages,
DB2 will take the physical sequence in the SQL statement to determine the order in
which to resolve the predicates.
Well explain with a little example on the next slide

Example
SELECT *
FROM MYTABLE
WHERE C1 > ?
AND :HV <> 5
AND C5 = ?
AND C4 = ?
AND DATE(C2) < ?
AND C3 = ?
ORDER BY C2

INDEX (C1, C3)

1
6
3
4
5
2
7

index
stage2
stage1
stage1
stage2
index
stage2
14

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
15

Sort impact
Sort YES :
at open cursor all rows are retrieved (e.g. 3000)
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : 2980 rows needless retrieved
Sort NO :
at fetch time first row retrieved
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : ONLY 20 rows retrieved from DB2, no
sort cost
16

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
17

Select *

SELECT * almost never to be used


SELECT ONLY COLUMNS that are
needed !
Reason :

Program maintenance
CPU cost per extra column
SORT file becomes bigger
Maybe not index only
18

Select *

Even for :
where exists (select *)
Better where exists (select 1)
Select col5, where col5= AB
Better Select AB where col5= AB
Best Select where col5= AB
Select col1, col2order by col2
Better Select col1order by col2
if just for order by
19

Other easy improvements


:hv between col1 and col2
Substr(name,1,2) = MI

col1 >= :hv and col2 <= :hv


name like MI%
But be careful for %MI%

Substr(name,5,2) = MI

denormalize
V9 index on expression

COL_date <> 9999-12-31


COL_int <> 0
COL <> :hv
COL not <= 5

COL_date < 9999-12-31


COL_int > 0
COL in ( , , , , , )
COL > 5

20

Other easy improvements


SELECT DISTINCT COL1, COL2, COUNT(C1)
FROM TABLE
WHERE
Always results in extra SORT

SELECT COL1, COL2, COUNT(C1)


FROM TABLE
WHERE
GROUP BY COL1, COL2
Same results SORT can be avoided

V9

21

Before version 9, although logically alike, there was a clear difference between both,
queries.
Using a distinct would always result in an extra sort, whereas the second query, with
adequate indexing could avoid the sort.
For instance an index on COL1, COL2 would have avoided a sort in the second query.
Since version 9, the distinct clause can also be used to avoid an extra sort.
Another important change is that since V9 and index COL2, COL1 can also be used to
avoid an extra sort. That of course means that you could have an impact in the
sequence of your result set and an order by clause should be included if you want to
guarantee the V8 sequence.

More easy improvements


Col1=A or Col1= B

Col1 in (A,B)

Col1>= :hv1 and COL1<=:hv2

Col1 between :hv1 AND :hv2

Col1 = :hv1 or
Col1 >:hv1 and Col2 = :hv2

Col1 >= :hv1 AND


(col1=:hv1 or
col1>:hv1 and col2 =:hv2)

Col1 = :hva (always 5)

Col1 = 5

:hv = 5

IN COBOL !!!

22

Even More easy improvements


Col1 not between 10 and 50

col 1 < 10
union all
col1 > 50

Existence checking

Col1 not in (A, B, C)

select 1
from table
where col1 =:hv
fetch first 1 row only
if possible
Col1 in (the rest)
will be cheaper even
when list is bigger

23

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
24

Determine Access Path


Optimization Service Center
Newest generation of Visual explain

Plan_table
See next slide
Might require some exercise
Not everything in it

DSN_statement_table
Contains the Cost columns
25

DB2 Plan_table
SELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,
TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, INDEXONLY, PREFETCH
FROM PLAN_TABLE WHERE QUERYNO = 30303
ORDER BY QBLOCKNO, PLANNO ;
QBLOCKNO
1
1
1

PROGNAME
DSNESM68
DSNESM68
DSNESM68

PLANNO
1
2
3

METHOD
0
1
3

TNAME ACCESSTYPE
AATEHA1
AATEHB1

I
I

MATCHCOLS
2
2
0

ACCESSNAME

INDEXONLY

AAX0EHA1
AAX0EHB1

N
N
N

Qblockno: indicates the number blocks


necessary to resolve the query
General rule, more blocks = less performing

Progname: represents the Program/package


name
26

Access path: planno, method

Planno: the number of steps AND the


sequence in which a query is resolved
General rule, more steps = less performing

Method: expresses what kind of access is


done
0 : First access
1 : Nested Loop Join
3 : extra sort needed

Tname: table name to be accessed


Access type : how that data is accessed
27

DSN_Statement_Table
Amongst others :
COST_CATEGORY:
A: Indicates that DB2 had enough information to make a cost estimate without using
default values.
B: Indicates that some condition exists for which DB2 was forced to use default
values.

PROCMS:
The estimated processor cost, in milliseconds, for the SQL statement

PROCSU:
The estimated processor cost, in service units, for the SQL statement

28

DSN_PREDICAT_TABLE
Contains all predicates and how they are
used.
Extremely useful for index design
Replaces the old spreadsheet
technique

29

Access Path Follow Up


Specific
plan_tables
Identify
every
query
using
QUERYNO

General
plan_tables

New binds
plan_tables nsert
I

Changes
plan_tables

EMAIL

EXCELL
Transfer
LAN

30

It is also best to set up, an automated way of following up your access path changes.
And notifying your DBA and responsible developers.

Agenda

One SQL, one access path


Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
31

Multi Row fetch

Technique to save up to 60% of DB2 cpu


Easy to use
Fetches a rowset into an array
Program can control size of rowset

32

!! due to compiler limits !!


elementary item : max. 16Mb
complete working storage : max 128 Mb

Multi Row Fetch


To be able to use this, the cursor should be
DECLAREd for rowset positioning, for
example:
EXEC SQL
DECLARE cursor-name CURSOR WITH ROWSET POSITIONING FOR
SELECT column1
,column2 FROM table-name;
END-EXEC

instead of
EXEC SQL
DECLARE cursor-name CURSOR FOR
SELECT column1
,column2 FROM table-name;
END-EXEC

Then you can FETCH multiple rows at-a-time


from the cursor
33

Multi Row Fetch


On the FETCH statement

the amount of rows requested can be specified


for example:
EXEC SQL
FETCH NEXT ROWSET FROM cursor-name
FOR :rowset-size ROWS
INTO
END-EXEC

instead of
EXEC SQL
FETCH cursor-name
INTO
END-EXEC

The rowset size can be defined as a constant or a


variable, for example:
01

rowset-size PIC S9(09) COMP-5.

34

Multi Row fetch


Do not use single and multiple row fetch
for the same cursor in one program
Be aware of compiler limits
elementary item : max. 16Mb
complete working storage : max 128 Mb

Last FETCH on a rowset can be


incomplete
35

!! due to compiler limits !!


elementary item : max. 16Mb
complete working storage : max 128 Mb

Multi Row Fetch

Performance results may differ:


< 5 rows : poor performance (worse than before)
10 100 rows : best performance
> 100 rows : no improvement anymore

Following data is based upon treatment of


1 million rows (in seconds CPU).
Via row

Via rowset

Gain on DB2
in CPU seconds

FETCH

16

10 (-60%)

FETCH + UPDATE via row

76

66

10 (-15%)

FETCH + UPDATE via rowset

76

60

16 (-35%)
36

Performance results may differ, depending on the amount of columns and


their data type, but mainly:
< 5 rows : poor performance (worse than before)
10 100 rows : best performance
> 100 rows : no improvement anymore (same as 10 - 100 rows)

gain 10 seconds of CPU per one million rows when using rowset pointers
Following data is based upon treatment of 1 million rows (in seconds CPU).

Sequences
Easy, fast and cheap way to generate
unique numbers if :
Holes are allowed
The order isnt important

Use : next value for yy.xxxxxxxx statement


BASIC SYNTAX : CREATE SEQUENCE yy.xxxxxxxx
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
NO CYCLE
CACHE 200;
37

Sequences
Effect of concurrency on elapsed
time

Better response times

6
own table
seq object

4
2

Effect on cpu usage

0
1

amount jobs

Less cpu need

cpu

duration

120
100
80
60
40
20
0

own table
seq object

amount jobs
38

Questions ?

Kurt.struyf@cp.be

39

E03
Practical SQL performance tuning,
for developers and DBA

Kurt Struyf
Competence Partners
Kurt.struyf@cp.be

40

Anda mungkin juga menyukai