Performance Tuning For Developers and DBA

Session: E03
Practical SQL performance tuning,

for developers and DBA
Kurt Struyf
Competence Partners
Oct 5th 2009 4pm
Platform: z/OS
Agenda
One SQL, one access path

Index, stage1, stage2
Sort impact
SQL examples of sub optimal coding and
its improvements
Access path fields in the plan_table
Other CPU saving techniques
2
Static SQL One SQL = One access path

Step1
Step2
Our table has 4 indexes :

IX1 on NAME
IX2 on FIRSTNAME
IX3 on BIRTHDATE
IX4 on ZIP CODE
SELECT
FROM
WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGH
AND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGH
AND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGH
AND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH
Step3
AT BIND TIME DB2 CHOOSES IX3
Step5
Step4
AT RUN TIME
User only fills out a value for ZIP CODE
AT RUN TIME DB2 USES IX3

which doesnt filter anything
ONE SQL = ONE access path
DB2 determines for each SQL statement the best way to resolve the query. The result
of this calculation is the access path. If it is a static SQL statement, this access path
will be chosen at bind time. As a general rule we can say that after DB2 has chosen
this, it wont change this access strategy at execution time. Even if at run time certain
other access path choices would have been better. This is somewhat simplifying the
truth, but is in most cases accurate.
In the example on the slide this is explained by a very simple static query.
Our table has 4 indexes. Our select has on all 4 columns a between. If nothing is
filled out by a user, the host variables are low value and high value. If the user
provides a value both host variables for that column hold the provided value.
At bind time, DB2 chooses IX3 as the best possible access path, with the known
parameters at that time.
IF at execution time our user doesnt fill out a value for COL3, but he does provide a
value for COL1. DB2 doesnt change his access path to IX1, but uses IX3, which
doesnt filter anything.
Well explain more later, the purpose here is just to explain that DB2 chooses one
access path and sticks to it. This access path can be a cheap access path or a more
expensive access path. But DB2 estimates that within the parameters at bind time it is
the cheapest.
Agenda

Sort impact
its improvements
4
Index, Stage1,
Stage2
DB2
Index
RDS
Stage2
DM
Stage1
Index
5
Well get back to this slide later on. But for now the purpose is to show you that DB2
has different ways to solve a where clause (predicate). The idea is to resolve a
predicate as cheap as possible. This may require some changes in your code. If we
can code a predicate in stead of stage2, in such a way that it becomes stage1 or even
better indexable, we should always do so. In the next couple of slides well go into
what can be resolved where.
Indexable
Keep it positive and simple!
= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN
DB2
RDS
Stage2
DM
Stage1
Index Index
MUST ALSO BE BOOLEAN

6
To slightly simplify things, you can use the phrase keep it positive and simple.
Indexes can verify very simple predicates, basically all simple and positive predicates
can be evaluated within an index provided they are Boolean.
A short reminder of what it means Boolean: if by not matching the predicate the row
can be negated, regardless of any other predicate, the predicate is considered a
Boolean predicate
Matching Columns
Matching columns is an indication of how well an index is used,
- more matching columns better index use
- always start with first index column
- on = and on one IN you can continue
Example: Index on (Name, Clientno, Salarycode)
Predicate
--------------------1. Name = Smith AND
Clientno = 20 AND
Salarycode = 56
Matching
-------------
Predicate
Matching
--------------------------------4. Name IN (Smith, Doe)
AND Salarycode > 0
2. Name = Smith OR
Clientno > 20 AND
Salarycode = 56
5. Name <> Smith

AND Clientno = 56
3. Name IN (Smith, Doe)

AND Clientno > 20
AND Salarycode = 56
6. Cliento = 56
AND Salarycode = 0
7
Stage 1
Keep it positive and simple but no index!
= : equal to
> : larger then
< : smaller then
>= : larger then or equal to
<= : smaller then or equal to
LIKE
IN
BETWEEN
DB2
RDS
Stage2
DM
Stage1
Index Index
Keep it simple but not positive !

Keep it simple but not Boolean !
8
To slightly simplify things, you can use the phrase keep it positive and simple but
no index.
If you have a very simple and positive where clause, but you dont have an index on
it, DB2 will resolve this on the data pages. In what we call the data manager, or
stage1
If you have a predicate which is positive and has an index on it, but it is not
BOOLEAN, then again DB2 will resolve this in stage1
If you have a predicate which is in the list but negative e.g. Not equal to it is
resolved in stage1
Stage 2
All the rest !!
All functions such as
SUBSTR
CONCAT
CHAR()
DB2
RDS
Stage2
DM
Stage1
Mismatching data types

Colchar_6 = 1234567
Host variable checking

AND :HV1 = 5
Decryption
Current date between col1 and col2
Sorting
Index Index
All functions require by definition more procession power then what the data
manager is capable of providing, and so they are resolved in stage 2.
This functions also include any mathematical function such as adding and subtracting
with a column.
Mismatching data types, this is a bit more complex. As a general rule of thumb, you
can say that, when the data type of a the host variable doesnt match the data type of
the column. The predicate is stage2. This is cutting it a bit short, you could also say
(and is more correct) if the host variable is bigger than the column data type the
predicate is stage 2. Many exceptions exist, but best is to use the correct data type.
Host variable checking is done in stage2 and this should NEVER be done in SQL and
should always be done in COBOL.
In COBOL checking stage 3

NEVER (ab)use this!
IN COBOL
DB2
All DB2 columns that

CAN be checked in SQL
SHOULD be checked in SQL
RDS
Stage2
DM
Stage1
So BETTER
Index Index
a Stage2 predicate then NO predicate
10
Being said that stage2 is expensive doesnt mean that you should use them.
If indeed the only way to write the predicate on a COLUMN is as a stage2 predicate,
you should write it as a stage2 and not pass the row on to COBOL and check it in
COBOL, that obviously is even more expensive. If such a thing as stage3 would
exist this would be it.
Index, Stage1,
Stage2
DB2
Index
RDS
Stage2
DM
Stage1
Index
11
This time around you should understand this slide. And know that there are more and
less expensive ways to writing a query, depending on where DB2 can resolve its
where predicates. And how many rows are filtered as early (index) on as possible and
how many are carried on to stage1 or even stage2.
SQL processing
DM (stage1)
1) matching index predicates (when the index is accessed)
2) other indexable stage 1 predicates (index screening)
3) non indexable stage 1 predicates on index pages
4) stage 1 predicates on the data
5) rows passed to RDS
RDS (stage2)
1) stage 2 predicates
2) sort
Selected rows passed to the user
12
DB2 resolves its where predicate always in the same manner.

First it will resolve the matching index predicates in the sequence of the index
columns
Secondly it will resolve all the screening predicates in the index
Thirdly DB2 will resolve all non indexable where predicates, that are stage 1 and can
be resolved in the index pages
Fourth, DB2 will resolve all stage1 predicates on the data
Then all stage2 predicates are resolved and lastly all returning rows are sorted.
Order of evaluating predicates

Within each of the above non index steps :
1) all equal predicates
2) all range predicates and col IS NOT NULL
3) all other predicates
Within each of the above sub-step :

the order in which they appear
13
Within all the non index steps of the previous slide, the same logic is followed.
E.G step 4 stage1 on data pages :
First DB2 will resolve all equal predicates
Secondly all range predicates
Thirdly all the rest (e.g. not equal to)
Within each sub step, the order in the SQL statement is followed. That means that if
we for example have two equal predicates that we have to resolve in the data pages,
DB2 will take the physical sequence in the SQL statement to determine the order in
which to resolve the predicates.
Well explain with a little example on the next slide
Example
SELECT *
FROM MYTABLE
WHERE C1 > ?
AND :HV <> 5
AND C5 = ?
AND C4 = ?
AND DATE(C2) < ?
AND C3 = ?
ORDER BY C2
INDEX (C1, C3)
1
6
3
4
5
2
7
index
stage2
stage1
stage1
stage2
index
stage2
14
Agenda

Sort impact
its improvements
15
Sort impact
Sort YES :
at open cursor all rows are retrieved (e.g. 3000)
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : 2980 rows needless retrieved
Sort NO :
at fetch time first row retrieved
fetch 20 rows to build first screen,
user found his info and aborts
RESULT : ONLY 20 rows retrieved from DB2, no
sort cost
16
Agenda

Sort impact
its improvements
17
Select *
SELECT * almost never to be used

SELECT ONLY COLUMNS that are
needed !
Reason :
Program maintenance
CPU cost per extra column
SORT file becomes bigger
Maybe not index only
18
Select *
Even for :
where exists (select *)
Better where exists (select 1)
Select col5, where col5= AB
Better Select AB where col5= AB
Best Select where col5= AB
Select col1, col2order by col2
Better Select col1order by col2
if just for order by
19
Other easy improvements

:hv between col1 and col2
Substr(name,1,2) = MI
col1 >= :hv and col2 <= :hv

name like MI%
But be careful for %MI%
Substr(name,5,2) = MI
denormalize
V9 index on expression
COL_date <> 9999-12-31

COL_int <> 0
COL <> :hv
COL not <= 5
COL_date < 9999-12-31

COL_int > 0
COL in ( , , , , , )
COL > 5
20
Other easy improvements

SELECT DISTINCT COL1, COL2, COUNT(C1)
FROM TABLE
WHERE
Always results in extra SORT
SELECT COL1, COL2, COUNT(C1)

FROM TABLE
WHERE
GROUP BY COL1, COL2
Same results SORT can be avoided
V9
21
Before version 9, although logically alike, there was a clear difference between both,
queries.
Using a distinct would always result in an extra sort, whereas the second query, with
adequate indexing could avoid the sort.
For instance an index on COL1, COL2 would have avoided a sort in the second query.
Since version 9, the distinct clause can also be used to avoid an extra sort.
Another important change is that since V9 and index COL2, COL1 can also be used to
avoid an extra sort. That of course means that you could have an impact in the
sequence of your result set and an order by clause should be included if you want to
guarantee the V8 sequence.
More easy improvements

Col1=A or Col1= B
Col1 in (A,B)
Col1>= :hv1 and COL1<=:hv2
Col1 between :hv1 AND :hv2
Col1 = :hv1 or
Col1 >:hv1 and Col2 = :hv2
Col1 >= :hv1 AND

(col1=:hv1 or
col1>:hv1 and col2 =:hv2)
Col1 = :hva (always 5)
Col1 = 5
:hv = 5
IN COBOL !!!
22
Even More easy improvements

Col1 not between 10 and 50
col 1 < 10
union all
col1 > 50
Existence checking
Col1 not in (A, B, C)
select 1
from table
where col1 =:hv
fetch first 1 row only
if possible
Col1 in (the rest)
will be cheaper even
when list is bigger
23
Agenda

Sort impact
its improvements
24
Determine Access Path

Optimization Service Center
Newest generation of Visual explain
Plan_table
See next slide
Might require some exercise
Not everything in it
DSN_statement_table
Contains the Cost columns
25
DB2 Plan_table
SELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,
TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, INDEXONLY, PREFETCH
FROM PLAN_TABLE WHERE QUERYNO = 30303
ORDER BY QBLOCKNO, PLANNO ;
QBLOCKNO
1
1
1
PROGNAME
DSNESM68
DSNESM68
DSNESM68
PLANNO
1
2
3
METHOD
0
1
3
TNAME ACCESSTYPE
AATEHA1
AATEHB1
I
I
MATCHCOLS
2
2
0
ACCESSNAME
INDEXONLY
AAX0EHA1
AAX0EHB1
N
N
N
Qblockno: indicates the number blocks

necessary to resolve the query
General rule, more blocks = less performing
Progname: represents the Program/package

name
26
Access path: planno, method
Planno: the number of steps AND the

sequence in which a query is resolved
General rule, more steps = less performing
Method: expresses what kind of access is

done
0 : First access
1 : Nested Loop Join
3 : extra sort needed
Tname: table name to be accessed

Access type : how that data is accessed
27
DSN_Statement_Table
Amongst others :
COST_CATEGORY:
A: Indicates that DB2 had enough information to make a cost estimate without using
default values.
B: Indicates that some condition exists for which DB2 was forced to use default
values.
PROCMS:
The estimated processor cost, in milliseconds, for the SQL statement
PROCSU:
The estimated processor cost, in service units, for the SQL statement
28
DSN_PREDICAT_TABLE
Contains all predicates and how they are
used.
Extremely useful for index design
Replaces the old spreadsheet
technique
29
Access Path Follow Up

Specific
plan_tables
Identify
every
query
using
QUERYNO
General
plan_tables
New binds
plan_tables nsert
I
Changes
plan_tables
EMAIL
EXCELL
Transfer
LAN
30
It is also best to set up, an automated way of following up your access path changes.
And notifying your DBA and responsible developers.
Agenda

Sort impact
its improvements
31
Multi Row fetch
Technique to save up to 60% of DB2 cpu

Easy to use
Fetches a rowset into an array
Program can control size of rowset
32
!! due to compiler limits !!

elementary item : max. 16Mb
complete working storage : max 128 Mb
Multi Row Fetch

To be able to use this, the cursor should be
DECLAREd for rowset positioning, for
example:
EXEC SQL
DECLARE cursor-name CURSOR WITH ROWSET POSITIONING FOR
SELECT column1
,column2 FROM table-name;
END-EXEC
instead of
EXEC SQL
DECLARE cursor-name CURSOR FOR
SELECT column1
,column2 FROM table-name;
END-EXEC
Then you can FETCH multiple rows at-a-time

from the cursor
33
Multi Row Fetch

On the FETCH statement
the amount of rows requested can be specified

for example:
EXEC SQL
FETCH NEXT ROWSET FROM cursor-name
FOR :rowset-size ROWS
INTO
END-EXEC
instead of
EXEC SQL
FETCH cursor-name
INTO
END-EXEC
The rowset size can be defined as a constant or a

variable, for example:
01
rowset-size PIC S9(09) COMP-5.
34
Multi Row fetch

Do not use single and multiple row fetch
for the same cursor in one program
Be aware of compiler limits
Last FETCH on a rowset can be

incomplete
35
!! due to compiler limits !!

Multi Row Fetch
Performance results may differ:

< 5 rows : poor performance (worse than before)
10 100 rows : best performance
> 100 rows : no improvement anymore
Following data is based upon treatment of

1 million rows (in seconds CPU).
Via row
Via rowset
Gain on DB2
in CPU seconds
FETCH
16
10 (-60%)
FETCH + UPDATE via row
76
66
10 (-15%)
FETCH + UPDATE via rowset
76
60
16 (-35%)
36
Performance results may differ, depending on the amount of columns and

their data type, but mainly:
< 5 rows : poor performance (worse than before)
10 100 rows : best performance
> 100 rows : no improvement anymore (same as 10 - 100 rows)
gain 10 seconds of CPU per one million rows when using rowset pointers
Following data is based upon treatment of 1 million rows (in seconds CPU).
Sequences
Easy, fast and cheap way to generate
unique numbers if :
Holes are allowed
The order isnt important
Use : next value for yy.xxxxxxxx statement

BASIC SYNTAX : CREATE SEQUENCE yy.xxxxxxxx
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
NO CYCLE
CACHE 200;
37
Sequences
Effect of concurrency on elapsed
time
Better response times
6
own table
seq object
4
2
Effect on cpu usage
0
1
amount jobs
Less cpu need
cpu
duration
120
100
80
60
40
20
0
own table
seq object
amount jobs
38
Questions ?
Kurt.struyf@cp.be
39
E03
Practical SQL performance tuning,
for developers and DBA
Kurt Struyf
Competence Partners
Kurt.struyf@cp.be
40

Performance Tuning For Developers and DBA

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Performance Tuning For Developers and DBA

Diunggah oleh

Hak Cipta:

Format Tersedia

Session: E03

Practical SQL performance tuning,

One SQL, one access path

Static SQL One SQL = One access path

Our table has 4 indexes :

AT RUN TIME DB2 USES IX3

One SQL, one access path

MUST ALSO BE BOOLEAN

5. Name <> Smith

3. Name IN (Smith, Doe)

Keep it simple but not positive !

Mismatching data types

Host variable checking

In COBOL checking stage 3

All DB2 columns that

DB2 resolves its where predicate always in the same manner.

Order of evaluating predicates

Within each of the above sub-step :

INDEX (C1, C3)

One SQL, one access path

One SQL, one access path

SELECT * almost never to be used

Other easy improvements

col1 >= :hv and col2 <= :hv

COL_date <> 9999-12-31

COL_date < 9999-12-31

Other easy improvements

SELECT COL1, COL2, COUNT(C1)

More easy improvements

Col1>= :hv1 and COL1<=:hv2

Col1 between :hv1 AND :hv2

Col1 >= :hv1 AND

Col1 = :hva (always 5)

Even More easy improvements

Col1 not in (A, B, C)

One SQL, one access path

Determine Access Path

Qblockno: indicates the number blocks

Progname: represents the Program/package

Access path: planno, method

Planno: the number of steps AND the

Method: expresses what kind of access is

Tname: table name to be accessed

Access Path Follow Up

One SQL, one access path

Multi Row fetch

Technique to save up to 60% of DB2 cpu

!! due to compiler limits !!

Multi Row Fetch

Then you can FETCH multiple rows at-a-time

Multi Row Fetch

the amount of rows requested can be specified

The rowset size can be defined as a constant or a

rowset-size PIC S9(09) COMP-5.

Multi Row fetch

Last FETCH on a rowset can be

!! due to compiler limits !!

Multi Row Fetch

Performance results may differ:

Following data is based upon treatment of

FETCH + UPDATE via row

FETCH + UPDATE via rowset

Performance results may differ, depending on the amount of columns and