Anda di halaman 1dari 38

DB2 SQL TUNING

© 2009 Wipro Ltd - Confidential


Topics

• General Tuning Recommendation


• Predicates Evaluation
• Using DB2 EXPLAIN
• Different Access Types
• Join Methods
• DB2 Data and Index page prefetch
• Sorting of Data and RIDs
• Special Techniques to influence access path

2 © 2009 Wipro Ltd - Confidential


General Recommendation

Make Sure
• Queries are simple
• Unused rows, columns are not fetched
• There is no unnecessary ORDER BY or GROUP BY Clause
• Minimize lock duration
• No redundant predicates

3 © 2009 Wipro Ltd - Confidential


Are there subqueries in a query?
If efficient indexes are available on the tables in the subquery, then a correlated subquery is
likely to be the most efficient kind of subquery.
If no efficient indexes are available on the tables in the subquery, then a noncorrelated
subquery would be likely to perform better.
If multiple subqueries are in any parent query, make sure that the subqueries are ordered in
the most efficient manner.
Example: Assume that MAIN_TABLE has 1000 rows:
SELECT * FROM MAIN_TABLE
WHERE TYPE IN (subquery 1) AND
PARTS IN (subquery 2);

4 © 2009 Wipro Ltd - Confidential


Continue..

• Assuming that subquery 1 and subquery 2 are the same type of subquery
(either correlated or noncorrelated) and the subqueries are stage 2, DB2
evaluates the subquery predicates in the order they appear in the WHERE
clause. Subquery 1 rejects 10% of the total rows, and subquery 2 rejects
80% of the total rows.
• The predicate in subquery 1 (which is referred to as P1) is evaluated 1000
times, and the predicate in subquery 2 (which is referred to as P2) is
evaluated 900 times, for a total of 1900 predicate checks. However, if the
order of the subquery predicates is reversed, P2 is evaluated 1000 times,
but P1 is evaluated only 200 times, for a total of 1200 predicate checks.
• Coding P2 before P1 appears to be more efficient if P1 and P2 take an
equal amount of time to execute. However, if P1 is 100 times faster to
evaluate than P2, then coding subquery 1 first might be advisable.
• In general subquery predicates can potentially be thousands of times more
processor- and I/O-intensive than all other predicates, the order of subquery
predicates is particularly important.
• Regardless of coding order, DB2 performs noncorrelated subquery
predicates before correlated subquery predicates, unless the subquery is
transformed into a join.

5 © 2009 Wipro Ltd - Confidential


Does query involve aggregate functions?

If a query involves aggregate functions, make sure that they are coded as simply as
possible; this increases the chances that they will be evaluated when the data is
retrieved, rather than afterward. In general, a aggregate function performs best
when evaluated during data access and next best when evaluated during DB2 sort.
Least preferable is to have a aggregate function evaluated after the data has been
retrieved.

6 © 2009 Wipro Ltd - Confidential


Continue..

» No sort is needed for GROUP BY. Check this in the EXPLAIN output.
• No stage 2 (residual) predicates exist.
• No distinct set functions exist, such as COUNT(DISTINCT C1).
• If the query is a join, all set functions must be on the last table joined.
• All aggregate functions must be on single columns with no arithmetic
expressions.
• The aggregate function is not one of the following aggregate functions:
– STDDEV
– STDDEV_SAMP
– VAR
– VAR_SAMP
• Does a query have an input variable in the predicate?
• When host variables or parameter markers are used in a query, the
actual values are not known when bind the package or plan that
contains the query. DB2 therefore uses a default filter factor to
determine the best access path for an SQL statement.

7 © 2009 Wipro Ltd - Confidential


Does a query have a problem with column correlation?
Two columns in a table are said to be correlated if the values in the columns do not
vary independently. DB2 might not determine the best access path when your
queries include correlated columns.
Can a query be written to use a noncolumn expression?
The following predicate combines a column, SALARY, with values that are not from
columns on one side of the operator:
WHERE SALARY + (:hv1 * SALARY) > 50000
If you rewrite the predicate in the following way, DB2 can evaluate it more efficiently:
WHERE SALARY > 50000/(1 + :hv1) In the second form, the column is by itself on
one side of the operator, and all the other values are on the other side of the
operator. The expression on the right is called a noncolumn expression. DB2 can
evaluate many predicates with noncolumn expressions at an earlier stage of
processing called stage 1, so the queries take less time to run.
Can materialized query tables help your query performance?
Dynamic queries that operate on very large amounts of data and involve multiple joins might
take a long time to run. One way to improve the performance of these queries is to generate the
results of all or parts of the queries in advance, and store the results in materialized query
tables.
Materialized query tables are user-created tables. Depending on how the tables are defined,
they are user-maintained or system-maintained. If you have set subsystem parameters or an
application sets special registers to tell DB2 to use materialized query tables, when DB2
executes a dynamic query, DB2 uses the contents of applicable materialized query tables if
DB2 finds a performance advantage to doing so.

8 © 2009 Wipro Ltd - Confidential


Does the query contain encrypted data?
Encryption and decryption can degrade the performance of some queries.
Encryption, by its nature, degrades the performance of most SQL statements.
Decryption requires extra processing, and encrypted data requires more space in DB2. If a
predicate
requires decryption, the predicate is a stage 2 predicate, which can degrade performance.
To minimize performance degradation, use encryption only in cases that require encryption.
Creating
indexes on encrypted data can improve performance in some cases. Exact matches and joins
of encrypted
data (if both tables use the same encryption key to encrypt the same data) can use the
indexes that you
create. Because encrypted data is binary data, range checking of encrypted data requires
table space
scans. Range checking requires all the row values for a column to be decrypted. Therefore,
range checking
should be avoided, or at least tuned appropriately.
CREATE TABLE EMP (EMPNO VARCHAR(48) FOR BIT DATA, NAME VARCHAR(48));
CREATE TABLE EMPPROJ(EMPNO VARCHAR(48) FOR BIT DATA, PROJECTNAME
VARCHAR(48));
CREATE INDEX IXEMPPRJ ON EMPPROJ(EMPNO);

9 © 2009 Wipro Ltd - Confidential


Continue..

Poor performance:
SELECT A.NAME, DECRYPT_CHAR(A.EMPNO) FROM EMP A, EMPPROJECT B
WHERE DECRYPT_CHAR(A.EMPNO) = DECRYPT_CHAR(B.EMPNO) AND
B.PROJECT ='UDDI Project';

SELECT PROJECTNAME FROM EMPPROJ WHERE DECRYPT_CHAR(EMPNO) =


'A7513';

Good performance
SELECT A.NAME, DECRYPT_CHAR(A.EMPNO) FROM EMP A, EMPPROJ B
WHERE A.EMPNO = B.EMPNO AND B.PROJECT ='UDDI Project';

SELECT PROJECTNAME FROM EMPPROJ WHERE EMPNO = ENCRYPT('A7513');

10 © 2009 Wipro Ltd - Confidential


General Recommendation

• Try to use indexable predicates wherever possible


• Use correlated subquery only if efficient predicates are available
• If there are multiple subqueries, make sure that they are ordered in efficient
manner

11 © 2009 Wipro Ltd - Confidential


Order of Predicate Evaluation

Predicates are evaluated in following sequence :


1. Indexable matching predicates – Index page
2. Indexable non-matching predicates (Index screening) – Index page
3. Other stage 1 predicates – Data page
4. Finally stage 2 predicates – After data page access

12 © 2009 Wipro Ltd - Confidential


Definition: Predicates are found in the clauses WHERE,
HAVING or ON of SQL statements;
they describe attributes of data. They are usually based on the
columns of a table and either qualify rows (through an index)
or reject rows (returned by a scan) when the table is accessed.
The resulting qualified or rejected rows are independent of the
access path chosen for that table.
Example: The following query has three predicates: an equal predicate on C1, a BETWEEN
predicate on C2, and a LIKE predicate on C3.
SELECT * FROM T1
WHERE C1 = 10 AND
C2 BETWEEN 10 AND 20 AND
C3 NOT LIKE 'A%'
Properties of predicates
Predicates in a HAVING clause are not used when selecting access paths. hence, the term
'predicate‘ means a predicate after WHERE or ON.
A predicate influences the selection of an access path because of:
1. Its type
2. Whether it is indexable
3. Whether it is stage 1 or stage 2
4. Whether it contains a ROWID column

13 © 2009 Wipro Ltd - Confidential


Continue..

Simple or compound
A compound predicate is the result of two predicates, whether
simple or compound, connected together by
AND or OR Boolean operators. All others are simple.
Local or join
Local predicates reference only one table. They are local to the
table and restrict the number of rows
returned for that table. Join predicates involve more than one table
or correlated reference. They determine
the way rows are joined from two or more tables.
Boolean term
Any predicate that is not contained by a compound OR predicate
structure is a Boolean term. If a Boolean
term is evaluated false for a particular row, the whole WHERE
clause is evaluated false for that row.

14 © 2009 Wipro Ltd - Confidential


DB2 EXPLAIN AND TUNING

• EXPLAIN is a monitoring tool that produces information about a plan,


package, or SQL statement when it is bound. The output appears in a user-
supplied table called PLAN_TABLE
It helps you to do the following
• Design databases, indexes, and application programs
• Determine when to rebind an application
• Determine the access path chosen for a query

15 © 2009 Wipro Ltd - Confidential


DB2 EXPLAIN OUTPUT

• Explain output is stored in PLAN_TABLE


• Each plan is identified by APPLNAME column
• Filter Factor – % of rows selected
• Indexes used
• Cluster Ratio - % of indexed rows in sequence with data rows

16 © 2009 Wipro Ltd - Confidential


Type of Access

• Tablespace Scan
• Index scan
– Index Only Access (INDEXONLY = Y)
– Multiple index Scan (ACCESSTYPE=M,MI,MU,MX)
– Matching index scan (MATCHCOLS > 0)
– Non-Matching index scan ( MATCHCOLS = 0)
– One fetch access (ACCESSTYPE= I1)

17 © 2009 Wipro Ltd - Confidential


Tablespace scan (ACCESSTYPE=R)

• Chosen when
– Huge number of rows returned
– Indexes available have low clusterratio
– No index available
• Sequential prefetch is used (PREFETCH=S)

18 © 2009 Wipro Ltd - Confidential


Using Index

• Define index based on how you want to access data


• Proper definition of index (highly clustered) will avoid sort
• Sometimes, using an index would make the query costly. In such cases,
discourage the use of such indexes

19 © 2009 Wipro Ltd - Confidential


Matching Index Scan

• Provides best filtering possible


• Predicates are specified on either leading or all index key columns
• MATCHCOLS will provide the number of matching columns
• If there is more than one index, DB2 will choose the one with best filter-
factor

20 © 2009 Wipro Ltd - Confidential


Non-Matching Index scan

• Also called Index Screening


• When predicates are not in first few columns of index but atleast one
predicate is in list of indexed columns
• Filters index pages
• MATCHCOLS = 0 and ACCESSTYPE = I

21 © 2009 Wipro Ltd - Confidential


One Fetch Access

• When a query returns needed row in one step of page access


• Only one table in the query
• MIN or MAX column functions
• No GROUP BY

22 © 2009 Wipro Ltd - Confidential


Index Only access

• When required data can be taken from index pages and no need to access
data page
• Much efficient
• ACCESSTYPE = I AND INDEXONLY = Y

23 © 2009 Wipro Ltd - Confidential


JOIN

• Retrieves rows from more than one table and combines them
• Application joins are called inner join, left outer join, right outer join and full
outer join
• DB2 internally uses three types of join method - Nested loop join, Merge
Scan Join and Hybrid Join

24 © 2009 Wipro Ltd - Confidential


Nested Loop Join (METHOD =1)

X B

Y A A PC

B OE
Z C
C BPR

X OE

Y PC

Z BPR

25 © 2009 Wipro Ltd - Confidential


Nested Loop Join (Method = 1)

Nested loop join is efficient when


• Outer table is small
• The number of data pages accessed in inner table is also small.
• Highly clustered index available on join columns of the inner table.
• This join method is efficient when filtering for both the tables(Outer and
inner) is high.

26 © 2009 Wipro Ltd - Confidential


Merge Scan Join (Method = 2)

Table is pre-sorted Table is pre-sorted


Y A A PC

X B B OE

Z C C BPR

X OE

Y PC

Z BPR

27 © 2009 Wipro Ltd - Confidential


Merge Scan Join (Method = 2)

Merge scan is used when :


• Qualifying rows of inner and outer tables are large and join predicates also
does not provide much filtering
• Tables are large and have no indexes with matching columns

28 © 2009 Wipro Ltd - Confidential


Hybrid Join (Method = 4)

Table is pre-sorted Index with RID


Y A A 5

X B B 30

Z C C 10

X OE Y 5

Y PC X 30

Z BPR
Z 10

29 © 2009 Wipro Ltd - Confidential


Hybrid Join(Method=4)

• Hybrid join is used often when a non-clustered index available on join


column of the inner table and there are duplicate qualifying rows on outer
table.
• Hybrid join handles are duplicates in the outer table as inner table is
scanned only once for each set of duplicate values.

30 © 2009 Wipro Ltd - Confidential


Sequential Prefetch (Prefetch=S)

• Sequential prefetch reads a sequential set of pages


• The maximum number of pages read by a request issued from application
program is determined by the size of the buffer pool used.
• Sequential prefetch is generally used for a table space scan.
• For an index scan that accesses 8 or more consecutive data pages, DB2
requests sequential prefetch at bind time. The index must have a cluster
ratio of 80% or above.

31 © 2009 Wipro Ltd - Confidential


List Sequential (Prefetch=L)

• List sequential prefetch reads a set of data pages determined by a list of


RIDs taken from an index
• Usually with a single index that has a cluster ratio lower than 80%.
• Sometimes on indexes with a high cluster ratio, if the amount of data to be
accessed is too small to make sequential prefetch efficient, but large
enough to require more than one regular read.
• Always to access data by multiple index access or Hybrid join

32 © 2009 Wipro Ltd - Confidential


Sequential Detection

• If DB2 does not choose prefetch at bind time, it can sometimes do that at
execution time. The method is called sequential detection.
• If a table is accessed repeatedly using the same statement (SQL in a do-
while loop), the data or index leaf pages of the table can be accessed
sequentially.
• DB2 can use this technique if it did not choose sequential prefetch at bind
time because of an inaccurate estimate of the no of pages to be accessed.

33 © 2009 Wipro Ltd - Confidential


Sorting of data

• Sort can happen on a new table or on the composite table


• Sort is required by ORDER BY or GROUP BY clause.
(SORTC_GROUPBY/SORTC_ORDERBY = Y).
• Sort is required to remove duplicates while DISTINCT or UNION is used.
(SORTC_UNIQ=Y)
• During Nested loop and Hybrid join, composite table is sorted and Merge
scan join, both of the tables might be sorted to make join efficient.
(SORTN_JOIN/SORTC_JOIN=Y)

34 © 2009 Wipro Ltd - Confidential


Sorting of data

• Sort is need for subquery processing. Result of the subquery is sorted and
put into the work file for later reference by parent query.
• DB2 sorts RIDs into ascending page number order in order to perform list
prefetch. This sort is very fast and is done totally in memory
• If sort is required during CURSOR processing, it is done during OPEN
CURSOR. Once cursor is closed and opened, sort is to be performed again.

35 © 2009 Wipro Ltd - Confidential


Some Special Techniques

• OPTIMIZE OF n ROWS
• Reducing the number of matching columns for index scan
• Adding extra local predicates
• Changing inner join to outer join
• Updating Catalog Statistics

36 © 2009 Wipro Ltd - Confidential


Risks

• There is no GOLDEN RULE for DB2 SQL tuning


• Wrong Analysis of performance Data and access method information may
led to more performance overhead
• While tuning SQL in test environment, the person should keep in mind that
amount of data and DB2 sub-system setup are not same.
• Person with good knowledge of DB2 should be involved with tuning activity.

37 © 2009 Wipro Ltd - Confidential


38 © 2009 Wipro Ltd - Confidential

Anda mungkin juga menyukai