Mike Walker
UCI
Consulting, Inc.
Phone: 1-888-UCI FOR U 1-888-824-3678 Fax: 1-609-654-0957 e-mail: mwalker@uci-consulting.com
Overview:
Onconfig settings
Disk/Table Layouts Fragmentation, etc
Reduce I/O
reduce I/O performed by the engine reduce I/O between the back-end and the front-end (reduce number of database operations)
Identify Problem Queries Easier to spot Easier to trace Simplify Queries Test on a machine with minimal system activity Use database that reflects production data Number of rows & similar distributions Want same query plan Want similar timings Turn Set Explain on Change configuration parameters Turn PDQ on Bounce engine to clear LRUs
What is the object of the query? What is the information required? What is the order criteria?
Identify the the data types and indexes on the columns being:
Some constraints are enforced with indexes Primary and Unique constraints may help identify when expect single row to be returned Check constraints may hint at data distributions
Consider the number of rows examined vs. the number of rows returned Determine the distribution of filter columns
dbschema -hd <tablename> -d <database> (if have stats on that column) Select count with group one-to-one one-to-many many-to-many
Examine the Set Explain output Modify the query and/or schema (use directives to test various paths) Run the query again
The query plan is written to the file: sqexplain.out File is created in the current directory (UNIX) If use SQLEditor file will be in home directory of the user that SQL was executed as File will be appended to each time more SQL is executed in the same session For NT, look for a file called username.out in %INFORMIXDIR%\sqexpln on the server
Sometimes sqexplain.out will not be written to, even though SET EXPLAIN ON statement has been executed
SELECT
In IDS 9.4
onmode Y <sid> [0|1] Set or unset dynamic explain Creates file called: sqexplain.out.sid May have issues
select * from stock where unit_price>20 order by stock_num Estimated Cost: 3 Estimated # of Rows Returned: 5 1) informix.stock: INDEX PATH Filters: informix.stock.unit_price > 20 (1) Index Keys: stock_num manu_code
Any Questions?
onstat u
address 1b062064 1b06662c 1b067520 flags Y--P-----PX-Y--P--sessid 44948 44961 39611 user cbsread cbsdba cbsuser tty wait 1c5f3cc8 0 1ecf6f00 tout 0 0 0 locks 1 0 1 nreads 0 2022 5308 nwrites 0 118008 61240
tty tty -
tout 0 0 0 tout 0 0 0
locks 1 1 1 locks 1 1 1
onstat g ntt
Individual thread network information (times): netscb thread name sid open read write address 1d380f00 sqlexec 44961 16:46:29 16:46:29 16:46:29 >date Wed Apr
Current statement name : slctcur Current SQL statement : select * from tab1, tab2 where tab1.a = tab2.b order by tab2.c Last parsed SQL statement : select * from tab1, tab2 where tab1.a = tab2.b order by tab2.c
Indexing Schemes
Level 1
1 0 0
1 1 2 9 0 5 5 9 0
>
5 0 0
>
Level 0
1 1 4 5 3 9 0 0 2 0 0 0
5 6 8 9 0 9 5 9 1 9 0 9
DATA
Columns used in joining tables Columns used as filters Columns used in ORDER BYs and GROUP BYs
Avoid highly duplicate columns Keep key size small Limit indexes on highly volatile tables Use the FILLFACTOR option
Cost Maintenance of indexes on Inserts, Updates & Deletes Extra Disk Space
Any Questions?
>
5 0 0
>
1 1 4 5 3 9 0 0 2 0 0 0
5 6 8 9 0 9 5 9 1 9 0 9
Estimated Cost: 2 Estimated # of Rows Returned: 15 1) informix.stock: INDEX PATH (1) Index Keys: stock_num manu_code (Key-Only) Lower Index Filter: informix.stock.stock_num = 190
Data Pages
stock_num, manu_code, qty
Data Pages
stock_num, manu_code, qty
Data Pages
stock_num, manu_code, qty
May not see much advantage with Key-First Indexes. They may help some especially for large wide tables Can gain some benefit from adding additional columns to the end of the index to reduce the jumps from the index pages to the data pages Evaluate adding a new index or changing the index to include the key-first column earlier in the index
Any Questions?
Table Joins
Joining Tables
Consider the following query: select * from stock, items where stock.stock_num = items.stock_num and items.quantity>10
Read from A then find matching rows in B Read from B then find matching rows in A
A then B 1,000 reads from A For each A row do an index scan into B (4 reads) Total reads: 5,000 (1,000 for A + 1,000 x 4 for B)
B then A 50,000 reads from B For each B row do an index scan into A (3 reads) Total reads: 200,000 (50,000 for B + 50,000 x 3 for A)
A then B
1,000 reads from A For each A row do an index scan into B (4 reads) Total reads: 5,000 (1,000 for A + 1,000 x 4 for B) Total Rows Returned: 10
B then A
Index scan of B (3 reads), then the data (10 reads) for a total of 13 For each B row do an index scan into A (3 reads) Total reads: 43 (13 for B + 10 x 3 for A) Total Rows Returned: 10
General Rule: The table which returns the fewest rows, either through a filter or the row count, should be first.
Any Questions?
Optimizer Directives
Optimizer Directives
Changes the generated query plan by removing paths from consideration Similar to Oracles HINTs Better than HINTs
Optimizer Directives
select --+ORDERED * from A, B where A.join_col = B.join_col
With the directive, ORDERED, the optimizer only considers paths that read from A then B. The lowest cost is then chosen from those paths. A then B Seq A, Seq B Cost:100 Seq A, Idx B Cost:50 Idx A, Idx B Cost:20 etc.
With the directive, this path would be chosen
B then A Seq B, Seq A Cost:100 Seq B, Idx A Cost:50 Idx B, Idx A Cost:10 etc.
Normally, this path would be chosen
SELECT --+ directive text SELECT {+ directive text } UPDATE --+ directive text UPDATE {+ directive text } DELETE --+ directive text DELETE {+ directive text }
SELECT /*+directive*/
Types of Directives
Access Methods Join Methods Join Order Optimization Goal Query Plan Only (IDS 9.3) Correlated Subquery Flattening (IDS 9.3)
index
avoid_index full avoid_full
ordered
first_rows (N) tells the optimizer to choose a plan optimized to return the first N rows of the result set all_rows tells the optimizer to choose a plan optimized to return all of the results
OPT_GOAL environment variable (environment level) SET OPTIMIZATION statement (session level)
FIRST_ROWS, ALL_ROWS
forces nested loop join on specified tables forces hash join on specified tables avoids nested loop join on specified tables avoids hash join on specified tables
customer.lname, orders.order_num, items.total_price from customer, orders, items where customer.customer_num = orders.customer_num and orders.order_num = items.order_num and items.stock_num = 6 and items.manu_code = "SMT"
3) items: INDEX PATH Filters: items.order_num = orders.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (items.stock_num = 6 AND items.manu_code = 'SMT' ) NESTED LOOP JOIN
2)orders: SEQUENTIAL SCAN DYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters:c.customer_num =o.customer_num
3)items: INDEX PATH Filters:i.order_num =o.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (i.stock_num = 6 AND i.manu_code = 'SMT' ) NESTED LOOP JOIN
EXPLAIN AVOID_EXECUTE Generate the Query Plan (SQL Explain Output), but dont run the SQL Introduced in IDS 9.3 Especially useful for getting the query plans for Insert, Update and Deletes no longer have to rewrite them as Select statements, or surround them with BEGIN WORKROLLBACK WORK commands
With AVOID_EXECUTE:
SET EXPLAIN ON;
DELETE /*+ EXPLAIN AVOID_EXECUTE */ FROM x WHERE y=10; Delete will NOT be performed, but the execution plan will be written
Force the engine to execute the SQL the way that we want Sometimes we know better!! Great for testing different plans
Cons:
Force the engine to execute the SQL the way that we want Sometimes the engine knows better!! If new indexes added, number of rows changes drastically, or data distributions changethen a better execution plan may be available
Any Questions?
Optimization Techniques
Optimization Techniques
Use Composite Indexes Use Index Filters Create indexes for Key-Only scans Perform indexed reads for sorting
Use the CASE/DECODE statements to combine multiple selects Drop and recreate indexes for large modifications Use Non Logging Tables Use OUTER JOINS Prepare and Execute statements
Composite indexes are ones built on more than one column The optimizer uses the leading portions of a composite index for filters, join conditions and sorts A composite index on columns a, b and c will be used for selects involving: column a columns a and b columns a, b and c It will not be used for selects involving only columns b and/or c since those columns are not at the beginning of the index( i.e. the leading portion )
Indexed Read
select qty from stock where stock_num = 190 and manu_code = 10
Table stock : stock_num = 190 : stock_num = 190 AND manu_code = 10 : 100,000 rows 10,000 rows 100 rows
Index (stock_num)
Index Pages
stock_num
Data Pages
stock_num, manu_code, qty
Composite Key
select qty from stock where stock_num = 190 and manu_code = 10
Table stock : stock_num = 190 : stock_num = 190 AND manu_code = 10 : 100,000 rows 10,000 rows 100 rows
Data Pages
stock_num, manu_code, qty
SELECT * FROM xyz WHERE begin_idx >= 99 AND end_idx <= 150 The leading portion of the index, column begin_idx, will be used.
>
5 0 0
>
1 1 4 5 3 9 0 0 2 0 0 0
5 6 8 9 0 9 5 9 1 9 0 9
>
5 0 0
>
1 1 4 5 3 9 0 0 2 0 0 0
5 6 8 9 0 9 5 9 1 9 0 9
Data for the select list is read from the index key -- No read of the data page is needed Useful for inner tables of nested-loop joins Useful for creating a sub-table for very wide tables
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b 1) cbsdba.tab1: INDEX PATH (1) Index Keys: a Lower Index Filter: cbsdba.tab1.a = 1 2) cbsdba.tab2: INDEX PATH
(1) Index Keys: b Lower Index Filter: cbsdba.tab2.b = cbsdba.tab1.b NESTED LOOP JOIN
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b
1,000 reads from tab1 Index Pages 1,000 jumps to tab1 Data Pages 1,000 reads from tab1 Data Pages For each of these: 1,000 reads from tab2 Index Pages 1,000 jumps to tab2 Data Pages 1,000 reads from tab2 Data Pages
Timing: 50 seconds
Key-Only Scans
create index tab1_idx on tab1(a); create index tab2_idx on tab2(b);
Change Indexes
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b 1) cbsdba.tab1: INDEX PATH (1) Index Keys: a b (Key-Only) Lower Index Filter: cbsdba.tab1.a = 1 2) cbsdba.tab2: INDEX PATH
(1) Index Keys: b c (Key-Only) Lower Index Filter: cbsdba.tab2.b = cbsdba.tab1.b NESTED LOOP JOIN
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b
Key-Only Scans
select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b
1,000 reads from tab1 Index Pages For each of these: 1,000 reads from tab2 Index Pages
Timing: 35 seconds
Indexed reads cause rows to be read in the order of the indexed columns Higher priority is given to indexes on columns used as filters
Columns in the sort criteria are not in the index Columns in the sort criteria are in a different order than the index Columns in the sort criteria are from different tables
Note: As of Informix Dynamic Server v7.31 this is done automatically by the optimizer
Useful for batch reporting Avoid selecting a subset of data repetitively from a larger table Create summary information that can be joined to other tables
Disadvantage
The data in the temporary table is a copy of the real data and therefore is not changed if the original data is modified.
OR's can cause the optimizer to not use indexes Complex where conditions can cause the optimizer to use the wrong index Note: Informix Dynamic Server v7.3 allows UNIONs in views
select sum(qty) from log where trans_id = 1 and sku = ? and date_time > ? UNION ... select sum(qty) from log where trans_id = 4 and sku = ? and date_time > ?
select sum(qty) from log where sku = ? and trans_id in ( 1, 2, 3, 4) and date_time > ?
Uses the composite index Earlier versions of Informix still may not use the composite index
Sequential scans of large tables are resource intensive use light scans if possible Sequential scans of small tables are not harmful Consider using permanent indexes to avoid sequential scans when possible Create temporary indexes for batch reporting Replace Auto Indexes with real indexes On a loosely related topicConsider changing the order of columns for Key-First Scans
Very efficient way to sequentially scan a table Go straight to disk, avoid the buffer pool
With Light Scans
Database Engine
Database Engine
Buffers (LRUs)
Disk
Disk
Only used when sequentially scanning a table The table is bigger than the buffer pool PDQ must be on (SET PDQPRIORITY ) Dirty read isolation (SET ISOLATION TO DIRTY READ) or no logging Monitor using onstat g lsc
Good to use when joining a large number of rows from multiple tables
Typical join is NESTED LOOP, costly to do index scan over and over Builds hash table in memory for one table, scans second and hashes into memory PDQ must be turned on DS_TOTAL_MEMORY should be set high
This will join 250,000 header records with 2,500,000 line records. With a nested loop join, the database will do an index read into the line table 250,000 times.
This will read the 10 million line records and put them in a hash table, then the header table will be read from and the hash table will be used to do the join.
A better option might be to put an ordered directive and change the order of the from clause so the 250,000 header records are put in the hash table. It depends on the memory available to PDQ.
output to /dev/null select unique tab2.c from tab1, tab2 where tab1.a = 1 and tab1.b = tab2.b
Hash Joins
select /*+FULL(tab1) FULL(tab2)*/ unique tab2.c from tab1, tab2 where tab1.a = 1
Timing: 6 seconds
includes generating the explain plan! Compare with 35 seconds for Key-Only Scan
CASE Syntax:
CASE WHEN condition THEN expr WHEN condition THEN expr ELSE expr END
DECODE Syntax:
DECODE( expr, when_expr, then_expr, , else_expr )
update customer set preferred = case when stat=A then Y else N end OR
DECODE( stat, A, Y, N )
1 SQL Statement
OR
select SUM( case when stat=A then 1 else 0 end ), SUM( case when stat=I then 1 else 0 end ), SUM( case when stat=D then 1 else 0 end ) from customer
OR
select SUM( DECODE( stat, A, 1, 0) ), SUM( DECODE( stat, I, 1, 0) ), SUM( DECODE( stat, D, 1, 0) ) from customer
Things to note: If you get an error about creating an index on a variant function, you may be trying to use a built-in function or you did not create the function with the NOT VARIANT clause. SET EXPLAIN shows the index being used. There is overhead with this type of index. Index creation is not done in parallel if function is not PARALLELIZABLE. SPL is not PARALLELIZABLE, only external functions written in C or Java.
Eliminates overhead of maintaining indexes during modification Indexes are recreated more efficiently
Indexes can deteriorate over time Use PDQPRIORITY for faster creation
Disadvantage Must have exclusive access to the table before doing this! Locking the table may not be sufficient! 3-tier architecture can make this an even bigger pain!
In XPSand introduced in IDS 7.31 Inserts, Updates and Deletes against rows in a tables are logged
For large operations this could produce significant overhead Create the table as RAW or change it to RAW for the duration of the operation, and the operations will not be logged
Do Load
ALTER TABLE big_table TYPE (STANDARD);
Create indexes
Cannot have indexes on a RAW Table!!
Ouch!
Brilliant!
PREPARE p1 FROM INSERT INTO some_table VALUES ( ?, 10 ) FOR x = 1 to 1000 EXECUTE p1 USING x END FOR
Once Only!
Really should be using PDQ for batch processes and reporting Enable for index builds (also Light Scans & Hash Joins) Set DS_TOTAL_MEMORY as high as you can spare set in config file or with onmode -M Use MAXPDQPRIORITY to set the maximum priority that any single session is permitted set in config file or with onmode -D Use SET PDQPRIORITY n to set the PDQ for a session or set in the environment (e.g. export PDQPRIORITY=80)
Any Questions?
Xtree
Xwindows interface Only works with Xwindows terminal Need the Xwindows libraries setup
Name
This part of the window is called the display window and shows the information about what is happening in the query. Each of these boxes (nodes) designates an operation of the query: sort, group, filter, scan. This number represents the number of rows that have been passed to node above. This part of the window displays the entire query tree. If the tree is too big for the display window (to the right), a black box will appear which can be dragged to scroll to different parts of the tree which are displayed in the display window. This number represents the number of rows examined per second. The little speedometer is a graphical representation of this number. The number is occasionally negative which could be because it is a 2-byte integer and when it gets too high (i.e., too fast) it displays as a negative.
Example
SET EXPLAIN ON; Need the explain plan to interpret xtree display SELECT A.DSCNT_DUE_DT, A.SCHEDULED_PAY_DT, A.PYMNT_GROSS_AMT, B.GROSS_AMT_BSE, A.DSCNT_PAY_AMT FROM PS_PYMNT_VCHR_XREF A, PS_VOUCHER B, PS_VENDOR C, PS_VENDOR_ADDR D, PS_VENDOR_PAY E WHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT AND A.VOUCHER_ID = B.VOUCHER_ID AND A.REMIT_SETID = C.SETID AND A.REMIT_VENDOR = C.VENDOR_ID AND A.REMIT_SETID = D.SETID AND A.REMIT_VENDOR = D.VENDOR_ID AND A.REMIT_ADDR_SEQ_NUM = D.ADDRESS_SEQ_NUM AND D.EFF_STATUS = 'A' AND . . .
Correlated Sub-Queries
These are examples of non-correlated sub-queries. The performance of these two should be the same.
select c.* from customers c, orders o where c.custid = o.custid and o.stat = OPEN
select c.* from customers c where custid in ( select custid Outer query referenced in Inner query from orders o Inner query must be repeated for each where o.stat = OPEN ) row returned by the Outer query
The sub-query, on orders, is executed for every row retrieved from customers.
If customers table had 100,000 rows, the sub-query would get executed 100,000 times.
If orders only had 20 rows with stat=OPEN the database would be doing a lot of extra work.
Correlated Sub-queries
update customers set stat = A where exists ( select X from orders o where o.custid = customers.custid and o.cmpny = customers.cmpny and o.stat = OPEN ) The original CSQ is left since it was joining on more than one column update customers set stat = A where exists ( select X from orders o where o.custid = customer.custid and o.cmpny = customers.cmpny and o.stat = OPEN ) and custid in ( select custid from orders o where o.stat = OPEN )
Add this condition to reduce the number of times the subquery is executed
If orders has only 20 rows meeting the filter, the second version of the update runs much faster, assuming that customers has an index on the column custid.
QUERY: update orders set ship_charge = 0 where exists ( select "X" from customer c where c.customer_num = orders.customer_num and c.state = "MD ) 1) informix.orders: SEQUENTIAL SCAN Filters: EXISTS <subquery>
Heres the join between the inner and outer tables Yuk!
1) informix.c: INDEX PATH Filters: informix.c.state = 'MD' (1) Index Keys: customer_num Lower Index Filter: c.customer_num = orders.customer_num
(1) Index Keys: customer_num Lower Index Filter:orders.customer_num = ANY <subquery> Subquery: --------1) informix.c: SEQUENTIAL SCAN Filters: informix.c.state = 'MD' LookNo Join!! Yippee!!
2) informix.orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: orders.customer_num = c.customer_num NESTED LOOP JOIN
Does this indicate that Subquery Flattening is not necessarily a good thing ????
business_unit=ABC Then why dont we just apply the same filter in the subquery?
becomes
Correlated Subquery
(1) Index Keys: process_instance business_unit Lower Index Filter: (ps_jrnl_ln.business_unit = 'ABC' AND ps_jrnl_ln.process_instance = 5960 )
Filter condition of Subquery: outer query has been --------applied here 1) ps_bus_unit_tbl_gl: INDEX PATH (1) Index Keys: business_unit (Key-Only) Lower Index Filter: ps_bus_unit_tbl_gl.business_unit = 'ABC' 2) ps_bus_unit_tbl_fs: INDEX PATH (1) Index Keys: business_unit descr (Key-Only) Lower Index Filter: ps_bus_unit_tbl_fs.business_unit = ps_bus_unit_tbl_gl.business_unit NESTED LOOP JOIN
that query Filters: (informix.a.range_to_06 >= Indicates ps_jrnl_ln.account AND a.tree_effdt = <subquery> ) can stop once this condition is satisfied (1) Index Keys: setid chartfield combination range_from_06 range_to_06 Lower Index Filter: (a.setid = 'ABC' AND (a.combination = 'OVERHEAD' AND a.chartfield = 'ACCOUNT' ) ) Upper Index Filter: a.range_from_06 <= ps_jrnl_ln.account NESTED LOOP JOIN (Semi Join)
QUERY: update orders set backlog = "Y" where exists ( select "X from items where orders.order_num = items.order_num and stock_num = 6 and manu_code = "SMT ) 1) informix.items: INDEX PATH (Skip Duplicate) Filters: (items.stock_num=6 AND items.manu_code='SMT' ) (1) Index Keys: order_num
Will get unique values from the first table before joining to the second table, so preventing multiple updates with the same value
2) informix.orders: INDEX PATH (1) Index Keys: order_num Lower Index Filter: orders.order_num = items.order_num NESTED LOOP JOIN
Any Questions?