Explaining The Explain - Part 1 (Webcast)

Explaining the EXPLAIN Part 1
Joe Ramon
October 21, 2008
Agenda
What is the EXPLAIN facility? Where does the EXPLAIN output come FROM? What does the Optimizer need to build a plan? What does the EXPLAIN terminology mean? What can be learned by reading the EXPLAIN text? What can be done to influence the Optimizer? Summary
What is EXPLAIN?
The EXPLAIN facility provides an "English" translation of the plan the SQL Optimizer develops to service a request. May be used on any SQL statement, except EXPLAIN itself. Look for key words AND phrases Execution time AND row count estimates depend on: > Are statistics collected? Actual execution time depends on: > Is DBS processing other requests? > Is channel or network busy?
How is EXPLAIN Text Generated?

SQL REQUEST DD
Dbase AccessRights RoleGrants (V2R5) TVM TVFields Indexes
SYNTAXER DD Cache RESOLVER SECURITY STATISTICS OPTIMIZER GENERATOR APPLY DISPATCHER EXPLAIN
AMP
Information Known to Optimizer

Number of nodes in system Number and type of CPUs per node Number of configured AMP Vprocs Disk array configuration Interconnect configuration Amount and configuration of memory
All are taken into account when calculating query cost.
Additional Information Required by the Optimizer

Columns with indexes Rows in the table Rows per block Values per column Rows per value Row length
Optimizer Random AMP Samples

Statistics collected by a random AMP sample apply in these cases row counts for the table are needed and statistics are not collected
on PI. indexed columns are used in the query and statistics are not collected. With Teradata 12.0, statistics have been collected, but are considered stale.
By default, Teradata chooses an AMP for random AMP (or dynamic) data
sampling.
Enhancement starting with Teradata 6.0. When statistics are not available, the Optimizer can obtain random
samples from more than one AMP when generating row counts for a query plan.
Random AMP sampling is controlled via a DBS Control parameter.
Random AMP Sampling

For a table row count estimate, read one cylinder on 1 AMP. Calculate the approximate number of rows in the table: For NUSI estimates, read one cylinder from the NUSI subtable. Uses a similar technique by counting the number of NUSI values in
the cylinder. The table row count is divided by the extrapolated NUSI row count to get a rows/NUSI value.
Any skewed component in the sample skews the demographics. For non-indexed columns without statistics, the optimizer uses fixed
formulas to estimate the number of rows. For example,
Assumes 10% for one column in an equality condition Assumes 7.5% for two columns, each in an equality condition, and
ANDed together
Optimizer Facts
Cost-based Optimizer - looks for lowest cost plan Does not store plan - dynamically regenerates As data demographics change, so may plan Will only assign cost to steps for which there are choices Assigns confidence factors on row estimates Mature, large-table, decision-support optimization
EXPLAIN Example
EXPLAIN SELECT FROM Last_Name, First_Name, Dept_Name, Job_Desc Employee E ON E.Job_code = J.Job_code
INNER JOIN Department D ON E.Dept_Number = D.Dept_Number INNER JOIN Job J ORDER BY 3, 1, 2;
EXPLAIN Example (cont.)

1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.E. 2) Next, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.J. 3) We lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.D. 4) We lock TFACT.E for read, we lock TFACT.J for read, and we lock TFACT.D for read. 5) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 19,642 rows (726,754 bytes). The estimated time for this step is 0.02 seconds. 2) We do an all-AMPs RETRIEVE step from TFACT.J by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 3 by the hash code of (TFACT.J.Job_Code). The size of Spool 3 is estimated with high confidence to be 12,166 rows (450,142 bytes). The estimated time for this step is 0.01 seconds. 6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TFACT.E by way of an all-rows scan with a condition of ("NOT (TFACT.E.Job_Code IS NULL)"). Spool 2 and TFACT.E are joined using a single partition hash_ join, with a join : : 8) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.14 seconds.
EXPLAIN Terminology .Pseudo Table Locks.

Prevents two users from getting conflicting locks with allAMP requests All-AMP lock requests are handled as follows: PE determines Table ID hash for an AMP to manage the all-AMP lock request. Put pseudo lock on the table Acquire lock on all AMPs
EXPLAIN Terminology (cont.) . Pseudo Table Locks.

First request PE PE Second request
Determine Table ID hash
AMP
AMP
AMP
AMP
EXPLAIN Terminology (cont.)

Most EXPLAIN text is easy to understand. The following additional definitions may help:
... (Last Use)

A spool file is no longer needed and will be released when this step completes.
... with no residual conditions

All applicable conditions have been applied to the rows.
... END TRANSACTION

Transaction locks are released, and changes are committed.
... by way of the sort key in spool field1 (dbname.tablename.colname)

Field1 is created to allow a tag sort. Teradata 12.0 includes the column name used for the sort.

5) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 19,642 rows (726,754 bytes). The estimated time for this step is 0.02 seconds. 2) We do an all-AMPs RETRIEVE step from TFACT.J by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 3 by the hash code of (TFACT.J.Job_Code). The size of Spool 3 is estimated with high confidence to be 12,166 rows (450,142 bytes). The estimated time for this step is 0.01 seconds. 6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TFACT.E by way of an all-rows scan with a condition of ("NOT (TFACT.E.Job_Code IS NULL)"). Spool 2 and TFACT.E are joined using a single partition hash_ join, with a join with a join condition of ("TFACT.E.Dept_Number = Dept_Number"). The result goes into Spool 4 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 4 by the hash code of (TFACT.E.Job_Code). The size of Spool 4 is estimated with low confidence to be 26,000 rows (1,690,000 bytes). The estimated time for this step is 0.04 seconds. 7) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a RowHash match scan, which is joined to Spool 4 (Last Use) by way of a RowHash match scan. Spool 3 and Spool 4 are joined using a merge join, with a join condition of ("Job_Code = Job_Code"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (TFACT.D.Dept_Name, TFACT.E.Last_Name, TFACT.E.First_Name). The size of Spool 1 is estimated with low confidence to be 26,000 rows (3,822,000 bytes). The estimated time for this step is 0.08 seconds. 8) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

Most EXPLAIN text is easy to understand. The following additional definitions may help:
... we do an ABORT test

Caused by an ABORT or ROLLBACK statement.
... We execute these steps in parallel.

The following indented steps are executed in parallel. 3) We lock DBC.ArchiveLoggingObjsTbl for read on a RowHash, we lock DBC.TVM for write on a RowHash, we lock DBC.TVFields for write on a RowHash, we lock DBC.Indexes for write on a RowHash, we lock DBC.DBase for read on a RowHash, and we lock DBC.AccessRights for write on a RowHash. 4) We execute the following steps in parallel. 1) We do a single-AMP ABORT test from DBC.ArchiveLoggingObjsTbl by way of the primary index. 2) We do a single-AMP ABORT test from DBC.DBase by way of the unique primary index. 3) We do a single-AMP ABORT test from DBC.TVM by way of the unique primary index. 4) We do an INSERT into DBC.TVFields (no lock required). : : 7) We do an INSERT into DBC.Indexes (no lock required). 8) We do an INSERT into DBC.TVM (no lock required). 9) We INSERT default rights to DBC.AccessRights for TFACT.Orders.

... which is redistributed by hash code to all AMPs (dbname.tablename.colname)
Redistributing data (in SPOOL) in preparation for a join. Teradata 12.0 includes the column name.
... which is duplicated on all AMPs

Duplicating data (in SPOOL) from the smaller table in preparation for a join.
... (one_amp) or (group_amps) or (all_amps)

Indicates one AMP, a subset of AMPs, or all of the AMPs will participate.
... ("NOT (table_name.column_name IS NULL)")

Feature where optimizer realizes that the column being joined to is NOT NULL or has referential integrity.
... eliminating duplicate rows ...

Duplicate rows only exist in spool files, not set tables.

4) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 19,642 rows (726,754 bytes). The estimated time for this step is 0.02 seconds. 5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TFACT.E by way of an all-rows scan. Spool 2 and TFACT.E are joined using a single partition hash_ join, with a join condition of ("TFACT.E.Dept_Number = Dept_Number"). The result goes into Spool 1 (group_amps), which is redistributed by the hash code of (TFACT.E.First_Name, TFACT.E.Last_Name, TFACT.E.Employee_Number, TFACT.D.Dept_Name) to all AMPs. The size of Spool 1 is estimated with low confidence to be 26,000 rows (3,614,000 bytes). The estimated time for this step is 0.09 seconds. 6) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with a condition of ("NOT (TFACT.D.Dept_Mgr_Number IS NULL)") into Spool 3 (all_amps), which is redistributed by the hash code of (TFACT.D.Dept_Mgr_Number) to all AMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is estimated with high confidence to be 1,403 rows (51,911 bytes). The estimated time for this step is 0.01 seconds. 7) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a RowHash match scan, which is joined to TFACT.E by way of a RowHash match scan with no residual conditions. Spool 3 and TFACT.E are joined using a merge join, with a join condition of ("TFACT.E.Employee_Number = Dept_Mgr_Number"). The result goes into Spool 1 (group_amps), which is redistributed by the hash code of (TFACT.E.First_Name, TFACT.E.Last_Name, TFACT.E.Employee_Number, TFACT.D.Dept_Name) to all AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 27,403 rows (3,809,017 bytes). The estimated time for this step is 0.06 seconds.

... we do a BMSMS (bit map set manipulation step)
Doing a NUSI Bit Map operation. : 3) We do a BMSMS (bit map set manipulation) step that builds a bit map for TFACT.Employee by way of index # 4 "TFACT.E.Job_Code = 3500" which is placed in Spool 2. The estimated time for this step is 0.01 seconds. 4) We do an all-AMPs RETRIEVE step from TFACT.E by way of index # 8 TFACT.E.Dept_Number = 1310" and the bit map in Spool 2 (Last Use) with a residual condition of ("TFACT.E.Job_Code = 3500") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 60 rows (4620 bytes). The estimated time for this step is 0.02 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds. Note: Statistics were collected on the NUSIs Job_Code and Dept_Number.
Synchronized Scanning
In the case of multiple users that access the same table at the same time, the system can do a synchronized scan (sync scan) on the table.
112747 Query 1766 1 Begins 034982 2212 310229 2231 209181 123881 223431 221015 121332 118314 104631 210110 210001 100076 100045 319116 : : 1235 2433 2500 1019 2281 2100 1279 1201 1205 1011 1012 1219 : : 100766 106363 108222 108221 101433 105200 3001 3005 3100 3001 3007 3101 Frankel Bench Palmer Smith Walton Brooks Woods Walton Ramon Roberts Douglas Morgan Allan John Carson Buster Sam Steve Tiger John Anne Julie Michael Joe
108222 Query 2 3199 101281 Begins 3007 101100 3002 100279 101222 105432 104321 101231 121871 : : 3002 3003 3022 3021 3087 3025 : :
Anderson Query 3 Sparky Michelson Begins Phil Crawford Cindy : : : :
Synchronized Scanning (cont.)

EXPLAIN SELECT * FROM daily_sales ORDER BY 1;
: 3) We do an all-AMPs RETRIEVE step from TFACT.daily_sales by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (TFACT.daily_sales.Item_id). The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 76,685 rows (2,530,605 bytes). The estimated time for this step is 0.09 seconds. :
Understanding Row and Time Estimates

The EXPLAIN facility may express confidence for a retrieve from a table. Some of the phrases used are: . . . with high confidence . . .
Restricting conditions exist on index(es) or column(s) that have collected statistics.
. . . with low confidence . . .

Restricting conditions exist on index(es) having no statistics, but estimates can be based upon a sampling of the index(es). Restricting conditions exist on index(es) or column(s) that have collected statistics but are AND-ed together with conditions on non-indexed columns. Restricting conditions exist on index(es) or column(s) that have collected statistics but are OR-ed together with other conditions.
. . . with no confidence . . .
Conditions outside the above.

The following are confidence phrases for a join:
. . . with index join confidence . . . A join condition via a primary index. . . . with high confidence . . . One input relation has high confidence and the other has high or index join confidence. . . . with low confidence . . . One input relation has low confidence and the other has low, high, or join index confidence. . . . with no confidence . . . One input relation has no confidence. Statistics do not exist for either join field.
Understanding Row and Time Estimates (cont.)

5) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TFACT.D by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 19,642 rows (726,754 bytes). The estimated time for this step is 0.02 seconds. 2) We do an all-AMPs RETRIEVE step from TFACT.J by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 3 by the hash code of (TFACT.J.Job_Code). The size of Spool 3 is estimated with high confidence to be 12,166 rows (450,142 bytes). The estimated time for this step is 0.01 seconds. 6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TFACT.E by way of an all-rows scan with a condition of ("NOT (TFACT.E.Job_Code IS NULL)"). Spool 2 and TFACT.E are joined using a single partition hash_ join, with a join with a join condition of ("TFACT.E.Dept_Number = Dept_Number"). The result goes into Spool 4 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 4 by the hash code of (TFACT.E.Job_Code). The size of Spool 4 is estimated with low confidence to be 26,000 rows (1,690,000 bytes). The estimated time for this step is 0.04 seconds. 7) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of a RowHash match scan, which is joined to Spool 4 (Last Use) by way of a RowHash match scan. Spool 3 and Spool 4 are joined using a merge join, with a join condition of ("Job_Code = Job_Code"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (TFACT.D.Dept_Name, TFACT.E.Last_Name, TFACT.E.First_Name). The size of Spool 1 is estimated with low confidence to be 26,000 rows (3,822,000 bytes). The estimated time for this step is 0.08 seconds. 8) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
Query Cost Estimates

Row estimates: > May be estimated using random samples, statistics or indexes > Are assigned a confidence level - high, low or none > Affect timing estimates - more rows, more time needed Timings: > Used to determine the lowest cost plan > Total cost generated if all processing steps have assigned cost > Not intended to predict wall-clock time, useful for comparisons Miscellaneous Notes: > Estimates too large to display show 3 asterisks (***). > The accuracy of the time estimate depends upon the accuracy of the row estimate.

Low and no confidence may indicate a need to collect statistics on indexes or columns involved in restricting conditions. You may otherwise consider a closer examination of the conditions in the query for possible changes that may improve the confidence. Collecting statistics or altering the conditions has no real impact unless it influences the optimizer to pick a better plan.
EXPLAIN of Create Table

EXPLAIN CREATE TABLE Orders (order_id INTEGER NOT NULL ,order_date DATE FORMAT 'yyyy-mm-dd' ,cust_id INTEGER) UNIQUE PRIMARY INDEX (order_id);
1) First, we lock TFACT.Orders for exclusive use. : 4) We execute the following steps in parallel. 1) We do a single-AMP ABORT test from DBC.ArchiveLoggingObjsTbl by way of the primary index. 2) We do a single-AMP ABORT test from DBC.DBase by way of the unique primary index. 3) We do a single-AMP ABORT test from DBC.TVM by way of the unique primary index. 4) We do an INSERT into DBC.TVFields (no lock required). 5) We do an INSERT into DBC.TVFields (no lock required). 6) We do an INSERT into DBC.TVFields (no lock required). 7) We do an INSERT into DBC.Indexes (no lock required). 8) We do an INSERT into DBC.TVM (no lock required). 9) We INSERT default rights to DBC.AccessRights for TFACT.Orders. 5) We create the table header. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.
Unique Primary INDEX Request (UPI)

EXPLAIN SELECT * FROM Employee WHERE Employee_Number = 1104066; 1) First, we do a single-AMP RETRIEVE step from TFACT.Employee by way of the unique primary index "TFACT.Employee.Employee_Number = 1104066" with no residual conditions. The estimated time for this step is 0.00 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
Simplest and most efficient type of access. Spool is not used.
UPI Request With Residual Condition

EXPLAIN SELECT FROM WHERE AND * Employee Employee_Number = 1104066 Dept_Number = 1404;
1) First, we do a single-AMP RETRIEVE step from TFACT.Employee by way of the unique primary index "TFACT.Employee.Employee_Number = 1104066" with a residual condition of ("TFACT.Employee.Dept_Number = 1404"). The estimated time for this step is 0.00 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
Residual condition does not help the query. No change to plan or time estimate.
Full Table Scan

EXPLAIN SELECT FROM WHERE AND * Employee Emp_Mgr_Number = 104043 Job_Code = 3405;
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("(TFACT.Employee.Emp_Mgr_Number = 104043) AND (TFACT.Employee.Job_Code = 3405)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows (616 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
Aggregations
EXPLAIN SELECT dept_number, SUM(salary_amount) FROM Employee GROUP BY 1;
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.employee. 2) Next, we lock TFACT.employee for read. 3) We do an all-AMPs SUM step to aggregate from TFACT.employee by way of an all-rows scan with no residual conditions, grouping by field1 (TFACT.employee.Dept_Number). Aggregate Intermediate Results are computed globally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 1,403 rows (51,911 bytes). The estimated time for this step is 0.06 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an allrows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1,403 rows (57,523 bytes). The estimated time for this step is 0.02 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.08 seconds.
Optimized INSERT/SELECT
INSERT/SELECT is the process of SELECTing data FROM one table and using it as input to be inserted into another table. Two different optimizations can occur: 1) If the PI of the source AND destination tables are identical, an AMP local operation is used. 2) If the target table is empty, a) Transient Journaling is reduced b) 127 KB block transfers are used
If both conditions are satisfied, both optimizations are used.
Optimized INSERT/SELECT Example

EXPLAIN INSERT INTO Employee_copy SELECT * FROM Employee;
1) First, we lock a distinct TFACT."pseudo table" for write on a RowHash to prevent global deadlock for TFACT.Employee_copy. 2) Next, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 3) We lock TFACT.Employee_copy for write, and we lock TFACT.Employee for read. 4) We do an all-AMPs MERGE into TFACT.Employee_copy from TFACT.Employee. The size is estimated with no confidence to be 25,382 rows. The estimated time for this step is 2.29 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.
INSERT/SELECT With Different PIs

If the target table has a different Primary Index, a standard insert SELECT process must be used. A BYNET operation will be used to relocate the SELECTed rows onto the target AMPs. This will require:
a) Single row inserts (vs. 127 KB blocks) b) Transient journal entries for each row
Non-Optimized INSERT/SELECT Example
CREATE SET TABLE TFACT.Employee (Employee_Number INTEGER, Location_Number INTEGER, : Salary_Amount DECIMAL(10,2)) UNIQUE PRIMARY INDEX ( Employee_Number );
CREATE SET TABLE TFACT.Employee_CharPI (Employee_Number CHAR(10), Location_Number INTEGER, : Salary_Amount DECIMAL(10,2)) UNIQUE PRIMARY INDEX ( Employee_Number );
Non-Optimized INSERT/SELECT Example (cont.)

EXPLAIN INSERT INTO Employee_CharPI SELECT * FROM Employee;
: 4) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by the hash code of (TFACT.Employee.Employee_Number (CHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT '-(10)9')(CHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, NAMED Employee_Number, FORMAT 'X(10)', NULL)) to all AMPs. Then we do a SORT to order Spool 1 by row hash. The size of Spool 1 is estimated with high confidence to be 26,000 rows (1,950,000 bytes). The estimated time for this step is 0.06 seconds. 5) We do an all-AMPs MERGE into TFACT.Employee_CharPI from Spool 1 (Last Use). The size is estimated with high confidence to be 26,000 rows. The estimated time for this step is 1.38 seconds. 6) We spoil the parser's dictionary cache for the table. 7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.
Unexpected Full Table Scan

EXPLAIN SELECT * FROM Employee_CharPI WHERE employee_number = 1104066 ;
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee_CharPI. 2) Next, we lock TFACT.Employee_CharPI for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee_CharPI by way of an all-rows scan with a condition of ("(TFACT.Employee_CharPI.Employee_Number (FLOAT, FORMAT'-9.99999999999999E-999'))= 1.10406600000000E 006") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 2 rows (166 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
Correct use of Primary INDEX

EXPLAIN SELECT * FROM Employee_CharPI WHERE employee_number = '1104066' ;
1) First, we do a single-AMP RETRIEVE step from TFACT.Employee_CharPI by way of the unique primary index "TFACT.Employee_CharPI.Employee_Number = '1104066 '" with no residual conditions. The estimated time for this step is 0.00 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
Explaining Macros
CREATE MACRO Dept_List (dept_no INTEGER) AS (SELECT * FROM Employee WHERE dept_number = :dept_no;); EXPLAIN EXEC Dept_List (1404);
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("TFACT.Employee.Dept_Number = 1404") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 40 rows (3,080 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
Explaining Macros (cont.)

EXPLAIN USING (dept_no INTEGER) EXECUTE Dept_List (:dept_no);
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("TFACT.Employee.Dept_Number = :dept_no") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 19 rows (1,463 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
How to Influence the Optimizer

COLLECTED STATISTICS can help the Optimizer make better decisions using actual row counts and data distribution information.
Collect Statistics on: Non-unique indexes Non-index join columns Primary Index of small tables Collect Statistics considerations: Requires a full table scan Must be kept current May be unnecessary for very large tables
Other Factors To Help Optimizer

Proper index choices at physical design time Add secondary, join, or hash indexes where helpful Use equality-based join conditions Experiment using EXPLAIN
Summary
EXPLAIN is a tool to help you plan query resources Teradata uses a cost-based optimizer Adding Secondary, Join, or Hash Indexes gives optimizer more choices Collecting Statistics allows better plan estimates Most mature optimizer for mixed workload environments in the industry
Joe.Ramon@Teradata.com

Explaining The Explain - Part 1 (Webcast)

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Explaining The Explain - Part 1 (Webcast)

Diunggah oleh

Hak Cipta:

Format Tersedia

Explaining the EXPLAIN Part 1

October 21, 2008

How is EXPLAIN Text Generated?

Information Known to Optimizer

All are taken into account when calculating query cost.

Additional Information Required by the Optimizer

Optimizer Random AMP Samples

Random AMP sampling is controlled via a DBS Control parameter.

Random AMP Sampling

INNER JOIN Department D ON E.Dept_Number = D.Dept_Number INNER JOIN Job J ORDER BY 3, 1, 2;

EXPLAIN Example (cont.)

EXPLAIN Terminology .Pseudo Table Locks.

EXPLAIN Terminology (cont.) . Pseudo Table Locks.

Determine Table ID hash

EXPLAIN Terminology (cont.)

... (Last Use)

... with no residual conditions

... END TRANSACTION

... by way of the sort key in spool field1 (dbname.tablename.colname)

EXPLAIN Terminology (cont.)

EXPLAIN Terminology (cont.)

... we do an ABORT test

... We execute these steps in parallel.

EXPLAIN Terminology (cont.)

... which is duplicated on all AMPs

... (one_amp) or (group_amps) or (all_amps)

... ("NOT (table_name.column_name IS NULL)")

... eliminating duplicate rows ...

EXPLAIN Terminology (cont.)

EXPLAIN Terminology (cont.)

Anderson Query 3 Sparky Michelson Begins Phil Crawford Cindy : : : :

Synchronized Scanning (cont.)

Understanding Row and Time Estimates

. . . with low confidence . . .

Understanding Row and Time Estimates

Understanding Row and Time Estimates (cont.)

Query Cost Estimates

Understanding Row and Time Estimates

EXPLAIN of Create Table

Unique Primary INDEX Request (UPI)

Simplest and most efficient type of access. Spool is not used.

UPI Request With Residual Condition

Full Table Scan

If both conditions are satisfied, both optimizations are used.

Optimized INSERT/SELECT Example

INSERT/SELECT With Different PIs

Non-Optimized INSERT/SELECT Example

Non-Optimized INSERT/SELECT Example (cont.)

Unexpected Full Table Scan

Correct use of Primary INDEX

Explaining Macros (cont.)

How to Influence the Optimizer

Other Factors To Help Optimizer

Anda mungkin juga menyukai