Joe Ramon
Agenda
What is the EXPLAIN facility? Where does the EXPLAIN output come FROM? What does the Optimizer need to build a plan? What does the EXPLAIN terminology mean? What can be learned by reading the EXPLAIN text? What can be done to influence the Optimizer? Summary
What is EXPLAIN?
The EXPLAIN facility provides an "English" translation of the plan the SQL Optimizer develops to service a request. May be used on any SQL statement, except EXPLAIN itself. Look for key words AND phrases Execution time AND row count estimates depend on: > Are statistics collected? Actual execution time depends on: > Is DBS processing other requests? > Is channel or network busy?
SYNTAXER DD Cache RESOLVER SECURITY STATISTICS OPTIMIZER GENERATOR APPLY DISPATCHER EXPLAIN
AMP
By default, Teradata chooses an AMP for random AMP (or dynamic) data
sampling.
Enhancement starting with Teradata 6.0. When statistics are not available, the Optimizer can obtain random
samples from more than one AMP when generating row counts for a query plan.
Any skewed component in the sample skews the demographics. For non-indexed columns without statistics, the optimizer uses fixed
formulas to estimate the number of rows. For example,
Assumes 10% for one column in an equality condition Assumes 7.5% for two columns, each in an equality condition, and
ANDed together
Optimizer Facts
Cost-based Optimizer - looks for lowest cost plan Does not store plan - dynamically regenerates As data demographics change, so may plan Will only assign cost to steps for which there are choices Assigns confidence factors on row estimates Mature, large-table, decision-support optimization
EXPLAIN Example
EXPLAIN SELECT FROM Last_Name, First_Name, Dept_Name, Job_Desc Employee E ON E.Job_code = J.Job_code
AMP
AMP
AMP
AMP
Redistributing data (in SPOOL) in preparation for a join. Teradata 12.0 includes the column name.
Synchronized Scanning
In the case of multiple users that access the same table at the same time, the system can do a synchronized scan (sync scan) on the table.
112747 Query 1766 1 Begins 034982 2212 310229 2231 209181 123881 223431 221015 121332 118314 104631 210110 210001 100076 100045 319116 : : 1235 2433 2500 1019 2281 2100 1279 1201 1205 1011 1012 1219 : : 100766 106363 108222 108221 101433 105200 3001 3005 3100 3001 3007 3101 Frankel Bench Palmer Smith Walton Brooks Woods Walton Ramon Roberts Douglas Morgan Allan John Carson Buster Sam Steve Tiger John Anne Julie Michael Joe
108222 Query 2 3199 101281 Begins 3007 101100 3002 100279 101222 105432 104321 101231 121871 : : 3002 3003 3022 3021 3087 3025 : :
. . . with no confidence . . .
Conditions outside the above.
1) First, we lock TFACT.Orders for exclusive use. : 4) We execute the following steps in parallel. 1) We do a single-AMP ABORT test from DBC.ArchiveLoggingObjsTbl by way of the primary index. 2) We do a single-AMP ABORT test from DBC.DBase by way of the unique primary index. 3) We do a single-AMP ABORT test from DBC.TVM by way of the unique primary index. 4) We do an INSERT into DBC.TVFields (no lock required). 5) We do an INSERT into DBC.TVFields (no lock required). 6) We do an INSERT into DBC.TVFields (no lock required). 7) We do an INSERT into DBC.Indexes (no lock required). 8) We do an INSERT into DBC.TVM (no lock required). 9) We INSERT default rights to DBC.AccessRights for TFACT.Orders. 5) We create the table header. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.
1) First, we do a single-AMP RETRIEVE step from TFACT.Employee by way of the unique primary index "TFACT.Employee.Employee_Number = 1104066" with a residual condition of ("TFACT.Employee.Dept_Number = 1404"). The estimated time for this step is 0.00 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
Residual condition does not help the query. No change to plan or time estimate.
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("(TFACT.Employee.Emp_Mgr_Number = 104043) AND (TFACT.Employee.Job_Code = 3405)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows (616 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
Aggregations
EXPLAIN SELECT dept_number, SUM(salary_amount) FROM Employee GROUP BY 1;
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.employee. 2) Next, we lock TFACT.employee for read. 3) We do an all-AMPs SUM step to aggregate from TFACT.employee by way of an all-rows scan with no residual conditions, grouping by field1 (TFACT.employee.Dept_Number). Aggregate Intermediate Results are computed globally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 1,403 rows (51,911 bytes). The estimated time for this step is 0.06 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an allrows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1,403 rows (57,523 bytes). The estimated time for this step is 0.02 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.08 seconds.
Optimized INSERT/SELECT
INSERT/SELECT is the process of SELECTing data FROM one table and using it as input to be inserted into another table. Two different optimizations can occur: 1) If the PI of the source AND destination tables are identical, an AMP local operation is used. 2) If the target table is empty, a) Transient Journaling is reduced b) 127 KB block transfers are used
1) First, we lock a distinct TFACT."pseudo table" for write on a RowHash to prevent global deadlock for TFACT.Employee_copy. 2) Next, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 3) We lock TFACT.Employee_copy for write, and we lock TFACT.Employee for read. 4) We do an all-AMPs MERGE into TFACT.Employee_copy from TFACT.Employee. The size is estimated with no confidence to be 25,382 rows. The estimated time for this step is 2.29 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.
a) Single row inserts (vs. 127 KB blocks) b) Transient journal entries for each row
CREATE SET TABLE TFACT.Employee (Employee_Number INTEGER, Location_Number INTEGER, : Salary_Amount DECIMAL(10,2)) UNIQUE PRIMARY INDEX ( Employee_Number );
CREATE SET TABLE TFACT.Employee_CharPI (Employee_Number CHAR(10), Location_Number INTEGER, : Salary_Amount DECIMAL(10,2)) UNIQUE PRIMARY INDEX ( Employee_Number );
1) First, we do a single-AMP RETRIEVE step from TFACT.Employee_CharPI by way of the unique primary index "TFACT.Employee_CharPI.Employee_Number = '1104066 '" with no residual conditions. The estimated time for this step is 0.00 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
Explaining Macros
CREATE MACRO Dept_List (dept_no INTEGER) AS (SELECT * FROM Employee WHERE dept_number = :dept_no;); EXPLAIN EXEC Dept_List (1404);
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("TFACT.Employee.Dept_Number = 1404") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 40 rows (3,080 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
1) First, we lock a distinct TFACT."pseudo table" for read on a RowHash to prevent global deadlock for TFACT.Employee. 2) Next, we lock TFACT.Employee for read. 3) We do an all-AMPs RETRIEVE step from TFACT.Employee by way of an allrows scan with a condition of ("TFACT.Employee.Dept_Number = :dept_no") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 19 rows (1,463 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
Collect Statistics on: Non-unique indexes Non-index join columns Primary Index of small tables Collect Statistics considerations: Requires a full table scan Must be kept current May be unnecessary for very large tables
Summary
EXPLAIN is a tool to help you plan query resources Teradata uses a cost-based optimizer Adding Secondary, Join, or Hash Indexes gives optimizer more choices Collecting Statistics allows better plan estimates Most mature optimizer for mixed workload environments in the industry
Joe.Ramon@Teradata.com