0 Overview
Rich Charucki Director Product Management Richard.Charucki@teradata.com
Enterprise Fit
Ease of Use
Enterprise Fit
Ease of Use
ML-PPI - Concepts
Multi-level partitioning allows each partition at a level to be sub-partitioned. Each partitioning level is independently defined using a RANGE_N or CASE_N expression. Internally, these range partitions are combined into a single partitioning expression that defines how the data is partitioned on the AMP. If PARTITION BY is specified the table is called a partitioned primary index (PPI) table.
ML-PPI - Concepts
If only one partitioning expression is specified, that PPI is called a single-level partitioned primary index (or single-level PPI). If more than one partitioning expression is specified, that PPI is called a multi-level partitioned primary index (or multi-level PPI). For PPI tables, the rows continue to be distributed across the AMPs in the same fashion, but on each AMP the rows are ordered first by partition number and then within each partition by hash. Teradata combines multiple where predicates that result in partition elimination. In a ML-PPI table, any single partition or any number or combination of the partitions may be referenced and used for partition elimination.
Input to Partition Function L1 1 1 2 1 L2 1 1 2 1 L3 1 2 3 4 salesdate 2003-04-15 2003-07-06 2004-11-09 2003-12-24 storeid 96 71 175 82
Sales productid 10 184 241 363 total revenue 4158 1972 3055 1261 totalsold 42 68 47 13 note Good day Marginal Slow day Promotion
2004-11-09
175
241
10
No Partitioning
Multi-Level Partitioning
ML-PPI - Example
An insurance company often performs analysis for a specific state and within a date range that is a small percentage of the many years of claims history in their data warehouse. Partition elimination using multiple expressions for filtering based on WHERE clause predicates would benefit performance. If analysis is being performed for Connecticut claims, claims in June 2005, or Connecticut claims in June 2005, a partitioning of the data that allows elimination of all but the desired claims has an extreme performance advantage. It should be noted that ML-PPI provides direct access to partitions regardless of the number of levels specified in the query assuring partition elimination and enhancing query performance.
11
ML-PPI - Example
CREATE TABLE claims (claim_id INTEGER NOT NULL, claim_date DATE NOT NULL, state_id BYTEINT NOT NULL, claim_info VARCHAR(20000) NOT NULL) PRIMARY INDEX (claim_id) PARTITION BY ( /* Level one partitioning expression. */ RANGE_N( claim_date BETWEEN DATE '1999-01-01' AND DATE '2005-12-31' EACH INTERVAL '1' MONTH), /* Level two partitioning expression. */ RANGE_N( state_id BETWEEN 1 AND 75 EACH 1));
12
ML-PPI - Example
Eliminating all but one month out of their many years of claims history would facilitate scanning of less than 5% of the claims history for satisfying the following query: SELECT * FROM claims WHERE claim_date BETWEEN DATE '2005-06-01' AND DATE '2005-06-30';
13
ML-PPI - Example
Similarly, eliminating all but the Connecticut claims out of the many states where the insurance company does business would facilitate scanning of less than 5% of the claims history for satisfying the following query:
SELECT * FROM claims, states WHERE claims.state_id = states.state_id AND states.state 'Connecticut';
14
ML-PPI - Example
Combining both of these predicates for partition elimination would facilitate scanning less than 0.5% of the claims history for satisfying the following query: SELECT * FROM claims, states WHERE claims.state_id = states.state_id AND states.state = 'Connecticut' AND claim_date BETWEEN DATE '2005-06-01' AND DATE '2005-06-30';
15
ML-PPI - Rules
Existing limits and restrictions for partitioned primary indexes also apply to a multi-level partitioned primary index with the following additions:
If more than one partitioning expression is specified in the PARTITION BY clause, each such partitioning expression must consist solely of either a RANGE_N or CASE_N function If more than one partitioning expression is specified in the PARTITION BY clause, the product of the number of partitions defined by each partitioning expression must not exceed 65535 and each partitioning expression must define at least two partitions The maximum number of partitioning expressions is 15 A partitioning expression must not contain the system-derived columns PARTITION#L1 through PARTITION#L15
16
In a ML-PPI scheme, defined partitions are hierarchical in nature Query performance is still optimized through partition elimination even when only one-level of an ML-PPI scheme is specified Can only ADD partitions to the First-Level of ML-PPI scheme First-order of Partitioning should be the level that potentially may change the most
17
OCES-3
Description
Implement the next level of enhancements for the Optimizer Cost Estimation Sub-System such as:
New costing methods More accurate row and spool estimates Expanded statistical information
Goal is to improve the accuracy of costing the various operations within a query plan. Major improvements to accuracy of query planning will result in overall query performance improvement and reduction of query rework efforts. Accurate plans feed more accurate and more granular workload management via workload categorizations and filters. Potentially, some queries can have performance regressions and in most cases, these will be considered as defects when the performance impact is larger than the standard 5% of margin error for performance testing
Benefit
Considerations
18
An expansion, re-interpretation and propagation of collected statistics Improve the accuracy of estimates that result from applying selection criteria on single-tables Detection and adjustment of stale statistics by comparing against random AMP sampling Join index statistics, check constraints, referential integrity constraints (hard or soft), all can supplement base table statistics
Many other minor costing enhancements Better skew detection during join planning Editing cost of result set Nested joins and bit map index costing
19
Previously, base table statistics were re-used for all steps New derived statistics allow for more accurate costing for multi-step plans Information about skew can now be applied to spool files
Across the session: Derived Statistics can be propagated to global or volatile tables Session-level derived statistics are held in memory across multiple requests Similar information as in the statistics histogram Used by standard insert/select operations
20
This Enhanced Query Rewrite Feature (QRW) is referred to as the process of re-writing a query Q to Q such that both queries are semantically equivalent (produce the same answer set) and that the Q query (after rewrite) runs faster than the original query Q. Join elimination, view folding, transitive closure, predicate move around and join index usage are examples of QRW techniques. Architecture re-organization and code cleanup. QRW will be a separate subsystem called directly by the parser as opposed to being driven by the Resolver. Functional enhancements of the existing rewrite. This part mainly addresses enhancing the logic of view folding to include a more general class of views that are involved with outer joins. The addition of a new rewrite is added that pushes projections into views and could actually trigger other rewrites. Query Rewrite (QRW) requires no user intervention and is completely done by the Optimizer. Some queries will run faster with these optimizations and query explain plans may change because of the extra conditions added or joins that have been eliminated.
Benefit
Considerations
21
Eliminates columns in a view definitions SELECT list if the columns are not referenced by the query itself.
SELECT MAX(total) Max_Sale FROM Sales_By_Product;
CREATE VIEW Sales_By_Product AS SELECT Product_Key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key GROUP BY Product_key, Product_name;
Select Max(Total) Max_Sale From (Select Sum (Quantity * Amount) Total From Sales, Product Where Sales_Product_Key = Product_Key Group By Product_Key, Product_Name) Sales_By_Product;
Projection Pushdown rewrite will offer a performance gain and a reduction in spool consumption by only spooling the columns that are necessary to support the query. New: Cases where the View or Derived Table must be spooled have these optimizations applied. 22
Provides the capability to rewrite certain queries such that WHERE predicates that are stated outside a view or derived table, can be pushed inside a view or derived table and applied directly as part of the query execution.
SELECT MAX (Total) Total FROM (SELECT Product_key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key AND Product_Key IN (10, 20, 30) GROUP BY Product_Key, Product_Name) V
SELECT MAX (Total) Total FROM (SELECT Product_Key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key GROUP BY Product_Key, Product_Name) V WHERE Product_Key IN (10, 20, 30);
QRW provides for diminished spool usage and performance gain through the application of WHERE predicates directly inside a view or derived table. New: Cases where the View or Derived Table must be spooled 23 have these optimizations applied.
Enhanced Query Rewrite Capability Pushing Joins Into UNION ALL Views
Pushing Joins Into UNION ALL Views
A cost-based rewrite that allows certain foreign-key primary-key (FK-PK) joins to be applied before UNION ALL.
SELECT SUM(Quantity * Amount) Total FROM Jan_Feb_Sales, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%';
CREATE VIEW Jan_Feb_Sales AS SELECT * FROM Sales1 UNION ALL SELECT * FROM Sales2;
SELECT SUM(Quantity * Amount) Total FROM (SELECT Quantity, Amount FROM Sales1, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%' UNION ALL SELECT Quantity, Amount FROM Sales2, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%' ) Jan_Feb_Sales ;
24
QRW provides for diminished spool usage and performance gain through the application of Joins and WHERE predicates at applicable points within the UNION ALL query.
25
26
PI Random Samples
PI Random Samples
Normalization heuristics
Join planner
Join planner
27
e = h + (h - l ) vv-'1
Added number of unique values = v (Estimated from Distinct value extrapolation)
Extrapolated boundary
28
07/17/07
07/23/07
Extrapolated # of distinct values = 20
Statistics Collected (Average values per day = 1 million for 200 days)
Current behavior: provides an estimate of approximately 3 million rows based on collected statistics New behavior: provides an estimate of approximately 7 million rows based on current and extrapolated statistics
29
07/21/07
07/25/07
Statistics Collected
(Average values per day = 1 million)
Extrapolated boundary
(08/08/07)
Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 5 million rows based on extrapolated statistics
30
08/06/07
08/11/07
Statistics Collected
(Average values per day = 1 million)
Extrapolated boundary
(08/08/07)
Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 3 million rows based on extrapolated statistics
31
07/16/07
No End Date
Statistics Collected
(Average values per day = 1 million)
Extrapolated boundary
(08/08/07)
Current behavior: provides an estimate of approximately 4 million rows based on collected statistics New behavior: provides an estimate of approximately 24 million rows based on extrapolated statistics
32
08/04/07
No End Date
Statistics Collected
(Average values per day = 1 million)
Extrapolated boundary
(08/08/07)
Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 5 million rows based on extrapolated statistics
33
09/01/07
No End Date
Statistics Collected
(Average values per day = 1 million)
Extrapolated date
(08/08/07)
Current and new behavior remains the same: Provides one row as the estimate (zero is rounded up to one)
34
05/30/07
No End Date
Statistics Collected
(Average rows per day = 1 million until 05/31/07 when rows become very sparse)
Extrapolated boundary
(06/06/07)
Current behavior: provides an estimate of approximately 2 million+ rows based on collected statistics New behavior: provides an estimate of approximately 8 million+ rows based on extrapolated statistics
35
Enterprise Fit
Ease of Use
36
37
Online Archive
Description Online archive allows the archival of a running database; that is, a database can be archived in conjunction with concurrently executing update transactions for the tables in the database. Transactional consistency is maintained by tracking any changes to a table in a log such that changes applied to the table during the archive can be rolled back to the transactional consistency point after the restore. Benefit Online archive removes the requirement of having a window where updates must be held up while backup procedures are executed. Additionally, object locking will be eased and the full-performance impact of permanent journals will be removed. Considerations Online archive will be integrated into the Open Teradata Backup (OTB) suite of products associated with this release.
38
39
40
If an optional name is not supplied, the error table name will default to ET_<data table name> If the data table name is longer than 27 characters, it will be truncated at the end. No warning will be returned. If <error table> is not specified, or if <error table> is specified without an explicit database name, then the error table will be created in the current default database for the session An error table may be created for a data table with a maximum of 2,048 columns In addition to the data table contents, the error table will house 18 additional error related columns COMMENTs on columns in the data table will not be carried over to the error table. However, COMMENTs may be added to the error table columns if desired. Access rights required for CREATE ERROR TABLE statements would be the same as those for CREATE TABLE statements
41
A LOGGING ERRORS option has been added to existing SQL syntaxes for INSERT-SELECT and MERGE-INTO statements This option permits users to specify the kinds of errors that can be logged Errors will be classified into 2 categories: Local and Non-local
Local errors are defined as errors that occur on the same AMP
The LOGGING ERRORS options is applicable to both ANSI and Teradata modes.
42
Local Errors are comprised of the following: Duplicate row (ANSI mode only):
INSERT-SELECT into a SET table is ignored in Teradata mode
Duplicate Primary Key Errors CHECK constraint violations LOB non-pad data truncation errors Data conversion errors that occur during data row inserts
Logging Non-Local Errors: Non-Local Errors are comprised of the following: Referential Integrity violations Unique Secondary Index Violations
43
44
Enhance the Merge-Into SQL capability to support full ANSI functionality. This feature will allow the database to perform a true bulk UPSERT operation with a standard SQL statement. Additionally, this enhancement also provides for the non-ANSI extensions to support additional error-handling capabilities. The new SQL Merge functionality lifts the current restriction of only supporting single-row merges and will allow multiple table rows to be processed in this fashion. Bulk UPSERT processing capability will no longer be limited to the Multiload utility and the extended error-handling capabilities will allow native SQL to become usable in given load strategy scenarios while at the same time overcoming current utility restrictions regarding unique indexes, join indexes and triggers resident on target tables. Strong consideration should be given to re-evaluating current batch load/ETL processes to take advantage of full ANSI Merge-Into SQL capability for load operations that are not currently considered due to current limitations/restrictions
Benefit
Considerations
45
46
47
FastLoad= 14 sec 100% INSERT MLoad Total FastLoad + Merge Elapsed Time Improvement 296 sec 295=14 + 281 50% INS, 50% UPD 361sec 294=14 + 280 19% 100 % UPD 342 sec 138=14 + 124 60%
48
Enterprise Fit
Ease of Use
49
50
51
Enterprise Fit
Ease of Use
52
53
54
55
Enterprise Fit
Ease of Use
56
57
58
SET QUERY_BAND = Document=XY1234;Universe=East; FOR TRANSACTION; SET QUERY_BAND = NONE FOR SESSION;
Note:
Partner tools and Teradata applications are planning roadmaps to utilize/generate the Query Banding. Customers need to have Query Banding added to their applications.
59
Extends active workload management to automatically detect, notify, and act on planned and unplanned system and enterprise events. TASM then automatically implements a new set of workload management rules, a working value set (WVS), specific to the detected events and resulting system condition. Ability to automatically adjust workload management rules when the system enters or exits a degraded mode to ensure critical system work continues to get priority for resources. This also allows for application situational events to be considered beyond just the prior date/time operating environments. For instance workload management rule changes based on actual batch reports completing versus an approximate completion time. Involves creation and configuration of a TASM 2-dimensional State Matrix in TDWM aligning operating environments with system conditions.
Benefits
Considerations
60
Begin/End batch processing Begin/End key workload (end of month processing) Dual System offline/online
61
62
63
Node Failure
64
Workload automatically changes and low priority session work reduced from 20 to 3 concurrent sessions
65
66
67
68
Parsing Engine CPU time High and low AMP byte counts for spool Normalized CPU data for co-existence systems Cost estimates (CPU, I/O, network, heuristics) Estimate processing time and row counts Additional Utility related information
69
70
Current depth of the AMP work mailbox. Specifies if an AMP is in flow control. Number of times during the log period that the system entered the flow control state. Maximum # of AWTs in use at any one time. Current # of AWTs in use during the log period for each work type for the VprId vproc. Maximum # of AWTs in use at one time during the log period for each work type for the VprId vproc.
71
AGId RelWgt CPUTime IOBlks NumProcs NumSets NumRequests QWaitTime QWaitTimeMax QLength QLengthMax ServiceTime ServiceTimeMax
Identifies current Allocation Group for Perf Group ID Relative weight of the Allocation Group. Milliseconds of CPU time consumed by assoc. task. # of logical data blocks read and/or written by PG. # of processes assigned to the PG. Allocation Group set division type. # of requests for the AMP Worker Tasks. Time that work requests waited on an input queue before being serviced. Maximum time that work requests waited. # of work requests waiting on the input queue. Max # of work requests waiting on input queue. Time that work requests required for service. Max time that work requests required for service.
72
73
Provide the functionality that allows Stored Procedures to build and use answer sets as a result of its execution. Extending the Stored Procedure capability greatly simplifies application development against the Teradata database and provides a long awaited capability. Currently, without the Stored procedure Result Set capability, temporary tables need to be created and used to store answer sets and follow the Stored Procedure CALL with a SELECT statement. Strong consideration should be given to removing these intermediate steps from current applications.
Benefits
Considerations
74
Extend the current External Stored Procedure (XSP) capability to provide an interface that allows an XSP to invoke and use SQL in the current session. This feature will foster greater application development and enhance the ability of a client application to access and use the Teradata database directly. Initial primary development focus will be to use CLIv2 to facilitate and allow and XSP to submit SQL to the Teradata database.
Benefits
Considerations
75
Performance Improvement
20% to 60% over MultiLoad end to end. 10% to 30% on queries that QRW can optimize. 20% time-savings improvement over non-online. Up to 50% on Level Two Secondary Index Check checking on large tables (> 20M rows). Up to 30% on queries that can take advantage of partition elimination, e.g. multidimensional queries. Up to 30% on queries that the new algorithm determines to cache. 5% improvement in plans generated by new OCES vs old OCES (costing parameters).
CheckTable
76
Active Enable
Online Archive Replication Scalability Restartable Scandisk Bulk SQL error logging tables Full ANSI Merge-Into SQL capability CheckTable utility performance enhancements Table Function without Join Back
Increase statistics intervals Extrapolate statistics outside range (e.g. DATE) Collect stats for multi-column NULL values Collect AMP Level statistics values Projection Pushdown Push Joins into UNION ALL Views
Parameterized statement caching improvements Hash bucket expansion Cost, Quality, Multi-level Partitioned Primary & Supportability Index (PPI) Windowed Aggregate Dispatcher Fault Isolation Functions Compression on Soft/Batch Referential Integrity Columns Additional EXPLAIN plan details
Ease of Use
Enterprise Fit
TASM enhancements:
Query Banding Traffic Cop Enhancements Global/Multiple Exceptions Provide for Open API SQL capability for TDWM Dynamic load utility management
Data collection: DBQL, Resusage Index wizard support for PPI SQL invocation via External Stored Procedures Stored Procedure result sets Dynamic Result Row Specification on Table Functions Normalized AMPusage View for coexistence
JAVA SPs (with JDBC) (Linux and Windows) Cursor positioning for multi-statement requests UNICODE support for password control and encryption Custom password Dictionary support New password encryption algorithm Restore/Copy Dictionary Phase Restore/Copy to Different Configuration Data Phase UNIX/Kerberos Authentication for Windows Clients
77
Questions.....
Richard.Charucki@teradata.com
78