Anda di halaman 1dari 78

Teradata Database 12.

0 Overview
Rich Charucki Director Product Management Richard.Charucki@teradata.com

Teradata Strategic Dimensions


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

Teradata Database 12.0 Features for


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

Teradata Database 12.0 Features for... Performance


Multi-Level Partitioned Primary Index OCES-3 Enhanced Query Rewrite Capability Extrapolate Statistics Outside of Range Parameterized Statement Caching Improvements Increase Statistics Intervals Collect Statistics for Multi-Column NULL Values Collect AMP Level Statistics Values Parameterized Statement Caching Improvements Windowed Aggregate Functions 4 Hash Bucket Expansion

Multi-Level Partitioned Primary Index


Description Extend the existing Partitioned Primary Index (PPI) capability to support and allow for the creation of a table or non-compressed join index with a Multi-Level Partitioned Primary Index (ML-PPI). Benefit The use of ML-PPI on table(s) affords a greater opportunity for the Teradata Optimizer to achieve a greater degree of partition elimination at a more granular level which in turn results in achieving a greater level of query performance. Considerations The Teradata Optimizer determines whether or not the Index and partitioning is usable as part of the best-cost query planning process and will engage the use of the Index as part of the plan to execute a given query automatically.

ML-PPI - Concepts
Multi-level partitioning allows each partition at a level to be sub-partitioned. Each partitioning level is independently defined using a RANGE_N or CASE_N expression. Internally, these range partitions are combined into a single partitioning expression that defines how the data is partitioned on the AMP. If PARTITION BY is specified the table is called a partitioned primary index (PPI) table.

ML-PPI - Concepts
If only one partitioning expression is specified, that PPI is called a single-level partitioned primary index (or single-level PPI). If more than one partitioning expression is specified, that PPI is called a multi-level partitioned primary index (or multi-level PPI). For PPI tables, the rows continue to be distributed across the AMPs in the same fashion, but on each AMP the rows are ordered first by partition number and then within each partition by hash. Teradata combines multiple where predicates that result in partition elimination. In a ML-PPI table, any single partition or any number or combination of the partitions may be referenced and used for partition elimination.

ML-PPI AMP-Level Row Grouping


CREATE TABLE Sales (storeid INTEGER NOT NULL, productid INTEGER NOT NULL, salesdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, totalrevenue DECIMAL(13,2), totalsold INTEGER, note VARCHAR(256)) UNIQUE PRIMARY INDEX (storeid, productid, salesdate) PARTITION BY ( RANGE_N(salesdate BETWEEN DATE '2003-01-01' AND DATE '2005-12-31' EACH INTERVAL EACH INTERVAL '1' YEAR), '1' YEAR), RANGE_N(storeid BETWEEN 1 AND 300 EACH 100), RANGE_N(productid BETWEEN 1 AND 400 EACH 100));

Input to Partition Function L1 1 1 2 1 L2 1 1 2 1 L3 1 2 3 4 salesdate 2003-04-15 2003-07-06 2004-11-09 2003-12-24 storeid 96 71 175 82

Sales productid 10 184 241 363 total revenue 4158 1972 3055 1261 totalsold 42 68 47 13 note Good day Marginal Slow day Promotion

ML-PPI: Partitioning Visual Graphic


Level 1: SalesDate Level 2: Storeid 82 363 2003-12-24 Level 3: Productid

2004-11-09

175

241

ML-PPI Partition Scanning Visual Example


Data Scanned: Last Month vs. Same Month LY for one Region out of Four:

10

No Partitioning

Single Level Partitioning

Multi-Level Partitioning

ML-PPI - Example
An insurance company often performs analysis for a specific state and within a date range that is a small percentage of the many years of claims history in their data warehouse. Partition elimination using multiple expressions for filtering based on WHERE clause predicates would benefit performance. If analysis is being performed for Connecticut claims, claims in June 2005, or Connecticut claims in June 2005, a partitioning of the data that allows elimination of all but the desired claims has an extreme performance advantage. It should be noted that ML-PPI provides direct access to partitions regardless of the number of levels specified in the query assuring partition elimination and enhancing query performance.

11

ML-PPI - Example
CREATE TABLE claims (claim_id INTEGER NOT NULL, claim_date DATE NOT NULL, state_id BYTEINT NOT NULL, claim_info VARCHAR(20000) NOT NULL) PRIMARY INDEX (claim_id) PARTITION BY ( /* Level one partitioning expression. */ RANGE_N( claim_date BETWEEN DATE '1999-01-01' AND DATE '2005-12-31' EACH INTERVAL '1' MONTH), /* Level two partitioning expression. */ RANGE_N( state_id BETWEEN 1 AND 75 EACH 1));

12

ML-PPI - Example
Eliminating all but one month out of their many years of claims history would facilitate scanning of less than 5% of the claims history for satisfying the following query: SELECT * FROM claims WHERE claim_date BETWEEN DATE '2005-06-01' AND DATE '2005-06-30';

13

ML-PPI - Example
Similarly, eliminating all but the Connecticut claims out of the many states where the insurance company does business would facilitate scanning of less than 5% of the claims history for satisfying the following query:

SELECT * FROM claims, states WHERE claims.state_id = states.state_id AND states.state 'Connecticut';

14

ML-PPI - Example
Combining both of these predicates for partition elimination would facilitate scanning less than 0.5% of the claims history for satisfying the following query: SELECT * FROM claims, states WHERE claims.state_id = states.state_id AND states.state = 'Connecticut' AND claim_date BETWEEN DATE '2005-06-01' AND DATE '2005-06-30';

15

ML-PPI - Rules
Existing limits and restrictions for partitioned primary indexes also apply to a multi-level partitioned primary index with the following additions:

If more than one partitioning expression is specified in the PARTITION BY clause, each such partitioning expression must consist solely of either a RANGE_N or CASE_N function If more than one partitioning expression is specified in the PARTITION BY clause, the product of the number of partitions defined by each partitioning expression must not exceed 65535 and each partitioning expression must define at least two partitions The maximum number of partitioning expressions is 15 A partitioning expression must not contain the system-derived columns PARTITION#L1 through PARTITION#L15

16

ML-PPI: Partition Ordering


Partition Ordering:

In a ML-PPI scheme, defined partitions are hierarchical in nature Query performance is still optimized through partition elimination even when only one-level of an ML-PPI scheme is specified Can only ADD partitions to the First-Level of ML-PPI scheme First-order of Partitioning should be the level that potentially may change the most

17

OCES-3
Description

Implement the next level of enhancements for the Optimizer Cost Estimation Sub-System such as:
New costing methods More accurate row and spool estimates Expanded statistical information

Goal is to improve the accuracy of costing the various operations within a query plan. Major improvements to accuracy of query planning will result in overall query performance improvement and reduction of query rework efforts. Accurate plans feed more accurate and more granular workload management via workload categorizations and filters. Potentially, some queries can have performance regressions and in most cases, these will be considered as defects when the performance impact is larger than the standard 5% of margin error for performance testing

Benefit

Considerations

18

OCES-3: Main Categories of Enhancements


Derived Statistics

An expansion, re-interpretation and propagation of collected statistics Improve the accuracy of estimates that result from applying selection criteria on single-tables Detection and adjustment of stale statistics by comparing against random AMP sampling Join index statistics, check constraints, referential integrity constraints (hard or soft), all can supplement base table statistics

Improved Single Table Estimates Handling of Stale Statistics

Consolidation of multiple sources of information

Many other minor costing enhancements Better skew detection during join planning Editing cost of result set Nested joins and bit map index costing

19

OCES-3: Derived Statistics are Central to Enabling Other OCES Enhancements


Within the query: Statistics are re-assessed and adjusted after each logical operation (selection, join,
aggregation)

Previously, base table statistics were re-used for all steps New derived statistics allow for more accurate costing for multi-step plans Information about skew can now be applied to spool files

Across the session: Derived Statistics can be propagated to global or volatile tables Session-level derived statistics are held in memory across multiple requests Similar information as in the statistics histogram Used by standard insert/select operations

20

Enhanced Query Rewrite Capability


Description

This Enhanced Query Rewrite Feature (QRW) is referred to as the process of re-writing a query Q to Q such that both queries are semantically equivalent (produce the same answer set) and that the Q query (after rewrite) runs faster than the original query Q. Join elimination, view folding, transitive closure, predicate move around and join index usage are examples of QRW techniques. Architecture re-organization and code cleanup. QRW will be a separate subsystem called directly by the parser as opposed to being driven by the Resolver. Functional enhancements of the existing rewrite. This part mainly addresses enhancing the logic of view folding to include a more general class of views that are involved with outer joins. The addition of a new rewrite is added that pushes projections into views and could actually trigger other rewrites. Query Rewrite (QRW) requires no user intervention and is completely done by the Optimizer. Some queries will run faster with these optimizations and query explain plans may change because of the extra conditions added or joins that have been eliminated.

Benefit

Considerations

21

Enhanced Query Rewrite Capability Projection Pushdown


Projection Pushdown

Eliminates columns in a view definitions SELECT list if the columns are not referenced by the query itself.
SELECT MAX(total) Max_Sale FROM Sales_By_Product;

CREATE VIEW Sales_By_Product AS SELECT Product_Key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key GROUP BY Product_key, Product_name;

Select Max(Total) Max_Sale From (Select Sum (Quantity * Amount) Total From Sales, Product Where Sales_Product_Key = Product_Key Group By Product_Key, Product_Name) Sales_By_Product;

Projection Pushdown rewrite will offer a performance gain and a reduction in spool consumption by only spooling the columns that are necessary to support the query. New: Cases where the View or Derived Table must be spooled have these optimizations applied. 22

Enhanced Query Rewrite Capability Predicate Pushdown


Predicate Pushdown

Provides the capability to rewrite certain queries such that WHERE predicates that are stated outside a view or derived table, can be pushed inside a view or derived table and applied directly as part of the query execution.
SELECT MAX (Total) Total FROM (SELECT Product_key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key AND Product_Key IN (10, 20, 30) GROUP BY Product_Key, Product_Name) V

SELECT MAX (Total) Total FROM (SELECT Product_Key, Product_Name, SUM(Quantity * Amount) Total FROM Sales, Product WHERE Sales_Product_Key = Product_Key GROUP BY Product_Key, Product_Name) V WHERE Product_Key IN (10, 20, 30);

QRW provides for diminished spool usage and performance gain through the application of WHERE predicates directly inside a view or derived table. New: Cases where the View or Derived Table must be spooled 23 have these optimizations applied.

Enhanced Query Rewrite Capability Pushing Joins Into UNION ALL Views
Pushing Joins Into UNION ALL Views

A cost-based rewrite that allows certain foreign-key primary-key (FK-PK) joins to be applied before UNION ALL.
SELECT SUM(Quantity * Amount) Total FROM Jan_Feb_Sales, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%';

CREATE VIEW Jan_Feb_Sales AS SELECT * FROM Sales1 UNION ALL SELECT * FROM Sales2;

SELECT SUM(Quantity * Amount) Total FROM (SELECT Quantity, Amount FROM Sales1, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%' UNION ALL SELECT Quantity, Amount FROM Sales2, Product WHERE Sales_Product_Key = Product_Key AND Product_Name LIKE Gourmet%' ) Jan_Feb_Sales ;

24

QRW provides for diminished spool usage and performance gain through the application of Joins and WHERE predicates at applicable points within the UNION ALL query.

Extrapolate Statistics Outside of Range


Description Enhance the Teradata Optimizer to include a new extrapolation technique specifically designed to more accurately provide for a statistical estimate for date range-based queries that specify a future date that is outside the bounds of the statistics that have been collected for the column. Benefit The Optimizer extrapolation technique for date range-based queries that supply future dates will result in better query plans due to the fact that cardinality estimation will be much more accurate. Because of the new extrapolation formula it is also possible that statistics for the associated date columns would not have to be re-collected as often. Considerations Extrapolation for date range-based queries will not change the procedure for dropping or collecting statistics nor will the help statistics features be affected. However, the information displayed within a query Explain plan will change because of the new numbers for estimated rows. Specific consideration should be given to collecting statistics less frequently on columns which will now be extraoplated.

25

Stale Statistics Detection


Currently, the table row count is estimated from the random AMP sampling or the statistics on the primary index (PI) of the table. If the primary index has statistics collected, they are always trusted and used by ignoring the random AMP samples. In TD 12.0, instead of always trusting Primary Index histogram row count, the row count from Random AMP Sampling and the histogram are compared and a decision is made based on certain normalization heuristics. The histogram row count is compared with the table row count and if the deviation is more than the defined threshold, the histogram is determined as stale. Stale histograms are specially tagged in the optimizer and value count/row extrapolations are done when they are used for cardinality estimations. Stale Statistics Detection also applies to tables that have zero statistics as well and allows for table row count extrapolation.

26

Stale Statistics Detection


Prior to TD 12.0 TD 12.0
Row count from PI Histogram

PI Random Samples

PI Random Samples

Row count from PI Histogram

Used only when no stats on PI

Normalization heuristics

Table demo Table demo

Join planner

Join planner

27

Extrapolate Statistics Outside of Range


Assuming RPV (rows per value) is the same across the data table.
Extrapolated boundary:

e = h + (h - l ) vv-'1
Added number of unique values = v (Estimated from Distinct value extrapolation)

Data Table (Days) l Statistics Collected (Number of unique values = v) h e

Extrapolated boundary

28

Extrapolate Statistics Outside of Range Closed Range Query - Example I


Query: select * from ordertbl where O_ORDERDATE between 07-17-2007 and 07-23-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

07/17/07

07/23/07
Extrapolated # of distinct values = 20

Statistics Collected (Average values per day = 1 million for 200 days)

Extrapolated boundary = 08/08/07 20 @ '2007 - 07 - 19'+('2007 - 07 - 19'-'2007 - 01 - 01' ) 200

Current behavior: provides an estimate of approximately 3 million rows based on collected statistics New behavior: provides an estimate of approximately 7 million rows based on current and extrapolated statistics

29

Extrapolate Statistics Outside of Range Closed Range Query - Example II


Query: select * from ordertbl where O_ORDERDATE between 07-21-2007 and 07-25-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

07/21/07

07/25/07

(Estimated number of rows = 5 million)

Statistics Collected
(Average values per day = 1 million)

Extrapolated boundary
(08/08/07)

Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 5 million rows based on extrapolated statistics

30

Extrapolate Statistics Outside of Range Closed Range Query - Example III


Query: select * from ordertbl where O_ORDERDATE between 08-06-2007 and 08-11-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

08/06/07

08/11/07

(Estimated number of rows = 3 million)

Statistics Collected
(Average values per day = 1 million)

Extrapolated boundary
(08/08/07)

Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 3 million rows based on extrapolated statistics

31

Extrapolate Statistics Outside of Range Open Range Query - Example I


Query: select * from ordertbl where O_ORDERDATE >= 07-16-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

07/16/07

No End Date

(Estimated number of rows = 24 million)

Statistics Collected
(Average values per day = 1 million)

Extrapolated boundary
(08/08/07)

Current behavior: provides an estimate of approximately 4 million rows based on collected statistics New behavior: provides an estimate of approximately 24 million rows based on extrapolated statistics

32

Extrapolate Statistics Outside of Range Open Range Query - Example II


Query: select * from ordertbl where O_ORDERDATE >= 08-04-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

08/04/07

No End Date

(Estimated number of rows = 5 million)

Statistics Collected
(Average values per day = 1 million)

Extrapolated boundary
(08/08/07)

Current behavior: provides 1 row as the estimate and assumes statistics are correct New behavior: provides an estimate of approximately 5 million rows based on extrapolated statistics

33

Extrapolate Statistics Outside of Range Open Range Query - Example III


Query: select * from ordertbl where O_ORDERDATE >= 09-01-2007
Query Date Range Data Table (Days)
01/01/07 07/19/07

09/01/07

No End Date

(Est. number of rows = Zero)

Statistics Collected
(Average values per day = 1 million)

Extrapolated date
(08/08/07)

Current and new behavior remains the same: Provides one row as the estimate (zero is rounded up to one)

34

Extrapolate Statistics Outside of Range For Column With HIGH DATE


Query: select * from ordertbl where O_ORDERDATE >= 05-30-2007
Query Date Range Data Table (Days)
01/01/07 05/31/07 01/01/2147

05/30/07

No End Date

(Estimated number of rows = 8 million)

Statistics Collected
(Average rows per day = 1 million until 05/31/07 when rows become very sparse)

Extrapolated boundary
(06/06/07)

Current behavior: provides an estimate of approximately 2 million+ rows based on collected statistics New behavior: provides an estimate of approximately 8 million+ rows based on extrapolated statistics

35

Teradata Database 12.0 Features for


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

36

Teradata Database 12.0 Features For...Active Enabled


Online Archive Bulk SQL Error Logging Tables Full ANSI Merge-Into Capability Replication Scalability Restartable Scandisk Checktable Utility Performance Table Functions Without Join-Back

37

Online Archive
Description Online archive allows the archival of a running database; that is, a database can be archived in conjunction with concurrently executing update transactions for the tables in the database. Transactional consistency is maintained by tracking any changes to a table in a log such that changes applied to the table during the archive can be rolled back to the transactional consistency point after the restore. Benefit Online archive removes the requirement of having a window where updates must be held up while backup procedures are executed. Additionally, object locking will be eased and the full-performance impact of permanent journals will be removed. Considerations Online archive will be integrated into the Open Teradata Backup (OTB) suite of products associated with this release.

38

Online Archive How it Works


Backup Method Initiate archive with new LOGGING statement ARC takes a checkpoint on table(s) Rows written to the transient journal are logged After table is backed up, log is backed up automatically to complete the archive Restore Method Restore the table(s) ARC restores the log rows as part of the Restore process Log rows automatically used to roll the restored table back to the state at the beginning of the archive Locking Read Lock required during the Checkpoint Drains transactions and utilities to get clean point at beginning No locks held during archive DDL are aborted (not blocked)

39

Bulk SQL Error Logging Tables


Description Provide support for complex error handling capabilities during bulk SQL Insert, Update and Delete operations through the use of new SQL-based error tables. Benefit Complementary capability to using native Teradata utilities, this feature increases flexibility/opportunity in developing load strategies by allowing SQL to be used for batch updates that contain errors, provide error reporting similar to current load utilities while overcoming current restrictions on having unique indexes, join indexes and triggers resident on target tables. Feature allows a SQL DML statement to continue to completion after logging an error rather than performing an abort and rollback. Considerations Strong consideration should be given to re-evaluating current batch load/ETL processes to take advantage of bulk SQL load operations that are not currently considered due to current limitations/restrictions.

40

Bulk SQL Error Logging Tables Error Table Creation Syntax


New Non-ANSI Syntax:
CREATE ERROR TABLE [<error table>] FOR <data table>;

If an optional name is not supplied, the error table name will default to ET_<data table name> If the data table name is longer than 27 characters, it will be truncated at the end. No warning will be returned. If <error table> is not specified, or if <error table> is specified without an explicit database name, then the error table will be created in the current default database for the session An error table may be created for a data table with a maximum of 2,048 columns In addition to the data table contents, the error table will house 18 additional error related columns COMMENTs on columns in the data table will not be carried over to the error table. However, COMMENTs may be added to the error table columns if desired. Access rights required for CREATE ERROR TABLE statements would be the same as those for CREATE TABLE statements

41

Bulk SQL Error Logging Tables - Logging


SQL Bulk SQL Logging Options

A LOGGING ERRORS option has been added to existing SQL syntaxes for INSERT-SELECT and MERGE-INTO statements This option permits users to specify the kinds of errors that can be logged Errors will be classified into 2 categories: Local and Non-local
Local errors are defined as errors that occur on the same AMP

that inserts the data row


Non-Local errors are defined as errors that occur on an AMP that

does not own the data row

The LOGGING ERRORS options is applicable to both ANSI and Teradata modes.

42

Bulk SQL Error Logging Tables - Logging


Logging Local Errors:

Local Errors are comprised of the following: Duplicate row (ANSI mode only):
INSERT-SELECT into a SET table is ignored in Teradata mode

Duplicate Primary Key Errors CHECK constraint violations LOB non-pad data truncation errors Data conversion errors that occur during data row inserts

Logging Non-Local Errors: Non-Local Errors are comprised of the following: Referential Integrity violations Unique Secondary Index Violations

43

Bulk SQL Error Logging Tables Insert-Select Syntax Extension


INS[ERT] [INTO] tablename { [VALUES] (expr [ ... ,expr] ) { (columnname [ ... ,columnname]) VALUES (expr [ ... ,expr]) { [ (columnname [ ... ,columnname] ) ] subquery [error_logging_option] { DEFAULT VALUES Where error_logging_option is { ALL [except_option] } [ LOGGING [ { } ] ERRORS [error_limit_option] ] ; { DATA } Where error_limit_option is { NO LIMIT } [ WITH { } ] { LIMIT OF <number> } and except option is { REFERENCING } [ EXCEPT { } ] { UNIQUE INDEX } } }; } }

44

Full ANSI Merge-Into SQL Capability


Description

Enhance the Merge-Into SQL capability to support full ANSI functionality. This feature will allow the database to perform a true bulk UPSERT operation with a standard SQL statement. Additionally, this enhancement also provides for the non-ANSI extensions to support additional error-handling capabilities. The new SQL Merge functionality lifts the current restriction of only supporting single-row merges and will allow multiple table rows to be processed in this fashion. Bulk UPSERT processing capability will no longer be limited to the Multiload utility and the extended error-handling capabilities will allow native SQL to become usable in given load strategy scenarios while at the same time overcoming current utility restrictions regarding unique indexes, join indexes and triggers resident on target tables. Strong consideration should be given to re-evaluating current batch load/ETL processes to take advantage of full ANSI Merge-Into SQL capability for load operations that are not currently considered due to current limitations/restrictions

Benefit

Considerations

45

Full ANSI Merge-Into SQL Capability - Syntax


MERGE [ INTO ] <target table> [ [ AS ] <merge correlation name> ] USING <table reference> ON <search condition> <merge operation specification>; <merge correlation name> ::= <correlation name> <merge operation specification> ::= <merge when clause> ... <merge when clause> ::= <merge when matched clause> | <merge when not matched clause> <merge when matched clause> ::= WHEN MATCHED THEN <merge update specification> <merge when not matched clause> ::= WHEN NOT MATCHED THEN <merge insert specification> <merge update specification> ::= UPDATE SET <set clause list> <merge insert specification> ::=INSERT [ ( <insert column list ) ] [ <override clause> ] VALUES <merge insert value list> <merge insert value list> ::= ( <merge insert value element [ { , <merge insert value element> } ... ] ) <merge insert value element> ::= <value expression> | <contextually typed value specification>

46

Full ANSI Merge-Into SQL Capability Syntax Extension


MERGE [INTO] tablename [ [AS] aname ] { VALUES (expr [...,expr]) } USING { } [AS] source_tname (cname, [...,cname]) { ( subquery ) } ON match-condition WHEN MATCHED THEN UPD[ATE] SET cname = expr [...,cname = expr] WHEN NOT MATCHED THEN { [VALUES] (expr [...,expr]) } INS[ERT] { } { (cname [...,cname]) VALUES (expr [...,expr]) } { ALL [except_option] } [ LOGGING [ { } ] ERRORS [error_limit_option] ] ; { DATA } Where error_limit_option is { NO LIMIT } [ WITH { }] { LIMIT OF <number> } and except option is { REFERENCING } [ EXCEPT { } ] { UNIQUE INDEX }

47

Bulk SQL Batch Test Result Summary


End to end comparison: MLoad (Acquisition + Applied Phases) vs. FastLoad + SQL Bulk Batch

FastLoad= 14 sec 100% INSERT MLoad Total FastLoad + Merge Elapsed Time Improvement 296 sec 295=14 + 281 50% INS, 50% UPD 361sec 294=14 + 280 19% 100 % UPD 342 sec 138=14 + 124 60%

48

Teradata Database 12.0 Features for


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

49

Teradata Database 12.0 Features forEnterprise Fit


Java Stored Procedures Restore/Copy Dictionary Phase Restore/Copy to Different Configuration Data Phase Performance Cursor Positioning for MSR UNICODE Support for password control & encryption Custom Password Dictionary Support New Password Encryption Algorithm

50

Java Stored Procedures


Description Provide the database user with a means to define an external stored procedure (XSP) written in the Java language which can use JDBC to dynamically execute SQL within the same session. Benefit Java applications will now be able to access data from the Teradata database directly. This feature leverages the everpresent Java skills in our customer base. Considerations Java Stored procedures will operate only on Linux or Windows based Teradata platforms. MP-RAS support is not planned.

51

Teradata Database 12.0 Features for


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

52

Teradata Database 12.0 Features for Cost, Quality & Supportability


Compression on Soft/Batch RI Columns Dispatcher Fault Isolation

53

Allow Compression on Soft/Batch Referential Integrity Columns


Description Allow compression on columns that are either a Parent Key or Foreign Key in a Soft/Batch Referential Integrity scheme. Benefit Lifting of this restriction will allow two important features (Referential Integrity and Compression) to work together without limitation. Considerations Columns that are part of a Primary Index designation are still not compressible. As such, if primary Index columns are used in a PK-FK referential integrity scheme they will not be compressible as well.

54

Dispatcher Fault Isolation


Description Increasing system availability by preventing DBS reset loops arising from re-submissions of fault-causing requests and gracefully abort a fault-causing request or transaction where possible. Also, a session will be logged-off forcibly when the number of fault instances has exceeded a threshold value. Benefits Escalating a fault to a DBS reset will be performed only as a last resort thus naturally facilitating an increase in system availability. Considerations With this feature, the Dispatcher will join both the Parser and the AMPs in support of fault isolation.

55

Teradata Database 12.0 Features for


Performance Active Enable

Cost, Quality and Supportability

Enterprise Fit

Ease of Use

56

Teradata Database 12.0 Features forEase of Use


TASM: Query Banding TASM: Traffic Cop TASM: Global/Multiple Exceptions TASM: Utility Management TASM: Open APIs Enhanced Data Collection: DBQL & ResUsage Enhanced Explain Plan Details Stored Procedure Result Sets SQL Invocation via External Stored Procedures Index Wizard Support for PPI Dynamic Result Row Specification on Table Functions Normalized AMPUsage View for Coexistence

57

TASM: Query Banding


Description A method for tagging a query, utilizing a Name/Value pair identification scheme, such that a querys originating source and purpose can be readily identified. Benefit Increases accuracy and granularity of a querys source and purpose, fosters better resource accounting and makes the request-generating application an integral part of workload management. Additionally, TASM rules can be setup to act specifically on any existing Name/Value pair enabling better and finer workload management. Considerations Feature should be used for all applications, but especially for applications that submit queries through session-pooling that use a common logon user-id. Capability can also be set at both the Session and Transaction levels.

58

TASM: Query Banding Usage


Examples

SET QUERY_BAND = org=Finance;report=Fin123; FOR SESSION;

SET QUERY_BAND = Document=XY1234;Universe=East; FOR TRANSACTION; SET QUERY_BAND = NONE FOR SESSION;

Note:

Partner tools and Teradata applications are planning roadmaps to utilize/generate the Query Banding. Customers need to have Query Banding added to their applications.

59

TASM: Traffic Cop


Description

Extends active workload management to automatically detect, notify, and act on planned and unplanned system and enterprise events. TASM then automatically implements a new set of workload management rules, a working value set (WVS), specific to the detected events and resulting system condition. Ability to automatically adjust workload management rules when the system enters or exits a degraded mode to ensure critical system work continues to get priority for resources. This also allows for application situational events to be considered beyond just the prior date/time operating environments. For instance workload management rule changes based on actual batch reports completing versus an approximate completion time. Involves creation and configuration of a TASM 2-dimensional State Matrix in TDWM aligning operating environments with system conditions.

Benefits

Considerations

60

TASM Traffic Cop Events


AMP Fatal: Number of AMPs reported as fatal at system startup. Gateway Fatal: Number of gateways reported as fatal at system startup. PE Fatal: Number of PEs reported as fatal at system startup. Node Down: Maximum percent of nodes down in a clique at system startup. AWT Limit: Number of AWTs used for MSGWORKNEW & MSGWORKONE work on an AMP. The levels of these two message work types are a barometer of the work level for an AMP. Configured as an AMP threshold with an AWT limit number with an associated qualification time (default of 180 seconds) Flow Control: Number of AMPs in flow control Configured as a limit number with an associated qualification time (default of 180 seconds) User Defined: Notification via an API when the event occurs and completes or can complete based on a time-out period set through the API at start up.

Begin/End batch processing Begin/End key workload (end of month processing) Dual System offline/online

61

TASM Traffic Cop State Matrix


State Matrix - Detections can yield a change in system state and associated WVS. The state matrix facilitates simplicity.

62

TASM Traffic Cop State Matrix


State Matrix - Detections can yield a change in system state and associated WVS. The state matrix facilitates simplicity. All levels of priority workload and session concurrency allowed.

63

TASM Traffic Cop State Matrix


State Matrix - Detections can yield a change in system state and associated WVS. The state matrix facilitates simplicity.

System Exception Occurs:

Node Failure

64

TASM Traffic Cop State Matrix


State Matrix - Detections can yield a change in system state and associated WVS. The state matrix facilitates simplicity.

Workload automatically changes and low priority session work reduced from 20 to 3 concurrent sessions

65

TASM Utility Management


Description Provide a Delay option to Utility Throttles allowing for queuing of jobs exceeding the threshold instead of the prior limitation of only being able to Reject. Extend utility management from load and export control to include backup and recovery jobs as well. An Archive/Restore option has been added to both Utility Throttles and WD Utility Classification Criteria. Benefit Additional TASM functionality for load, export, backup, and restore jobs is a key area of focus in extending Teradata workload management. Feature provides the ability to ensure that these utilities do not impact higher priority system work, can be controlled during system state changes, or that they can get prioritized when deemed necessary (for instance, during a batch window). Considerations TASM utility rules apply to FastLoad, MultiLoad, FastExport, TPT Load/Update/Export operator, JDBC FastLoad, and ARCMAIN. Additionally note that utility throttles apply only to the type and number of utilities running on Teradata, as such they cannot be associated with Teradata Database objects.

66

TASM: Open APIs


Description Provide a set of Application Program Interfaces (APIs) that allow application or third-party tools to interface directly with the Teradata Active Systems Management (TASM) software components. Benefit Feature provides the mechanism by which applications or third-party tools may influence or enhance the working of the TASM software to suit their particluar needs and requirements. Considerations Use of the TASM Open APIs can have a profound affect on given workload and how they are managed and should only be utilized if the required functionality is not available with the TASM framework.

67

Enhanced Data Collection: DBQL


Description Provide additions to the data columns in the Database Query Logging (DBQL) facility. Benefit DBQL is an extremely valuable tool that facilitates query analysis, including the capture of executed SQL and associated resource consumption. Extending the DBQL content simply enables the capability to provide deeper query analysis, fosters better workload understanding and provides the basis for query tracking and for optimizing performance. Considerations As the content of DBQL expands now and in the future, a greater consideration as to its adoption should be considered by database administrators.

68

Enhanced Data Collection: DBQL


Additional Information in DBQL:

Parsing Engine CPU time High and low AMP byte counts for spool Normalized CPU data for co-existence systems Cost estimates (CPU, I/O, network, heuristics) Estimate processing time and row counts Additional Utility related information

69

Enhanced Data Collection: Resusage


Description Provide new tables to the Teradata Resource Sub-System (RSS) that provide additional details on AMP Worker Tasks (AWTs) and enhance other RSS tables with provisional information on workload definitions. Benefit Enhancement of the Resusage system tables provides additional insight into system consumption and fosters better workload management. Considerations As the content of Resusage system tables expands now and in the future, a greater consideration as to leverage its contents should be considered by database administrators.

70

Enhanced Data Collection: Resusage


ResUsageSAWT Columns of Interest:

MailBoxDepth FlowControlled FlowCtlCnt InuseMax WorkTypeInuse WorkTypeMax

Current depth of the AMP work mailbox. Specifies if an AMP is in flow control. Number of times during the log period that the system entered the flow control state. Maximum # of AWTs in use at any one time. Current # of AWTs in use during the log period for each work type for the VprId vproc. Maximum # of AWTs in use at one time during the log period for each work type for the VprId vproc.

71

Enhanced Data Collection: Resusage


ResUsageSPS Columns of Interest:

AGId RelWgt CPUTime IOBlks NumProcs NumSets NumRequests QWaitTime QWaitTimeMax QLength QLengthMax ServiceTime ServiceTimeMax

Identifies current Allocation Group for Perf Group ID Relative weight of the Allocation Group. Milliseconds of CPU time consumed by assoc. task. # of logical data blocks read and/or written by PG. # of processes assigned to the PG. Allocation Group set division type. # of requests for the AMP Worker Tasks. Time that work requests waited on an input queue before being serviced. Maximum time that work requests waited. # of work requests waiting on the input queue. Max # of work requests waiting on input queue. Time that work requests required for service. Max time that work requests required for service.

72

Enhanced Explain Plan Details


Description Enrich the content of SQL explain plans by additional information to the explain output including spool size estimates, view names and actual column names for Hashing, Sorting or Grouping columns. Benefits The enhancing of explain plan details facilitates explain output readability and understanding as well as aids in the debugging of complex queries and for identifying intermediate result spool skewing. Considerations There is no special mechanism needed to acquire enhanced explain plan details. A simple Explain SQL statement will generate all the aforementioned features.

73

Stored Procedure Result Sets


Description

Provide the functionality that allows Stored Procedures to build and use answer sets as a result of its execution. Extending the Stored Procedure capability greatly simplifies application development against the Teradata database and provides a long awaited capability. Currently, without the Stored procedure Result Set capability, temporary tables need to be created and used to store answer sets and follow the Stored Procedure CALL with a SELECT statement. Strong consideration should be given to removing these intermediate steps from current applications.

Benefits

Considerations

74

SQL Invocation via External Stored Procedures


Description

Extend the current External Stored Procedure (XSP) capability to provide an interface that allows an XSP to invoke and use SQL in the current session. This feature will foster greater application development and enhance the ability of a client application to access and use the Teradata database directly. Initial primary development focus will be to use CLIv2 to facilitate and allow and XSP to submit SQL to the Teradata database.

Benefits

Considerations

75

Individual New Feature Performance


12.0 compared to 6.2
New Feature
Bulk SQL Load

Performance Improvement
20% to 60% over MultiLoad end to end. 10% to 30% on queries that QRW can optimize. 20% time-savings improvement over non-online. Up to 50% on Level Two Secondary Index Check checking on large tables (> 20M rows). Up to 30% on queries that can take advantage of partition elimination, e.g. multidimensional queries. Up to 30% on queries that the new algorithm determines to cache. 5% improvement in plans generated by new OCES vs old OCES (costing parameters).

New query rewrite Online Archive

CheckTable

Multi Level PPI

Parameterized cache request enhancement Optimizer Cost Estimation Subsystem improvements

76

Teradata Database 12.0 Features


Performance
OCES (phase 3) Statistics enhancements:

Active Enable
Online Archive Replication Scalability Restartable Scandisk Bulk SQL error logging tables Full ANSI Merge-Into SQL capability CheckTable utility performance enhancements Table Function without Join Back

Increase statistics intervals Extrapolate statistics outside range (e.g. DATE) Collect stats for multi-column NULL values Collect AMP Level statistics values Projection Pushdown Push Joins into UNION ALL Views

Enhanced query rewrite capability

Parameterized statement caching improvements Hash bucket expansion Cost, Quality, Multi-level Partitioned Primary & Supportability Index (PPI) Windowed Aggregate Dispatcher Fault Isolation Functions Compression on Soft/Batch Referential Integrity Columns Additional EXPLAIN plan details

Ease of Use

Enterprise Fit

TASM enhancements:

Query Banding Traffic Cop Enhancements Global/Multiple Exceptions Provide for Open API SQL capability for TDWM Dynamic load utility management

Data collection: DBQL, Resusage Index wizard support for PPI SQL invocation via External Stored Procedures Stored Procedure result sets Dynamic Result Row Specification on Table Functions Normalized AMPusage View for coexistence

JAVA SPs (with JDBC) (Linux and Windows) Cursor positioning for multi-statement requests UNICODE support for password control and encryption Custom password Dictionary support New password encryption algorithm Restore/Copy Dictionary Phase Restore/Copy to Different Configuration Data Phase UNIX/Kerberos Authentication for Windows Clients

77

Questions.....

Richard.Charucki@teradata.com

78

Anda mungkin juga menyukai