Anda di halaman 1dari 42

TERADATA SQL

Teradata RDBMS Architecture


Objective of this training is to understand Teradata SQL features. This module is structured as:
Teradata Objects Teradata SQL Indexes ANSI vs. Teradata mode Data Types DDL DML SELECT statement HELP, SHOW, EXPLAIN Data Conversions Aggregation Subquery Processing Join Processing Date and Time Processing Character String Processing OLAP Functions SET Operators Data Manipulation Data Interrogation View Processing Macro Processing Reporting Totals and Subtotals Data Definition Language Temporary Tables Trigger Processing

Teradata Objects
There are five fundamental objects which may be found in a Teradata database. Tables - rows and columns of data Views - predefined subsets of existing tables Macros - predefined, stored SQL statements Triggers - SQL statements associated with a table Stored Procedure - program stored within TD These objects are created, maintained and deleted using Structured Query Language (SQL). Object definitions are stored in the Data Dictionary DEFINITIONS OF ALL DD/D Directory (DD/D).
DATABASE or USER TABLE 1 VIEW 1 MACRO 1 TRIGGER 1 Stored Procedure 1 TABLE 2 VIEW 2 MACRO 2 TRIGGER 2 Stored Procedure 2 TABLE 3 VIEW 3 MACRO 3 TRIGGER 3 Stored Procedure 3
DATABASE OBJECTS

Databases

A Teradata database is a defined logical repository for tables, views, macros, SPs. A database is empty until objects are created within it. Teradata has the concept of parent and child databases. A database has one and only one creator. The owner can be different from the creator if the database is given to another user.

Users

A Teradata user is a database with an assigned password. A user may logon to Teradata and access objects within itself other databases for which it has access rights. A user is an active repository while a database is a passive repository. A user is empty until objects are created within it.

The Data Dictionary Directory (DD/D)


The DD/D is an integrated set of system tables contain definitions of and information about all objects in the system is entirely maintained by the RDBMS is data about the data or metadata is distributed across all AMPs like all tables may be queried by administrators or support staff is accessed via Teradata supplied views

Examples of views: DBC.Tables DBC.Users DBC.AllRights DBC.AllSpace - info about all tables - info about all users - info about access rights - info about space utilization

Structured Query Language (SQL)


SQL is a query language for Relational Database Systems. - A fourth-generation language - A set-oriented language - A non-procedural language (e.g, doesnt have IF, GO TO, DO, FOR NEXT, or PERFORM statements) SQL consists of: Data Definition Language (DDL) - Defines database structures (tables, users, views, macros, and triggers) CREATE DROP ALTER Data Manipulation Language (DML) - Manipulates rows and data values SELECT INSERT UPDATE DELETE Data Control Language (DCL) - Grants and revokes access rights GRANT REVOKE Teradata SQL also includes Teradata Extensions to SQL HELP SHOW EXPLAIN CREATE MACRO REPLACE MACRO

Indexes
An index is a mechanism that can be used by the SQL query optimizer to make table access more performant.

Teradata provides four different index types. Primary index All Teradata tables require a primary index because the system distributes table rows to the AMPs based on their primary index values. Primary indexes types are: Unique primary index (UPI) Nonunique primary index (NUPI) Nonpartitioned primary index (NPPI) Partitioned primary index (PPI) Secondary index Unique secondary index (USI) Nonunique secondary index (NUSI) Join index Multitable join index Single-table join index Hash index

Primary Index
Controls data distribution and retrieval using the Teradata hashing algorithm. Defined with the CREATE TABLE data definition statement. If no explicit primary index is defined, then CREATE TABLE assigns one automatically. Can be unique or non-unique and partitioned or non-partitioned. If the primary index is not defined explicitly as unique, then the definition defaults to non-unique. Can be composed of as many as 64 columns. Can be generated automatically if defined on an identity column. A minimum of one and a maximum of one must be defined per table. Improves performance when used correctly in the WHERE clause of an SQL data manipulation statement to perform the following actions Single-AMP retrievals Joins between tables with identical primary indexes, the optimal scenario.

Secondary Index
Can enhance the speed of data retrieval. Can be Unique (USI) or non-unique (NUSI). NUSIs can be hash-ordered or value-ordered. Do not affect base table data distribution. Maximum of 32 secondary and join indexes defined per table. Can be composed of as many as 64 concatenated columns. Can be created or dropped dynamically as data usage changes or if they are found not to be useful for optimizing data retrieval performance. Require additional disk space to store subtables. Require additional I/Os on INSERTs, DELETEs, and possibly on UPDATEs. Should not be defined on columns whose values change frequently. Composite secondary index is useful if it reduces the number of rows that must be accessed.

Join Index
Join indexes are file structures designed to permit queries (join queries in the case of multitable join indexes) to be resolved by accessing the index instead of having to access and join their underlying base tables. Joins multiple tables (optionally with aggregation) in a prejoin table. Replicates all, or a vertical subset, of a single base table and partitions its rows using a different primary index than the base table, such as a foreign key column to facilitate joins of very large tables by hashing them to the same AMP. Aggregates one or more columns of a single table as a summary table. Join indexes are useful for queries where the index table contains all the columns referenced by one or more joins, thereby allowing the Optimizer to cover all or part of the query by planning to access the index rather than its underlying base tables. queries that aggregate columns from tables with large cardinalities.

Hash Index

Hash indexes are file structures that share properties with both single-table join indexes and secondary indexes. Hash indexes are not indexes in the usual sense of the word. They are base tables that cannot be accessed directly by a query. A hash index always has at least one of the following functions. Replicates all, or a vertical subset, of a single base table and partitions its rows with a user-specified partition key column set, such as a foreign key column to facilitate joins of very large tables by hashing them to the same AMP. Provides an access path to base table rows to complete partial covers. Hash indexes are useful for queries where the index table contains the columns referenced by a query, thereby allowing the Optimizer to cover it by planning to access the index rather than its underlying base table.

ANSI Vs TERADATA MODE

Teradata RDBMS has the ability to execute all SQL in either Teradata mode or in ANSI mode.

Teradata mode: All SQL commands are implicitly a complete transaction. Therefore, once a change is made, it is committed and becomes permanent. It contains an implied COMMIT or an explicit END TRANSACTION (ET).

ANSI mode: All SQL commands are considered to be part of the same logical transaction. A transaction is not complete until an explicit COMMIT is executed.

Data Types Two categories of data types are supported by Teradata.


Conforming to ANSI INTEGER SMALLINT DECIMAL(X,Y) NUMERIC(X,Y) FLOAT REAL PRECISION DOUBLE PRECISION CHARACTER(X), CHAR(X) VARCHAR(X), CHARACTER VARYING(X) CHAR VARYING(X) DATE TIME TIMESTAMP

Specific to teradata BYTEINT BYTE(X) VARBYTE(X) LONG VARCHAR GRAPHIC(X) VARGRAPHIC(X)

Data Definition Language (DDL)


CREATE TABLE employee ,FALLBACK ,NO BEFORE JOURNAL ,NO AFTER JOURNAL ,FREESPACE = 30 ,DATABLOCKSIZE = 10000 BYTES (employee_number INTEGER NOT NULL ,dept_number SMALLINT ,job_code INTEGER COMPRESS ,last_name CHAR(20) NOT NULL ,first_name VARCHAR (20) ,street_address VARCHAR (30) TITLE 'Address' ,city CHAR (15) DEFAULT Atlanta COMPRESS Atlanta ,state CHAR (2) WITH DEFAULT ,birthdate DATE FORMAT 'mm/dd/yyyy' ,salary_amount DECIMAL (10,2) ,sex CHAR (1) UPPERCASE ) UNIQUE PRIMARY INDEX (employee_number) ,INDEX (dept_number); CREATE INDEX (job_code) ON employee; DROP INDEX (job_code); DROP TABLE employee;

Data Manipulation Language (DML)


The SELECT statement is used to retrieve data from tables.
Who was hired on October 15, 1986?
EMPLOYEE (partial listing)
EMPLOYEE NUMBER MANAGER EMPLOYEE NUMBER DEPT NUMBER JOB CODE LAST NAME FIRST NAME HIRE DATE BIRTH DATE SALARY AMOUNT

PK

FK

FK

FK

1006 1008 1005 1004 1007 1003

1019 1019 0801 1003 1005 0801

301 301 403 401 403 401

312101 312102 431100 412101 432101 411100

Stein Kanieski Ryan Johnson Villegas Trader

John Carol Loretta Darlene Arnando James

861015 870201 861015 861015 870102 860731

631015 680517 650910 560423 470131 570619

3945000 3925000 4120000 4630000 5970000 4785000

SELECT Last_Name ,First_Name FROM Employee WHERE Hire_Date = 861015 ; FIRST LAST
NAME Stein Ryan Johnson NAME John Loretta Darlene

Answer

SELECT statement
Basic SELECT command:
SELECT * FROM Student_Table ;

Compound Comparisons:
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 OR Grade_Pt = 4.0 AND Class_Code = 'FR' ;

Using NOT in sql comparisons:


SELECT Last_Name ,First_Name ,Class_Code ,Grade_Pt FROM Student_Table WHERE NOT ( Grade_Pt >= 3.0 AND Grade_Pt IS NOT NULL AND Class_Code <> 'SR' AND Class_Code IS NOT NULL )

Multiple Value search (IN):


SELECT Last_Name ,Class_Code ,Grade_Pt FROM Student_Table WHERE Grade_Pt IN ( 2.0, 3.0, 4.0 ) ;

Using Quantifiers vs IN
SELECT Last_Name ,Class_Code ,Grade_Pt FROM Student_Table WHERE Grade_Pt = ANY ( 2.0, 3.0, 4.0 ) ;

SELECT statement contd....


Multiple Value Rage Search(BETWEEN):
SELECT Grade_Pt FROM Student_Table WHERE Grade_Pt BETWEEN 2.0 and 4.0 ;

Character String Search(LIKE):


SELECT * FROM Student_Table WHERE Last_Name LIKE ('_a%' ) ;

Derived Columns:
SELECT salary (format 'ZZZ,ZZ9.99') ,salary/12 (format 'Z,ZZ9.99') FROM Pay_Table ;

Order By:
SELECT * FROM Student_Table WHERE Grade_Pt > 3 ORDER BY Grade_Pt DESC;

Distinct Function:
SELECT DISTINCT Class_code FROM student_table ORDER BY class_code;

SELECT statement contd....


Creating a Column Alias Name: AS
SELECT salary AS annual_salary ,salary/12 AS Monthly_salary FROM Pay_Table ;

NAMED
SELECT salary (NAMED Annual_salary) ,salary/12 (NAMED Monthly_salary) FROM Pay_Table ;

Naming Conventions
When creating an alias only valid Teradata naming characters are allowed. The alias becomes the name of the column for the life of the SQL statement. The only difference is that it is not stored in the Data Dictionary.

Breaking Conventions:
When it is necessary or desirable to use non-standard characters in a name, double quotes (") are used around the name. This technique tells the PE that the word is not a reserved word and makes it a valid name. This is the only place that Teradata uses a double quote instead of a single quote (). SELECT salary "Annual salary" ,salary/12 "Monthly_salary" FROM Pay_Table ORDER BY "Annual Salary" ;

HELP Commands
Databases and Users: HELP HELP DATABASE USER customer_service ; Dave_Jones ;

Tables, Views, and Macros: HELP HELP HELP HELP TABLE VIEW MACRO COLUMN employee ; emp; payroll_3; employee.*; employee.last_name; emp.* ; emp.last; HELP HELP HELP INDEX STATISTICS CONSTRAINT employee; employee; employee.over_21;

Example of HELP DATABASE


HELP DATABASE customer_service;

*** Help information returned. 10 rows. *** Total elapsed time was 1 second.

Table/View/Macro name contact customer department employee employee_phone job location location_employee location_phone

Kind T T T T T T T T T

Comment ? ? ? ? ? ? ? ? ?

SHOW Command
SHOW commands display how an object was created. Command SHOW TABLE SHOW VIEW SHOW MACRO tablename; viewname; macroname; Returns CREATE TABLE statement CREATE VIEW statement CREATE MACRO statement

SHOW TABLE employee; CREATE SET TABLE CUSTOMER_SERVICE.employee ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER, last_name CHAR(20) NOT CASESPECIFIC NOT NULL, first_name VARCHAR(30) NOT CASESPECIFIC NOT NULL, hire_date DATE NOT NULL, birthdate DATE NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number );

The EXPLAIN Facility


The EXPLAIN modifier in front of any SQL statement generates an English translation of the Parsers plan. The request is fully parsed, and optimized but not actually executed. EXPLAIN returns: Text showing how a statement will be processed (a plan) An estimate of how many rows will be involved A relative cost of the request (in units of time)

This information is useful for: predicting row counts predicting performance testing queries before production analyzing various approaches to a problem EXPLAIN

EXPLAIN SELECT last_name, department_number FROM employee; Explanation (partial): 3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.

Data Conversion

CAST
Data can be converted from one type to another by using the CAST function.
SELECT CAST('ABCDE' AS CHAR(1)) AS Trunc ,CAST(128 AS CHAR(3)) AS OK ,CAST(127 AS INTEGER ) AS Bigger ,CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ; Trunc A OK. 128 Bigger 127 Whole 121 Rounder 122

Implied CAST
Prior to CAST, conversion was requested by placing the "implied' data type conversion in parentheses after the column name.
SELECT 'ABCDE' (CHAR(1)) AS Shortened ,128 (CHAR(3)) AS OK ,-128 (CHAR(3)) AS N_OK ,128 (INTEGER) AS Bigger ,121.13 (SMALLINT) AS Whole ; Shortened A OK_ N_OK_ 128 Bigger _ 121 Whole

Subquery Processing
Using IN
SELECT Order_number ,Order_total FROM Order_Table WHERE Customer_number IN ( SELECT Customer_number FROM Customer_table WHERE Customer_name LIKE 'Bill%');

Using NOT IN
SELECT Customer_name ,Phone_number FROM Customer_Table WHERE Customer_number NOT IN ( SELECT Customer_number FROM Order_table) ;

Using ANY
SELECT Customer_name ,Phone_number FROM Customer_Table WHERE customer_number = ANY (SELECT customer_number FROM Order_Table WHERE Order_total > ( SELECT AVG(Order_total) FROM Order_Table ) );

Using EXISTS
SELECT Customer_name FROM Customer_table AS CUST WHERE EXISTS ( SELECT * FROM Order_table AS OT WHERE CUST.Customer_number = OT.Customer_number ) ;

Join Processing
A join is the combination of two or more tables in the same FROM of a single SELECT statement.

Different types of Joins provided are: Inner Join Outer Join Left Outer Join Right Outer Join Full Outer Join Cross Join Self Join

Date and Time Processing


DATE, TIME and TIMESTAMP are valid Teradata data types. The Teradata RDBMS stores the date in YYYMMDD format on disk. For January 1, 1999 Teradata stores 0990101 on the disk. The following calculation demonstrates how Teradata converts a date to the YYMMDD date format, for storage of January 1, 1999:

The stored data for the date January 1, 1999 is converted to:

Date and Time Processing contd...

INTEGERDATE in the form of YY/M/DD is the default display format for most Teradata database client utilities Output date format can be changed by using DATEFORMAT
System Level Definition
 

MODIFY GENERAL 14 = 0 /* INTEGERDATE (YY/MM/DD) */ MODIFY GENERAL 14 = 1 /* ANSIDATE (YYYY-MM-DD) */

User Level Definition


CREATE USER username ....... DATEFORM={INTEGERDATE | ANSIDATE} ;

Session Level Declaration


SET SESSION DATEFORM = {ANSIDATE | INTEGERDATE} ;

Since Teradata stores the date as an INTEGER, it allows simple and complex mathematics to calculate new dates from dates Other functions provided are ADD MONTHS, EXTRACT, OVERLAPS etc.

Character String Processing

Teradata provides character string processing functions like: CHARACTERS - used to count the number of characters stored in a data column. TRIM - used to eliminate space characters from fixed length data values. SUBSTRING - used to retrieve a portion of the data stored in a column. SUBSTR - the original Teradata substring operation. POSITION - used to return a number that represents the starting location of a specified character string with character data. INDEX - used to return a number that represents the starting position of a specified character string with character data. ANSI mode is case sensitive and Teradata mode is not. Therefore, the output from most of the string processing functions will differ accordingly.

OLAP Functions

Powerful OLAP (On-Line Analytical Processing) functions provide data mining capabilities to discover a wealth of knowledge from the data.

OLAP functions combined with standard SQL within the data warehouse, provide the ability to analyze large amounts of historical, business transactions from the past through the present

Like traditional aggregates, OLAP functions operate on groups of rows and permit qualification and filtering of the group result.

Unlike aggregates, OLAP functions also return the individual row detail data and not just the final aggregated value.

OLAP Functions contd....

OLAP Functions provided by Teradata are:

SET Operators

The Teradata database provides the following SET operators:

INTERSECT - used to match or join the common domain values from two or more sets. UNION - used to merge the rows from two or more sets. The join performed for a UNION is more similar to an OUTER JOIN. EXCEPT - used to eliminate common domain values from the answer set by throwing away the matching values. This is the primary SET operator that provides a capability not available using either an INNER or OUTER JOIN MINUS - is exactly the same as the EXCEPT. It was the original SET operator in Teradata before EXCEPT became the standard

Data Interrogation Functions


Data Interrogation functions test the data values after a row passes the WHERE test and is read from the disk. These functions not only allow the data to be tested, but also allow for additional logic to be incorporated into the SQL. Functions provided by teradata are: NULLIFZERO - compares the data value in a column for a zero and when found, converts the
zero, for the life of the SQL statement, to a NULL value.

NULLIF - only converts a zero to a NULL. It can convert anything to a NULL. ZEROIFNULL - compares the data value in a column and when it contains a NULL,
transforms it, for the life of the SQL statement, to a zero.

COALESCE - searches a value list, ranging from one to many values, and returns the first
Non-NULL value it finds. At the same time, it returns a NULL if all values in the list are NULL.

CASE - provides an additional test that allows for multiple comparisons on multiple columns
with multiple outcomes. It also incorporates logic to handle a situation in which none of the values compares equal.

View Processing
Views are pre-defined subsets of existing tables consisting of specified columns and/or rows from the table(s). A single table view: - is a window into an underlying table - allows users to read and update a subset of the underlying table - has no data of its own
EMPLOYEE (Table)
EMPLOYEE NUMBER MANAGER EMPLOYEE NUMBER

PK

FK

DEPT NUMBER

FK

JOB CODE

FK

LAST NAME

FIRST NAME

HIRE DATE

BIRTH DATE

SALARY AMOUNT

1006 1008 1005 1004 1007 1003

1019 1019 0801 1003 1005 0801

301 301 403 401 403 401

312101 312102 431100 412101 432101 411100

Stein Kanieski Ryan Johnson Villegas Trader

John Carol Loretta Darlene Arnando James

861015 870201 861015 861015 870102 860731

631015 680517 650910 560423 470131 570619

3945000 3925000 4120000 4630000 5970000 4785000

Emp_403 (View)
EMP NO 1005 801 DEPT NO 403 403 LAST NAME Villegas Ryan FIRST NAME Arnando Loretta HIRE DATE 870102 861015

Multi-Table Views
A multi-table view allows users to access data from multiple tables as if it were in a single table. Multi-table views are also called join views. Join views are used for reading only, not updating.
MANAGER EMPLOYEE NUMBER DEPT NUMBER JOB CODE LAST NAME FIRST NAME HIRE DATE BIRTH DATE SALARY AMOUNT

EMPLOYEE (Table)
EMPLOYEE NUMBER

PK

FK

FK

FK

1006 1019 301 1008 1019 301 1005 0801 403 1004 1003 401 1007 1005 403 DEPARTMENT (Table) 1003 0801 401

312101 312102 431100 412101 432101 411100

Stein Kanieski Ryan Johnson Villegas Trader

John Carol Loretta Darlene Arnando James

861015 870201 861015 861015 870102 860731

631015 680517 650910 560423 470131 570619

3945000 3925000 4120000 4630000 5970000 4785000

DEPT NUMBER

PK
501 301 302 403 402 401 201

DEPARTMENT NAME marketing sales research and development product planning education software support customer support technical operations EmpDept (View)
LAST NAME Stein Kanieski Ryan Johnson Villegas Trader

MANAGER BUDGET EMPLOYEE AMOUNT NUMBER

FK

80050000 46560000 22600000 93200000 30800000 98230000 29380000


DEPARTMENT NAME

1017 1019 1016 1005 1011 1003 1025

research & development research & development education customer support education customer support

MACRO Processing
Macros are SQL statements stored as an object in the Data Dictionary (DD). Unlike a view, a macro can store one or multiple SQL statements. Additionally, the SQL is not restricted to only SELECT operations. INSERT, UPDATE, and DELETE commands are valid within a macro. When using BTEQ, conditional logic and BTEQ commands may also be incorporated into the macro. You can only have one DDL statement within a macro. If a macro contains DDL, it must be the last statement in the macro. Macro commands:
   

CREATE MACRO - initially builds a new macro REPLACE MACRO - used to modify an existing macro EXECUTE MACRO - used to run a macro DROP MACRO - deletes a macro of the DD.

Reporting TOTALS and SUBTOTALS


Teradata has the capability to generate the total and subtotals and at the same time display the detail data from the rows that goes into creating the totals.

Totals(WITH)
SELECT Last_Name ,First_Name ,Dept_no ,Salary FROM Employee_table WITH SUM(Salary);

Subtotals (WITH..BY)
SELECT Last_Name , First_Name , Dept_no , Salary FROM Employee_table WITH SUM(salary) (TITLE 'Departmental Salaries:') BY dept_no

Temporary Tables
Why Temporary Tables? You can usually use simpler SQL statements. The system doesn't have to do aggregation. The system may access Accounts based on the Primary Index value, which results in a fast response.

Temporary Table types DERIVED TABLES: Tables which are created in spool and dropped when the query is completed. VOLATILE TEMPORARY TABLES: Tables that do not survive a system restart. GLOBAL TEMPORARY TABLES :require a base definition which is stored in the Data Dictionary(DD). Remains materialized until it is dropped or session terminates.

Temporary Tables contd....

Trigger Processing
A trigger is an event driven maintenance operation. The event is caused by a modification to one or more columns of a row in a table. Triggering Statement The user's initial SQL maintenance request that causes a row to change in a table and then causes a trigger to fire (execute).
It can be: It cannot be: INSERT, UPDATE, DELETE, INSERT/SELECT SELECT

Triggered Statement It is the SQL that is automatically executed as a result of a triggering statement.
It can be: It cannot be: INSERT, UPDATE, DELETE, INSERT/SELECT, ABORT/ROLLBACK, EXEC BEGIN/END TRANSACTION, COMMIT, CHECKPOINT, SELECT

Stored Procedures
Teradata provides Stored Procedural Language (SPL) to create Stored Procedures. These procedures allow the combination of both SQL and SPL control statements to manage the delivery and execution of the SQL.

The processing flow of a procedure is more like a program. It is a procedural set of commands, where SQL is a non-procedural language.

DDL is not allowed within a procedure.

Stored Procedure Commands:


CREATE PROCEDURE, REPLACE PROCEDURE, DROP PROCEDURE, SHOW PROCEDURE, RENAME PROCEDURE.

Stored Procedures contd....