Paul Derouin
Learning Consultant
Teradata Learning
Table of Contents
2 pg.
Using Union For Set Tagging
Show the name of manager 1019 and the names of his direct reports.
SELECT first_name
,last_name
, ' employee ' AS "Employee//Type"
FROM employee
WHERE manager_employee_number = 1019
UNION
SELECT first_name
,last_name
,' manager '
FROM employee
WHERE employee_number = 1019
ORDER BY 2;
Employee
first_name last_name Type
---------------------------- -------------------- --------------
Carol Kanieski employee
Ron Kubic manager
John Stein employee
3 pg.
Using Union For Set Tagging (Cont.)
4 pg.
Using CASE For Set Tagging
Show the name of manager 1019 and the names of his direct reports.
SELECT first_name
,last_name
,CASE WHEN manager_employee_number = 1019 THEN 'employee'
WHEN employee_number = 1019 THEN 'manager'
ELSE NULL END
FROM employee
WHERE employee_number = 1019
OR manager_employee_number = 1019;
5 pg.
Using CASE For Set Tagging
SELECT first_name ,last_name
,CASE WHEN manager_employee_number = 1019 THEN 'employee'
WHEN employee_number = 1019 THEN 'manager'
ELSE NULL END
FROM employee
WHERE employee_number = 1019
OR manager_employee_number = 1019;
6 pg.
Reporting By Day of Week
Show the sales figures by day of week as seen below.
Day of
Week Sales
---------------- ----------
Sunday 2950.00
Monday 2200.00
Tuesday 2000.00
Wednesday 2100.00
Thursday 2000.00
Friday 2450.00
Saturday 3250.00
7 pg.
Creating a Day of Week Table
8 pg.
Using a Day Of Week Table
Show the sales figures by day of week.
Day of
Week Sales
SELECT dw.char_day "Day of// Week" ---------------- ----------
,SUM(ds.sales) AS Sales Sunday 2950.00
Monday 2200.00
FROM daily_sales ds Tuesday 2000.00
,sys_calendar.calendar sc Wednesday 2100.00
Thursday 2000.00
, day_of_week dw Friday 2450.00
WHERE sc.calendar_date = ds.salesdate Saturday 3250.00
AND sc.day_of_week = dw.numeric_day
GROUP BY 1, dw.numeric_day
ORDER BY dw.numeric_day;
9 pg.
Using CASE Statement
SELECT CASE sc.day_of_week
WHEN 1 then 'Sunday' Same Result
WHEN 2 then 'Monday' Day of
Week Sales
WHEN 3 then 'Tuesday' ---------------- ----------
WHEN 4 then 'Wednesday' Sunday 2950.00
WHEN 5 then 'Thursday' Monday 2200.00
Tuesday 2000.00
WHEN 6 then 'Friday' Wednesday 2100.00
WHEN 7 then 'Saturday' Thursday 2000.00
ELSE 'Not Found' END Friday 2450.00
AS "Day of// Week" Saturday 3250.00
,SUM(ds.sales) AS Sales
FROM daily_sales ds ,sys_calendar.calendar sc
WHERE sc.calendar_date = ds.salesdate
GROUP BY 1, sc.day_of_week
ORDER BY sc.day_of_week;
Requires joining only two tables using one join condition
Total cost of this query is approx .35
10 pg.
RANDOM Function
The RANDOM function may be used to generate a random number between a
specified range.
RANDOM (Lower limit, Upper limit) returns a random number between the lower and
upper limits inclusive. Both limits must be specified.
department_number Random(1,9)
----------------- -----------
501 2
301 6
Note it is possible for random
201 3
numbers to repeat. The RANDOM
600 7
function is activated for each row
100 3
processed, thus duplicate random
402 2
values are possible.
403 1
302 5
401 1
11 pg.
Duplicate RANDOM Values
Duplicate value likelihood may be reduced by increasing the size of the RANDOM
interval relative to the size of the table.
SELECT department_number
, RANDOM(1,100)
FROM department;
department_number Random(1,100)
----------------- -------------
501 15
301 19
201 71
600 75
100 61
402 41
403 81
302 31
401 59
12 pg.
Duplicate RANDOM Values (cont'd)
Duplicate random values can be increased, by decreasing the size of the RANDOM
interval relative to the size of the table.
With only three values to distribute over nine rows, duplicates are necessary.
13 pg.
RANDOM Sampling
Consider the following distribution of employee salaries.
14 pg.
Using The SAMPLE Function
A sample of a single group can also be generated and with more accuracy using
the SAMPLE function.
Solution 2:
SELECT employee_number
, salary_amount
FROM employee
WHERE salary_amount < 30000
SAMPLE .67;
employee_number salary_amount
--------------- -------------
1006 29450.00
1023 26500.00
1008 29250.00
1014 24500.00
15 pg.
SAMPLE Function For Multiple Samples
Permits use of percentage or row count specification.
Used rows are not reusable for subsequent sample sets.
16 pg.
Complex RANDOM Sampling
The RANDOM function can be used multiple times in the same SELECT statement, It
can be used to produce multiple samples, each using a separate criteria.
Example: Create a sample consisting of approximately 67% from each of the under
$50,000 salary ranges.
employee_number salary_amount
SELECT employee_number, salary_amount --------------- -------------
1014 24500.00
FROM employee 1001 25525.00
WHERE (salary_amount < 30000 1023 26500.00
1009 31000.00
AND RANDOM(1,3) < 3) 1005 31200.00
OR (salary_amount BETWEEN 30001 1004 36300.00
1003 37850.00
AND 40000 AND RANDOM(1,3) < 3) 1021 38750.00
OR (salary_amount BETWEEN 40001 1020
1002
39500.00
43100.00
AND 50000 AND RANDOM(1,3) < 3) 1024 43700.00
ORDER BY 2; 1010
1007
46000.00
49700.00
The result shows the following distribution:
Under $30,000 — 3 out of 6 (50%)
17 pg.
Complex RANDOM Sampling (cont'd)
Changing the size of the RANDOM range can affect the size of the returned sample.
Example: Perform the same query but change the size of the RANDOM range to 100.
18 pg.
Sample Sizing Issues
The larger the pool of rows to be drawn from, the closer one can get to achieving a
specific percentage of rows in the sample.
SEL COUNT(*) FROM agent_sales
WHERE (sales_amt BETWEEN 20000 and 39999);
Returns 100 rows exactly
Each of the following examples attempts to return a 50% sample of the target rows.
The smaller the RANDOM range is defined relative to the size of the pool of rows,
the more accurately a specific percentage can be achieved.
19 pg.
Limitations On Use Of RANDOM
20 pg.
V2R5 Sampling Features
Before V2R5:
Sampling without replacement
Proportional allocation
- each AMP provides same proportion of sample rows.
With V2R5:
Sampling with or without replacement (User choice)
Proportional allocation
- each AMP provides same proportion of sample rows.
Ramdomized allocation
- randomized across system - not AMP proportional.
21 pg.
Dynamic SQL and Static SQL
Dynamic SQL
- technique for generating and executing SQL commands dynamically from a
stored procedure at runtime.
Static SQL
- pre-constructed SQL compiled into the stored procedure.
- may be parameterized.
- still optimized prior to each execution
Static SQL Example
REPLACE PROCEDURE static_sql (IN sal DEC(9,2)
,IN emp_num INT)
BEGIN
UPDATE emp1
SET salary_amount = :sal
WHERE employee_number = :emp_num);
END;
22 pg.
Dynamic SQL (cont'd)
Dynamic SQL Example
CALL dyn_sql('salary_amount','50000','1018');
/* Updates employee 1018 salary_amount to $50,000 */
CALL dyn_sql('job_code','567890','1018');
/* Updates employee 1018 job_code to 567890 */
Dynamic SQL
- Constructed as a concatenated character string
23 pg.
Dynamic SQL (cont'd)
The following are restrictions on the use of Dynamic SQL within stored procedures:
Restrictions
The creating user must also be the owner of the procedure in order to have
the right to use dynamic SQL.
CALL SELECT
CREATE PROCEDURE SELECT INTO
DATABASE SET SESSION ACCOUNT
EXPLAIN SET SESSION COLLATION
HELP SET SESSION DATEFORM
REPLACE PROCEDURE SET TIME ZONE
SHOW
24 pg.
Join Indexes
A Join Index is an optional index which may be created by the
user for one of the following three purposes:
− Pre-join multiple tables(Multi-table Join Index)
− Distribute the rows of a single table on the hash value of a
foreign key value(Single-table Join Index)
− Aggregate one or more columns of a single table or multiple
tables into a summary table(Aggregate Join Index)
If possible, the optimizer will use a Join Index rather than access
tables directly
This typically will result in much better performance
Join Indexes are automatically updated as the table rows are
updated
A Join Index may not be accessed directly
It is a option which the optimizer may choose if the index ‘covers’
the query
25 pg.
Customer and Order Tables
CREATE TABLE customer
( cust_id INTEGER NOT NULL,
cust_name CHAR(15),
cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id );
CUSTOMERS 49 1 ORDERS
1
26 pg.
Single Table Query
How many orders have assigned customers?
Count(order_id)
----------------------
50
CUSTOMERS 49 1 ORDERS
1
27 pg.
Will Join Index Help?
How many orders have assigned valid customers?
Count(order_id)
------------------------
49
CUSTOMERS 49 1 ORDERS
1
28 pg.
Creating a Join Index
CREATE JOIN INDEX cust_ord_ix AS
SELECT (c.cust_id, cust_name),(order_id, order_status, order_date)
FROM customer c, orders o
WHERE c.cust_id = o.cust_id
PRIMARY INDEX (cust_id);
29 pg.
With Join Index
How many orders have assigned valid customers?
Count(order_id)
------------------------
49 CUSTOMERS 49 1 ORDERS
1
30 pg.
Join Index Coverage
How many valid customers have assigned orders in January 1999?
31 pg.
Join Index Comparison
Name the valid customers who have open orders in January 1999?
32 pg.
Aggregate Join Indexes
Aggregate Join Indexes are:
• Designed for queries which use counts, sums and averages
• Extracted aggregated data optionally based on months or years
• An alternative to summary tables
• Automatically updated as base tables change
• An option for the optimizer when the index covers the query
• Are not compatible with Multiload or Fastload
33 pg.
Traditional Aggregation
SELECT EXTRACT(YEAR FROM salesdate) AS Yr
, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales
WHERE itemid = 10 AND Yr IN (‘1997’, ‘1998’)
GROUP BY 1,2
Yr Mon Sum(sales)
ORDER BY 1,2; ----------- ----------- --------------
1997 1 2150.00
Explanation 1997 2 1950.00
-------------------------------------------------------------------------- 1997 8 1950.00
1) First, we do a SUM step to aggregate from PED1.daily_sales by 1997 9 2100.00
way of the primary index "PED1.daily_sales.itemid = 10" with a 1998 1 1950.00
residual condition of ("((EXTRACT(YEAR FROM 1998 2 2100.00
(PED1.daily_sales.salesdate )))= 1997) OR ((EXTRACT(YEAR 1998 8 2200.00
FROM (PED1.daily_sales.salesdate )))= 1998)"), and the grouping 1998 9 2550.00
identifier in field 1. Aggregate Intermediate Results are
computed locally, then placed in Spool 2. The size of Spool 2 is
estimated with high confidence to be 1 to 1 rows.
34 pg.
Creating An Aggregate Index
CREATE SET TABLE daily_sales ,NO
FALLBACK ,
(
itemid INTEGER,
salesdate DATE FORMAT 'YY/MM/DD',
sales DECIMAL(9,2))
PRIMARY INDEX ( itemid );
35 pg.
Query Using Aggregate Index
SELECT EXTRACT(YEAR FROM salesdate)AS Yr
, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales
WHERE itemid = 10 AND Yr IN (‘1997’, ‘1998’)
Yr Mon Sum(sales)
GROUP BY 1,2 ------------ ----------- --------------
ORDER BY 1,2; 1997 1 2150.00
1997 2 1950.00
1997 8 1950.00
1997 9 2100.00
1998 1 1950.00
Explanation 1998 2 2100.00
----------------------------------------------------------------------- 1998 8 2200.00
1) First, we do a SUM step to aggregate from join index table 1998 9 2550.00
PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in
field 1. Aggregate Intermediate Results are computed locally,
then placed in Spool 2. The size of Spool 2 is estimated with low
confidence to be 4 to 4 rows.
36 pg.
Join Index Summary
A Join Index:
Is a denormalization tool
Pre-joins existing tables
Aggregates existing columns
Can improve performance for covered queries
Can join more than two tables
Can use inner, outer and cross joins
Costs additional disk space
Costs additional maintenance processing for updates
Cannot be accessed directly by SQL
Is a choice for the optimizer
37 pg.
ANSI Timestamp
Timestamp combines date and time into a single column.
tmstampb
---------------------------------------
2001-11-06 13:48:38.580000
38 pg.
Timestamp + Interval
YEAR
Timestamp may be combined with any YEAR TO MONTH
day-time interval to produce a new MONTH
timestamp. DAY
DAY TO HOUR
TIMESTAMP + DAY TO MINUTE = TIMESTAMP
DAY TO SECOND
HOUR
HOUR TO MINUTE
MINUTE
MINUTE TO SECOND
SECOND
Subtract 2 yrs and 6 mos from the designated timestamp:
SELECT TIMESTAMP '1999-10-01 09:30:22'
- INTERVAL '2-06' YEAR TO MONTH;
1997-04-01 09:30:22
39 pg.
Timestamp Subtraction
YEAR
YEAR TO MONTH
TIMESTAMP - TIMESTAMP = MONTH
DAY
Given the following two timestamps, calculate the difference DAY TO HOUR
between them as directed: DAY TO MINUTE
DAY TO SECOND
In months? HOUR
HOUR TO MINUTE
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
MINUTE
TIMESTAMP '1998-09-19 08:20:00') MONTH; MINUTE TO SECOND
SECOND
13
In years?
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
TIMESTAMP '1998-09-19 08:20:00') YEAR;
1
In days?
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
TIMESTAMP '1998-09-19 08:20:00') DAY(3);
396
40 pg.
Using Timestamp In An Application
CREATE TABLE Repair_time
( serial_number INTEGER
,product_desc CHAR(8)
,start_time TIMESTAMP(0)
,end_time TIMESTAMP(0))
UNIQUE PRIMARY INDEX (serial_number);
41 pg.
Calculating Time Intervals
Produce a report showing each TV by serial number and how long in days, hours and
minutes it took to repair the TV?
serial_number work_time
------------------- --------------
100 2 02:50
What is the average amount of time it takes to repair a TV?
101 3 03:50
102 1 00:40 Show the answer in days, hours and minutes.
103 6 21:20
104 2 23:50 SELECT AVG( (end_time - start_time) DAY TO MINUTE)
105 2 06:10 AS avg_repair_time
106 5 02:50 FROM Repair_time;
107 1 20:20
108 2 05:10 avg_repair_time
---------------------
3 01:40
42 pg.
Comparing Intervals
Show the serial number and the number of days required for
each TV that took longer than 2 days to repair.
SELECT serial_number,
(end_time - start_time) DAY TO MINUTE
AS #_DaysHrsMns
FROM Repair_time
WHERE #_DaysHrsMns >
INTERVAL '02 00:00' DAY TO MINUTE;
serial_number #_DaysHrsMns
-------------------- --------------------
106 5 02:50
101 3 03:50
108 2 05:10
100 2 02:50
104 2 23:50
103 6 21:20
105 2 06:10
43 pg.
Advanced Use of Timestamp - Example 1
Produce a list which pairs by serial number any two TV’s that
were being repaired at the same time.
44 pg.
Advanced Use of Timestamp - Example 2
What percentage of all TV’s took 2 or more days to repair?
SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%')
FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt),
(SELECT serial_number, (end_time - start_time) day AS Num_Days
FROM Repair_time
WHERE Num_days > INTERVAL '02' DAY) AS temp2(serial, Number_days)
GROUP BY cnt;
((100*Count(serial))/cnt)
----------------------------------
33% Incorrect Answer
SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%')
FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt),
(SELECT serial_number, (end_time - start_time) day AS Num_Days
FROM Repair_time
WHERE Num_days > INTERVAL '02 00:00' DAY TO MINUTE;)
AS temp2(serial, Number_days)
GROUP BY cnt;
((100*Count(serial))/cnt)
----------------------------------
78% Correct Answer
45 pg.
Performance Reminders
46 pg.
Summary
47 pg.