Anda di halaman 1dari 100

Introduction

to the
Teradata Database

Course 25964

Version 2.1

Course Objectives
After completing this course, you should be able to:
Describe the purpose and function of the Teradata Database.
Navigate relational tables using Primary Keys and Foreign Keys.
List the principal components of the Teradata Database and
describe their functions.
Describe the Teradata Database features that provide fault tolerance.
Describe the Primary Index and the Secondary Index.
Explain data distribution and data access mechanics in the Teradata
Database.

Course Audience

Who Should Attend


This course is designed for anyone who will be working with the
Teradata Database, including programmers, administrators, designers,
support personnel and end users.

Class Format
This course consists of:
One day of classroom instruction
Review exercises following each module
A course handbook in facing-page format

Course Modules
This course consists of:
Module 1:

Teradata Database Overview

Module 2:

Relational Database Concepts

Module 3:

Teradata and the Data Warehouse

Module 4:

Components and Architecture

Module 5:

Databases and Users

Module 6:

Data Distribution and Access

Module 7:

Secondary Indexes and Full-Table Scans

Module 8:

Fault Tolerance and Data Protection

Module 9:

Client Tools and Utilities

Course Appendices

This course contains the following appendices:


Appendix A: Review Questions/Solutions
Appendix B: Born to be Parallel
Appendix C: Third Normal Form

Teradata Database Overview


After completing this module, you should be able to:
Describe the purpose of the Teradata Database product.
Identify supported operating systems.
List activities that Teradata Database Administrators
(DBA) never have to perform.
Describe the advantages of the Teradata Database.

What is the Teradata Database?


Relational Database Management System
Built on a Parallel Architecture

Runs on MP-RAS UNIX,


Microsoft Windows
2000/2003 Server, and SuSE
Linux

Teradata Database Server

LAN

Channel

UNIX

Mainframes

Win
200
3

Win
2000

Clients

Win
XP

Teradata Parallel Architecture


More warehouse data

Parallel-Aware Optimizer

Linear Scalability (10GB to 100+TB)

Single, Administrative View

Hashing provides for automatic data


distribution

Ad hoc queries with ANSI


standard SQL

Teradata Database Server

LAN

Channel

UNIX

Mainframes

Win
200
3

Win
2000

Clients

Win
XP

Teradata Database Advantages


Proven Linear Scalability - increased workload without decreased throughput
Most Concurrent Users - multiple complex queries
Unconditional Parallelism - sorts, aggregations and full-table scans are
performed in parallel
Mature Optimizer - robust and parallel aware, handles complex queries,
multiple joins per query, ad hoc processing
Low TCO - ease of setup and maintenance, robust parallel utilities, no re-orgs,
automatic data distribution, low disk to data ratio, robust expansion utility
High Availability - no single point of failure, fault-tolerant architecture
Single View of the Business - single database server for multiple clients

Teradata Database Manageability


Things Teradata Database Administrators never have to do!
Reorganize data or index space
Pre-allocate table or index space
Physically format partitions or disk space
Pre-prepare data for loading (convert, sort, split, etc.)
Ensure that queries run in parallel
Unload/reload data spaces due to expansion

The Administrator knows that if the data is to be doubled,


the system can be easily expanded to accommodate it.
The amount of work required to create a table which will
contain 100 rows is the same as that to create a table
which will contain 1,000,000,000 rows.

Teradata Database Features


Designed to process large quantities of detail data
Ideal for data warehouse applications
Parallelism makes easy access to very large tables possible
Open architecture - uses industry standard components
Performance increase is linear as components are added
Runs as a database server to client applications
Runs on multiple hardware platforms (SMP) and Teradata hardware
(MPP)

Module 1 Review Questions


1. Name three operating systems that the Teradata Database runs on:
2. Which of the following describes the scalability of the Teradata Database?
a. Linear b. Parallel c. Exponential d. Shared
3. Which feature allows the Teradata Database to process large amounts of data
quickly?
a. High availability software and hardware components
b. Parallelism
c. Proven scalability
d. High performance servers from Intel
4.

The Teradata Database is primarily a:

a. Server

b. Client

5.

Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading

Relational Database Concepts


After completing this module, you should be able to:
Define the terms associated with relational theory.
Discuss the function of the Primary Key.
Discuss the function of Foreign Keys.
List the advantages of a relational database.

What is a Database?
A database is a collection of permanently stored data that is:
Logically related - the data relates to other data.
Shared - many users may access the data.
Protected - access to data is controlled.
Managed - the data has integrity and value.

Logical/Relational Modeling
The Logical Model
Should be designed without regard to usage
Accommodates a wide variety of front end tools
Allows the database to be created more quickly
Should be the same regardless of data volume
Data is organized according to what it represents (real world business
data in table (relational) form)
Includes all the data definitions within the scope of the application or
enterprise
Is generic the logical model is the template for physical
implementation on any RDBMS platform

Normalization
Process of reducing a complex data structure into a simple, stable one
Involves removing redundant attributes, keys, and relationships from the
conceptual data model

Relational Databases
Relational Databases are founded on Set Theory and based on the Relational Model.
A Relational Database consists of a collection of logically related tables.
A table is a two dimensional representation of data consisting of rows and columns.
Column

EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER

Row

1006
1008
1005
1004
1007
1003

JOB LAST
CODE NAME

1019
1019
0801
1003

301
301
403
401

312101
312102
431100
412101

0801

401

411100

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

The employee table has:


Nine columns of data
Six rows of data - one per employee
Only one row format for the entire table
Missing data values represented by nulls
Column and row order are arbitrary

Primary Keys
Primary Key (PK) values uniquely identify each row in a table.
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER

JOB LAST
CODE NAME

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

PK
1006
1008
1005
1004
1007
1003

1019
1019
0801
1003

301
301
403
401

312101
312102
431100
412101

0801

401

411100

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

Primary Key Rules


A Primary Key is required for every table.
Only one Primary Key is allowed in a table.
Primary Keys may consist of one or more columns.
Primary Keys cannot have duplicate values (ND).
Primary Keys cannot be null (NN).
Primary Keys are considered non-changing values (NC).

Foreign Keys
EMPLOYEE (partial listing)
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER

JOB LAST
CODE NAME

PK

FK

FK

1006
1008
1005
1004
1007
1003

1019
1019
0801
1003

301
301
403
401

312101
312102
431100
412101

0801

401

411100

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

FK
Stein
Kanieski
Ryan
Johnson
Villegas
Trader

Foreign Key (FK) values


model relationships.

DEPARTMENT

Foreign Keys (FK) are optional.


A table may have more than one FK.
A FK may consist of more than one
column.
FK values may be duplicated.
FK values may be null.
FK values may be changed.
FK values must exist elsewhere as a
PK (i.e. have referential integrity).

DEPARTMENT DEPARTMENT
NUMBER
NAME
PK
501
301
302
403
402
401
201

marketing sales
research and development
product planning
education
software support
customer support
technical operations

MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER
80050000
46560000
22600000
93200000
30800000
98230000
29380000

FK
1017
1019
1016
1005
1011
1003
1025

Answering Questions with a


Relational Database
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER
PK
1006
1008
1005
1004
1007
1003

FK
1019
1019
0801
1003

FK
301
301
403
401

0801

401

JOB LAST
CODE NAME
FK
312101
312102
431100
412101
432101
411100

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

DEPARTMENT
DEPARTMENT
NUMBER

DEPARTMENT
NAME

MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER

PK
501
301
302
403
402
401
201

FK
marketing sales
research and development
product planning
education
software support
customer support
technical operations

80050000
46560000
22600000
93200000
30800000
98230000
29380000

1017
1019
1016
1005
1011
1003
1025

1. Name the department in


which James Trader works.
2. Who manages the
Education Department?
3. Identify by name an
employee who works
for James Trader.
4. James Trader manages
which department?

Relational Advantages
Advantages of a Relational Database compared to other database
methodologies include:
More flexible than other types
Allows businesses to quickly respond to changing conditions
Being data-driven vs. application driven
Modeling the business, not the processes
Makes applications easier to build because the data does more
of the work
Supporting trend toward end-user computing
Being easy to understand
No need to know the access path
Solidly founded in set theory

Module 2 Review Questions


Match each term with its definition:
__b_1. Database
__e_2. Table
__g_3. Relational database
___4. Primary Key
__d_5. Null
C 6. Foreign Key
_a__7. Row

a. A set of columns that uniquely


identify a row.
b. A set of logically related tables.
c. One or more columns that exist
as a PK value in another table in
the database.
d. The absence of a value or an
unknown value.
e. A two-dimensional array of rows
and columns.
f. A collection of permanently
stored data.
g. One instance of all columns in a
table.

Teradata and the Data Warehouse

After completing this module, you should be able to:


Identify the different types of enterprise data processing.
Define a data warehouse and an active data warehouse.
Define the different types of data marts.
Explain the advantages of detail data over summary data.

Evolution of Data Processing


A transaction
is a logical
unit of work.

T
R
A
D
I
T
I
O
N
A
L
T
O
D
A
Y

Type
OLTP

Examples
Update a checking account to reflect
a deposit.

Number of Rows
Accessed

Response
Time

Small

Seconds

Large

Seconds or
minutes

Debit transaction takes place against


current balance to reflect amount of
money withdrawn at ATM.

DSS

How many child size blue jeans were


sold across all of our Eastern stores
in the month of March?
What were the monthly sales of
shoes for retailer X?

OLAP

Show the top ten selling items across Large of detail rows or Seconds or
all stores for 1997.
moderate of summary minutes
rows
Show a comparison of sales from this
week to last week.

Data
Mining

Which customers are most likely to


leave?
Which customers are most likely to
respond to this promotion?

Phase 1: Minutes
Moderate to large
detailed historical rows or hours
Phase 2:
Seconds or less

The Advantage of Using Detail Data


STORE ITEM DAY
STORE
ITEM
NUMBER NUMBER
PK
F
1K
1
1
1
1
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
..
..
2
.
.
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
..
..
2
2
.
.
5
2
5
2
5
2
5
5
5
2
5
2
5
2
5
2
5
2
5
5
2
5
2
5
2

DATE

QUANTITY
SOLD

June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.
June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
4

110
12
12
6
14
7
10
4
34
2
41
4
42
0
29
6
16
7
16
4
115
7
10
89
.9.
.5
4
0
3
7
2
2
3
0
14
7
12
4
16
6
49
4
6
3
8
2
4
3
4
4
0
.5.
.
1
13
1
4
1
0
1
2
3
5
6
2
8
5
1
7
2
8
68
1
2
4
11
2

QUESTION: How effective was the national advertisement


for jeans that ran June 6 through June 8?

STORE ITEM DAY


STORE
ITEM
WEE
NUMBER NUMBER ENDING
K

QUANTITY
SOLD

PK
..
.
1
1
..
.
2
2
..
.
5
5
..
.

F
K

..
.
2
2
..
.
2
2
..
.
2
2
..
.

..
.
June0
June1
7
.4.
.
June
June1
07
.4.
.
June0
June1
7
.4.
.

..
.. .
136
.
137
3
.6.
.. .
45
.
44
6
.1.
.
16
18
1
.6.
.

DETAIL DATA
vs.
SUMMARY DATA
Detail data gives
a more accurate
picture.
Correct business
decisions result.

Data Warehouse Usage Evolution


STAGE 1

STAGE 2

STAGE 3

STAGE 4

STAGE 5

REPORTING
WHAT
happened?

ANALYZING
WHY
did it happen?

PREDICTING
WHAT
will happen?

OPERATIONALIZING
WHAT
is happening?

ACTIVE WAREHOUSING
MAKING
it happen!

Primarily Batch
Pre-Defined
Reports

Increase in
Ad Hoc
Queries

Analytical
Modeling
Grows

Batch

Ad
Hoc

Analytics

Continuous Update &


Time Sensitive Queries
Become Important
Continuous Update
Short Queries

Event Based
Triggering
Takes Hold
Event-Based
Triggering

Active Data Warehousing


Performance - response time within seconds
Scalability
large amounts of detailed data
mixed workloads (both tactical and strategic queries) for
mission critical applications
concurrent users
Availability and Reliability - 7 x 24
Data Freshness - accurate, up to the minute data, including
access to operational data store level information

The Data Warehouse


A central, enterprise-wide database that contains information extracted from
operational systems.
Based on enterprise-wide
model
Can begin small but
may grow large rapidly
Populated by
extraction/loading
of data from
operational systems
Responds to
end-user what if queries
Minimizes data movement/
synchronization
Provides a Single View of
the business

Accounts
Receivable

Inventory

Teradata Database

Cognos

Microstrategy

Operational
Systems

POS

Data Warehouse

Biz
Objects

Access Tools

End Users

Data Marts
A data mart is a special purpose subset of enterprise data for a particular function
or application. It may contain detail or summary data or both.
Data mart types:
Independent - created directly from operational systems to a separate physical
data store
Logical - exists as a subset of existing data warehouse via Views
Dependent - created from data warehouse to a separate physical data store

Operational Systems
Independent
Data Mart

Dependent
Data Mart
Data
Warehouse
Logical
Data
Mart

Module 3 Review Questions


1. Name three types of enterprise data processing and give examples.
2. What is the difference between a data warehouse and a data mart?

3. Which type of data mart gets its data directly from the data warehouse?

4. Name the two types of queries that an Active Data Warehouse supports
for mission critical applications.
5. Match the data warehouse usage evolution stage to its description:
___ Stage 1
___ Stage 2
___ Stage 3
___ Stage 4
___ Stage 5

a. Continuous updates and time sensitive queries


b. Event-based triggering takes hold
c. Analytical modeling
d. Increase in ad hoc queries
e. Primarily batch

Components and Architecture


After completing this module, you should be able to:
Describe a node.
List the major components of the Teradata Database
architecture and their functions.
Describe the overall Teradata Database parallel
architecture.
Explain how the Teradata Database functions with
channel and network attached clients.

What is a Node?
ChannelAttached
Systems

LAN

CHANNEL

W/S

TERADATA
GATEWAY

CHANNEL DRIVER
UNIX

WIN PDE
2K

PC

PE
VPROC

PE
VPROC
BYNET DRIVER

Linux

AMP
VPROC

VDISK

AMP
VPROC

AMP
VPROC

AMP
VPROC

VDISK

VDISK

VDISK

Teradata software, gateway software and channel-driver software run as processes


Parsing Engines (PE) and Access Module Processors (AMP) are Virtual Processors
(VPROC) which run under control of Parallel Database Extensions (PDE)
Each AMP is associated with a Virtual Disk (VDISK)
A single node is called a Symmetric Multi-Processor (SMP)
All AMPs and PEs communicate via the BYNET

Major Components of the Teradata Database

Answer Set Response

SQL Request
Node

Parsing Engine

BYNET

AMP

AMP

AMP

AMP

VDISK

VDISK

VDISK

VDISK

The Parsing Engine (PE)


Answer Set Response

SQL Request

The Parsing Engine is responsible for:


Managing individual sessions
(up to 120 sessions per PE)
Parsing and optimizing your SQL
requests
Building query plans with the
parallel-aware, cost-based,
intelligent Optimizer
Dispatching the optimized plan to
the AMPs
EBCDIC/ASCII input conversion
(if necessary)
Sending the answer set response
back to the requesting client

Parsing Engine

Session
Control
Parsing

System Configuration

Optimizing
Dispatching

Data Demographics

BYNET

AMP

VDISK

AMP

AMP

AMP

VDISK

VDISK

VDISK

The BYNET
Answer Set Response

SQL Request

Parsing
Engine
BYNET
AMP

AMP

AMP

AMP

VDISK

VDISK

VDISK

VDISK

Dual redundant, fault-tolerant, bi-directional interconnect network that enables:


Automatic load balancing of message traffic
Automatic reconfiguration after fault detection
Scalable bandwidth as nodes are added
The BYNET connects and communicates with all the AMPs on the system:
Between nodes, the BYNET hardware carries broadcast and point-to-point
communications
On a node, BYNET software and PDE together control which AMPs receive a
multicast communication

The Access Module Processor (AMP)


Answer Set Response

SQL Request

The AMP is responsible for:


Storing rows to and retrieving
rows from its VDISK
Lock management
Sorting rows and aggregating
columns
Join processing
Output conversion and
formatting (ASCII, EBCDIC)
Creating answer sets for clients
Disk space management and
accounting
Special utility protocols
Recovery processing

Parsing
Engine
BYNET

AMP

VDISK

AMP

AMP

AMP

VDISK

VDISK

VDISK

AMPs perform
all tasks in parallel

The MPP System


The BYNET (both software and hardware) connects two or more SMP
Nodes to create a Massively Parallel Processing (MPP) system.
The Teradata Database is linearly expandable by adding nodes.

SMC

SMC

BYNET

NODE

SMC

SMC

BYNET

Disk Array

NODE

NODE

NODE

NODE

NODE

Disk Array

NODE

Disk Array

NODE

Disk Array

Node Cabinet

Array Cabinet

Node Cabinet

Array Cabinet

Teradata Database Software


Channel-Attached System

Network-Attached System

Client
Application

Client
Application

CLI

CLI

TDP

T
P
A

Channel

Teradata
Database Node

O
S

Parsing
Engine

MOSI

Parsing
Engine
BYNET

AMP

AMP

VDISK

VDISK

MTDP

Teradata Gateway

Channel Driver

P
D
E

LAN

AMP

AMP

VDISK

VDISK

Channel-Attached Client Software


Channel-Attached Host

Connection made via


HCA, Bus & Tag or ESCON cables,
Channel Driver, and PE

Client
Application

CLI

TDP

Channel
Host Channel
Adapters

CLI (Call-Level Interface)


Request and response control
Buffer allocation and initialization
Lowest level interface to the
Teradata Database
Library of routines for
blocking/unblocking requests and
responses to/from RDBMS
Performs logon and logoff
functions

Channel
Driver

Parsing
Engine

TDP (Teradata Director Program)


Manages session traffic between CLI and the
Teradata Database
Session balancing across multiple PEs
Failure notification (application failure, Teradata
Database restart)
Logging, verification, recovery, restart, security

Network-Attached Client Software


Network-Attached Host
ODBC
Call-level interface
Teradata Database ODBC
driver is used to connect
applications with the
Teradata Database
MTDP (Micro Teradata Director
Program)
Performs many TDP
functions including session
management but not session
balancing across PEs
MOSI (Micro Operating System
Interface)
Provides operating system
and network protocol
independent interface

Client
Application

CLI

ODBC

MTDP
LAN

MOSI

Gateway

Connection made via


Ethernet or LAN card, cables,
Teradata Gateway, and PE.
2 LAN connections for redundancy.

Parsing
Engine

Module 4 Review Questions


1. Name the three major Teradata Database components and state their purpose.

2. Why are there two LANs in a Teradata system?


3. How many sessions can a PE support?
4. What is the communications layer in a Teradata system?

Databases and Users

After completing this module, you should be able to:


Define a Database and a User.
Define Perm Space and its purpose.
Define Spool Space and its purpose.
Define Temp Space and its purpose.
Describe the hierarchy of objects in the Teradata Database.

Databases and Users Defined


Databases and Users are the repositories for objects:
Tables - require Perm Space
Views - do not require Perm Space
Macros - do not require Perm Space
Triggers - do not require Perm Space
Stored Procedures - require Perm Space

Space limits are specified for each database and for each user:
Perm Space - maximum amount of space available for permanent tables
Spool Space - maximum amount of work space available for request processing
Temp Space - maximum amount of space available for global temporary tables

A database is created with the CREATE DATABASE command.


A user is created with the CREATE USER command.
The only difference between a database and a user is the user has a
password and may logon to the system.
A database or user with no perm space may not contain permanent tables
but may contain views and macros.

Teradata Database Space Management


DBC 10GB
TDPUSER PUBLIC All

Default

SYSADMIN 5GB

SYSTEMFE

SYSDBA

Current Permanent Space


Maximum Permanent Space
No = No Permanent Space
Box
Database 1

Database 2

10GB

10GB

5GB

CRASHDUMPS 10GB

70 GB
-10 GB
-10 GB
-30 GB
20 GB
User D
Database 3

User A 30GB
10 GB

User B
10GB

User C
10GB

A new database or user must be created from an existing database or user.


All Perm Space limits are subtracted from the owner.
Perm Space is a zero-sum game the total of all Perm Space limits must equal the
total amount of disk space available.
Perm Space currently not being used is available for Spool Space or Temp Space.

Module 5 Review Questions


Indicate whether a statement is True or False.
____ 1. A database will always have tables.
____ 2. A user will always have a password.
____ 3. A user creating a subordinate user who needs tables must give up some
of its Perm Space.
____ 4. The sum of all user and database Perm Space will equal the total
space on the system.
____ 5. Deleting a view from a database reclaims Perm Space for the database.

Data Distribution and Access

After completing this module, you should be able to:


Explain the purpose of the Primary Index.
Distinguish between Primary Index and Primary Key.
State the reasons for selecting a Unique Primary Index
vs. a Non-Unique Primary Index.
Describe how the Teradata Database distributes the
rows in a table.

How Does the Teradata Database


Distribute Rows?

The Teradata Database uses a hashing algorithm to randomly


distribute table rows across the AMPs.
The Primary Index choice determines whether the rows of a table will
be evenly or unevenly distributed across the AMPs.
Evenly distributed table rows result in evenly distributed workloads.
Each AMP is responsible for its subset of the rows of each table.
The rows are not placed in any particular order.
The benefits of unordered rows include:
No maintenance needed to preserve order.
The order is independent of any query being submitted.
The benefits of hashed distribution include:
The distribution is the same regardless of data volume.
The distribution is based on row content, not data demographics.

Primary Key (PK) vs. Primary Index (PI)


The PK is a relational modeling convention which uniquely identifies each row.
The PI is a Teradata convention which determines row distribution and access.
A well designed database will have tables where the PI is the same as the PK as well
as tables where the PI is defined on columns different from the PK.
Join performance and known access paths might dictate a PI that is different from the
PK.
Primary Key (PK)

Primary Index (PI)

Logical concept of data modeling

Mechanism for row distribution and access

Teradata does not need the PK defined

A table must have one Primary Index

No limit on the number of columns

May be from 1 to 64 columns

Documented in the logical data model

Defined in the CREATE TABLE statement

Value must be unique

Value may be unique or non-unique

Uniquely identifies each row

Used to place a row on an AMP

Value should not change

Value may be changed (Updated)

May not be NULL

May be NULL

Does not imply access path

Defines the most efficient access path

Chosen for logical correctness

Chosen for physical performance

Primary Indexes
The physical mechanism used to assign a row to an AMP
A table must have a Primary Index
The Primary Index cannot be changed
UPI

If the index choice of column(s) is unique, we call this a


UPI (Unique Primary Index).
A UPI choice will result in even distribution of the rows of
the table across all AMPs.

Reasons to Choose a UPI: UPIs guarantee even data distribution, eliminate


duplicate row checking, and are always a one-AMP operation.
NUPI

If the index choice of column(s) isnt unique, we call this a NUPI


(Non-Unique Primary Index).
A NUPI choice will result in even distribution of the rows of the
table proportional to the degree of uniqueness of the index.
NUPIs can cause skewed data.

Why would you choose an Index that is different from the Primary Key?
Join performance
Known access paths

Defining the Primary Index

The Primary Index (PI) is defined at table creation.


Every table must have one Primary Index.
The Primary Index may consist of 1 to 64 columns.
The Primary Index of a table may not be changed.
The Primary Index is the mechanism used to assign a row to an AMP.
The Primary Index may be Unique (UPI) or Non-Unique (NUPI).
Unique Primary Indexes result in even row distribution and eliminate
duplicate row checking.
Non-Unique Primary Indexes result in even row distribution proportional
to the number of duplicate values. This may cause skewed distribution.
UPI Table

NUPI Table

CREATE TABLE Table1


( Col1 INTEGER
,Col2 INTEGER
,Col3 INTEGER )
UNIQUE PRIMARY INDEX (Col1);

CREATE TABLE Table2


( Col1 INTEGER
,Col2 INTEGER
,Col3 INTEGER )
PRIMARY INDEX (Col2);

Row Distribution via Hashing


Index value
Hashing Algorithm

Row
HASH BUCKET#
Hash

Hash Map

AMP #

{
{
{
{

A Row's Primary Index value is passed into the Hashing


Algorithm.
The Hashing Algorithm is designed to ensure even
distribution of unique values across all AMPs.

The Hashing Algorithm outputs a 32-bit Row-Hash value.


The first 16-bits (the Hash Bucket Number) are used as a
pointer into the Hash Map.
Hash values are calculated using the hashing algorithm.
The Hash Map is uniquely configured for each system.
The Hash Map is an array which associates the DSW
with a specific AMP.
Two systems with the same number of AMPs will
have the same Hash Map.
Changing the number of AMPs in a system requires a
change to the Hash Map.

Unique Primary Index (UPI) Access


CUSTOMER table

SELECT *
FROM Customer
WHERE Cust = 45;
PE

Cust

37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40

CREATE TABLE Customer


( Cust INTEGER
,Name CHAR(10)
,Phone CHAR(8) )
UNIQUE PRIMARY INDEX (Cust);

Hashing
Algorithm

BYNET

Base Table
Cust
UP
I49
45
56
51

Name
Smith
Adams
Smith
Marsh

Phone
111-6666
444-6666
555-7777
888-2222

AMP 2

AMP 3

AMP 4

Base Table

Base Table

Base Table

Cust
UPI
62
84
95
77

Phone

PK
UPI

UPI = 45

AMP 1

Name

Name
Black
Rice
Peters
Jones

Phone
444-5555
666-5555
555-7777
777-6666

Cust
UPI
12
74
98
31

Name
Young
Smith
Brown
Adams

Phone
777-4444
555-6666
333-9999
111-2222

Cust

Name

Phone

UPI
27
72
40
37

Jones
Adams
Smith
White

222-8888
666-7777
222-3333
555-4444

White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith

555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333

Single AMP
access with 0 to
1 rows returned.

Non-Unique Primary Index (NUPI) Access


CUSTOMER table

SELECT *
FROM Customer
WHERE Phone = '555-7777';
PE

Cust

Hashing
Algorithm

NUPI
37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40

BYNET

AMP 1

Base Table
Cust Name

Phone
NUPI
37 White 555-4444
666-5555
84 Rice
31 Adams 111-2222
40 Smith 222-3333

AMP 2

AMP 3

AMP 4

Base Table

Base Table

Base Table

Cust Name
45
98
72
74

Adams
Brown
Adams
Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

Phone

PK

CREATE TABLE Customer


( Cust INTEGER
,Name CHAR(10)
,Phone CHAR(8) )
PRIMARY INDEX (Phone);

PI = 555-7777

Name

Cust Name

Phone
NUPI
49 Smith 111-6666
12 Young 777-4444
27 Jones 222-8888
62 Black 444-5555

Cust Name
77
95
56
51

Jones
Peters
Smith
Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith

555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333

Single AMP
access with 0 to
n rows returned.

UPI Row Distribution


Order
Order
Number
PK

Customer
Number

Order
Date

Order
Status

4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09

O
O
C
O
C
C
C
C
C

UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

7415 1

4/09
4/13

C
C

AMP 4

AMP 3

AMP 2

AMP 1

7202 2

2
3
1
1
2
1
3
1
2

7325 2

4/13

7103 1

4/10

7402 3

4/16

Order_Number values are unique


(UPI).
The rows will distribute evenly
across the AMPs.

7188 1

4/13

7225 2

4/15

7324 3

4/13

7384 1

4/12

NUPI Row Distribution


Order
Order
Number
PK

Customer
Number

Order
Date

Order
Status

4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09

O
O
C
O
C
C
C
C
C

NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

2
3
1
1
2
1
3
1
2

AMP 2

AMP 1

AMP 3

7325 2

4/13

7384 1

4/12

7202 2

4/09

7103 1

4/10

7225 2

4/15

7415 1

4/13

7188 1

4/13

Customer_Number values are


non-unique (NUPI).
Rows with the same PI value
distribute to the same AMP
causing skewed row distribution.

AMP 4

7402 3

4/16

7324 3

4/13

Highly Non-Unique NUPI Row Distribution


Order
Order
Number
PK

Customer
Number

Order
Date

Order_Status values are highly


non-unique (NUPI).

Order
Status

Only two values exist. The rows


will be distributed to two AMPs.

NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

2
3
1
1
2
1
3
1
2

AMP 1

AMP 2

7402 3

4/16

7202 2
7225 2
7415 1
7188 1
7384 1

4/09

4/15
4/13
4/13
4/12

C
C
C
C

4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09

O
O
C
O
C
C
C
C
C

This table will not perform well in


parallel operations.
Highly non-unique columns are
poor PI choices.
The degree of uniqueness is
critical to efficiency.

AMP 4

AMP 3

7103 1
7324 3

4/10

4/13

7325 2

4/13

Partitioned Primary Index (PPI)


The Orders table defined with a
Non-Partitioned Primary Index
(NPPI) on Order_Number (O_#)

Partitioned Primary Indexes:


Improve performance on
range constraint queries
Use partition elimination
to reduce the number of
rows accessed

The Orders table defined with a


Primary Index on Order_Number
(O_#) Partitioned By Order_Date
(O_Date) (PPI)

Module 6 Review Questions


1. Indicate whether the following apply to: UPI, NUPI, Either, or Neither
__________A. Specified in CREATE TABLE statement.
__________B. Provides even row distribution via the hashing algorithm.
__________C. May be up to 64 columns.
__________D. Always a one-AMP operation.
__________E. Access will return a single row.
__________F. Used to assign a row to a specific AMP.
__________G. Allows NULL.
__________H. Value cannot be changed.
__________I.

Required on every table.

__________J. Permits duplicate values.


__________K. Can never be the Primary Key.
2. Why is the choice of Primary Index important? ____________________________
3. True/False: Tables should be assigned a PI at creation._____________________

Secondary Indexes and Full-Table Scans


After completing this module, you should be able to:
Define Secondary Indexes.
List the various types of secondary indexes.
Describe the operation of a full-table scan in a
parallel environment.

Secondary Indexes
A secondary index is an alternate path to the rows of a table.
A table may have from 0 to 32 secondary indexes.
A secondary index:
does not affect table row distribution.
is chosen to improve access performance.
may reference from 1 to 64 table columns.
may be defined at table creation.
may be defined after the table is created.
may be dropped at any time.
uses a sub-table which utilizes Perm Space.
may impact table maintenance performance (row inserts,
row updates and/or row deletes).

Defining a Secondary Index


Unique Secondary Index (USI)
A Unique Secondary Index requires unique column values in each row.
Access to a referenced value requires 2 AMPs (serial operation) and returns 0 or 1 rows.
SQL to create:

CREATE UNIQUE INDEX (social_security) on Employee;

Non-Unique Secondary Index (NUSI)


A Non-Unique Secondary Index (NUSI) allows duplicate column values in the rows.
Access to a referenced value requires all AMPs (parallel operation) and returns 0 to n
rows.
SQL to create:

CREATE INDEX (last_name) on Employee;


CREATE INDEX (last_name, first_name) on Employee;

Other Types of Secondary Indexes


Join Index
Define a pre-join table on frequently joined columns (with optional aggregation) without
denormalizing the database.
Create a full or partial replication of a base table with a PI on a FK column to
facilitate joins of large tables by hashing their rows to the same AMP.
Define a summary table without denormalizing the database.
Can be defined on one or several tables.

Sparse Index
Any join index, whether simple or aggregate, multi-table or single-table, can be sparse.
Uses a constant expression in the WHERE clause of its definition to narrowly filter its row
population.

Hash Index
Used for the same purposes as single-table join indexes.
Create a full or partial replication of a base table with a PI on a FK column to
facilitate joins of large tables by hashing them to the same AMP.
Can be defined on one table only.

Value-Ordered NUSI
Very efficient for range conditions and conditions with an inequality on the secondary index
column set.

Primary Index vs. Secondary Index


Index Feature

Primary Index

Secondary Index

Yes

No

Number per table

0 to 32

Maximum number of columns

64

64

Unique or Non-Unique

Both

Both

Affects row distribution

Yes

No

Created/Dropped dynamically

No

Yes

Improves row access

Yes

Yes

Separate Sub-Table

No

Yes

Extra Processing Overhead

No

Yes

Required Index

Full-Table Scans
CUSTOMER

Cust_ID

Cust_Name

Cust_Phone

USI

NUSI

NUPI

Every data block of the table is read once


All AMPs scan their portion of the table in parallel.
The Primary Index choice will affect parallel scan performance
(UPI is even; NUPI is potentially skewed).
Full-table scans typically occur when:
the index columns are not used in the query
a non-equality or range test is specified for the index columns
SQL requests that result in a full-table scan:
SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name <> 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;

Module 7 Review Questions


For each type of access, fill each box with either Yes, No, or the appropriate number.
USI Access

NUSI Access

Full-Table Scan

Number of AMPs accessed?


Number of rows returned?
A parallel operation?
Uses separate sub-table?
Reads all data blocks?

Indicate whether each statement is True or False.


1. A USI can be used to enforce uniqueness on a PK column.
2. You can create or drop USIs and NUSIs at any time.
3. A full-table scan is not efficient because it accesses rows multiple times.
4. A full-table scan can occur when there is a range of values specified
for columns in a primary index.

Fault Tolerance and Data Protection

After completing this module, you should be able to:


Explain how locks protect data integrity.
List the types and levels of locking provided by the Teradata
Database.
Explain the concept of Fallback tables.
Describe the purpose and function of the Down AMP
Recovery Journal and the Transient Journal.
List the utilities available for archive and recovery.

Locks

Exclusiveprevents any other type of concurrent access


Types
of Locks Writeprevents other Read, Write, Exclusive locks
Readprevents Write and Exclusive locks
Accessprevents Exclusive locks only

Locks may be applied at the following levels:


Levels
Of Locks

SQL

Databaseapplies to all tables/views in the database


Table/Viewapplies to all rows in the table/view
Row Hashapplies to all rows with same row hash
Lock requests are based on the SQL request:
SELECTrequests a Read lock
UPDATErequests a Write lock
CREATE TABLErequests an Exclusive lock
Lock requests may be upgraded or downgraded:
LOCKING TABLE Table1 FOR ACCESS . . .
LOCKING TABLE Table1 FOR EXCLUSIVE . . .

Transient Journal
Transient Journal

Maintains a copy on each AMP of before images of all rows affected.


Provides rollback of changed rows in the event of TXN failure.
Activities are automatic and transparent to user.
Before images are reapplied to table if TXN fails.
Before images are discarded upon TXN completion.

Successful TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
(Add $100 to checking)
UPDATE Row B
Before image Row B recorded
(Subtract $100 from savings)
END TRANSACTION
Discard before images
Failed TXN
BEGIN TRANSACTION
UPDATE Row A

Before image Row A recorded

UPDATE Row B
Before image Row B recorded
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN) Discard before images

RAID Protection
RAID 1 (Mirroring)
Primary
Each physical disk in the array has an exact copy in the same
array.
The array controller can read from either disk and write to both.
When one disk of the pair fails, there is no change in performance.
Mirroring reduces available disk space by 50%.
Array controller reconstructs failed disks quickly.
RAID 5 (Parity)
Block 0
Data and parity striped across rank of 4 disks.
Parity
If a disk fails, any missing block may be
Block 6
reconstructed using the other three disks.
Parity reduces available disk space by 25% in a 4-disk rank.
Reconstruction of failed disks takes longer than RAID 1.

Summary

RAID-1 - Good performance with disk failures


Higher cost in terms of disk space
RAID-5 - Reduced performance with disk
failures
Lower cost in terms of disk space

Block 1
Block 3
Parity

Block 2
Block 4
Block 7

Mirror

Parity
Block 5
Block 8

Fallback
A Fallback table is
fully available in the
event of an
unavailable AMP.

PE

PE

BYNET
AMP 1

A Fallback row is a
copy of a primary
row stored on a
different AMP in the
same CLUSTER of
AMPs.

2
3

AMP 2

6
8

11
5

AMP 3

3
2

12

5
1

AMP 4

11

Primary
rows

8
6

1
12

Benefits of
Fallback:

May be specified at the table or database level


Permits access to table data during AMP off-line period
Adds a level of data protection beyond disk array RAID 1 & 5
Highest level of data protection is RAID 1 and Fallback
Automatically restores data changed during AMP off-line
Critical for high availability applications

Costs of
Fallback:

Twice the disk space for table storage is needed


Twice the I/O for INSERTs, UPDATEs and DELETEs is needed

Fallback
rows

Recovery Journal for Down AMPs


Recovery Journal is:

Automatically activated when an AMP is taken off-line


Maintained by other AMPs in the cluster
Totally transparent to users of the system

While AMP is off-line:

Journal is active
Table updates continue as normal
Journal logs Row-IDs of changed rows for down-AMP

When AMP is
back on-line:

Restores rows on recovered AMP to current status


Journal discarded when recovery complete
AMP 2

AMP 1

41

66

93

72

88

AMP 3

AMP 4

58

93

20

88

45

17

37

72

45

17

37

58

41

20

66

RJ

Row-ID 7

RJ

Row-ID 41

RJ

Row-ID 66

Cliques

Clique 1

Clique 3

Clique 2

A clique is a defined set of nodes


with failover capability.
All nodes in a clique are able to
access the vdisks of all AMPs in
the clique.
If a node fails, its vprocs will
migrate to the remaining nodes
in the clique.
Each node can support 128
vprocs.

Disk cabling groups


nodes into cliques.

Archiving and Recovering Data


Archive Recovery Utility (ARC)
Runs on IBM, UNIX, Linux and Win2K

Other Archive Applications


BakBone NetVault

Archives data from RDBMS

Restores data from archive media

Permits data recovery to a


specified checkpoint

Symantec NetBackup

Common uses of ARC

NCR 6476
6000 Slots
2 - 80 Drives

Dump database objects for backup or disaster recovery


Restore non-fallback tables after disk failure.
Restore tables after corruption from failed batch processes.
Recover accidentally dropped tables, views, or macros.
Recover from miscellaneous user errors.
Copy a table and restore it to another Teradata Database.

Module 8 Review Questions


Match each item with its definition:
____ 1. Database locks

a. Provides for TXN rollback in case of failure

____ 2. Table locks

b. Protects all rows of a table

____ 3. Row Hash locks

c. Logs changed rows for down AMP

____ 4. Fallback

d. Protects from node failure

____ 5. Cluster

e. Logical group of AMPs for fault-tolerance

____ 6. Recovery journal

f. Applies to all tables and views within

____ 7. Transient journal

g. Multi-platform archive utility

____ 8. ARC

h. Lowest level of protection granularity

____ 9. Clique

i. Protects tables from AMP failure

Client Tools and Utilities

After completing this module, you should be able to:


List the various load and unload utilities available for use with
the Teradata Database.
List the various support tools available to Teradata Database
Administrators.
List the various query tools available for use with the Teradata
Database.

Query Tools - BTEQ

Query Tools Teradata SQL Assistant


SQL front-end to Teradata Database and other ODBC compliant databases

FastLoad Utility

Fast batch utility for loading a single empty table


Automatic checkpoint/restart capability
Errors reported and collected in error tables
Supports INMOD routines and Access Modules
Loads data in two phases

MultiLoad Utility
Loads/maintains up to five empty or populated tables
Performs block level operations against target tables
Affected data blocks are written once
Multiple operations with one pass of input files
Uses conditional logic to applying updates
Supports INSERT, UPDATE, DELETE and UPSERT operations
Supports INMOD routines and Access Modules
Errors reported and collected in error tables
Provides automatic checkpoint/restart capability

FastExport Utility

Exports large volumes of formatted data from one or more


tables on the Teradata Database to a host file or user-written
application
Supports multiple sessions
Export from multiple tables
Provides automatic checkpoint/restart capability

TPump Utility

Allows near real-time updates from transactional systems into the warehouse
Allows constant loading of data into a table
Performs INSERT, UPDATE, DELETE, and ATOMIC UPSERT operations, or a
combination, to more than 60 tables at a time
High-volume SQL-based continuous update of multiple tables
Allows target tables to:
Have secondary indexes, referential integrity, constraints and enabled triggers
Be MULTISET or SET
Be populated or empty
Allows conditional processing
Supports automatic restarts
No session limituse as many sessions as necessary
No limit to the number of concurrent instances
Uses row-hash locks, allowing concurrent updates on the same table
Can be stopped at any time with work committed with no ill effect
Designed for highest possible throughput
Gives users the control over the rate per minute (throttle) at which statements are
sent to the database either dynamically or by script

Teradata Parallel Transporter

Parallel Extract, Transform and Load (end-to-end parallelism) eliminates


sequential bottlenecks
Data Streams eliminate the overhead of persistent storage
Single SQL-like scripting language
Access to various data sources
Open API enables Third Party and user application integration

Teradata Parallel Transporter Operators

TPT Operator

Teradata Utility

Description

LOAD

FastLoad

A consumer-type operator that uses the Teradata


FastLoad protocol. Supports Error limits and Checkpoint/
Restart. Both support Multi-Value Compression and PPI.

UPDATE

MultiLoad

Utilizes the Teradata MultiLoad protocol to enable job


based table updates. This allows highly scalable and
parallel inserts and updates to a pre-existing table.

EXPORT

FastExport

A producer operator that emulates the FastExport utility

STREAM

TPump

Uses multiple sessions to perform DML transactions in


near real-time.

DataConnector

N/A

This operator emulates the Data Connector API. Reads


external data files, writes data to external data files, reads
an unspecified number of data files.

ODBC

N/A

Reads data from an ODBC Provider.

Teradata Manager
Graphical system management tool - Collects, analyzes, and displays:
Performance information

Teradata Dynamic Workload Manager


Query workload management tool (formerly Teradata Dynamic Query Manager) that:
Restricts (i.e. runs, suspends, schedules later or rejects) query based on set thresholds

Based on analysis control:


Too long
-- Too
many rows
Based on object control:
- User ID
- Table
- Day/time
- Group ID
Logs workload
performance for analysis
Based on environmental
factors
- CPU
- Disk utilization
- Network activity
- Number of users

Users
Accounts
Profiles

Analyst Tools Teradata Visual Explain


Provides the ability to capture and graphically represent the steps of a query
plan and perform comparisons of two or more plans
Stores query plans in a Query Capture Database (QCD)

Analyst Tools
Teradata System Emulation Tool
Emulates a target system by exporting and importing all information necessary to
emulate in a test environment

- Use with Target


Level Emulation
to generate query
plans on a test
system as if they
were run on the
target system
- Verifies queries
and reproduces
optimizer related
issues in a test
environment

Analyst Tools Teradata Index Wizard


Recommends secondary indexes for tables, based on a particular workload

Analyst Tools Teradata Statistics Wizard


Recommends and automates the Statistics Collection process
Recommends Statistics to be re-collected due to table growth

Module 9 Review Questions


Match each item with its definition:

___2. MultiLoad

a.
b.
c.

___3. FastLoad

d.

___1. TPump

___4. FastExport

e.

___5. Teradata Manager


___6. Teradata Dynamic
Query Manager

f.
g.

___7. BTEQ
___8. Teradata SET

h.
i.

___9. Teradata Index Wizard


___10. Teradata Statistics
Wizard
___11. Teradata Visual Explain

j.
k.

Graphical system management tool.


Query workload management tool.
Utility that performs block level operations
against populated tables.
Utility that allows constant loading of data
(streaming) into a table.
Utility that allows export of data from multiple
tables.
Utility that performs fast batch loads into
unpopulated tables.
SQL query front-end that runs on all client
platforms.
Recommends Secondary Indexes for a workload.
Utility that uses a Query Capture Database to
store query plans.
Utility that recommends and automates the
Statistics Collection process.
Utility that verifies queries and reproduces
optimizer related (query plans) issues on a test
environment.

More Information

For more information on topics discussed in this course, see the following
resources:
Documentation: http://www.info.Teradata.com
Practice tests for certification: http://www.Teradata.com/certification
Available courses: Teradata Education Network
http://www.TeradataEducationNetwork.com

Appendix A

Review Questions/Solutions

Module 1 Review Questions


1. Name three operating systems that the Teradata Database runs on:
MP-RAS UNIX
MS Windows 2003 Server SuSE LINUX
2. Which of the following describes the scalability of the Teradata Database?
a. Linear b. Parallel c. Exponential d. Shared
3. Which feature allows the Teradata Database to process large amounts of data
quickly?
a. High availability software and hardware components
b. Parallelism
c. Proven scalability
d. High performance servers from Intel
4.

The Teradata Database is primarily a:

a. Server

b. Client

5.

Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading

Module 2 Review Questions


Match each term with its definition:
F

1. Database

a. A set of columns that uniquely


identify a row.

2. Table

b. A set of logically related tables.

3. Relational database

c. One or more columns that exist


as a PK value in another table in
the database.

4. Primary Key

d. The absence of a value or an


unknown value.

5. Null

e. A two-dimensional array of rows


and columns.

6. Foreign Key

7. Row

f.

A collection of permanently
stored data.

g. One instance of all columns in a


table.

Module 3 Review Questions


1. Name three types of enterprise data processing and give examples.
OLTP
Withdraw cash from ATM
DSS
How many items sold for a given month?
OLAP
What are the top 10 selling items for a given month?
2. What is the difference between a data warehouse (DW) and a data mart (DM)?
DW
Central enterprise-wide, detail data, historically unlimited
DM
Subset of enterprise data, detail data, summary data, limited history
3. Which type of data mart gets its data directly from the data warehouse?
Dependent
4. Name the two types of queries that an Active Data Warehouse supports
for mission critical applications. Strategic and Tactical
5. Match the data warehouse usage evolution stage to its description:
E
D
C
A
B

Stage 1
Stage 2
Stage 3
Stage 4
Stage 5

a. Continuous updates and time sensitive queries


b. Event-based triggering takes hold
c. Analytical modeling
d. Increase in ad hoc queries
e. Primarily batch

Module 4 Review Questions

1. Name the three major Teradata Database components and state their purpose.
PE Parse, Optimize and Dispatch queries
AMPData storage and retrieval
BYNET Communication between PEs and AMPs
2. Why are there two LANs in a Teradata system? For redundancy
3. How many sessions can a PE support? 120
4. What is the communications layer in a Teradata system? BYNET driver

Module 5 Review Questions


Indicate whether a statement is True or False.
F

1. A database will always have tables.

2. A user will always have a password.

3. A user creating a subordinate user who needs tables must give up some
of its Perm Space.

4. The sum of all user and database Perm Space will equal the total
space on the system.

5. Deleting a view from a database reclaims Perm Space for the database.

Module 6 Review Questions


1. Indicate whether the following apply to: UPI, NUPI, Either, or Neither
EITHER

A. Specified in CREATE TABLE statement.

UPI

B. Provides even row distribution via the hashing algorithm.

EITHER

C. May be up to 64 columns.

EITHER

D. Always a one-AMP operation.

UPI

E. Access will return a single row.

EITHER

F. Used to assign a row to a specific AMP.

EITHER

G. Allows NULL.

NEITHER

H. Value cannot be changed.

EITHER

I.

NUPI

J. Permits duplicate values.

NUPI

K. Can never be the Primary Key.

Required on every table.

2. Why is the choice of Primary Index important?


Data distribution and access performance.
3. True/False: Tables should be assigned a PI at creation.

True

Module 7 Review Questions


For each type of access, fill each box with either Yes, No, or the appropriate number.
USI Access

NUSI Access

Full-Table Scan

Number of AMPs accessed?


Number of rows returned?
A parallel operation?
Uses separate sub-table?
Reads all data blocks?

Indicate whether each statement is True or False.


1. A USI can be used to enforce uniqueness on a PK column.
2. You can create or drop USIs and NUSIs at any time.
3. A full-table scan is not efficient because it accesses rows multiple times.
4. A full-table scan can occur when there is a range of values specified
for columns in a primary index.

Module 8 Review Questions


Match each item with its definition:
F

1. Database locks

a. Provides for TXN rollback in case of failure

B 2. Table locks

b. Protects all rows of a table

H 3. Row Hash locks

c. Logs changed rows for down AMP

4. Fallback

d. Protects from node failure

5. Cluster

e. Logical group of AMPs for fault-tolerance

C 6. Recovery journal

f. Applies to all tables and views within

A 7. Transient journal

g. Multi-platform archive utility

G 8. ARC

h. Lowest level of protection granularity

D 9. Clique

i. Protects tables from AMP failure

Module 9 Review Questions


Match each item with its definition:

C 2. MultiLoad

a.
b.
c.

3. FastLoad

d.

4. FastExport

D 1. TPump

e.

A 5. Teradata Manager
B 6. Teradata Dynamic
Query Manager

f.
g.

G 7. BTEQ
K 8. Teradata SET

h.
i.

H 9. Teradata Index Wizard


J
I

10. Teradata Statistics


Wizard
11. Teradata Visual Explain

j.
k.

Graphical system management tool.


Query workload management tool.
Utility that performs block level operations
against populated tables.
Utility that allows constant loading of data
(streaming) into a table.
Utility that allows export of data from multiple
tables.
Utility that performs fast batch loads into
unpopulated tables.
SQL query front-end that runs on all client
platforms.
Recommends Secondary Indexes for a workload.
Utility that uses a Query Capture Database to
store query plans.
Utility that recommends and automates the
Statistics Collection process.
Utility that verifies queries and reproduces
optimizer related (query plans) issues on a test
environment.

Anda mungkin juga menyukai