Anda di halaman 1dari 21

TERADATA- DAY 1

Teradata Introduction
Teradata Architecture
Types of spaces
Teradata Data Protection Mechanisms

Prepared By

AnilKumar P

Teradata Introduction:
What Is Teradata?
-Teradata is a relational database management system (RDBMS) that drives a
company's data warehouse.
-The origin of the name Teradata is "tera-, derived from Greek which means
"trillion.
-Teradata was the first commercial database system to scale to and support a
trillion bytes of data.

Teradata Advantages :
Single data store
Scalability
Unconditional parallelism (parallel architecture)
Ability to model the business
Mature, parallel-aware Optimizer

Single Data Store

Teradata acts as a single data store, with multiple client applications making
inquiries against it concurrently

Scalability
Addition of new components to the system increases the performance
linearly.
Adding components allows the system to accommodate increased workload
without decreased throughput .

Complexity
Teradata is adept at complex data models that satisfy the information
needs throughout an enterprise.
It has the ability to perform large aggregations during query run time
and can perform up to 64 joins in a single query.

Concurrent Users
Teradata has the ability to handle from hundreds to thousands of users , who
are often running multiple, complex queries on the system simultaneously.

Unconditional Parallelism
Teradatas ability to manage large amounts of data is accomplished using
the concept of parallelism, wherein many individual processors perform
smaller tasks concurrently to accomplish an operation against a huge
repository of data.
Teradata's parallelism does not depend on query tuning, limited data
quantity, column range constraints, or specialized data models -Teradata has "unconditional parallelism."

Ability to Model the Business


It support all types of data model.

Mature, Parallel-Aware Optimizer


Teradata's Optimizer is the most robust in the industry, able to handle:
Multiple complex queries
64 Joins per query
Unlimited ad-hoc processing
The Optimizer is parallel-aware, meaning that it has knowledge of system
components and determines the least expensive plan (time wise)
to process queries fast and in parallel.

Teradata System :
A Teradata system contains one or more nodes where the processing occurs
for the Teradata Database .
There are two types of Teradata systems:
Symmetric multiprocessing (SMP) - An SMP Teradata system has a
single node that contains multiple CPUs sharing a memory pool.
Massively parallel processing (MPP) - Multiple SMP nodes working
together comprise a larger, MPP implementation of Teradata. The
nodes are connected using the BYNET, which allows multiple
virtual processors on multiple nodes to communicate with each other.

BYNET
The BYNET is a high-speed interconnect that enables nodes in the system
to communicate. It has several unique features:

Scalable: Addition of nodes to the system, increases the system size


without performance penalty -- and sometimes even increase
performance.
High performance: An MPP system typically has two BYNET
networks (BYNET 0 and BYNET 1). Because both networks in a system
are active, the system benefits from having full use of the aggregate
bandwidth of both the networks.
Fault tolerant: Each network has multiple connection paths. If the
BYNET detects an unusable path in either network, it will automatically
reconfigure that network so all messages avoid the unusable path.
Load balanced: Traffic is automatically and dynamically distributed
between both BYNETs.

BYNET Hardware and Software


The BYNET hardware and software handle the communication
between the vprocs and the nodes.
Hardware: The nodes of an MPP system are connected with
the BYNET hardware, consisting of BYNET boards and cables.
Software: The BYNET software is installed on every node.
This BYNET driver is an interface between the PDE software
and the BYNET hardware.

Parallel Database Extensions (PDE)


The Parallel Database Extensions (PDE) software layer was added to
the operating system to support the parallel software environment.

Teradata Architecture

Channel Driver
Channel Driver software is the means of communication between an
application and the PEs assigned to channel-attached clients. There
is one Channel Driver per node.

Teradata Gateway
Teradata Gateway software is the means of communication between an
application and the PEs assigned to network-attached clients. There is
one Teradata Gateway per node.

Basic components of Teradata Architecture:

The Parsing Engine


Message Passing Layer
Access Module Processor
Parsing Engine : A Parsing Engine (PE) is a vproc that manages the dialogue
between a client application and the Teradata Database, once a valid session
has been established. Each PE can support a maximum of 120 sessions.
Components : 1. PARSER 2.OPTIMIZER 3. DISPATCHER
Message Passing Layer :

Carrying messages between the AMPs and PEs.


Point-to-Point, Multi-Cast, and Broadcast communications.
Merging answer sets back to the PE.

Access Module Processor (AMP) :


The AMP is a virtual processor that controls its portion of the data on the
system. The AMPs work in parallel, each AMP managing the data rows stored
on its vdisk. AMPs are involved in data distribution and data access in different
ways.
Finding the rows requested
Lock management
Sorting rows
Aggregating columns
Join processing
Output conversion and formatting
Creating answer set for client
Disk space management
Recovery processing

TERADATA Database or Users :


Database or user must created from DBC or existing DB
Perm space must be extracted from immediate owner.
Perm Space used only by Tables , Join Index or Stored procedure.
Un used Perm is utilized by Temp/Spool space.

Teradata Database Spaces :


PERM Permanent space
Tables , Indexes , Sub Table , JOIN Indexes , Stored Procedure used PERM Space.
PERM space deducted from Owner database.

CREATE DATABASE WMT_EDW FROM DBC


AS PERM = 2000000 SPOOL = 5000000
NO FALLBACK
NO AFTER JOURNAL DUAL BEFORE JOURNAL
DEFAULT JOURNAL TABLE = WMT.journals ;
SPOOL Working space
Temporary working space to store intermediate query result/answers set.
SELECT statement use spool space.
Large number of non unique values , poor distribution of data or join on columns
results in Insufficient spool error.
Volatile and Derived table uses SPOOL space.
TEMP Working space
TEMP space is acquired by GTT (Global Temporary Tables) when it is materialized.

Data Protection:
-LOCKS
-RAID
-FALLBAK
-JOURANLS
-CLIQUE
Locks : We have 4 types of locks applied on three levels.
-Database level
-Table level
-Row hash level
Types of Locks:
-Exclusive locks
-Write locks
-Read locks
-Access locks

RAID: Redundant Array of Inexpensive Disks (RAID) is a storage technology that


provides data protection at the disk drive level.
RAID 1 : Disk Mirror Technique
RAID 5 : Parity Checking Method.
Fallback: Fallback is a Teradata feature that protects
Data against AMP failure. Fallback uses groups of AMPs that provide data
availability and consistency if an AMP is unavailable.
Clique :
Set of SMPs/Nodes that share commonset of diskarrays.
Provides protection from Node failure.
If a node fails, all vprocs will migrate to the remaining nodes in the clique (VprocMigration).
A clique can support up to 128 vprocs.

Journal:
TD Journals used for specific types of data recovery or process recovery.

1.Recovery Journals : -Automatically activated when AMP is taken offline.


2.Transaction Journal : A journal of Transaction "BEFOREIMAGE, Automatic rollback
in the EVENT of transaction Failure.

3.Permanent Journal : User specified , systemmaintainedjournal. Use for unexpected


software and hardware Disaster.

Thanku

Anda mungkin juga menyukai