Anda di halaman 1dari 7

What is a Data Warehouse?

Initially, the data warehouse was a historical database, enterprise-wide and centralized,
containing data derived from an operational database.
The data in the data warehouse was:

• Subject-oriented
• Integrated
• Usually identified by a timestamp
• Nonvolatile, that is, nothing was added or removed

Rows in the tables supporting the operational database were loaded into the data warehouse (The
historical database) after they exceeded some well-defined date.

Data could be queried, but the responses returned only reflected historical information. In this
sense, a data warehouse was initially static, and even if a historical data warehouse contained
data that was being updated, it would still not be an active data warehouse.

What is an Active Data Warehouse?

 Provides a single up-to-date view of the enterprise on one platform.


 Represents a logically consistent store of detailed data available for strategic, tactical and
event driven business decision making.
 Relies on timely updates to the critical data - as close to real time as needed.
 Supports short, tactical queries that return in seconds, alongside of traditional decision
support.

Strategic queries represent business questions that are intended to draw strategic advantage
from large stores of data. Strategic queries are often complex queries.

Tactical queries are short, highly tuned that facilitate action-taking or decision-making in a
time-sensitive environment.
Tactical queries are usually repetitively executed and take advantage of techniques such as
request (query plan) caching and session-pooling.

Teradata Database
The Teradata Database is an information repository supported by tools and utilities that make it,
as part of the Teradata Warehouse, a complete and active relational database management
system.

This attachment method… Allows the system to be attached…


channel directly to an I/O channel of a mainframe computer.
Network To intelligent workstations and other computers and
devices through a Local Area Network (LAN).
Teradata Database Capabilities
Teradata has designed a system that allows users to view and manage large amounts of data as a
collection of related tables. Some of the capabilities of the Teradata Database are listed in the
following table.

Teradata Database provides… That…

Capacity includes:

• Scaling from Gigabytes to Terabytes of detailed data stored in billions of rows.


• Scaling to thousands of millions of instructions per second (MIPS) to process data.

Parallel processing: makes Teradata Database faster than other relational systems.

Single data store:


 Can be accessed by network - attached and channel-attached systems.
 Supports the requirements of many diverse clients.
 reduces data duplication and inaccuracies

Fault tolerance: automatically detects and recovers from hardware failures.

Data integrity: ensures that transactions either complete or rollback to a stable state if a
fault occurs.

Scalable growth: allows expansion without sacrificing performance.


SQL: serves as a standard access language that permits users to control data.

The TaraData architecture includes both


Single-node, Symmetric Multi-Processing (SMP) systems and
Multi-node, Massively Parallel Processing (MPP) systems in which the distributed functions
communicate by means of a fast interconnect structure. The interconnect structure in the current
architecture is the BYNET for MPP systems and the board less BYNET for SMP systems.
Teradata Database Server Software
Database Window: a tool that you can use to control the operation of the Teradata Database.

Teradata Gateway: communications support.

The server-resident program provides a pathway for applications running


on network-attached clients to access the Teradata Database. The Teradata
Gateway runs as a separate operating system task.

The Gateway software validates messages from clients that generate


sessions over the network and it controls encryption.

Parallel Data Extensions (PDE):


A software interface layer on top of the operating system that enables the
database to operate in a parallel environment.
Teradata Database management software: the following modules:

• Parsing Engine (PE), this includes:


• Session controller
• Parser
• Optimizer
• Step Generator
• Dispatcher
• Access module processor (AMP)
• Teradata file system
BYNET
Acronym for "BanYan NETwork," is a folded banyan switching network built upon the capacity
of the YNET. It acts as a distributed multi-fabric inter-connect to link PEs, AMPs and nodes on a
Massively Parallel Processing (MPP) system.

The BYNET is a high-speed interconnect that is responsible for:


Sending messages, merging data, Sorting answers.

The BYNET is the combination of hardware and software that enables the high speed
communication inside and between the nodes.
 Linear Scalability:
 Fault Tolerance:
 Load Balancing:
 Enhanced Performance: By default, a Teradata MPP system is equipped with two
BYNET networks. Since both BYNET networks in a system are active, the system
performance can be enhanced by using the combined bandwidth of the two networks.

Messages:
Point-to-Point - A virtual proc can send a message to another virtual proc:
In the same node using BYNET software only, the message is reassigned in memory to the target
virtual proc.

In another node the message is using both BYNET hardware and software.

Multicast - A virtual proc can send a message to multiple virtual proc by sending a broadcast
message to all nodes. The BYNET software on the receiving node determines whether a virtual
proc on the node should receive or discard the message.

Broadcast - A virtual proc can broadcast a message to all the virtual proc in the system.

Two BYNETs per system for the following reasons:


Performance , Fault Tolerance
Clique
1. A clique is a group of nodes that share access to the same disk arrays. The nodes have a daisy-
chain connection to each disk array controller.
2. Cliques provide data accessibility if a node fails for any reason.
3. Virtual proc are distributed across all nodes in the system. Each multi-node system has at least
one clique.
4.
Software Components

1. UNIX operating system - The Teradata RDBMS runs on UNIX SVR4 with MP-RAS.
2.
3. Parallel Database Extensions (PDE) - PDE was added to the UNIX kernel by NCR to support
the parallel software environment.
4.
5. Trusted Parallel Application (TPA) - A TPA uses PDE to implement virtual processors. The
Teradata RDBMS is classified as a TPA.
6.
7. Channel Driver - The Channel Driver software is the means of communication between the
application and the PEs assigned to channel-attached clients.

8. Teradata Gateway - The Gateway software is the means of communication between the
application and the PEs assigned to network-attached clients. There is one Gateway per node.
9.
AMP
The AMP is a type of virtual proc that has software to manage data.
10.
1. AMP Worker Task (AWT) Functions in the AMP perform a number of operations, including:
1. Locking Tables
2. Executing Tables
3. Joining Tables
4. Executing end transaction steps
5.
2. The file system software accesses the data on the virtual disks. Each AMP uses the file system
software to read from and write to the virtual disks.
3.
4. Console Utilities - The AMP software includes utilities to perform generally sophisticated, low-
level functions such as:
1. Configure and reconfigure the system
2. Rebuild tables
3. Reveal details about locks and space status
Parsing Engine
PE is a type of virtual proc that has software components to break SQL into steps, and send the
steps to the AMPs.

5. Session Control - When you log on to the Teradata RDBMS through your application, the
session control software on the PE establishes that session. Session control also manages and
terminates sessions on the PE.
6.
7. Parser/Optimizer - The parser interprets your Teradata SQL request and checks the syntax.
The parser decomposes the request into AMP steps, using the optimizer to determine the most
efficient way to access the data on the virtual disks. Then the parser sends the steps to the
dispatcher.
8.
9. Dispatcher - The dispatcher is responsible for a number of tasks, depending on the operation it
is performing:
1. Processing Requests
2. Processing Responses