Anda di halaman 1dari 5

Data Ware House: Ralph Kimball

A DWH is a data base which is specifically designed for analyzing the business but not
for business transactional processing.
A ware house is designed to support decision making process.

Data Ware House: WH Inmon: A data ware house is a


1. Time variant
2. Non Volatile
3. Subject Oriented
4. Integrated
Data base.

Characteristics Features of DWH:

Time Variant: A dwh is a time variant data base, which support the users in analyzing
the business and comparing the business with different time periods. User can identify the
trends in the business.
30

25
} Variance
20

15
10

0
20 08 Q 1 2 0 09 Q 1

Non Volatile: A dwh is a non volatile db. Once the data entered into the dwh, it does not
reflect to the changes which take place at operational db.

Integrated: A DWH is an integrated db which collects the data from multiple operational
sources & integrates the data in a homogeneous format.

Subject Oriented: A DWH is a subject oriented db, which supports the business needs
of individual departments in the enterprise or middle management users.
E.g. Sales, HR, Accounts, Loans

Difference between Oltp Data base and DWH:

Oltp DB DWH
Designed to support operational Designed to support decision making
monetering process
Data is volatile Data is non volatile
Current Data Historical Data
Detailed data Summary data
Normalized De Normalized
Designed for running business Designed for analyzing business
Designed for Clercical access Designed for Managerial access
More Joins Few Joins
Few Indexes More Indexes

Naming Standards given to DWH:

• Decision Supporting System: (DSS):Since a dwh is designed to support decision


making process, hence it is known as DSS.
• Historical db:Since a dwh contains a historic data for analysis, hence it is called as
historical db.
• Read Only db:Since the db is designed to only read the data for analysis and
making decisions but not for transactional processing, hence it is called as read
only db.

Data Acquisition: It is a process of extracting the relevant business data, transforming


the data into a required business format and loading into the targed db.
A data Acquisition is defined with following processes
1. Data Extraction
2. Data Transformation
3. Data Loading.
There are 2 types of ETL’s used in implementing Data Acquisition.
• Code based ETL: (Character based ETL) : An application developer builds data
acquisition using some programming language like SQL, PLSQL.
E.g. SAS ETL, Terra data ETL Utilities ( BTEQ (batch tera data query), Fast load, multi
load, TPUMP, Fast Export).
• GUI Based ETL: An application developer designs data acquisition applications
with a simple GUI point and clip techniques.
E.g. Informatica, Data Stage, Data Junction, Data Integration, Ab Inititio (means
from beginning), Decision Stream (Data Manager), Oracle Ware house Builder.

1. Data Extraction: It’s a process of reading the data from various types of source
systems like db, sap, PeopleSoft, JD Edwards, Flat File, XML files, Cobol Files
etc.

2. Data Transformation: It’s a process of converting the data and cleansing the
data into a required business format. The following are the different types of
Transformation activities takes place in Staging area.
a. Data Merging:
b. Data Cleansing
c. Data Scrubbing
d. Data Aggregation.

3. Data Loading: It’s a process of inserting the data into the target system. There
are 2 types of data loads
a. Initial Load / Full Load: It’s a process of loading all the request data at first
load.
b. Incremental Load / Delta Load: It’s a process of loading only new records
after initial load

Data Transformation:

Data Merging: It’s a process of integrating the data from multiple sources into a single
output pipeline.
There are 2 types of data merging activities: Join, Union.
Oralce [OLTP]

Pdt
Emp E
Category

SQL Server Join


Sub Category Category
SubCategory
E Product
Dept
Product

E T
Oralce [OLTP]
Join

Emp E
Join on 2 diff types of data sources or
having 2 different types of data
SQL Server Union

Union on Sources having same data


E
Employe

Data Cleaning: It’s a process of removing unwanted data. It’s a process of changing
inconsistencies and inaccuracies.

Country Sales
City
india $7.6
Hyd
austria $4.780
Initcap(country) Hbad
Italy $3.21
Decode(…). hyderabad
Australia $3.0
Round()

Data Scrubbing: it’s a process of deriving new data definitions using source data
definition.
Cust_FirstName
Cust_Last Name
In OLTP
Sale Sale
sid Revenue = sid
customer qty*price customer
pdt pdt Cust_name in DW
qty qty
price revenue

Data Aggregation: Process of calculating the summaries using some aggregate


functions. E.g. sum, avg etc.

Meta Data: It is defined as data about data. A Metadata describes field name, data type,
precision, scale & keys.

To design a plan for data acquisition, the following inputs need to be required.
1. Source Definition
2. Target definition
3. Transformation rule (Business logic) (Optional).

Repository: A repository is a central metadata storage location, which contains all the
required metadata to build extraction, transformation & loading to target system.

ODBC Client [GUI] ETL Tool ODBC

Src def Tgt def

Customer T_Customer
- cno - cno
- cfname - cname
Customer - clname - gender
- gender T_Customer
- cno number(5) pk
- cid number(5) pk
- cfname varchar2(5)
- cname varchar2(12)
- clanme varchar2(6)
T/R rule - gender varchar2(1)
- gender number(1)
Connect()
Decode()
OLTP DB [ Source]
DW [Target]
ETL Plan
Client: A Client is a GUI component where we can draw the plan of ETL Process.
ETL Plan
Informatica Mapping
Data Stage Job
Ab Intitio Graph
Data Integration Batch Job

Server: Server is the main component which executes the ETL Plan to move the data
from source to target.

Anda mungkin juga menyukai