DATA
WAREHOUSING
BY
QUONTRA SOLUTIONS
PHONE :(404)-900-9988
EMAIL
: INFO@QUONTRASOLUTIONS.COM
WEBSITE : WWW.QUONTRASOLUTIONS.COM
DATA WAREHOUSE
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used primarily in organizational
decision making.
-- Bill Inmon, Building the Data Warehouse
1996
SUBJECT ORIENTED
INTEGRATED
RDBMS
Data Processing
Data Transformation
Legacy
System
Flat File
Data
Warehouse
Data Processing
Data Transformation
NON-VOLATILE
load
access
TIME VARIANT
Problem Statement:
ABC Pvt Ltd is a company with branches at USA,
UK,CANADA,INDIA
The Sales Manager wants quarterly sales report across
the branches.
Each branch has a separate operational system where
sales transactions are recorded.
UK
Sales
Manager
CANADA
INDIA
Solution:
Extract sales information from each database.
Store the information in a common repository at a single
site.
UK
Data
Warehouse
CANADA
INDIA
Query &
Analysis tools
Sales
Manager
CHARACTERISTICS OF DATA
WAREHOUSE
Relational / Multidimensional database
growing
Geographic Information
Systems
Indexes
Few
Many
Data
Normalized
Generally De-normalized
Joins
Many
Some
Rare
Common
Sales
Data Mart
Analysis
Operational
System
ETL
(Extract
Transform
and Load)
Data
Warehouse
Generic
Data Mart
Flat
Files
Flat
Files
Data Mining
Inventory
Data Mart
Reporting
ETL
Country: IN or India
ETL FUNCTIONS
Extract
Parse data
Project
Encode data
Aggregate
Load
ETL STEPS
ETL GLOSSARY
Mapping:
Defining relationship between source and target objects.
Cleansing:
The process of resolving inconsistencies in source data.
Transformation:
The process of manipulating data. Any manipulation
beyond copying is a transformation. Examples include
aggregating, and integrating data from multiple sources.
Staging Area:
A place where data is processed before entering the
warehouse.
DIMENSION
TYPES OF DIMENSIONS
Type1 - The Type 1 methodology overwrites old data with new data, and
therefore does not track historical data at all.
Type 2 - The Type 2 method tracks historical data by creating multiple
records for a given value in dimension table with separate surrogate keys.
Type 3 - The Type 3 method tracks changes using separate columns.
Whereas Type 2 had unlimited history preservation, Type 3 has limited
history preservation, as it's limited to the number of columns we designate
for storing historical data.
Type 4 - The Type 4 method is usually referred to as using "history tables",
where one table keeps the current data, and an additional table is used to
keep a record of all changes.
FACTS
DIMENSION TABLE
FACT TABLE
Contains Facts
Foreign keys to dimension tables
Primary Key: usually composite key of all FKs
Star Schema
Snowflake Schema
Fact Constellation Schema
STAR SCHEMA
Multi-dimensional Data
Dimension and Fact Tables
A fact table with pointers to Dimension tables
STAR SCHEMA
SNOWFLAKE SCHEMA
SNOWFLAKE SCHEMA
In this example, the dimensions tables for time, item, and location are
shared between both the sales and shipping fact tables.
OPERATIONS ON DATA
WAREHOUSE
Drill Down
Roll up
Slice & Dice
Pivoting
DRILL DOWN
Product
Category e.g Home Appliances
Sub Category e.g Kitchen Appliances
Region
Time
ROLL UP
Year
Fiscal Year
Quarter
Fiscal Quarter
Month
Fiscal Month
Fiscal Week
Day
Region
Region
Product
Time
Time
PIVOTING
Product
Time
Region
Product
Time
Region
ADVANTAGES OF DATA
One consistent data store for reporting, forecasting, and
WAREHOUSE
analysis
Easier and timely access to data
Scalability
Trend analysis and detection
Drill down analysis
DISADVANTAGES OF DATA
WAREHOUSE
Preparation may be time consuming.
CHALLENGES
Product
Sales Fact
Region
Product
Category
THANK YOU