Pragim Technologies
Roadmap
Pragim Technologies
Pragim Technologies
Pragim Technologies
OLTP Vs OLAP
Pragim Technologies
Pragim Technologies
Transaction Systems
Support Business
transactional process
Designed to run the
Business
Detailed data
No redundancy
(Normalized)
Data is normally updated
Current data
Few Indexes
Supports E-R Model
Pragim Technologies
Data warehouse
Support Decision making
process
Designed to analyze the
business
Summarized data
Allows redundancy
(Denormalized)
Data is normally loaded
Historical data
More Indexes
Supports Dimensional Model
7
Detailed Information to
operational systems.
Batch Load
Pragim Technologies
Load &
Summarise
Extract &
Transform
Operational
Data
Query
ETL
Warehouse
Information
Delivery
External
Data
Pragim Technologies
Analyze
Data Sources
Extract
Extract
Flat File
Flat
FlatFile
File
Extraction
Transformation
Loading
Scheduling
Refreshing
Summarized
De-normalized
Historical
Nonvolatile
Subject Oriented
Data
Mart
Extract
RDBMS
Data
Extract
Data
RDBMS
Data
ETL
Data
Warehouse
Managers
Power Users
Report
Query
Data
Mart
Analyze
Data
Mart
Extract
Pragim Technologies
10
Pragim Technologies
11
Pragim Technologies
12
Dimension :
Pragim Technologies
14
Pragim Technologies
15
Star Schema
-
Pragim Technologies
16
Dimension Tables
Product_Dimension_Table
prod_grp_key prod_key prod_grp_desc prod_desc
10
20
30
100
140
220
Fewer devices
Circuit boards
Components
Northeast
Northwest
Southeast
Southwest
Power supply
Motherboard
Co-processor
ABC Electronics
Midway Electric
Victor Components
Washburn, Inc.
Zerox
Account_Dimension_Table
Time_key prod_key region_key Account_key vend_key net-sales gross_sales
1
2
3
100
140
220
10
11
12
100000
110000
100000
100
200
300
month
month_name
30,000
23,000
32,000
50,000
42,000
49,000
Vendor_Dimension_Table
vend_key
1
2
3
01-1996
02-1996
03-1996
January
February
March
Fact Table
100
200
300
vendor_desc
PowerAge, Inc.
Advanced Micro Devices
Farad Incorporated
Time_Dimension_Table
Pragim Technologies
17
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
time_key
product_key
location_key
measures
Pragim Technologies
units_sold
amount
product_key
product_name
category
brand
color
supplier_name
LOCATION
location_key
store
street_address
city
state
country
18
region
Pragim Technologies
19
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
Pcategory
time_key
product_key
location_key
measures
Pragim Technologies
product_key
product_name
category
brand
color
supplier_name
LOCATION
units_sold
amount
Sregion=Europe
location_key
store
street_address
city
state
country
20
region
Advantages :
Space Can be minimized
Disadvantages :
Can hamper the query performance due to more number of
joins
Pragim Technologies
21
Galaxy Schema
- Fact Constellation
Process of joining two Fact tables
Pragim Technologies
22
Types of Dimensions
CONFIRMED DIMENSIONS
It can be shared by multiple fact tables ( e.g. customer dimension)
DEGENERATED DIMENSIONS
Not connected to any dimensions (e.g. Transaction Id)
JUNK DIMENSIONS
A dimension with text description, flag, Boolean, which are not used in
describing the key performance indicators ( e.g. gender description,
product description)
Pragim Technologies
23
Types of Facts
ADDITIVE FACTS
facts that can be summed up through all of its dimensions in the fact table
(e.g. dollars sold)
SEMIADDITIVE FACTS
facts that can be summed up for some of its dimensions in the fact table
(e.g. inventory levels can not be added across time)
NONADDITIVE FACTS
facts that cannot be summed up for any of its dimensions present in the
fact table (e.g. true textual fact; which probably should not be in the
data warehouse to begin with)
Pragim Technologies
24
Dimensional Modeling
A dimensional modeling consists of following phases to build the DW.
i. Conceptual Modeling
- understand the business requirements
- Identify the entities ( tables)
- Identify the attributes(Columns) for each entity.
- Identify the relationship between the entities (Pk Fk)
Pragim Technologies
25
Pragim Technologies
26
Pragim Technologies
27
SURROGATE KEYS
System Generated Key used to uniquely identify the record in Fact or
Dimension table.
This key is generated by a sequence generator and will not be derived
from OLTP system.
A 4 byte integer is a good choice to improve the Join performance.
Pragim Technologies
28
INDEXING
Index is a pointer locates the physical address of data
Indexing can be used to increase the performance and scalability of the data
warehouse solution.
Using Indexes, replaces the full-table scan, followed by a read of only those
disk blocks that contain the rows needed
It will improve the performance while retrieving or manipulating data using
the indexed column in where clause.
Types of Indexes:
i) B-Tree Index
ii) Bitmap Index
Pragim Technologies
29
What is Cardinality ?
Pragim Technologies
30
B-Tree Index
The most common type of indexing is the B-tree index. This type of
indexing is often used for high-cardinality columns such as product key or
customer key. B-tree indexes are designed to return few rows.
Pragim Technologies
31
Bitmap Index
This Index is used for low cardinality columns. When a bitmap index is
created on a column, a bit stream is created for each distinct value in the
indexed column. A bit stream is composed on ones and zeros.
Pragim Technologies
32
B-Tree Vs Bitmap
Pragim Technologies
33
Partitioning
Partitioning enables you to divide tables into smaller units that are more
manageable.
This feature addresses the problem of supporting large tables and indexes
that are inherent to data warehouses
Pragim Technologies
34
Partition Types
Data partitioning can be divided into two broad categories: Horizontal
Partitioning and Vertical Partitioning
Pragim Technologies
35
Horizontal Partitioning
Horizontal partitioning is commonly used in data warehouse environments
because it enables data in a very large to be stored in smaller tables. It gives
the DBA control over the rows that go into each table.
Pragim Technologies
36
Vertical Partitioning
Vertical Partitioning divides tables on a column-by-column basis
Pragim Technologies
37
Range Partitioning
Range Partitioning allows the users to specify the ranges for each of the
partition. Here each of the partition may not be evenly distributed
Pragim Technologies
38
Hash Partitioning
This partition will give the
control to the system to evenly
maintain each of the partition
Pragim Technologies
39
Composite Partitioning
This Partition is the combination of
both Range and Has Partition
Pragim Technologies
40
Data Acquisition
It is a process of extracting the relevant business information,
transforming the data into required business format and loading
into the data ware house
i) Data Extraction
ii) Data Transformation
iii) Data Loading
Pragim Technologies
41
Pragim Technologies
42
Pragim Technologies
43
Pragim Technologies
44
ETL Tools
Informatica
Data Integrator
Ardent Data Stage
MS SQL-Server DTS
Ab Initio
Data Junction
Pragim Technologies
45
Reporting Tools
Business Objects
Cognos
Brio
Hyperion
Seagate
Eureka Strategy
Micro Strategy
Pragim Technologies
46
ETL Steps
Extract
Extract source system data to populate interim stage
Transform
Apply the business logic
Process to find New Dimension records
Add Changed/New Dimension record to staging area with new key
Load
Load directly from staging area into DW
Pragim Technologies
47
ETL
Data
QR&A
Extract
Transform
Extract
Load
Extract
Pragim Technologies
48
Pragim Technologies
49
The End
Thank you!
Pragim Technologies
50