Data Warrehouse Olap
Data Warrehouse Olap
Data Warrehouse Olap
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
Dari tabel dan spreadsheet
ke Kubus Data
• Data warehouse didasarkan pada model data multidimensional,
dimana data dipandang dalam bentuk kubus data
• Kubus data, seperti sales, memungkinkan data dipandang dan
dimodelkan dalam banyak dimensi
– Tabel dimensi, seperti item (item_name, brand, type), or time(day, week,
month, quarter, year)
– Tabel fakta mengandung measures (seperti dollars_sold) dan merupakan
kunci untuk setiap tabel-tabel dimensi terkait.
• n-D base cube dinamakan base cuboid. 0-D cuboid merupakan
cuboid pada level paling tinggi, yang menampung ringkasan data
dalan level paling tinggi, dinamakan apex cuboid. Lattice dari cuboid-
cuboid membentuk sebuah data cube.
Cube: A Lattice of
Cuboids
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
Pemodelan Konseptual Data Warehouse
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
Contoh skema Snowflake
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type dollars_sold
city_key
avg_sales city
province_or_street
Measures country
Contoh Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
all all
Specification of hierarchies
• Schema hierarchy
day < {month < quarter;
week} < year
• Set_grouping hierarchy
{1..10} < inexpensive
Data Multidimensional
Office Day
Month
Contoh Kubus Data
3Qtr 4Qtr
uc
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
Cuboid yang terkait dengan
kubus
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Browsing kubus data
• Visualization
• OLAP capabilities
• Interactive manipulation
Operasi-operasi OLAP
• Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or detailed
data, or introducing new dimensions
• Slice and dice:
– project and select
• Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D planes.
• Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its back-end
relational tables (using SQL)
Ilustrasi
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Refresh
Warehouse Data mining
Data Marts
• Data extraction:
– get data from multiple, heterogeneous, and external sources
• Data cleaning:
– detect errors in the data and rectify them when possible
• Data transformation:
– convert data from legacy or host format to warehouse format
• Load:
– sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
• Refresh
– propagate the updates from the data sources to the warehouse
Three Data Warehouse
Models
• Enterprise warehouse
– collects all of the information about subjects spanning the entire
organization
• Data Mart
– a subset of corporate-wide data that is of value to a specific
groups of users. Its scope is confined to specific, selected
groups, such as marketing data mart
• Independent vs. dependent (directly from warehouse) data mart
• Virtual warehouse
– A set of views over operational databases
– Only some of the possible summary views may be materialized
Data Warehouse
Development: A
Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Data
Mart Mart
Warehouse
Layer2
MDDB
MDDB
Meta Data