Basic principles
Warehouse Database
Refresh
Data incomplete Data inconsistent (eg: engineering vs accounts) Wrong level of granularity Data not clean New system requires changes new product codes
Data utilities
ORACLE is king of data handling Export: to transfer data between DBs
Extract both table structure and data content into dump file
Import: corresponding facility SQL*loader automatic import from a variety of file formats into DB files
Needs a control file
NEW
Sup name
Sup address Phone Cat 1,2,3 depending on total purchases last year
OLD (sales)
CustID* name address city discount rates sales_to_date rep_name
OLD (Shipping)
CustID** name address city Preferred haulier
Operational system
Extract
Transform
Transport (Load)
Warehouse
Data staging area in its own environment, avoiding negative impact on the warehouse environment
Oper. envt.
Operational system
Staging envt.
Data staging area
Warehouse envt.
Warehouse
Operational environment
Operational system Data staging area
WH envt.
Extract
Transform
Transport (Load)
Warehouse
Data Mart
A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include:
Do not normally contain detailed operational data unlike data warehouses. May contain certain levels of aggregation
Marketing
Sales
External Data
Sales or Marketing
External Data
Data Warehouse
Relational Database on a dedicated Server
De normalised, data
Multidimensional Models
Customer Market Time
Product
Time FINANCE
SALES
Product
P/L_Line
MOLAP Server
The application layer stores data in a multidimensional structure DSS client The presentation layer provides the MOLAP multidimensional view Engine Efficient storage and processing Application layer Complexity hidden from the user (but NOT from developer) Analysis using preaggregated summaries and precalculated Warehouse measures
ROLAP Server
The warehouse stores DSS client atomic data. The application layer ROLAP generates SQL for the engine three- dimensional view. Application The presentation layer Multiple layer SQL provides the multidimensional view.
Warehouse server
MOLAP
MDDB
Query Periodic load Warehouse Server
Data user
ROLAP
Cache
Live fetch Query
Data
user
ROLAP
Simple
Complex Analysis
Modeling
Warehouses differ from operational structures:
Analytical requirements Subject orientation
Identify dimension tables Link fact tables to the dimension tables Create views for users
Dimension Tables
Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference
Product
Channel
Fact Tables
Fact tables have the following characteristics:
Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Dimension tables
Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units ...
Item Table Item_id Item_desc ...
Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables
Total Percentage
Sales() Store
SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000