Anda di halaman 1dari 35

Data Warehouse: Methodology and Tools

Concepts, Architectures and Products

FORWISS - Bavarian Research Centre for Knowledge Based Systems


1999 FORWISS

Overview
q q q q

The Process of Planning and Building a Warehouse Data Warehouse Architecture (revisited) Classification of Tools Focus: OLAP Tools
Multidimensional Data Modeling OLAP Architectures OLAP query languages

q q

Tool Demonstration Summary

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

OLAP Design Cycle


Using the Data Warehouse Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (e.g. Product specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Data Warehouse Architecture


Data Analysis Reporting, OLAP, Data Mining

Data Storage Data-Migration Operational Data Sources


DW: Tools and Projects

Repository Middleware (Populations-Tools)

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Classification of Tools
q q q

Frontend Tools Data Storage Tools (Databases) ETL Tools


Extraction Transformation Loading

Repository Systems
Metadata Storage

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Repository Systems
q

Manage Different Kinds of Metadata


Business Metadata
E.g. How is revenue computed

Technical Metadata
When was data last loaded from which system Data model for OLTP and OLAP databases
q

Functionality
Communication hub for different tools Guides user exploration Guides development process Impact analysis

E.g. Viasoft Rochade, Softlab Enabler,...


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

ETL Tools
q

Extraction: Range of Supported Data Sources


Mainframe legacy databases COBOL Files Relational Databases Filebased data storage (Excel, Word,XML,...)

Transformation
(Graphical) Specification of Transformation Rules (Expressive Power)

Loading
Ability to use database features (e.g. bulk loading)

Process Management
Scheduling, Monitoring, Error Handling

Informatica PowerMart, Hummingbird Genio, Acta...


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

Databases for DW
q

Special Indexing Techniques


Multidimensional Indexes Bitmap Indexes Foreign Column Indexes

q q

Support for Materialized Views (Preaggregation) Special Analytical Capabilities (e.g. SQL Extensions)
Top N Ranking

Bulk Loading Capabilities


Offline, No concurrency control

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Frontend Tools

Reporting Why did it happen? Interactive OLAP Ad hoc-Queries What will happen?
Additional Benefit
DW: Tools and Projects

What happened?

What happened why and how?

Data Mining
Number of Users
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

10

The Users view (OLAP Tool)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

11

Multidimensional OLAP (MOLAP)


q

specialized database technology multidimensional storage structures E.g. Hyperion Essbase, Oracle Express, Cognos PowerPlay (Server) Query Performance Powerful MD Model write access Database Features
multiuser access/ backup and recovery

Frontend Tool

q q

+ + +

Multidim. Database

Sparsity
DW: Tools and Projects

Handling -> DB Explosion


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

12

Relational OLAP (ROLAP)


Frontend Tool
q q q MDInterface +

idea: use relational data storage star (snowflake) schema E.g. Microstrategy, SAP BW advantages of RDBMS
+ scalability, reliability, security etc.

ROLAPEngine
SQL

Meta Data

Sparsity handling Performance Model Complexity

Query Data

Relational DB
DW: Tools and Projects

no

write access
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

13

Client (Desktop) OLAP


ClientOLAP
q q q q

proprietary data structure on the client data stored as file mostly RAM based architectures E.g. Business Objects, Cognos PowerPlay mobile user ease of installation and use volume multiuser capabilites
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

+ +

data no
DW: Tools and Projects

14

DW Integration
MOLAP ROLAP ClientOLAP

ROLAPEngine Multidim. Database

DW-DB (mostly relational)


DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

15

Combining Architectures I
Drill through

qhighly

aggregated data data

Multidim. Database

qdense q95%

of the analysis requirements

Relational Database
DW: Tools and Projects

qdetailed data q5%

(sparse)

of the requirements
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

16

Combining Architectures II
Hybrid OLAP (HOLAP)
qequal

treatment of MD and Rel

Data
qStorage qCube

type at the discretion of the administrator Partitioning

HOLAP System
Meta Data

Multidim. Storage
DW: Tools and Projects

Relational Storage
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

17

OLAP Standards
q q q

Idea: define interface between client and server Benefit: Component oriented architectures Proposal 1: OLAP Council
union of OLAP Tool producers not implemented so far (even by the council members)

Proposal 2: Microsoft - OLEDB for OLAP (shot ODBO)


standardizes a data model and an MD query language (MDX) specification contains lots of optional functionality all major vendors committed themselves to the standard will be the de facto standard

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Practical Case Study


Building a Warehouse

artwork copyright Intersystems GmbH artwork copyright Intersystems GmbH 1999 FORWISS

19

Conceptual Design
Using the Data Warehouse Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (tool specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

20

The Modeling Process


q q

Which business process is being modeled? What is the subject of analysis (fact) and what is being measured? On what granularity level is active analysis being done? Which properties (dimensions) determine the measures? Which different levels of aggregation are meaningful? What additional information is needed for the different levels? What is the variability and the cardinality of the dimensions?

q q q q q

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

21

Facts
q q q q

Fact = Subject of Analysis Measures = Attributes describing facts Derived Measures Additivity of Measures
globally additiv additiv for some dimensions

Sales Quantity, Price Profit


Quantity Items in stock additiv resp. to plants/ not additiv w.r.t. time profit margin

not additiv at all

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

22

Dimensions

q q q q

Dimensions = static structure of business information Used for navigating the data space Choosing the necessary granularity Dimension Members = Instances of a dimension
e.g. 8.12.1997 and Juli 1997 are members of dimension time

Structuring Dimension
using different dimension levels (hierarchies) using descriptive attributes

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

23

Simple Hierarchies
Month Quarter 1/2 Year Period Year

Januar 99 Februar 99 Mrz 99 April 99 Mai 99 Juni 99 Juli 99 August 99 Sept. 99


DW: Tools and Projects

1. Quartal 99 1. Halbjahr 99 2. Quartal 99

Dimension Level

1999 3. Quartal 99 ............


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

2. Halbjahr 99

24

Unbalanced Hierarchies
Plant/Site Plant1 Div A ... Great Outdoors Bu 1 Div B Bu 2 ...
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Business Unit

Business Division

Enterprise

Plant 1 ... Plant 0815

25

Alternative Hierarchies
Customer Geogr. Region Bavaria Hessen Hamburg Germany Country

Customer 01 Customer 02 Customer 03 Customer 04 Customer 05 Customer 06

Partner Retailer Consumer Customer Group

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

26

Alternative Pathes
Ort Geogr. Region Country

Munich 01 Munich 02 Munich 03 Wrzburg 01 Wrzburg 02 Frankfurt 01

Bayern Hessen Hamburg Germany Germany (South) Germany (West) Germany (North)

Sales Region
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

27

Criteria for a good MD Design


dimensions should be independent dimensionality of a cube should be max. 7-8 dimensions
interpretation of results is difficult for a large number of dimensions
q

q q

hierarchies should have a fan-out of max. 30


long drill-down times large drill-down results insert additional levels for structuring purposes (e.g. insert state
between city and country)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

28

Graphical Notation (ME/R)


Fact-Name Measure 1 ... Measure n

A Fact and its measures .. is characterized by dimensions

Level-Name Attribute 1 ... Attribute n

A Dimension Level with attributes ..can be classified according to...

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

29

Example Data Model


Year Month Region Day Country Sale Line Prod. Type Product Revenue Cost Order Qty Sales Rep Name Code

Branc h

Custome r Custome r Type

Margin Range

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

30

Cognos PowerPlay- Architecture


PowerPlay Client

Client

PowerCube (Proprietary, Compressed) Transformer PowerPlay Server

OLEDB for OLAP

Impromptu

OLEDB Provider e.g. MS OLAP Services, SAP BW,

Server

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

31

Logical+Physical Design
Using the Data Warehouse Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (tool specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

32

Practical Demonstration

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

33

Summary and Conclusions


q

Multidimensional modeling is performed on different levels


conceptual model (tool independent level) following requirement analysis logical and physical design before implementation

Distinction between two types of data


quantifying data: measures, cells of the cube, fact table qualifying data: properties, dimensions, dimension tables

q q q

Hierarchical structures of dimensions can be complex ME/R notation can be used to document conceptual models Several ways to map an MD model to a relational DB
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

34

Canonical Query (I)


Restriction Element

A B A B

Result Measures m1 m2

Query Result Result Granularity

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

35

Canonical Query (II)


Canonical Query Definition Result Measures Restriction Elements Result Granularity m1 r1 g1 r2 g2 mk rn gn

SELECT g1,...,gn, aggr(m1),..., aggr(mk) FROM FactName, Dim1,..., Dimn WHERE Dim1.level(r1) = r1 AND ... AND Dimn.level(rn) = rn AND Dim1.d1=FactName.d1 AND ... AND Dimn.dn=FactName.dn GROUP BY g1,...,gn
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Anda mungkin juga menyukai