Anda di halaman 1dari 37

Data Warehouse

Concepts &
MicroStrategy
Architecture

Objectives
What is Data Warehousing
The evolution of Data Warehousing
Need for Data Warehousing
OLTP Vs Warehouse Applications
Data marts Vs Data Warehouses
Operational Data Stores
Overview of Warehouse Architecture
MicroStrategy Architecture
MicroStrategy Components
2

What is a Data Warehouse ?


Can I see credit
Can I see credit
report from
report from
Accounts, Sales
Accounts, Sales
from marketing and
from marketing and
open order report
open order report
from order entry for
from order entry for
this customer
this customer

Data from
Data from
multiple
multiple
sources is
sources is
integrated for a
integrated for a
subject
subject

A data warehouse is a subject-oriented,


integrated, nonvolatile, time-variant
collection of data in support of
management's decisions.
- WH Inmon

Identical queries
Identical queries
will give same
will give same
results at different
results at different
times. Supports
times. Supports
analysis requiring
analysis requiring
historical data
historical data

Data stored for


Data stored for
historical period. Data
historical period. Data
is populated in the
is populated in the
data warehouse on
data warehouse on
daily/weekly basis
daily/weekly basis
depending upon the
depending upon the
requirement.
requirement.

WH Inmon - Regarded As Father Of Data Warehousing

Overview
A data warehouse has been used to refer to a
database that contains very large stores of
historical data.
The data is stored as a series of snapshots, in
which each record represents data at a specific
time. This data snapshot allows a user to
reconstruct history and to make accurate
comparisons between different time periods.
A data warehouse integrates and transforms the
data that it retrieves before it is loaded into the
warehouse.
A primary advantage of a data warehouse is that
it provides easy access to and analysis of vast
stores of information.
4

Definition

Subject Oriented: DW is organized around subject, as


oppose to legacy application system which organizes
around processes. Subjects in a warehouse means items
such as Customer, employees and products.
Integrated: The data within warehouse is integrated in
that the final product is a fusion of various legacy system
information into a cohesive set of information. It is critical
to use consistent naming convention, variable
measurement standards, encoding structures and data
distribution characteristics.
Time Variant: Data in DW is accurate to some date and
time. An indication of time is generally included in each row
of the database to give the WH time variant characteristics.
Non Volatile: The WH data is nonvolatile in that the data
which enters the database is rarely, if ever, changed (only
when accuracy problems occur).

Subject-Oriented- Characteristics of
a Data Warehouse
Operation
al

Data
Warehouse

Leads

Prospects

Customers

Products

Quotes

Orders

Regions

Time

Focus is on Subject Areas rather than Applications

Extract Transform Load Aggregate Publish Query

Marketing

Sales

N
I
G
H
T
L
Y
L
O
A
D
S
H
I
S

Legacy

T
O
R
Y

Product

Gross Margin
Transactions

Ledger

Customer Invoices
Gross
Margin
Metrics

Location

Sales
Person

Time

Metadata Repository
Directory of available data
Definitions and business rules

Rebate Accruals
Inventory
Adjustments
Manual Journal
Entries
System-Generated
Journal Entries
AP Payments

L
A
Y
E
R

Narrowcast Server

HR

General

S
E
C
U
R
I
T
Y

MicroStrategy Web Reporting

Finance

Web
Ad
Hoc

Senior
Management
Finance
Supply
Marketing
Sales Directors
Others by approval

Emailed
Excel
Reports

District Managers
Account
Managers
Location
Managers
Other Users

Non-volatile - Characteristics of a
Data Warehouse
insert

change

Data
Warehouse

Operational
delet
e

replace

insert
load

read only
access

change

Integrated View Is The Essence Of A Data Warehouse

Time Variant - Characteristics of a


Data Warehouse

Operational

Current Value data


time horizon : 60-90 days
key may not have element of
time

Data Warehouse Typically Spans Across Time

Data
Warehouse
Snapshot data
time horizon : 5-10 years
key has an element of time
data warehouse stores
historical data

Alternate Definitions
Data Warehouse is a repository of data
summarized or aggregated in
simplified form from operational
systems. End user orientated data
access and reporting tools let user
get at the data for decision support Babcock
10

Evolution of Data
Warehousing
1960
- 1985 : MIS Era

Unfriendly
Slow
Dependent on IS programmers
Inflexible
Analysis limited to defined reports
Focus on Reporting

11

Evolution of Data
Warehousing
1985
- 1990 : Querying Era
Adhoc, unstructured access to corporate data
SQL as interface not scalable
Cannot handle complex analysis

Focus on Online Querying

12

Evolution of Data
Warehousing
1990
- 20xx : Analysis Era

Trend Analysis
What If ?
Moving Averages
Cross Dimensional Comparisons
Statistical profiles
Automated pattern and rule discovery
Focus on Online Analysis

13

Need for Data Warehousing


Better business intelligence for end-users
Reduction in time to locate, access, and analyze
information
Consolidation of disparate information sources
Strategic advantage over competitors
Faster time-to-market for products and services
Replacement of older, less-responsive decision
support systems
Reduction in demand on IS to generate reports

14

OLTP Vs Warehouse
Operational System

Data Warehouse

Transaction Processing

Query Processing

Time Sensitive

History Oriented

Operator View

Managerial View

Organized by transactions (Order, Input,


Inventory)

Organized by subject (Customer, Product)

Relatively smaller database

Large database size

Many concurrent users

Relatively few concurrent users

Volatile Data

Non Volatile Data

Stores all data

Stores relevant data

Not Flexible

Flexible

Do we need a separate
database ?
OLTP and data warehousing require two very
differently configured systems
Isolation of Production System from Business
Intelligence System
Significant and highly variable resource
demands of the data warehouse
Cost of disk space no longer a concern
Production systems not designed for query
processing
16

Data Marts
Subject or Application Oriented
Business View of Warehouse
Quick Solution to a specific Business
Problem
Finance, Manufacturing, Sales etc.
Smaller amount of data used for
Analytic Processing

A Logical Subset of The Complete Data Warehouse

17

Data Warehouses or Data


For companies interested in changing their corporate
Marts
cultures or integrating separate departments, an
enterprise wide

approach makes sense.

Companies that want a quick solution to a specific


business problem are better served by a standalone
data mart.
Some companies opt to build a warehouse
incrementally, data mart by data mart.

A Logical Subset of The Complete Data Warehouse

18

Data Warehouse and Data


Mart
Data Warehouse

Data Marts

Scope

Application Neutral
Centralized, Shared
Cross LOB/enterprise

Specific Application
Requirement
LOB, department
Business Process
Oriented

Data
Perspecti
ve
Subjects

Historical Detailed data


Some summary

Detailed (some
history)
Summarized

Multiple subject areas

Single Partial
subject
Multiple partial
subjects

Data Warehouse and Data


Mart
Data Warehouse

Data Marts

Data Sources

Many
Operational/ External
Data

Few
Operational,
external data

Implement
Time Frame

9-18 months for first


stage
Multiple stage
implementation

4-12 months

Characteristi
cs

Flexible, extensible
Durable/Strategic
Data orientation

Restrictive, non
extensible
Short life/tactical
Project
Orientation

Warehouse or Mart First ?


Data Warehouse First

Data Mart first

Expensive

Relatively cheap

Large development cycle

Delivered in < 6 months

Change management is difficult

Easy to manage change

Difficult to obtain continuous


corporate support

Can lead to independent and


incompatible marts

Technical challenges in building


large databases

Cleansing, transformation,
modeling techniques may be
incompatible

OLTP Systems Vs Data


Warehouse

Remember
Between OLTP and Data Warehouse systems
users are different
data content is different,
data structures are different
hardware is different

Understanding The Differences Is The Key

22

Operational Data Store Definition

A
B

ODS

Data
Warehouse

C
Operational
DSS

23

Can I see credit


report from
Accounts, Sales
from marketing
and open order
report from
Definition
order entry for
this customer

Operational Data Store -

Data from multiple


sources is integrated
for a subject

A subject oriented, integrated,


volatile, current valued data store
containing only corporate
Identical queries may
detailed data
give different results
at different times.
Supports analysis
requiring current
data

Data stored only for


current period. Old
Data is either
archived or moved to
Data Warehouse

24

Different kinds of Information Needs


Current
Current

Recent
Recent

Is this medicine available


in stock

OLTP

What are the tests this


patient has completed so
far

ODS

Historical
Historical
Has the incidence of
Data
Tuberculosis increased in
last 5 years in Southern
region
25

Warehouse

Schema Types
In designing data models for data warehouses / data marts, the
most commonly used schema types are Star Schema and
Snowflake Schema.
Star Schema: In the star schema design, a single object (the fact
table) sits in the middle and is radially connected to other
surrounding objects (dimension lookup tables) like a star. A star
schema can be simple or complex. A simple star consists of one
fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of
the star schema, where each point of the star explodes into more
points. The main advantage of the snowflake schema is the
improvement in query performance due to minimized disk storage
requirements and joining smaller lookup tables. The main
disadvantage of the snowflake schema is the additional
maintenance efforts needed due to the increase number of lookup
tables.

26

Example of Star Schema

In the example figure, sales fact table is connected to dimensions location,


product, time and organization. It shows that data can be sliced across all
dimensions and again it is possible for the data to be aggregated across multiple
dimensions. "Sales Dollar" in sales fact table can be calculated across all
dimensions independently or in a combined manner which is explained below.

Sales Dollar
Sales Dollar
Sales Dollar
Sales Dollar
employee

value
value
value
value

for
for
for
for

a
a
a
a

particular product
product in a location
product in a year within a location
product in a year within a location sold or serviced by an

27

Snowflake Schema
A snowflake schema is a term that
describes a star schema structure
normalized through the use of outrigger
tables. i.e dimension table hierarchies are
broken into simpler tables. In star schema
example we had 4 dimensions like
location, product, time, organization and a
fact table (sales).
In Snowflake schema, the example
diagram shown below has 4 dimension
tables, 4 lookup tables and 1 fact table.
The reason is that hierarchies (category,
branch, state, and month) are being
broken out of the dimension tables
(PRODUCT, ORGANIZATION, LOCATION,
and TIME) respectively and shown
separately. In OLAP, this Snowflake
schema approach increases the number of
joins and poor performance in retrieval of
data. In few organizations, they try to
normalize the dimension tables to save
space. Since dimension tables hold less
space, Snowflake schema approach may
be avoided.
28

MicroStrategy Architecture

29

Three-tier Architectural
Overview
Desktop

Desktop

Desktop

Intelligence
Server
TCP/IP
ODBC

Metadata

Warehouse

Four-tier Architectural
Overview
Browsers

Web Server/
MicroStrategy Web
Desktop

HTTP

Desktop

Intelligence
Server

TCP/IP
ODBC

Metadata

Warehouse

Components

32

MicroStrategy Intelligence Server


Major Functions:
Central Point for all communication for metadata and warehouse
and the clients
Handles client requests for objects
Handles Database connections
Apply security to all incoming requests
Object/ Element/Report Caching
Included the SQL Engine
Contains an Analytical engine with over 150 different mathematical
and statistical functions. This is capable of handling some processing
too
All other products in the MicroStrategy platform work in conjunction
with the Intelligence Server.

MicroStrategy Architect
Model applications using an intuitive graphical interface. It provides
an environment for creating and maintaining BI application.
Project Designer is responsible for the design,implementation, and
creation of projects.

MicroStrategy Web
Deploy reports and related objects to large number of users via the
web.
It provides an easy large scale deployment to many users without
having to install a product on each users machine..
Pure HTML thin web client which is easily customizable using the
SDK
All the major tasks are handled by the Intelligence Server, the web
server handles http requests from users and returns data requested.

MicroStrategy Office
Users can run, edit and format any MicroStrategy report directly
from within Microsoft Office applications such as Excel, PowerPoint
and Word.
Designed using Microsoft .NET technology and accesses the
MicroStrategy business intelligence platform using XML and Web
services.

MicroStrategy Narrowcast Server


Proactively distributes personalized information to report customers
through a variety of devices, including mobile phones, PDAs, e-mail,
Web pages, and pagers.
Distribution of personalized messages are triggered according to
predefined schedules and exception criteria
MicroStrategy Narrowcast Server also provides a self-subscription
portal.