Anda di halaman 1dari 33

Dimensional Modeling

Kimball: 6, 7
Make everything as simple as
possible, but not simpler.
(Albert Einstein)
2
Agenda
Topics covered:
Introduction to Dimensional Modeling
Designing a Dimensional Model
Designing a Physical Database
Planning for Performance
3
Introduction to Dimensional Modeling
Dimensional modeling divides
the world into measurements
and context.

Measurements are captured by the organization's business
processes and their supporting operational source systems.
Measurements are usually numeric values; we refer to
them as facts.
Facts are surrounded by largely textual context that is true
at the moment the fact is recorded. This context is
intuitively divided into independent logical clumps
called dimensions. Dimensions describe the "who, what,
when, where, why, and how" context of the
measurement.
4
An Example of Facts and Dimensions
5
Referred to as master dimensions or common
reference dimensions.
Two forms: (1) two conformed dimensions are
identical, (2) one dimension is a perfect subset
of another.
Benefits include:
Consistency
Integration
Reduce development time to market
Conformed dimensions and their benefits
6
Shrunken conformed
dimension tables are
created to describe
fact tables that either
naturally capture
measurements at a
higher level of detail,
or facts that have
been aggregated to a
less granular, rolled
up level for
performance reasons
(e.g. product
dimension is rolled up
to Brand dimension).
7
Modeling Process
8
Figure 6.4. Diagram of data warehouse bus with conformed dimension interfaces.

9
High level model bubble diagram
10
Step 1: Choose the business process from the
bus matrix
Step 2: Declare the grain, as atomic as
possible
Step 3: Identify the dimensions
Step 4: Identify the facts
Four-Step Dimensional Design Process
11
Figure 6.5. Bus matrix for manufacturing
supply chain

Step 1: Choose the biz process from the matrix
12
The first dimensional model built should be the
one with the most impactit should answer the
most pressing business questions and be
Step 1: Choose the biz process from the matrix
13
Matrix Row Mishap
Departmental or overly encompassing rows:
distinction between departments (group of
Report-centric or too narrowly defined rows:
kitchen sink syndrome
Matrix Column Mishap
Overly generalized column: check for overlaps
Separate column for each level of hierarchy
Avoid Common Matrix Mishaps
14
Example grain declarations include:
An individual line item on a customer's retail sales
ticket as measured by a scanner device
A line item on a bill received from a doctor
An individual boarding pass to get on a flight
A daily snapshot of the inventory levels for each
product in a warehouse
A monthly snapshot for each bank account

Preferably you should develop dimensional models for the
most atomic information captured by a business process.
Atomic data is the most detailed information collected; such
data cannot be subdivided further. Dont bypass this step!

Step 2: Declare the grain, as atomic as possible
15
All dimensions in bus matrix should be tested against
the grain to see if they fit.
Scrutinize the dimensions to make sure they make
sense. Consider the impacts on usability and
performance of splitting a large dimension into several
dimensions or combining dimensions.
A careful grain statement determines the primary
dimensionality of the fact table. It is then often
possible to add more dimensions to the basic grain of
the fact table, where these additional dimensions
naturally take on only one value under each
combination of the primary dimensions. If the
additional dimension violates the grain by causing
additional fact rows to be generated, then the grain
statement must be revised to accommodate this
dimension.

Step 3: Identify the dimensions
16
Make sure the facts are additive along all
dimensions
The most useful facts are both numeric and
retrieve a single fact table row, queries typically
select hundreds or thousands of fact rows.

Step 4: Identifying Facts
17
Imagine that we work in the headquarters of a large
grocery chain. Our business has 100 grocery stores
spread over a five-state area. Each of the stores has
a full complement of departments, including grocery,
frozen foods, dairy, meat, produce, bakery, floral, and
health/beauty aids. Each store has roughly 60,000
individual products on its shelves. The individual
products are called stock keeping units (SKUs). About
55,000 of the SKUs come from outside manufacturers
and have bar codes imprinted on the product
package. These bar codes are called universal
product codes (UPCs). UPCs are at the same grain
as individual SKUs. Each different package variation
of a product has a separate UPC and hence is a
separate SKU.
A retail case
18
In our retail case study, management wants to
better understand customer purchases as
captured by the POS system.
Thus the business process we're going to
model is POS retail sales. This data will allow
us to analyze what products are selling in
which stores on what days under what
promotional conditions.
Step 1: Choose the biz process
19
In our case study, the most granular data is an
individual line item on a POS transaction. To
ensure maximum dimensionality and flexibility,
we will proceed with this grain.

A data warehouse almost always demands data
expressed at the lowest possible grain of each
dimension not because queries want to see individual
low-level rows, but because queries need to cut
through the details in very precise ways.
Step 2: Declare the Grain
20
In our case study we've decided on the
following descriptive dimensions: date, product,
store, and promotion. In addition, we'll include
the POS transaction ticket number as a special
dimension.
Step 3: Choose Dimensions
21
The facts must be true to the grain: the individual line
item on the POS transaction in this case.
The facts collected by the POS system include the
sales quantity, per unit sales price, and the sales
dollar amount. The sales dollar amount equals the
sales quantity multiplied by the unit price. Cost dollar
amount is also included.
Step 4: Identify the Facts
22
Three of the facts, sales quantity, sales dollar amount,
and cost dollar amount, are beautifully additive across
all the dimensions.
We can compute the gross profit by subtracting the
cost dollar amount from the sales dollar amount, or
revenue. Although computed, this gross profit is also
perfectly additive across all the dimensions.
The gross margin can be calculated by dividing the
gross profit by the dollar revenue. Gross margin is a
nonadditive fact because it can't be summarized
along any dimension.
Unit price is also a nonadditive fact. Attempting to
sum up unit price across any of the dimensions
results in a meaningless, nonsensical number.

Step 4: Identify the Facts
23
Physical Design for Project 2
24

CREATE TABLE DEPARTMENT
(
DEPT_ID Number NOT NULL,
DEPT_NAME Varchar2(40) NOT NULL
);

ALTER TABLE DEPARTMENT ADD CONSTRAINT DEPT_UID PRIMARY
KEY (DEPT_ID);
25
CREATE TABLE EMPLOYEE
( EMP_ID Number NOT NULL,
DEPT_ID Number NOT NULL,
EMP_SSN Char(9 ) NOT NULL,
EMP_FIRST_NAME Varchar2(20) NOT NULL,
EMP_LAST_NAME Varchar2(30) NOT NULL,
EMP_BIRTH_DATE Date NOT NULL,
EMP_GENDER Char(1) NOT NULL,
EMP_HIRE_DATE Date NOT NULL,
EMP_STREET Varchar2(80), EMP_CITY Varchar2(80),
EMP_STATE Char(2), EMP_ZIP Char(5),
EMP_TYPE Varchar2(1)
CONSTRAINT ValidValuesEMP_TYPE CHECK ((EMP_TYPE IN ('E','N')))
);
ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID1 PRIMARY KEY
(EMP_ID,DEPT_ID);
ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID2 UNIQUE (EMP_SSN);
ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID3 UNIQUE
(EMP_FIRST_NAME,EMP_LAST_NAME,EMP_BIRTH_DATE,EMP_GENDER);
26
Database Systems, 8
th
Edition 26
Triggers: Maintain PK unique across subtypes
Procedural SQL code automatically invoked by
RDBMS on data manipulation event
Trigger definition:
Triggering timing: BEFORE or AFTER
Triggering event: INSERT, UPDATE, DELETE
Triggering level:
Statement-level trigger
Row-level trigger
Triggering action
DROP TRIGGER trigger_name
27

28
create or replace
TRIGGER non_exempt_employee_check
BEFORE INSERT OR UPDATE OF emp_id
ON non_exempt_employee
FOR EACH ROW
DECLARE
dummy INTEGER := 0;
BEGIN
IF ( (INSERTING OR UPDATING) AND ( :new.emp_id <> :old.emp_id)) THEN
SELECT COUNT(*)
INTO dummy
FROM exempt_employee
WHERE emp_id = :new.emp_id;
IF (dummy <> 0) THEN
RAISE DUP_VAL_ON_INDEX;
END IF;
END IF;
END;
/

The PL/SQL Code for Two Subtypes
29
create or replace
TRIGGER exempt_employee_check
BEFORE INSERT OR UPDATE OF emp_id
ON exempt_employee
FOR EACH ROW
DECLARE
dummy INTEGER := 0;
BEGIN
IF ( (INSERTING OR UPDATING) AND ( :new.emp_id <> :old.emp_id))
THEN
SELECT COUNT(*)
INTO dummy
FROM non_exempt_employee
WHERE emp_id = :new.emp_id;
IF (dummy <> 0)
THEN
RAISE DUP_VAL_ON_INDEX;
END IF;
ELSE
SELECT COUNT(*)
INTO dummy
FROM <third_subtype>
WHERE emp_id = :new.emp_id;
IF (dummy <> 0)
THEN
RAISE DUP_VAL_ON_INDEX;
END IF;
END IF;
END;
/
The PL/SQL Code for Three Subtypes
30
31
Choose a real world case and design a mini data
warehouse
Design the Dimensional Model, include at least 4 main
dimensions
Warehouse software
Prepare the reports and queries based on the business
requirements
Present to the class (15-20 minutes).
For reference purpose: The Data Warehouse Toolkit: The
Complete Guide to Dimensional Modeling, 2nd Edition by Ralph Kimball
and Margy Ross, John Wiley & Sons 2002
This book contain many examples from different industries on
dimensional modeling.
Project 3: Data Warehouse Design and Development
32