Anda di halaman 1dari 29

Data Warehousing In Retail Sales

Click to edit Master subtitle style

3/15/12

Agenda
Goals of Data Warehouse Components of Data Warehouse Dimensional Modeling Case Study : Retail Business Designing the Dimensional Model Dimensional Table Attributes
o Date Dimension o Product Dimension o Store Dimension
3/15/12

Goals of Data Warehouse

The data warehouse must make an organizations information easily accessible

The data warehouse must present the organizations information consistently

The data warehouse must be adaptive and resilient to change

The data warehouse must be a secure bastion that protects our information assets

The data warehouse must serve as the foundation for improved decision making
3/15/12

Components of Data Warehouse

3/15/12

Dimensional Modeling
Fact Table

Daily Sales Fact Table


Date Key (FK) Product Key (FK) Store Key (FK) Quantity Sold Dollar Sales Amount

Stores Business Performance measurement

Mostly numeric & additive

It expresses many to many relationship between dimensions

Dimension Table

Product Dimension Table


Product Key (PK) Product Description SKU Number (Natural Key) Brand Description Category Description Department Description Package Type Description Package Size ... and many more

Discretely valued description that is more or less constant and participates in constraints It implements user interface to the Data Warehouse

3/15/12

Bringing together facts and dimensions


Star Join Schema

Simplicity and symmetry Highly recognizable to business users High performance benefits

3/15/12

Example: A simple report

3/15/12

Case Study: Retail Business


A large grocery chain

3/15/12

Retail Business
100 grocery stores spread over five-state

area

Each store has a full complement of

departments, including grocery, frozen foods, dairy, meat and health/beauty aids
Each store has roughly 60,000 SKUs on its

shelves

About 55,000 of the SKUs come from

outside manufacturers and have Universal Product Codes (UPCs) imprinted on the product package.
The remaining 5,000 SKUs come from
3/15/12

Retail Business
Data collection happens at
Cash Registers (POS systems) Back door where vendors make deliveries

Key inputs to the dimensional modeling

3/15/12

Designing the Dimensional Model


Step 1. Select the Business Process Aim: Management wants to better understand customer purchases as

captured by the POS system

The business process is POS retail sales

Step 2. Declare the Grain Granularity, atomic data Provides maximum flexibility Can support all possibilities of user requests The most granular data is an individual line item on a POS transaction

Step 3. Choose the Dimensions Date, product, and store dimensions

3/15/12

Designing the Dimensional Model


Measured facts in the retail sales schema

3/15/12

Dimensional Table Attributes


Date Dimension Product Dimension Store Dimension Promotion Dimension Promotion Coverage Factless Fact Table Degenerate Transaction Number
3/15/12

Date Dimension
It is present in every data mart as a data

mart is a time series

Date Dimension Unlike other dimension table date Date Attributes of date dimension: Description Full Date Day of Week o Day Number Day Number in Epoch o Month Number Week Number in Epoch Month Number in Epoch o Holiday Indicator Day Number in o Weekday Indicator Calendar Month Day Number in o Selling Season 3/15/12 Calendar Year Date Key (PK) dimension can be build in advance

Date Dimension contd..


Why an explicit date dimension table is needed? As SQL query can directly constrain on fact table date key, if the date key in the fact table is a date-type field.
Usability: business user is not versed in SQL date

semantics, so he or she would be unable to directly leverage inherent capabilities associated with a date data type
SQL date functions do not support filtering by

attributes such as weekdays versus weekends, holidays, fiscal periods, seasons, or major events
Presuming that the business needs to slice data by
3/15/12

Product Dimension
The product dimension describes every SKU

in the grocery store.

The product dimension is almost always

sourced from the operational product master file


Most retailers administer their product

master files at headquarters and download a subset of the file to each stores POS system at frequent intervals attributes of each SKU

The product master holds many descriptive The merchandise hierarchy is an important
3/15/12

Product Dimension contd..


There are attributes (Bottle, Bag, Box) in

the product dimension table which are not part of the merchandise hierarchy, can combine constraints with a constraint on a merchandise hierarchy attribute three primary dimensions in nearly every data mart
Bakery Bakery Bakery Frozen Foods Frozen Foods Frozen Foods Frozen Foods Frozen Foods Frozen Foods Baked Well Fluffy Light QuickFreeze Freshlike Frigid Icy QuickFreeze Freshlike

The product dimension is one of the two or

Department Description Brand Description set of dimension A robust and complete


$3,009 $3,024 $6,298 $5,321 $10,476 $7,328 $2,184 $6,467 $10,476 1,138 1,476 2,474 2,640 5,234 3,092 1,437 3,162 5,234

Sales Dollar Amount

Sales Quantity

attributes translates into user capabilities for robust and complete analysis
3/15/12

Store Dimension
Store Dimension

The store dimension

describes every store in our grocery chain primary geographic dimension in our case study as a location. As a result, we can roll stores up to any geographic attribute, such as ZIP code, county

The store dimension is the

Each store can be thought of

Store Key (PK) Store Name Store Number (Natural Key) Store Street Address Store City Store County Store State Store Zip Code Store Manager Store District Store Region Floor Plan Type Photo Processing Type Financial Service Type Selling Square Footage Total Square Footage First Open Date Last Remodel Date and 3/15/12 more

Promotion Dimension
It describes the promotion conditions under

which a product was sold

Causal dimension: Temporary price

reductions, end-aisle displays, newspaper ads, and coupons


Factors on which Promotions are judged:
o Lift: Measured on the agreed baseline sales o Whether transferred sales from regularly

priced products to temporarily reduced-priced products but sales decrease in nearby products on the

o Cannibalization: Gain in sales of3/15/12 one product

The tradeoffs

in favor of keeping the four dimensions together include the following:

o Since the four causal mechanisms are highly

correlated, the combined single dimension is not much larger than any one of the separated dimensions would be
o The combined single dimension can be

browsed efficiently but it only shows the possible combinations. Browsing in the dimension table does not reveal which stores 3/15/12 or products were affected by the promotion. This information is found in the fact table

Promotion Coverage Factless Fact Table


It is needed to find the products that were

on promotion but did not sell

Wed load one row in the fact table for each

product on promotion in a store each day regardless of whether the product sold or not.
It is a factless fact table as it has no

measurement metrics; it merely captures the relationship between the involved keys promotion but didnt sell requires a two3/15/12 step process

To determine what products where on

Degenerate Transaction Number


The POS transaction number is the key to

the transaction header record, containing all the information valid for the transaction as a whole, such as the transaction date and store identifier header information is already extracted into other dimensions as it serves as the grouping key for pulling together all the products purchased in a single transaction 3/15/12

In dimensional model this interesting

The POS transaction number is still useful

Retail Schema
A frequent

shopper dimension table and add another foreign key in the fact table is created to see exact purchase of frequent shopper on a weekly basis
3/15/12

A frequent

Retail Schema
Original

schema gracefully extends to accommodate these new dimensions largely because we chose to model the POS transaction 3/15/12 data at its

Dimension Normalization
Perceived benefits of Dimension

Normalization
cryptic codes

o This design saves space as were only storing o The normalized design for the dimension

tables is easier to maintain

Snowflaking: Redundant attributes are

removed from the flat, denormalized dimension table and placed in normalized secondary dimension tables
o The multitude of snowflaked tables makes for
3/15/12

Reason for not adopting modelling:

Surrogate Key
Surrogate keys are integers that are

assigned sequentially as needed to populate a dimension

It is encouraged to use surrogate keys in

dimensional models rather than relying on operational production codes operational code:

Reason to avoid natural keys based on the


o To avoid embedding intelligence in the data

warehouse keys because any assumptions that we make eventually may be invalidated
o Queries and data access applications should
3/15/12

Market Basket Analysis


Market Basket Analysis is the notion of analyzing the

combination of products that sell together combinations of items

It gives the retailer insights about how to merchandise various The retail sales fact table cannot be used easily to perform MBA

table is a periodic Data mining tools and some OLAP products can assist with snapshot representing market basket analysis. However in the absence of these tools, a the pairs of products more direct approach is used purchased together during a specified time period
o

as SQL was never designed to constrain and group across line item fact rows o The market basket fact

The basket count is a semiadditive fact

3/15/12

Thank You
3/15/12

References
The Data Warehouse ToolKit Ralph Kimbal

& Margy Ross

Wikepedia.org

3/15/12

Anda mungkin juga menyukai