Anda di halaman 1dari 18

A tour of new features

Queries for DWH


The average no of sales of beans per store

over the last month The ten most popular programs over the past week The top 20% spending customer over the past quarter

Identifying facts and dimensions


Look for the elemental transactions within

the business process - fact tables. Determine the key dimensions that apply to each fact Check that a candidate fact is not actually a dimension with embedded facts Check that a candidate dimension is not actually a fact table within the context or DS requirement

1. Look for elemental transactions within the business process


Always determine the factual transactions being used by the business processes Dont assume that the reported transactions within an operational system are fact data Eg Telco call analysis call event

customer analysis customer events (disconnection, payment) Eg Banking customer profiling customer events customer profitability account transactions

2. Determine the key dimensions that apply to each fact


Eg Banking account transaction identified

as fact table Ques analyze account transactions by account or how customer use service If focus is analysis of customer usage then dimension customer entity

3. Check candidate fact is not dimension


Eg cable company customer profiling

address entity could be mistaken for a fact table More appropriate fact table no of operational events occurred at the specific address

4. Check that a candidate dimension is not a fact


Customer

Fact In a customer profiling or customer marketing database, it is probably a fact table Dimension In a retail sales analysis data warehouse, or any other variation customer is used as basis for analysis If a dimension can be viewed by more than three entities its a fact

Designing fact tables


Understand historical period for each supported function Whether statistical samples will satisfy requirement for detailed data Select appropriate columns of data to hold Minimize the column sizes within fact table Determine use of intelligent or non intelligent keys Design time into fact table Partition the fact table

1. Identify historical period for each function


What degree of detail is necessary for each

business - minimum retention period for each detail Draw retention period graph showing detail necessary for each business function Eg Retail Sales analysis 3 months details Life style profiling - 6 months weekly

2. Determine whether samples will satisfy requirement for detailed data


If business requirement does not require all

detailed fact data consider storing samples and aggregate the rest Eg retail sales analysis samples spot trends across all stores Not suitable to determine product buying patterns in all stores located at seaside

3. Select the appropriate columns


Is this column telling me something about a factual event? Is there any other place I can derive this data from? Does the business care, or is it for control purpose only? Eg Sales analysis
Product ref, store ref, date, no sold, revenue Optional customer identifier, time of transaction

4. Minimize the column sizes within fact table


Eg fact table for a telco call analysis DWH

contains 3.65 billion rows 2 million customers, with 2.5 transactions per day per customer, 2 year retention period saving of 10bytes per row 10X3.65 billion bytes= 33.99GB

5. Determine the use of Intelligent or Non-Intelligent keys


Intelligent keys when query refers directly

to the intelligent key the query can be satisfied by the fact table alone Diasadv If any of the identifier changes fact table will have to be updated

6. Design time into fact table


Storing the physical date cost is minimal

compared to reference to a time dimension table Storing an date offset from inherent start of the table (week, month, quarter) Storing a date range 7. Partition the fact table

Designing Dimension tables


Creating a star dimension
denormalizing all additional entities into single

table May not be appropriate in situation where additional data is not accessed very often

Hierarchies and networks Eg product group hierarchies


Product sold by competitors Product supplied by a single, major supplier Product that are part of a single promotion

Determine the hierarchy likely to be used by

largest no of queries

Dimensions that vary over time


When business changes re-categorize the

products Use queries that compare facts within a grouping that exists at present with grouping that existed in the past It is necessary to store date ranges on the dimension table

Managing large dimension tables


If the dimension table grows similar to fact table

partition the table horizontally

Anda mungkin juga menyukai