Anda di halaman 1dari 13

Assessment 1: Data Warehouse

fundamentals, fact table design

Short answer questions

Business Intelligence (BCO6676)

by

01-May-2016
Questions & Answers: Data warehouse fundamentals and fact table design
Assignment 1: Data Warehouse Design

QUESTION 1.

a. Explain why decision support systems have become so important.


Due to the quick expansion of the Internet along the information system technologies, it has
brought a new era to observe data with different perspectives and in real-time. Nowadays,
companies can collect thousands of data in a very short time, but what to do with all this data?
That is why the Decision support systems (DSS) has taken an important role in every single
business that wants to keep expanding, but most importantly, anticipating market trends and
a better understanding of customer needs. The DSS importance is not the huge amount of
information a company can have, but rather the quality and the knowledge that we can get
from it (Sauter, 2002). For that reason, DSS allow executives, managers and different type of
users to support decision making processes, visualising the information in a “What if” analysis
scenario (data sourced from transactional systems) in a quick and summarised way by
dynamic reports, graphics, KPI measurements and under a specific data structure based on
the strategic objectives of an organisation. In this ways, companies can embrace new business
opportunities, anticipate competitors and gain market share.

- Why are they different to transactional database systems?


Because DSS are subject oriented while transactional database systems are process oriented.
This means that DSS base their analysis on the data warehouse sourced by the transactional
database system, which is in charge to run the organisational transactions to add, modify and
delete current records. DSS are more complex, they are based on historical data to summarise
it and convert it in information used to determine a particular subject, supporting decision
making processes to create best practices and strategies within a company.

b. One quoted advantage of implementing a Business Intelligence systems is the


concept of a ‘single version of the truth’ Explain what this term refers to?
Single version of the truth (SVOT) means that a company relies its whole knowledge on a
centralised database (Below, 2011). Organisations need to make clear to employees, where
is the data source located so they can trust and work from it. This process to align the entire
variety of business processes, combining internal and external type of sources, makes the
data irredundant and coherent for employees to maintain the same understanding of the
business. For example information about customers, services, products, vendors, assets and
financial records have just one version of the truth, and that information is unique, consistent
and most reliable within an enterprise. For that reason, Business Intelligence (BI) base its

1
Assignment 1: Data Warehouse Design

analysis on this information stored in a data repository, and then to be visualised by


presentation tools, such as: multidimensional reports, graphics, dashboards, KPIs, etc, to
support decision making process.

c. What are the properties that a data warehouse star schema must contain explain
each property in detail.
The data warehouse star schema is mainly made up to answer business questions and
anticipate market behaviour. This schema has the following elements: fact tables, dimension
tables and the joins which interrelate the dimension tables with the fact table to make the logical
structure of a data warehouse. Below I will explain in detail each one of this elements.
1. Fact table: This tables are specifically designed to answer questions for individual
business measures and only one can be permitted per star schema. It contains a series
Foreign Key (FK) from the dimensional tables primary key (e.g. Customer ID, product No).
It must include at least one time dimension (e.g. Order date) and some KPIs in order to
measure some variables depend on the record you want to analyse, aggregate and
quantify. Each one of the combination between the dimension (e.g. customer ID, Product
No, order date) represents a different record within a fact table. The level of detail
displayed on an individual record is called granularity, which can be defined by the “by”
words, such as: sales by customers, sales by region, etc.
2. Dimension Tables: Are a series of tables that surround the fact table and help to describe
the object/entity that has been linked with the fact table (e.g. Customer ID, Product ID,
Date). These tables are characterised for having one primary key (PK) which represents
an individual record within a database, and also, store a number of attributes where you
extract specific data from (e.g. Customer ID (PK) with the dimensions: name, suburb,
phone, post code).
3. Joins: Are in charged to link the dimension tables with its fact table. Their functionality is
to maintain the structure and integrity between the two tables, verifying the value from de
FK to the primary key in the dimension table.

QUESTION 2.

a. In terms of Analytics, what is meant by the terms drill down and slicing a cube?
A cube is based on a series of dimensions with specific type of data. These dimensions can
be analysed in detail by these two technics: Drill down/up analysis or slicing analysis, which
its selection depends on what information you want to know deeply.

2
Assignment 1: Data Warehouse Design

Drill down: refers to the level of granularity for a particular dimension. In other words, this
technic takes the information from the most summarised (Up) to the most detailed (Down) data
in a dimension (Wikipedia, 2016). For example, sales by customer, sale by region and sales
by time-period, the “by” word can summarised the grade of drill down report.
Slicing Analysis: brings information related with a single value within a dimension. This creates
a new cube with fewer dimensions and allows to analyse data on that particular value. For
instance, sales by time can be sliced by the year 2014, the other years are taken out from the
cube reducing the amount of dimensions.

b. What is the difference between a Star Schema fact table and SAP’s Infocube?
The main difference is the way that the structure of the fact table/SAP’s Infocube and their
dimension tables are built from. The traditional star schema is based on a series of dimensional
tables that surround the fact table. This fact table contains dimension primary keys, at least
one time dimension and KPIs within it. This fact table structure, in comparison with SAP’s
Infocube, is much bigger and is not limited to a maximum number of dimensions (Guru99.com,
2016). On the contrary, the SAP’s Infocube is limited to a 16 dimension tables where three of
them are predefined (Time, unit, Info Package), and the 13 left are user defined. This makes
this Infocube much smaller than the traditional fact table, but dimensions are much bigger due
to the connection through Surrogate ID (SID) between master data (divided by attribute, text
and hierarchies) and the Infocube. It is worth to mention that Star schema fact table has master
data, while SAP’s infocube does not. The latter has only SID tables which are linked with
master data, which is different.

c. In relation to a fact table, what does the term granularity refer to?
Granularity refers to the level of detail or characteristics to describe the key figures (KPIs)
within a fact table. In other words is how far you want to drill down on your data, the most
summarised (low granularity) or the most detailed possible (High granularity). E.g. Sales by
customer, sales by region, sales by product or just total sales amount.

- What are the implications of implementing either high or low granularity?


When you implement high granularity within a fact table means that you will have a restricted
amount of data to choose from. When the fact table has too many dimensions, the amount of
data can make the systems slow down by using useless reporting information to answer
business questions. In this way, responding a specific business variable will take longer to
analyse the data, and consequently shifting away from the real value of the information

3
Assignment 1: Data Warehouse Design

gathered. On the other hand, having low granularity means that the information required to
answer a business question is insufficient to drill down through the data, bringing no too much
knowledge to the company. A good example is to put a time dimension by year. This time
dimension limits the analysis to get conclusion per month to identify trends. The best option
would be to have a medium granularity that allows you to perform a good speed while having
a detailed source from the data obtained. In this way, decision making process has multiplies
possibilities to confront the data against other variables.

d. What are the perceived limitations of the traditional star schema model?
The limitations can be summarised in this four main issues:
1. It does not support multiple languages
2. The system performance is reduced because of the use of alphanumeric primary keys.
3. It is not supported for time dependent changes.
4. It is common to find duplication of dimensional data.

- How did SAP’s extended star schema model address these issues?
SAP resolves most of the traditional star schema issues by using Surrogate ID (SID) tables.
This tables incorporates additional information to describe a variety of languages depending
on the location or country where the information is taking from. This SID tables makes the
systems perform faster and more efficient due to the non-alphanumeric structure of the SID
tables. By giving a numerical format to different dimensions, the system can easily find records.
In the same way, duplication of dimensional data is almost none because of the unique number
given by the SID table to a specific dimension. In regards of time dependent changes, SAP
creates a new record to preserve historical changes. This new records are incorporate to the
master data as: Date From and Date to (Easy-learn-bw.blogspot.com.au, 2013). In this way,
you can keep a time-dependent attribute to be able to find records before and after the effective
date of the change.

QUESTION 3.

a. Describe the three methods that can be used to cater for slowly changing
dimensions.
There are three ways a change can be recorded in a system (1keydata.com, 2016):
1. Overwrite the existing record. With this type of change there is no history kept in the
system, so all the old information is lost. For example, change the location of a current
customer when they moved to another city.

4
Assignment 1: Data Warehouse Design

2. Create a new record. This type of change preserves the history of the old data plus the
new information added. Following the previous example, in this case both locations are
kept in the customer dimension adding a new row into the table. The only problem with
this method is that reduces the performance of the ETL process due to the size of the
table.
3. Add a new field to the record. For this method part of the history is preserved by creating
two new columns or fields. One with the current information added, and the other one
with the effective date of that change.

b. Employees with an organisation could potentially be transferred to several


departments over their working history. How are strategies implemented to ensure
an employee is reported to be at the department they are assigned to at a specific
time?
Once an organisation is notified that an employee has been promoted and moved to another
department or has changed location, the IT department is in charge of modifying the
employee’s dimension within the database. By creating a new record including two fields in
this dimension: Date from and date to (effective date of change), companies can deal and
preserve changes throughout their employee’s history.

- What are the consequences of ignoring this issue?


The consequences can go from payroll problems (salary adjustment), redundancy data,
duplicated information, issues to measure key performance information (KPIs) issues and tax
deduction discrepancies among others.

QUESTION 4.

In your own words, define each of the following SAP terms:

(i) Characteristics: Helps to define the master data through specific information related
with a dimension table. Within a dimension you can find characteristics such as: product
No, product description, product category, unit/size which helps to bring information to
measure KPIs.
(ii) InfoAreas: are folders where all the information related with characteristics and KPIs
are stored.
(iii) InfoCatalog: is a subcategory of the infoareas where characteristics and key figures
(KPIs) have independent catalogues with information grouped by specific criteria.

5
Assignment 1: Data Warehouse Design

(iv) Key Figures: refer to the KPIs to measure a specific business question. These
indicators can be classified by amount (currency), quantity (Unit), numeric and time/date.
(v) Master Data: Is a dimension table with related data to describe a particular record
surrounding an infocube (Fact table). This master data is determined by three variables:
attributes, text and hierarchies.
(vi) Surrogate keys: or well known as Surrogate ID tables (SID), makes the systems
perform faster and more efficient due to the numeric structure of these tables. By giving
a numerical format to different dimensions, the system can easily find records instead of
looking for alphanumeric data.
(vii) Dimensions: Although they can be related with master data, this dimensions are not
considered master data. Instead, dimensions tables are based on SID tables linking the
information extracted from the master data. This dimension build the infocube in a SAP
extended star schema model.

QUESTION 5

The following table represents transactional data the needs to be stored into the fact tables
below.

Note: SalesRevenue represents a key performance measure and has been set to aggregate
data.

Date SalesRep Region Product SalesRevenue


21.2.2012 S1 N P1 300
21.2.2012 S1 N P2 200
22.2.2012 S1 N P1 150
23.2.2012 S2 E P1 300
24.2.2012 S1 W P2 250
25.2.2012 S2 E P1 100
26.2.2012 S3 W P1 80
26.2.2012 S1 S P2 150
27.2.2012 S2 E P1 50
28.2.2012 S1 N P1 60
29.2.2012 S1 E P2 30
29.2.2012 S2 E P2 60
29.2.2012 S2 S P1 400

a. For each of the fact table below, show the data that would be transferred and stored
in it after the transfer of the transactional data above.

Fact table 1: Sales Revenue measured by Year, Month, SalesRep, Region and Produc

Year Month SalesRep Region Product SalesRevenue


2012 2 S1 N P1 510
2012 2 S1 N P2 200

6
Assignment 1: Data Warehouse Design

2012 2 S1 S P2 150
2012 2 S1 W P2 250
2012 2 S1 E P2 30
2012 2 S2 S P1 400
2012 2 S2 E P1 450
2012 2 S2 E P2 60
2012 2 S3 W P1 80
Total Sales Revenue 2130

Fact table 2: Sales Revenue measured by Year, Month, Region

Year Month Region SalesRevenue


2012 2 N 710
2012 2 S 550
2012 2 E 540
2012 2 W 330
Total Sales Revenue 2130

b. Explain the term granularity. Which of the two fact tables above has the greater
granularity?

Based on the previous question 2.C, granularity refers to the level of detail or characteristics
to describe the key figures (KPIs) within a fact table. In other words is how far you want to drill
down on your data, the most summarised (low granularity) or the most detailed possible (High
granularity). In the facts table above, the one with greater granularity is the FACT TABLE 1.

QUESTION 6.

a. Students are required to design a standard star scheme to meet the above
requirements.

Sales revenue = QtyPurchased x UnitSalesPrice

7
Assignment 1: Data Warehouse Design

Dimension
Dimension Part
Customer
PartNO (pk) CustomerNO (pk)
PartDescription Name
QtyOnHand Street
UnitPrice Suburb
CategoryNO (fk) Postcode
CategoryName IndustryNO (fk)
IndustryName
Balance

Fact table
Sales
(Revenue)
CustomerNO
PartNO
SalesPersonNO
OrdDate
QtyPurchased
UnitSalePrice

Dimension Dimension
Order Date Sales Person
Year SalesPersonNO (pk)
Month SalesPersonName
Day DepartmentNO (fk)
DepartmentName
RegionNO (fk)
RegionName

b. Students are required to transform their design in part (a) to match SAP’s extended
star schema model.
Sales revenue = QtyPurchased x UnitSalesPrice

Atribute Text Hierarchies Atribute Text Hierarchies


CustomerNO Name CustomerNO SalesPersonNO SalesPersonName SalesPersonNO
IndustryNO (fk) Street IndustryNO DepartmentNO (fk) DepartmentName SalesPersongroup
Suburb RegionNO (fk) RegionName
Postcode
Balance

SalesPerson_SID
Customer_SID InfoCube Sales (Revenue) SalesPersonNO
CustomerNO
DIM-ID DIM-ID
Customer Sales Person
Cutomer_SID SalesPerson_SID

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales


Customer Part SalesPerson Order Date Purchased Price

DIM-ID DIM-ID
Part Order Date
Part_SID OrdDate_SID
Part_SID OrderDate_SID
PartNO OrderDate

Atribute Text Hierarchies Atribute Text Hierarchies


PartNO PartDescription PartNO Year TimePeriod
CategoryNO (fk) CategoryName PartGroup Month
QtyOnHand Day
UnitPrice

8
Assignment 1: Data Warehouse Design

c. A sales person over time can move to different regions and the company would like
to record this fact. Indicate two ways this situation can be modelled in your design.
You may need to redesign your model

1. First method is by adding a new record to the Sales Person master data. Sales person
number changes as well as the region. The previous record is preserved in the data base.
Atribute Text Hierarchies
SalesPersonNO SalesPersonName SalesPersonNO
Atribute Text Hierarchies DepartmentNO (fk) DepartmentName SalesPersongroup
CustomerNO Name CustomerNO RegionNO (fk) RegionName
IndustryNO (fk) Street IndustryNO
Suburb
Postcode
Balance SalesPerson_SID
SalesPersonNO SalesPersonName RegionNO
S1 David Sanabria R1
Customer_SID InfoCube Sales (Revenue) S2 David Sanabria R2
CustomerNO
DIM-ID DIM-ID
Customer Sales Person
Cutomer_SID SalesPerson_SID

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales


Customer Part SalesPerson Order Date Purchased Price

DIM-ID DIM-ID
Part Order Date
Part_SID OrdDate_SID
Part_SID OrderDate_SID
PartNO OrderDate
ID Name Region
S1 Bill Smith R1
S2 Anne Jones R2

ID SID
Atribute Text Hierarchies Atribute Text Hierarchies S1 1
S2 2
PartNO PartDescription PartNO Year TimePeriod
CategoryNO (fk) CategoryName PartGroup Month DIMID Sale SID R
1 1
QtyOnHand Day 2 2

UnitPrice

2. The second method is by adding two fields to the same record in the master data: New
region and the effective date of the transfer.

Atribute Text Hierarchies


SalesPersonNO SalesPersonName SalesPersonNO
Atribute Text Hierarchies DepartmentNO (fk) DepartmentName SalesPersongroup
CustomerNO Name CustomerNO RegionNO (fk) RegionName
IndustryNO (fk) Street IndustryNO
Suburb
Postcode
Balance SalesPerson_SID
OLD NEW Effective
SalesPersonNO SalesPersonName
RegionNO RegionNO Date
S1 David Sanabria R1 R2 1/05/2016
Customer_SID InfoCube Sales (Revenue)
CustomerNO
DIM-ID DIM-ID
Customer Sales Person
Cutomer_SID SalesPerson_SID

DIM-ID DIM-ID DIM-ID DIM-ID Qty Unit Sales


Customer Part SalesPerson Order Date Purchased Price

DIM-ID DIM-ID
Part Order Date
Part_SID OrdDate_SID
Part_SID OrderDate_SID
PartNO OrderDate

Atribute Text Hierarchies Atribute Text Hierarchies


PartNO PartDescription PartNO Year TimePeriod
CategoryNO (fk) CategoryName PartGroup Month
QtyOnHand Day
UnitPrice

9
Assignment 1: Data Warehouse Design

QUESTION 7

Create a star schema diagram that will FIT-WORLD GYM INC. to analyse their revenue. The
fact table will include: for every instance of revenue taken – attributes(s) useful for analysing
revenue.

- The star schema will include all dimensions that can be useful for analysing revenue.
- The only data sources available are shown below.

Answer:

Dimension Dimension
Membership Merchandise
MshpID (pk) MrchID (pk)
MshpName MrchName
MshpPrice MrchPrice

Dimension
One Day Pass Guess Fact table Revenue
Pass
PassID (pk) MshpID
PassDate MrchID
PassCatID (fk) PassID
CatName CorpCustID
Price SalesTranDate
MembID (fk) QtyMshpSold
MshpUnitPrice
QtyMrchPurchased
MrchUnitPrice
QtyCorpCust
CorpAmountCharge
Dimension Dimension
Sales Transaction Special Events
STrID (pk) CorpCustID (pk)
SalesTranDate CorpCustName/Location
MembID (fk) Event Type Code
Event Type Code
Event Date
Amount Charged

Note: The revenue is calculated from the following formula:

Total Revenue = (Qty Membership sold x MembershipUnitPrice) + (Qty Merchandise x


MerchandiseUnitPrice) + (Qty CorporateCustomers x CorporateAmountCharge)

10
Assignment 1: Data Warehouse Design

QUESTION 8

1. This first fact table is analysing the number of transaction per day for each plant, sale
channel (Internet and Warehouse) and product.

Dimension Dimension
Plant Sale Channel
PlantNO (pk) ChannelNO (pk)
PlantName ChannelName
CountryNO (fk)
RegionNO (fk)

Dimension Dimension Fact table


Category Product by Plant/Channel
CategoryNO (pk) ProductNO (pk) PlantNO
CategoryName ProdDescription ChannelNO
UnitPrice ProductNO
CategoryNO (fk) SalesOrderNO
QtyOnHand Day
QtySold
UnitSalesPrice

Dimension Dimension
Order Date Sales Order
Day SalesOrderNO (pk)
Month CustomerNO (fk)
Year ProductNO (fk)
OrderDate
SalesQty
UnitSalesPrice

2. This second fact table is answering the business question of customer sales by
country, sales by region and product sold per those locations.

Dimension Dimension Dimension


Customer Product Category
CustomerNO (pk) ProductNO (pk) CategoryNO (pk)
CustomerName ProdDescription CategoryName
Street UnitPrice
Postcode CategoryNO (fk)
RegionNO (fk) QtyOnHand
CountryNO (fk)

Dimension Fact table


Country by Country/Region
CountryNO (pk) CustomerNO
CountryName ProductNO
CountryNO
RegionNO
OrderDate
QtySold
UnitSalesPrice
Dimension Dimension
Order Date Region
Day RegionNO (pk)
Month RegionName
Year

11
Assignment 1: Data Warehouse Design

REFERENCES

1. Below, P. (2011). QSM. [online] The myth of the single version of the truth. Available at:
http://www.qsm.com/Blog/The%20Myth%20of%20the%20Single%20Version%20of%20t
he%20Truth_BELOW012310.pdf [Accessed 17 Apr. 2016].
2. Easy-learn-bw.blogspot.com.au. (2013). SAP BI, SAP BW: Time dependent attributes.
[online] Available at: http://easy-learn-bw.blogspot.com.au/2013/06/time-dependent-
attributes.html [Accessed 25 Apr. 2016].
3. Guru99.com. (2016). All about classical extended star schema. [online] Available at:
http://www.guru99.com [Accessed 24 Apr. 2016].
4. 1keydata.com. (2016). Type 3 Slowly Changing Dimension. [online] Available at:
https://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-3.html
[Accessed 25 Apr. 2016].
5. Oracle, (2003). Understanding Star Schemas. [online] Gkmc.utah.edu. Available at:
http://gkmc.utah.edu/ebis_class/2003s/Oracle/DMB26/A73318/schemas.htm [Accessed
17 Apr. 2016].
6. Sauter, V. (2002). Decision Support Systems (DSS). [online] Umsl.edu. Available at:
http://www.umsl.edu/~sauterv/analysis/488_f02_papers/dss.html [Accessed 16 Apr.
2016]. [Accessed 17 Aug. 2015].
7. Wikipedia. (2016). OLAP cube. [online] Available at:
https://en.wikipedia.org/wiki/OLAP_cube [Accessed 24 Apr. 2016].

12