Anda di halaman 1dari 18

Oracle Warehouse Builder 10g Release 2

Implementing Advanced Data


Warehouse Concepts at Elizabeth
Arden Inc.

October 2006
Note:
This document is for informational purposes. It is not a commitment to deliver
any material, code, or functionality, and should not be relied upon in making
purchasing decisions. The development, release, and timing of any features or
functionality described in this document remains at the sole discretion of Oracle.
This document in any form, software or printed matter, contains proprietary
information that is the exclusive property of Oracle. This document and
information contained herein may not be disclosed, copied, reproduced, or
distributed to anyone outside Oracle without prior written consent of Oracle. This
document is not part of your license agreement nor can it be incorporated into
any contractual agreement with Oracle or its subsidiaries or affiliates.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 2


Oracle Warehouse Builder 10g Release 2

Implementing Advanced Data Warehouse


Concepts at Elizabeth Arden

INTRODUCTION
In many organizations information requirements become more advanced and
require an adaptive architecture to deliver on these advanced requirements in a
timely manner. This latter requirement forces organizations to put more emphasis
on planning ahead and implementing more advanced concepts in production
environments.
The specific solutions we are discussing in this paper are technically elegant and
effective, but need to achieve real business goals to warrant their development
time. For Elizabeth Arden Inc. the goals were concrete and tangible:
· Ensure business users own good as well as bad data, allowing for better
overall information quality and clear information ownership in the
business.
· Empower business users to create their own complex reporting solutions
reducing the involvement of IT developers and reducing the cost to the
business.
· Allow for quick and easy conversion of outside or external data reducing
the time and effort required to integrate these new data sets.
Ultimately, if all of these goals are achieved, Elizabeth Arden achieves better
information quality, faster turn around cycles to add or move new data and gets
better decisions directly from users. The company also offloads some reporting
related efforts from the IT staff, removing bottle-necks from the reporting
process for end-users.
In this paper we discuss these advanced concepts that are useful in any data
warehouse situation:
· In place rejection and realignment of data elements to allow for business
ownership of all data.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 3


· Embedded special time constructs to enable business users to create
complex analysis reports.
· Hierarchy based data conversions to easily incorporate new data, loading
data at a different level from the display level to end users.
We will elaborate on the theory behind the concepts and show the implementation
of the concepts using Warehouse Builder 10g Release 2 as a framework.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 4


IN PLACE REJECTION AND AUTOMATED REALIGNMENT OF DATA
ELEMENTS
The in place rejection concept is based on the fact that business users ultimately
own the data they are viewing in their reports. This simple fact means that
business users are also ultimately responsible for bad data. To allow business users
to see for themselves where bad data is residing it should be allowed into the
warehouse. In place rejection and realignment is a way to achieve this in a
dimensional model.

Concepts
In a dimensional model, the fact table is linked to the dimensions based on some
key value. For each fact record, to load it into the fact table, the dimension keys
must be a match, otherwise the constraints are enforced and the record is not
loaded. To load facts, therefore the dimensions must hold the exact data elements
that the new fact records expect.
If a fact record contains bad data (for at least one of the dimensions it
references), then the normal process in a relational database is to reject the
record. Rejecting the record, however does not expose the records to the business
user. In some cases it might end up in a pile of incorrect data that someone
should go through at some point in time. Experience learns that that task is hardly
ever completed in a timely manner.
Hence the concept of in place rejection is introduced. By rejecting the records in
place within the fact table the business users see the misaligned data at each
report, and are consciously aware the bad data is there and how much of it is lying
around. They also see the business value (in dollars, units or whatever the
measurement type is) of misaligned data.

In place rejection

To achieve the goal of placing the bad data in the actual reporting systems, in
place rejection uses generated dimension members to capture the bad data. For
each dimension member at the linked level (typically this is the lowest level of the
dimension) a value is introduced that is then used as a garbage collector.
All data that does not match any of the members is assigned to this garbage
collector value. The garbage collector value can either be a static value like
rejected, or a dynamic value like for example REJ_originalvalue. Alternatively you
can log the entire input row and the new key value in a table for later processing
of reloads.
In all cases the lowest level rolls up to a distinct level in the top of the hierarchy
and the total gets the in place rejected records so the total of a star always rolls up
correctly.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 5


Realignment

In some cases in place rejection is caused not by bad data, but by bad timing. In
that case the realignment happens without any interference directly upon the next
load of the facts and dimensions. When bad data is truly the cause of
misalignment, realignment can only happen if the data is fixed and then reloaded.
Regardless of the fix, the system must be built in such a way that the realignment
is detected and the data that was labeled as incorrect is realigned in the database
to be correct. It is crucial that the system is not double counting due to a previous
error.

A look at Warehouse Builder


These concepts are automated in the ETL processes in Warehouse Builder 10g
Release 2. Here we describe some of the implementation details.
The first pass through the data is loading the dimensions in a regular mapping.
For this paper that mapping is not interesting as it simply loads all dimension
records currently present in the source.
The fact mapping is the first mapping that will encounter any issues with the load
if a dimension record is not (yet) loaded. That mapping utilizes a plain Warehouse
Builder lookup, but returns a default link value if the dimension record is not
found.

Figure 1 Using no-match rows in key-lookup to always load a fact

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 6


This way, all facts are loaded, with a default dimension ID, which is as is shown in
Figure 1 set to 4. Because of the lookup functionality that comes with Warehouse
Builder this is s straightforward step in the ETL process.
The next step is less trivial, and that is because the keys need to be realigned when
a dimension record is found. To match and recode a source table is used that
contains the original references, from there a lookup is done to the final
dimension to establish if there is a new record that requires recoding.

Figure 2 Lookup a dimension value for recoding

If there is no recoding (e.g. the record is still not found in the dimension) the
default key is loaded into the recoding table again.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 7


Figure 3 If no record is found, the ID remains set to the default value

Because the recoding table has both the identifying information for the entire fact
record, all of its logical key references are stored as well as the surrogate key for
the recoded dimension, facts now correctly roll up to their new dimension link.

Benefits
By applying this strategy Elizabeth Arden achieves some of the following benefits
in either the infrastructure provided or in the business environment:
· No special structure needed to store error data simplifies the data model.
· All data is visible in a single environment rather than in a special area for
erroneous data.
· Balancing data to the source is more efficient.
· Simplicity in dependencies between ETL jobs makes the loading process
simpler and less prone to delays.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 8


EMBEDDED SPECIAL TIME CONSTRUCTS
While time is a simple concept in our every day life, it actually can complicate
reporting on a system. In most cases the data warehouse or reporting database
takes only a base time period as its storage point. In other words, sales orders are
loaded when they are booked, and that is the time of the transaction.
Users on the reporting side however often have questions like, how does this
months sales compare to last month, or to last year for the same month. When
the reporting system is only concerned with the base time, the user now has to
create the calculations to get the desired answers.
Embedded special time constructs are moving the calculation of these desired
time constructs down into the reporting system. Essentially the data model will
embed the special time constructs for the user to choose from. By embedding it
in the base data layer of the reporting system, heavy lifting to run the calculations
is done when the data is refreshed, not when the user needs an answer.

Concepts
The main important concept is pre-calculation, where a reporting entity is pre-built
for the users. A data warehouse in itself is a good example of implementing this
concept. Instead of heaving each user join the tables to get the sales for all
customers over the last month out of the ERP system, it is pre-built.
Embedded special time goes one step further. It creates more reporting constructs
up-front giving easy access to hard to get data.
To pre-calculate many options can be looked at. On-Line Analytical Processing
(OLAP) is one storage technique that heavily uses pre-built information. In this
section however we cover a strictly relational implementation, using two diverse
techniques to pre-build the special time constructs:
· Analytic SQL
· A bitmap table linking fact to dimensions

Using Analytic SQL

As with OLAP data, if you start to create facts that only exist in certain cells for
the dimensions, sparsity is the result. The data is sparse in that for certain
dimension combinations no fact is stored. The problem with sparse data is that it
increases storage and typically can decrease performance.
To avoid this sparsity in the query objects, an extra join is used to condense the
data set, and then analytic SQL functions enable the data to be shown to users as
embedded special time constructs.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 9


The other advantage (this is part of why we do this) is that the heavy lifting is not
done by users in client calculations, but by the database. This increases
performance and alleviates the burden of creating the calculations.

Using a Bitmap Table

A bitmap table is an intermediate table that links the fact table to the time
dimension allowing the facts to be looked at for different time constructs by pre-
joining the data set.
A bitmap table would look something like this (note this is not a complete table
but should convey the idea):
from_period to_period CP LY_CP LU_R2 LY_R3 QTD_CP LY_QTD_CP
200701 200701 1 0 0 0 1 0

200601 200701 0 1 1 1 0 0
200512 200701 0 0 1 1 0 0
200511 200701 0 0 0 1 0 0

Table 1 Bitmap Table Example

The columns in the bitmap table represent the special time constructs we want to
work with. The SQL will join on the rows to determine when there are values to
be shown leveraging the bitmap in the table.

A look at Warehouse Builder


These concepts are added to Warehouse Builder in the schema design, and the
queries are created in views again stored in the Warehouse Builder repository.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 10


Table 2 A densified time view in the data object editor

Focus on Analytic SQL

As we said, there are 2 important concepts here. The first is to deal with the
analytic SQL itself. The below fragment shows a particular snippet from the select
clause of the statement, note the analytic SQL constructs.
Select
<snip>

, NVL ( SUM ( fact.quantity_on_hand )


OVER ( PARTITION BY fact.company_id,
fact.item_sku_id,
v_time.fiscal_period_cd
ORDER BY v_time.fiscal_period_cd ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
,
0 ) ptd_quantity_on_hand
<snip>

The second part is the actual join used to densify the data. It literally squeezes the
air of the result set and gives us better performance in retrieving the results. That
join is shown in the snippet below.
Select
<snip>
FROM Hybrid_Eaden.c_inventory_rpt fact
PARTITION BY (fact.company_id,fact.item_sku_id)
RIGHT OUTER JOIN Hybrid_Eaden.d_ea_month v_time
ON ( fact.ea_month_id = v_time.ea_month_id )

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 11


<snip>

Focus on the Bitmap Table

The bitmap table is linked from the fact table to the dimension. The bitmap
(either a 1 or a 0) is then added into the select clause of the query. If we look at
our bitmap table again, and then at the usage in the select clause we can
understand where data is shown (there is a 1) or not shown in the results (there is
a 0) allowing us to create a nice reporting solution.
, (fact.sales_units*time.ly_qtd_cp)
, (fact.sales_units*time.ly_tot_qtr_cp)
, (fact.sales_units*time.ly_tot_hyr_cp)
, (fact.sales_units*time.ly_tot_yr_cp)
, (fact.sales_units*time.ly_r3)
, (fact.sales_units*time.ly_r6)
, (fact.sales_units*time.ly_r12)
FROM c_global_sales_affiliate fact
, d_ea_month_matrix time
WHERE fact.ea_month_id = time.from_ea_month_id
The bitmap is simply joined to the fact table but the measures in the fact are
multiplied with the value in the bitmap.
This allows each measure to either hold data, or to hold a zero value. In this case
zero means that this cell is not applicable for the measure. Rolling the totals up
will then just add a zero value and not influence any totals.
In the case where the bitmap has a one, the value is shown in the measure and is
counted in rollups and in reporting. Users do not have to figure this out, they
simply see either a value or a zero value.

Advantages and disadvantages


Both methods have their advantages and disadvantages. At Elizabeth Arden the
following points came out of the woodworks.
Analytic SQL advantage has the advantage that it outperforms the traditional
approach of writing complex sub-queries. It has the disadvantage that
performance can be impacted by because query predicates are not always passed
to the underlying view.
The main disadvantage of the bitmap method is that the query pulls in more data
than is strictly required. This of course will impact performance as well.
Both methods however do achieve their main goal of creating a simpler reporting
environment for the end user and allows the IT professionals to focus on
developing better systems.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 12


Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 13
DIMENSION BASED DATA CONVERSION
In today’s world a successful project faces some interesting challenges. Mostly the
addition of new data from either new subsidiaries, or acquisitions, or other similar
systems from what the system started with. Adding data to an existing system may
sound simple, simply load the new (or old historical) data into the database and
you are done. If it was just that simple, this section would not be written.
The problem, especially with adding similar but different systems, is that in many
cases data needs to be recoded or rolled up in different ways. When loading the
new data it needs to conform to the current hierarchies and strategies. To achieve
this can be a daunting task.

Concepts
The dimension-based conversion is based on a simple but elegant concept. Rather
than changing data structures or original data loaded into a dimension to convert
data, the rollup in a hierarchy manages the changes.
This concept has a number of very compelling benefits. One is that a change is
not permanent or destructive to the original data. The original record and linkage
is still in place, it just rolls up differently. The second benefit is that these typically
can be done quickly with much less impact on the system than a normal update.

Load level vs. Reporting levels

The two benefits we mentioned earlier are achieved by adding a level to each
dimension (or better to a hierarchy) that acts as the lowest level. This lowest level
is called a load level. In other words data for the fact is always loaded at this level
in the hierarchy. The load level however is never exposed to users of the
reporting system. These users only see the so-called reporting levels.

Realigning vs. Overwriting

In many cases conversion methods are destructive. For example fact data might
be stored at a higher grain, and the new data needs to be aggregated. Doing this in
a normal load will convert the data once. If that turned out to be not completely
accurate, then there is no easy fix. If the source is lost by that time, there is no fix
at all. Realigning simply keeps the new records as is in the grain they are in
originally. Overwriting is not what we want to do to our data.

An example
If we look at a table structure and data, this concept will become a lot clearer and
the benefits become visible instantly.
Let’s say we have loaded three products into the dimension table (in this example
we only show a few dimension columns). The result with the booked (which is
what the load level is called in this example) and reporting levels will look like this:

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 14


booked_products reported_products source_key
A A STANDARD
C C STANDARD

D D STANDARD

Table 3 The dimension table with booked and reported levels

Now lets say we need to merge from a new vendor into this dimension, and we
do have the following mapping table that shows how a product from the new
vendor can be recoded to the Elizabeth Arden product codes (and therefore
hierarchies):
Vendor XYZ Product Elizabeth Arden Product

123 A
654 A
234 D
A C

Table 4 Recoding or matching table

With the levels as shown in Table 3 we can now quickly load the dimension and
see the following as a result after loading and recoding:
booked_products reported_products source_key
A A STANDARD
C C STANDARD
D D STANDARD
123 A VENDORXYZ
654 A VENDORXYZ

234 D VENDORXYZ
A C VENDORXYZ

Table 5 Adding new products is simple and retains history

We can now see where the big benefits come in. The table in Table 5 shows both
the original data and it shows the Elizabeth Arden product to the user. The
revenue for this dimension is now directly available in the relevant business units.
The other advantage is when a recoding is required, where for example product
234 should be linked to Elizabeth Arden product A, the dimension can be
updated instead of all related facts.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 15


More to come
The concept of building a dimension with two separate levels playing two separate
roles, one for loading one for reporting is currently not in production at Elizabeth
Arden, but may be implemented in future.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 16


SUMMARY AND CONCLUSION
Inventive use of an ETL tool, forward looking data modeling and embedding
smarts in data structures allows Elizabeth Arden to achieve two goals for their
business.
First, the IT staff can focus on their core capabilities of building and maintaining
systems, not on creating reports for end users. Second, business users get better
information quicker allowing these users to again do their job, run the business.
With Warehouse Builder Elizabeth Arden has found a tool that it can use to
implement these concepts to get the maximum benefits for the company.

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 17


Implementing Advanced Data Warehouse Concepts at Elizabeth Arden
October 2006
Author: Jean-Pierre Dijcks – Oracle and Tim Seyfried – Elizabeth Arden

Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.

Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
www.oracle.com

Copyright © 2006, Oracle. All rights reserved.


This document is provided for information purposes only
and the contents hereof are subject to change without notice.
This document is not warranted to be error-free, nor subject to
any other warranties or conditions, whether expressed orally
or implied in law, including implied warranties and conditions of
merchantability or fitness for a particular purpose. We specifically
disclaim any liability with respect to this document and no
contractual obligations are formed either directly or indirectly
by this document. This document may not be reproduced or
transmitted in any form or by any means, electronic or mechanical,
for any purpose, without our prior written permission.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective owners.

Anda mungkin juga menyukai