Anda di halaman 1dari 10

Data Warehouse Schema Architecture

Schemas in Data Warehouses

A schema is a collection of database objects, including tables, views, indexes, and synonyms.

There is a variety of ways of arranging schema objects in the schema models


designed for data warehousing. One data warehouse schema model is a star
schema.
The foundation of each data warehouse is a relational database built using a dimensional model.
A dimensional model consists of dimension and fact tables and is typically described as star or
snowflake schema.
Data Warehouse environment usually transforms the relational data model into
some special architectures. There are many schema models designed for data
warehousing but the most commonly used are:

- Star schema

- Snowflake schema

- Fact constellation schema

The determination of which schema model should be used for a data warehouse
should be based upon the analysis of project requirements, accessible tools and
project team preferences.

Star schema
What is star schema? The star schema architecture is the simplest data warehouse schema. It is
called a star schema because the diagram resembles a star, with points radiating from a center.
The center of the star consists of fact table and the points of the star are the dimension tables.
Usually the fact tables in a star schema are in third normal form(3NF) whereas dimensional
tables are de-normalized. Despite the fact that the star schema is the simplest architecture, it is
most commonly used nowadays and is recommended by Oracle.

Fact Tables

A fact table typically has two types of columns: foreign keys to dimension tables and measures
those that contain numeric facts. A fact table can contain fact's data on detail or aggregated level.

Dimension Tables

A dimension is a structure usually composed of one or more hierarchies that categorizes data. If a
dimension hasn't got a hierarchies and levels it is called flat dimension or list. The primary keys
of each of the dimension tables are part of the composite primary key of the fact table.
Dimensional attributes help to describe the dimensional value. They are normally descriptive,
textual values. Dimension tables are generally small in size then fact table.
Typical fact tables store data about sales while dimension tables data about geographic
region(markets, cities) , clients, products, times, channels.

The main characteristics of star schema:


Simple structure -> easy to understand schema
Great query effectives -> small number of tables to join
Relatively long time of loading data into dimension tables -> de-normalization, redundancy
data caused that size of the table could be large.
The most commonly used in the data warehouse implementations -> widely supported by a
large number of business intelligence tools

Data Warehousing - Star and Snowflake


Schemas
Page
Discussion

View source

History

See Also: Main_Page - Database Administration - Analysis Services - Data Warehousing


Concepts

The foundation of each data warehouse is a relational database built using a dimensional model.
A dimensional model consists of dimension and fact tables and is typically described as star or
snowflake schema.

Star schema resembles a star; one or more fact tables are surrounded by the dimension tables.
Dimension tables aren't normalized - that means even if you have repeating fields such as name
or category no extra table is added to remove the redundancy. For example, in a car dealership
scenario you might have a product dimension that might look like this:

Product_key
Product_category
Product_subcategory
Product_brand
Product_make
Product_model
Product_year
In a relational system such design would be clearly unacceptable because product category (car,
van, truck) can be repeated for multiple vehicles and so could product brand (Toyota, Ford,
Nissan), product make (Camry, Corolla, Maxima) and model (LE, XLE, SE and so forth). So a
vehicle table in a relational system is likely to have foreign keys relating to vehicle category,
vehicle brand, vehicle make and vehicle model. However in the dimensional star schema model
you simply list out the names of each vehicle attribute.

Star schema also contains the entire dimension hierarchy within a single table. Dimension
hierarchy provides a way of aggregating data from the lowest to highest levels within a
dimension. For example, Camry LE and Camry XLE sales roll up to Camry make, Toyota brand
and cars category. Here is what a star schema diagram could look like:

Notice that each dimension table has a primary key. The fact table has foreign keys to each
dimension table. Although data warehouse does not require creating primary and foreign keys, it
is highly recommended to do so for two reasons:

1. Dimensional models that have primary and foreign keys provide superior performance,
especially for processing Analysis Services cubes.

2. Analysis Services requires creating either physical or logical relationships between fact
and dimension tables. Physical relationships are implemented through primary and
foreign keys. Therefore if the keys exist you save a step when building cubes.

Snowflake schema resembles a snowflake because dimension tables are further normalized or
have parent tables. For example we could extend the product dimension in the dealership
warehouse to have a product_category and product_subcategory tables. Product categories could
include trucks, vans, sport utility vehicles, etc. Product subcategory tables could contain
subcategories such as leisure vehicles, recreational vehicles, luxury vehicles, industrial trucks
and so forth. Here is what the snowflake schema would look like with extended product
dimension:

File:ASDW3 139.gif

Snowflake schema generates more joins than a star schema during cube processing, which
translates into longer queries. Therefore it is normally recommended to choose the star schema
design over the snowflake schema for optimal performance. Snowflake schema does have an
advantage of providing more flexibility, however. For example, if you were working for an auto
parts store chain you might wish to report on car parts (car doors, hoods, engines) as well as
subparts (door knobs, hood covers, timing belts and so forth). In such cases you could have both
part and subpart dimensions, however some attributes of subparts might not apply to parts and
vise versa. For example, you could examine the thread size attribute would apply to a tire but not
for nuts and bolts that go on the tire. If you wish to aggregate your sales by part you will need to
know which subparts should rollup to each part as in the following:

Dim_subpart
subpart_key
subpart_name
subpart_SKU
subpart_size
subpart_weight
subpart_color
part_key

Dim_part
part_key
part_name
part_SKU

With such a design you could create reports that show you a breakdown of your sales by each
type of engine, as well as each part that makes up the engine.

Schema Modeling Techniques


The following topics provide information about schemas in a data warehouse:

Schemas in Data Warehouses


Third Normal Form

Star Schemas
Optimizing Star Queries

Schemas in Data Warehouses

A schema is a collection of database objects, including tables, views, indexes, and synonyms.

There is a variety of ways of arranging schema objects in the schema models designed for data
warehousing. One data warehouse schema model is a star schema. The Sales History sample
schema (the basis for most of the examples in this book) uses a star schema. However, there are
other schema models that are commonly used for data warehouses. The most prevalent of these
schema models is the third normal form (3NF) schema. Additionally, some data warehouse
schemas are neither star schemas nor 3NF schemas, but instead share characteristics of both
schemas; these are referred to as hybrid schema models.

The Oracle9i database is designed to support all data warehouse schemas. Some features may be
specific to one schema model (such as the star transformation feature, described in "Using Star
Transformation", which is specific to star schemas). However, the vast majority of Oracle's data
warehousing features are equally applicable to star schemas, 3NF schemas, and hybrid schemas.
Key data warehousing capabilities such as partitioning (including the rolling window load
technique), parallelism, materialized views, and analytic SQL are implemented in all schema
models.

The determination of which schema model should be used for a data warehouse should be based
upon the requirements and preferences of the data warehouse project team. Comparing the merits
of the alternative schema models is outside of the scope of this book; instead, this chapter will
briefly introduce each schema model and suggest how Oracle can be optimized for those
environments.

Third Normal Form

Although this guide primarily uses star schemas in its examples, you can also use the third
normal form for your data warehouse implementation.

Third normal form modeling is a classical relational-database modeling technique that minimizes
data redundancy through normalization. When compared to a star schema, a 3NF schema
typically has a larger number of tables due to this normalization process. For example, in
Figure 17-1, orders and order items tables contain similar information as sales table in the
star schema in Figure 17-2.

3NF schemas are typically chosen for large data warehouses, especially environments with
significant data-loading requirements that are used to feed data marts and execute long-running
queries.

The main advantages of 3NF schemas are that they:


Provide a neutral schema design, independent of any application or data-
usage considerations
May require less data-transformation than more normalized schemas such as
star schemas

Figure 17-1 presents a graphical representation of a third normal form schema.

Figure 17-1 Third Normal Form Schema

Text description of the illustration dwhsg108.gif

Optimizing Third Normal Form Queries

Queries on 3NF schemas are often very complex and involve a large number of tables. The
performance of joins between large tables is thus a primary consideration when using 3NF
schemas.

One particularly important feature for 3NF schemas is partition-wise joins. The largest tables in a
3NF schema should be partitioned to enable partition-wise joins. The most common partitioning
technique in these environments is composite range-hash partitioning for the largest tables, with
the most-common join key chosen as the hash-partitioning key.

Parallelism is often heavily utilized in 3NF environments, and parallelism should typically be
enabled in these environments.

Star Schemas

The star schema is perhaps the simplest data warehouse schema. It is called a star schema
because the entity-relationship diagram of this schema resembles a star, with points radiating
from a central table. The center of the star consists of a large fact table and the points of the star
are the dimension tables.

A star schema is characterized by one or more very large fact tables that contain the primary
information in the data warehouse, and a number of much smaller dimension tables (or lookup
tables), each of which contains information about the entries for a particular attribute in the fact
table.

A star query is a join between a fact table and a number of dimension tables. Each dimension
table is joined to the fact table using a primary key to foreign key join, but the dimension tables
are not joined to each other. The cost-based optimizer recognizes star queries and generates
efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sh sample schema, the fact
table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id,
time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times,
products, channels, and promotions. The product dimension table, for example, contains
information about each product number that appears in the fact table.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:

Provide a direct and intuitive mapping between the business entities being
analyzed by end users and the schema design.
Provide highly optimized performance for typical star queries.

Are widely supported by a large number of business intelligence tools, which


may anticipate or even require that the data-warehouse schema contain
dimension tables

Star schemas are used for both simple data marts and very large data warehouses.

Figure 17-2 presents a graphical representation of a star schema.

Figure 17-2 Star Schema

Text description of the illustration dwhsg007.gif

Snowflake Schemas

The snowflake schema is a more complex data warehouse model than a star schema, and is a
type of star schema. It is called a snowflake schema because the diagram of the schema
resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data
has been grouped into multiple tables instead of one large table. For example, a product
dimension table in a star schema might be normalized into a products table, a
product_category table, and a product_manufacturer table in a snowflake schema. While
this saves space, it increases the number of dimension tables and requires more foreign key joins.
The result is more complex queries and reduced query performance. Figure 17-3 presents a
graphical representation of a snowflake schema.

Figure 17-3 Snowflake Schema

Text description of the illustration dwhsg008.gif

Note:
Oracle Corporation recommends you choose a star schema over a snowflake
schema unless you have a clear reason not to.

Optimizing Star Queries

You should consider the following when using star queries:

Tuning Star Queries


Using Star Transformation

Tuning Star Queries

To get the best possible performance for star queries, it is important to follow some basic
guidelines:

A bitmap index should be built on each of the foreign key columns of the fact
table or tables.
The initialization parameter STAR_TRANSFORMATION_ENABLED should be set to
true. This enables an important optimizer feature for star-queries. It is set to
false by default for backward-compatibility.

The cost-based optimizer should be used. This does not apply solely to star
schemas: all data warehouses should always use the cost-based optimizer.

When a data warehouse satisfies these conditions, the majority of the star queries running in the
data warehouse will use a query execution strategy known as the star transformation. The star
transformation provides very efficient query performance for star queries.

Using Star Transformation

The star transformation is a powerful optimization technique that relies upon implicitly rewriting
(or transforming) the SQL of the original star query. The end user never needs to know any of the
details about the star transformation. Oracle's cost-based optimizer automatically chooses the star
transformation where appropriate.

The star transformation is a cost-based query transformation aimed at executing star queries
efficiently. Oracle processes a star query using two basic phases. The first phase retrieves exactly
the necessary rows from the fact table (the result set). Because this retrieval utilizes bitmap
indexes, it is very efficient. The second phase joins this result set to the dimension tables. An
example of an end user query is: "What were the sales and profits for the grocery department of
stores in the west and southwest sales districts over the last three quarters?" This is a simple star
query.
Note:

Bitmap indexes are available only if you have purchased the Oracle9i
Enterprise Edition. In Oracle9i Standard Edition, bitmap indexes and star
transformation are not available.

Star Transformation with a Bitmap Index