Anda di halaman 1dari 19

Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate

Technical White Paper Mathias Zarick, Karol Hajdu Senior Consultants March-2011

While looking for a solution for near real-time data warehouse (DWH), the efficiency and operational stability (reliability) is probably one of your primary technical concerns. This applies both to the data transformation tasks and to the extraction & loading of staging area as well. This article covers the challenges around the staging area. It presents a technical solution which aims to solve both important concerns: the refresh process of staging area is efficient and easy to operate. This solution is based on Oracle Data Guard and transportable tablespaces.

Contents
1. 1.1 1.2 2. 2.1 2.2 3. 3.1 3.2 3.3 4. 4.1 4.2 4.3 5. The role of a Staging Area in Data Warehouse ................................................................. 3 The Challenge called short latency ...................................................................................................... 3 Different solutions having different advantages ................................................................................... 3 Solution with Data Guard the management perspective ............................................... 5 Benefits for Data Warehouses .................................................................................................................... 5 Which types of Data Warehouses will benefit most? ........................................................................ 5 Solution with Oracle Data Guard the technical insight ................................................. 6 How it works? .................................................................................................................................................. 6 The Key Advantages ...................................................................................................................................... 9 Technical Prerequisites.............................................................................................................................. 10 Take a Tour on Real-Life Example ................................................................................ 12 Real-life example ......................................................................................................................................... 12 Setup and Configuration ........................................................................................................................... 13 Operation ....................................................................................................................................................... 14 Solution extension: If data availability for operational reporting matters ...................... 17

info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 2 / 19

1.

The role of a Staging Area in Data Warehouse

In data warehouse architectures, there are some common good practices concerning the staging area: 1. Create a staging area. After being extracted from source systems, the data is loaded into the staging area. The staging area serves as the input for transformation processes. 2. During the extraction and load into the staging area, only minimal data transformations are done: the tables in the staging area have the same structure as the corresponding tables in the source system. This makes the ETL architecture much more transparent. Based on the staging areas content, the transformation and integration processes will produce: - snapshots of data, serving as input for DWHs versioning - sets of change events (transactions) to be loaded into the DWH

1.1

The Challenge called short latency

In many enterprises, the Data Warehouse is the place where operative data originating from different systems are coming together and are integrated with analytical or dispositive data. Step-by-step, many business users discovered the value of integrated data. They use the data stored in the Data Warehouse to create reporting or analytical applications. As the markets in many lines of business get more and more volatile, the business users are not willing to wait several days or hours for the latest figures. They require a shortened latency of the Data Warehouse: the need for near real-time data warehouse was born. Integration tasks involve both hardware resources and time. Hence, the Data Warehouse architects faced a new challenge: to find a trade-off between get more speed (data latency) and provide integrated and cleansed data. Some of them decided to introduce additional redundancy (by creating an Operational Data Store, having short latency, but less integration). Some of them decided to provide short latency only for very narrow and well specified content: they speak about real-time data warehouse content, rather than a real-time data warehouse. Regardless on which approach the Data Warehouse architect has chosen, he or she needs to have a Staging Area with short latency. This is the subject covered by this white paper.

1.2

Different solutions having different advantages

There are several technical approaches how data extraction and the loading of a staging area can be implemented. The technical implementations differ basically in the following characteristics: - transferred data volumes required to refresh the staging area
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 3 / 19

degree of completeness for changes to be captured performance impact on the source system (additional resource consumption) impact on data and service availability of source system total costs of ownership: o licensing costs and development efforts o operational complexity (efforts, reliability)
Data Volumes to Transfer Degree of Completeness Performance Impact on Source Availability Impact on Source Operational Complexity

Concept / Implementation

Full-Extraction Marker-Based Extraction Journal-Based Extraction Oracle Streams Oracle Golden Gate

/ /

/ /

Table 1: Simple overview of refresh solutions for DWHs staging area

Depending on database technology of the source system, some concepts can be excluded right from the start, because they have very specific prerequisites about supported technologies. For more details about the refresh solutions for Staging Area, please refer to the book Data Warehousing mit Oracle Business Intelligence in der Praxis [3], Chapter 3.4. If the source system is Oracle, there is yet another technical solution to extract the data from source systems and load it into staging area. This approach uses Oracle Data Guard, flashback and transportable tablespaces. This solution has the same advantages common for any other replication techniques like Oracle Streams or Oracle GoldenGate: - small data volumes to be transferred - low impact on source systems performance & availability1 - all types of changes on both data & structures are captured and transferred However, there is one important difference: This new solution has significantly lower operational complexity than Oracle Streams or Oracle GoldenGate!
Concept / Implementation Data Volumes to Transfer Degree of Completeness Performance Impact on Source Availability Impact on Source Operational Complexity

New Solution using Data Guard and transportable tablespaces

Table 2: New solution has significantly lower operational complexity than Streams or GoldenGate

This white paper explains the concepts and provides the most important implementation details presented in form of a real-life example.
1

This solution has even less impact on source database than Oracle Streams or Oracle GoldenGate. Reason: the supplemental logging is not required here.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 4 / 19

2. Solution with Data Guard the management perspective


2.1 Benefits for Data Warehouses

Our experience shows that the solution described in this paper brings the following benefits for Data Warehousing: Benefit Short latency of data stored in DWH or ODS (to near real-time) Shorter time-to-market for new ETL functionality How is this achieved? The refresh process of Staging Area and/or Operational Data Store (ODS) is very efficient. It consumes small amount of HW resources. It terminates in short elapsed time. Solution enables that Staging Area contains full set of data (not only the changed records). This makes the ETL application more transparent. Introducing changes in ETL applications is then less complex. While refreshing the tables in Staging Area, the operational complexity is delegated to standard and reliable Oracle products and features. These features are easy to operate.

Easy and stable operation

2.2

Which types of Data Warehouses will benefit most?

Extraction and sourcing from dedicated online transaction applications which are used to manage complex relationships between customers, suppliers, accounts or delivery components (applications like CRM or SCM2) can be very hard. The underlying database schema of these applications is related with complex data models3. Companies using dedicated CRM or SCM applications often have to manage the life cycle of several millions of individual subjects (like customers, suppliers, contracts, product components, stock keeping units, policies etc). A Staging Area or even an Operational Data Store (ODS) if using the solution described in this paper, takes most benefits if: - The source system has huge data volume with complex relationships, having relatively small rate of data changes. - There are reports with short latency requirements. The Staging Area needs to capture the changes made in online applications with a very short latency.

2 3

Supply Chain Management a lot of relationships between the tables


info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 5 / 19

3. Solution with Oracle Data Guard the technical insight


3.1 How it works?

The solution presented in this article is based on Oracle Data Guard. Data Guard technology maintains standby databases which are copies of primary databases. A Data Guards standby database can also be used for refreshing the staging area in a data warehouse. The main idea is based on the Data Guards ability to open a physical standby database temporarily read-write and the ability to rewind it back to the time when it was opened. This is achieved by using Oracle guaranteed restore point and flashback technology. How can this be used for refreshing a staging area? Lets explain it using the Figure 1. On the data warehouse machine (host DWH), a physical standby for the database OLTP will be configured. The primary database of OLTP is on host OLTP. This setup leads to the following situation: Using the Data Guard functionality, any change done on the primary database will be performed on the standby database as well.

host OLTP
Primary Site primary database OLTP

host DWH
Standby Site physical standby database OLTP database DWH

tablespace CRM

tablespace CRM

OLTP_SITE1

Redo Transport Online Redo Logs

OLTP_SITE2

Standby Redo Logs

datafile crm01OLTP.dbf
Archived Redo Logs Archived Redo Logs

Staging Area CORE DWH

Figure 1: On the DWH machine, a physical standby database of OLTP is configured with Data Guard

info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 6 / 19

Reading from the Staging Area: As soon as an ETL process inside the DWH database needs to read the content out of the staging area, the following action will be taken: - Recovery process on the standby is paused. - Physical standby database is converted to a snapshot standby database. This opens the standby database read write. - Using the transportable tablespaces feature, the tablespace CRM of the snapshot standby database is plugged into the database DWH: o The tablespace CRM in the snapshot standby database is set to read only mode. o The metadata (definition of tables, indexes, etc.) of this tablespace is transferred with data pump from the snapshot standby database to the DWH database4. - Datafile crm01OLTP.dbf is now part of both databases (snapshot standby database OLTP and database DWH). In both databases the tablespace is in read only mode. - The ETL process can read the data out of the staging area.
host OLTP
Primary Site primary database OLTP

host DWH
Standby Site physical / snapshot standby database OLTP database DWH

tablespace CRM

tablespace CRM

tablespace CRM

read only access


OLTP_SITE1

read only access

Redo Transport Online Redo Logs

OLTP_SITE2

Standby Redo Logs

datafile crm01OLTP.dbf
Archived Redo Logs Archived Redo Logs

Staging Area CORE DWH

Figure 2: On the DWH machine, datafile crm01OLTP.dbf is part of both databases (read-only)

For convenient handling of this transfer with data pump a database link can be used.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 7 / 19

Refreshing the Content in the Staging Area: As far as there is a need to read a more current content, that means there is a need to refresh the CRM part of the staging area, the following action will be taken: - The plugged-in tablespace CRM is dropped from the DWH database. - The snapshot standby database is converted back to a physical standby database. - This resumes the recovery process of all its datafiles, including those of the tablespace CRM.
host OLTP
Primary Site primary database OLTP

host DWH
Standby Site physical / snapshot standby database OLTP database DWH

tablespace CRM

tablespace CRM

tablespace CRM

recovery
OLTP_SITE1

dropped

Redo Transport Online Redo Logs

OLTP_SITE2

Standby Redo Logs

datafile crm01OLTP.dbf
Archived Redo Logs Archived Redo Logs

Staging Area CORE DWH

Figure 3: Tablespace CRM is dropped from DWH database; standby database is converted back to physical standby

info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 8 / 19

3.2

The Key Advantages

This solution has the following Key Advantages: Staging area contains the full set of data. No additional workload on the host OLTP. Datafiles with full set of data are neither transferred nor copied. o Volume of data transferred between the OLTP and DWH is determined merely by the volume of data changes (size of archived redo logs). Elapsed time of refresh process of the staging area this represents the refresh of the standby database - does not include the elapsed time to copy the archived redo logs from host OLTP to DWH: o The standby site is able to receive logs from the primary database, in both the physical standby mode and in the snapshot standby mode. In the snapshot standby mode, the logs are queued and not applied. o Since the log transport to the standby site is running all the time, as the recovery process resumes, the outstanding archived redo log files are already registered and available for the recovery5 of the physical standby database. Elapsed time for refresh process of staging area does not depend on tablespace size but only on the volume of data changes since the last refresh. Once configured, both the operation of physical standby databases and the operation of transportable tablespaces are easy to handle and maintain. Neither remote queries nor distributed joins are used. On the DWH database the access methods to the data residing in the transported tablespace(s) can be adjusted as follows: o estimation of additional statistics like histograms o manipulation of statistics o creation of additional data structures like indexes or materialized views

Considering the overhead produced on the source system and the workload produced on the DWH machine, the solution presented in this article is the most efficient one. - Only the redo logs, and no additional structures, are used - it works on the level of changes on data blocks and not on the level of SQL statements

Transported redo logs are applied in physical standby mode only.


info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 9 / 19

In case the refresh of the staging area is the only purpose of the standby database on the DWH machine, the elapsed time for the refresh process can be minimized by narrowing the scope of the recovery process on the standby database to only those tablespaces of OLTP database, which need to be read by the ETL application. Usually, the ETL processes in DWH require other index types than an OLTP application. If indexes of an OLTP schema reside in a separate tablespace, excluding them can boost the recovery process. Exclusion of irrelevant tablespaces can be easily achieved by offlining and deleting their datafiles on the standby database6. The standby database on the DWH machine can be configured to serve two purposes at the same time: both for refresh of staging area and for disaster protection of OLTP database. While considering this approach, be aware of the following impacts: - A standby database with offline datafiles cannot be used for disaster protection. - If MaxAvailability or MaxProtection is considered then the availability or the workload on the DWH machine can impact the availability or the performance of the OLTP database.

3.3

Technical Prerequisites

There are some technical prerequisites which have to be fulfilled, in order to be able to use the solution described. These prerequisites can be grouped in the following categories: - Identical database character set - Self-contained tablespace sets - Required Oracle database releases - Required Oracle licenses 3.3.1 Identical Database Character Set In order to use transportable tablespaces the database OLTP and the database DWH must have identical database character set and identical national character set. 3.3.2 Self-contained Tablespace Sets

In order to be able to transport a set of tablespaces it needs to be self contained. This means that you cannot transport a set of tablespaces which contain objects with other dependent objects such as materialized views, table partitions, etc. as long as you transfer all those objects together in one set7.

6 7

Tablespaces that are needed for opening the database like system, sysaux and undo cannot be excluded. Segmentless objects like sequences, views, pl/sql packages are not transferred with transportable tablespaces. Normally you dont need to transfer them into Staging Area anyway.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 10 / 19

3.3.3

Required Oracle Database Release

The OLTP database needs to be operated with Oracle Database release 10g or higher. Oracle 11g is recommended as the snapshot standby database feature is available as of this release. If using Oracle 10g it would be necessary to emulate this functionality manually by creating a guaranteed restore point on the standby database before opening it read write. The following limitations have also to be considered when running with Oracle 10g: - There is no out-of-the-box handling with Data Guard for this functionality. You will need to develop a piece of code but this is quite straight forward. - The redo transport between primary and standby is stopped during the period when the standby is open read write8. In order to use transportable tablespaces in this context, the DWH database needs to be at same or higher release as the OLTP database. 3.3.4 Required Oracle Licenses

This solution requires Oracle Enterprise Edition licence both for the OLTP host and for the DWH host. All required features like Data Guard, transportable tablespaces and snapshot standby database are included in the Enterprise Edition license. No additional option is required for this solution - neither the Active Data Guard9 option nor the Partitioning option.

As mentioned before with a snapshot standby database as of 11g the log transport stays active all the time. Active Data Guard is a new extra licensable option with 11g which includes real-time query and fast incremental backup. None of these features is required by the described solution.
9

info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 11 / 19

4. Take a Tour on Real-Life Example


To demonstrate our approach on a representative sample we will use an excerpt from the database schema of CRM application called Siebel. We took Siebel to improve the readability of this example. Siebel is a widely used CRM application owned by Oracle and hence there is a higher chance that ETL developers are familiar with the data model behind it. It is important to understand that the described solution works with any other system or application, even a non-standard in-house developed SW application, too10. We took the Siebel tables S_CONTACT, S_ORG_EXT and S_ASSET as the representatives for a set of approximately 15 Siebel tables having complex relationships and high cardinality.

4.1

Real-life example

Consider the following common Data Warehouse situation: Transformation processes have to read the content out from Siebel tables and transform it into a new entity, lets call it Customer Subscription (refer to Figure 3).

Figure 3: Transformation process reads Siebel tables and transform the data into new entity Customer Subscription

The Data Warehouse has to store not only the latest status of the Customer Subscriptions, but also all the historical values. The ETL has to compare the new snapshot of Customer Subscriptions with the latest one and in case of changes create new versions which will keep track of the fragmented history. This concept is known as versioning -refer to Figure 4.

10

as long as the data is stored in Oracle RDBMS


info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 12 / 19

Historize Subscription Derive Customer Subscription Latest Snapshot New rows Updated rows DELTA Deleted rows Highest Version from History S_ASSET 2.5 mio rows S_ORG_EXT 0.5 mio rows

- create new version - close version

C_CUST_SUBSCRIPTION

S_CONTACT 0.5 mio rows

tablespace CRM
Staging Area CORE DWH

database DWH

Figure 4: ETL compares new snapshot of Customer Subscriptions with the latest one and in case of changes creates new versions which keep track of the fragmented history

Consider the following design decisions of a Data Warehouse architect: - Due to the many inner joins and filters inside the query, the Staging Area needs to hold the full set of data. - transferring millions of rows from source system to Staging Area every night is not an option - in the source system, no reliable row-markers or journals exist or can be introduced - architect decided to use the solution described in this white paper Because of the high cardinality of the data set (several millions of rows) good scalability of the underlying database11 is assumed. In the next sections we will present the most important steps to build and operate this solution.

4.2

Setup and Configuration

The Oracle Data Guard has been setup as described above in chapter 3. On both the OLTP database and the DWH database, we used Oracle Database 11.2.0.2.0. We created a Data Guard Broker configuration. We left the protection mode on Maximum Performance (default) and set the log transport to asynchronous.

11

including the physical data model of CORE DWH


info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 13 / 19

To enforce logging on database OLTP, we issued the following statement:


ALTER DATABASE FORCE LOGGING;

This causes every12 attempt of an unrecoverable nologging operation to be logged anyway. 4.2.1 Create Role with Common Name in Both Databases In both the database DWH and the database OLTP, the role dwh_sa_crm_role was created:
CREATE ROLE dwh_sa_crm_role;

Grant the SELECT privilege on Siebel tables to this role in the OLTP database.
GRANT SELECT ON s_contact TO dwh_sa_crm_role; GRANT SELECT ON s_org_ext TO dwh_sa_crm_role; GRANT SELECT ON s_asset TO dwh_sa_crm_role;

You will also need to create the owner of the transported tables on the DWH database:
CREATE USER crm IDENTIFIED BY thisIsASecretPassword;

Neither a create session nor a create table privilege is necessary for this user.

4.3

Operation

Lets take a look on operation of this solution. From the point of view of the CRM data in the staging area, there are two main operational states: - Snapshot of latest CRM data is available in staging area (Status A) - Refresh of CRM data in staging area is in progress (Status B)

OLTP

CRM users are changing the operative data (24/7)

DWH

A: Snapshot of latest CRM data in Staging Area is available for read B: Refresh of CRM data in Staging Area time

Figure 5: Two main operational states for CRM data in the Staging Area

Most of the time, the CRM data in the staging area is available for read (Status A). Sometimes you will need to refresh the data in the staging area: during this period, the data is not available (Status B).
12

Like for any other change in database instance parameters: an impact analysis is required before making this change.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 14 / 19

Transitions between these two operational states are usually triggered by one of the following two events - ETL processes need more current CRM data (A to B) o This event triggers the start of refresh process o The goal of the refresh process is to achieve a given (defined) point in time of the snapshot - ETL processes need to read CRM data again (B to A) o As soon as the CRM data in physical standby is current enough the refresh process will be terminated and transition to the Status A will be taken o This event triggers the immediate termination of the refresh process and causes transition to status A. In the next paragraphs we will describe: - the actions related with the termination of the refresh process and - the actions related with the start of the refresh process 4.3.1 Termination of Refresh Process

As long as the refresh process is in progress, the CRM data in the staging area is not available. The datafiles of the tablespace CRM13 on the host DWH are currently exclusively assigned to the physical standby database of OLTP called OLTP_SITE2 for recovery. In order to terminate the refresh process the following sequence of actions is taken: Firstly the physical standby is converted to a snapshot standby database. This is performed as follows:
DGMGRL> connect sys@OLTP_SITE2 Password: Connected. DGMGRL> convert database 'OLTP_SITE2' to snapshot standby Converting database "OLTP_SITE2" to a Snapshot Standby database, please wait... Database "OLTP_SITE2" converted successfully

Secondly the tablespace is set to read only and plugged into the DWH database with data pump via a database link from DWH to the snapshot standby database.
SQL> alter tablespace crm read only; # impdp system@DWH logfile=imp_crm.log network_link=OLTP_SNAP transport_tablespaces=CRM transport_datafiles=d:\oradata\oltp\crm01oltp.dbf14

13 14

Of course this concept can be extended to transfer multiple tablespaces. If running 11.2.0.2 due to Oracle Bug 10185688 it is required that either XDB is loaded into source Database or related patch is applied.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 15 / 19

In order to transport the metadata you may also use other alternatives like: - export the metadata with data pump to a dump file and import from that dump file instead of using a database link - export / import with classical exp/imp15 - initiate data pump directly with PL/SQL, see [2] for details When transferring the metadata you can also decide whether to include or exclude certain tables. You can also choose whether to import indexes, object privileges, table triggers and table constraints. As the last step a deterministic function is created in DWH database. Return value of this function reflects the timestamp of the CRM data. We used the following PL/SQL code:
declare sql_text varchar2(1000); v_timestamp varchar2(20); begin select to_char(timestamp,'DD.MM.YYYY HH24:MI:SS') into v_timestamp from (select timestamp from gv$recovery_progress@OLTP where item = 'Last Applied Redo' order by start_time desc ) where rownum < 2 ; dbms_output.put_line('timestamp is ' || v_timestamp); sql_text := 'create or replace function crm.SA_CRM_SNAPSHOT_TIMESTAMP return date deterministic is ts date; begin select to_date ('''|| v_timestamp ||''', ''DD.MM.YYYY HH24:MI:SS'') into ts from dual; return ts; end;'; execute immediate sql_text; execute immediate 'GRANT EXECUTE ON crm.SA_CRM_SNAPSHOT_TIMESTAMP to DWH_SA_CRM_ROLE' ; end; / Listing 1: In the DWH database, this creates a function which returns the timestamp for data in the CRM tablespace

This function is used by ETL processes during the versioning operation (Figure 4): it is used to build the value for VALID_FROM attributes of new versions and for VALID_TO of versions to be closed.

15

Deprecated with 11g but worked in our case.


info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 16 / 19

4.3.2

Start of Refresh Process

In order to start the refresh process the following sequence of actions has to be taken: Firstly, the tablespace CRM has to be dropped from the DWH database. After this any queries on the data will fail as it is not available. Dependent views, synonyms, stored PL/SQL procedures etc. get invalid.16
SQL> drop tablespace crm including contents; DGMGRL> convert database 'OLTP_SITE2' to physical standby

If you need to check whether your physical database with CRM tablespace in OLTP_SITE2 is again current enough in order to be used for the next integration load cycle you can easily query the Data Guard Broker:
DGMGRL> show database 'OLTP_SITE2'; Database - OLTP_SITE2 Role: Intended State: Transport Lag: Apply Lag: Real Time Query: Instance(s): oltp Database Status: SUCCESS PHYSICAL STANDBY APPLY-ON 0 seconds 11 minutes 23 seconds OFF

As the refresh is a parallel media recovery the process is very efficient. Media recovery works block change oriented and is much faster and less resource consuming than the mechanisms of GoldenGate and Streams where SQL is extracted and processed row by row. The presented real-life example demonstrates clearly the high efficiency and the easy and stable operation of this solution.

5. Solution extension: If data availability for operational reporting matters


There is yet another challenge for todays DWH architects: Where to place the Operational Reporting? - The OLTP database is becoming a less and less suitable place due to the heavy workload related with complex query logic inside the Operational Reports - Many Operational Reports query not only the data residing in OLTP system, but also additional analytical attributes which are typically stored in a Core DWH. With the solution presented in this white paper, the DWH architect can consider to use the data residing in the Staging Area for Operational Reporting17 as well. As this data resides in the
16

They get valid automatically again when they are used after the tablespace reappears in next cycle.
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 17 / 19

data warehouse database, it can be joined with analytical attributes in Core DWH without performance impacts (no distributed queries). However, one point has to be taken into consideration: During the refresh, the data in Staging Area is not available (refer to Figure 5). This unavailability needs to be eliminated. The snapshot functionality of operating system and/or storage facility can be used to overcome this. The concept: After the standby database is turned into snapshot standby and the tablespaces are set to read only, snapshots of data files will be created. These snapshots are then plugged into the DWH database, instead of the standby databases data files. This will result in two advantages: - the data in the Staging Area is available almost all18 the time - the recovery of the tablespaces can go on as the standby database can be converted from snapshot standby back to a physical standby right after taking the snapshot of the data files. This will achieve an even shorter latency for the refresh cycles. Note: the snapshots do not copy the data. The data is presented a second time. Later changes are tracked for both sets of data: origin and snapshot. This is known as copy on write mechanism (COW). Examples for OS side snapshotting: With ZFS on Solaris you have a feature of taking copy on write snapshots. It is also possible with Veritas file system, LVM snapshots in Linux and Microsoft Volume Shadow Copy on Windows. SAN and NAS Systems also offer snapshotting features that work with COW mechanism. By using the knowhow of Trivadis, we believe it is possible to reduce operating costs and the complexity of your data warehouse: proper design is what matters!

Kontakt Karol Hajdu Mathias Zarick Trivadis Delphi GmbH Millennium Tower Handelskai 94-96 A-1200 Vienna Tel.: +43 1 332 35 31 00 www.trivadis.com Please contact us if you need more information or help with your setup. karol.hajdu@trivadis.com mathias.zarick@trivadis.com

17 18

at least for that part of reporting where the integrity level of data in Staging Area is sufficient. short downtime will still occur during tablespace drop and re-plugin
info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 18 / 19

Literature and Links


[1] Oracle Data Guard Concepts and Administration, http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htm [2] Oracle Database PL/SQL Packages and Types Reference Chapter 46, http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htm [3] Data Warehousing mit Oracle Business Intelligence in der Praxis. Chapter 3.4. Jordan et al. Hanser. 2011.

info@trivadis.com . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 19 / 19