Hu Yan huy@cs.tut.fi
Outline
What is data warehousing The benefit of data warehousing Differences between OLTP and data warehousing The architecture of data warehouse The main components Data flows Tools and technologies Integration The importance of managing meta-data Data marts
enterprise..rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data Integratedbecause the source data come together from different enterprisewide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users Time-variantthe source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Non-volatiledata is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data
Problems
Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenization High demand for resources Data ownership High maintenance Long-duration projects Complexity of integration
The architecture
Operational data source1
Detailed data
DBMS
Warehouse Manager
detailed, lightly and lightly summarized data,archive/backup data meta-data end-user access toolscan be categorized into five main groups:
data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools
Data flows
Inflow- The processes associated with the extraction, cleansing, and loading of
the data from the source systems into the data warehouse.
upflow- The process associated with adding value to the data in the warehouse
through summarizing, packaging , packaging, and distribution of the data
outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data
Upflow
y Manage
DBMS
Warehouse Manager Data mining tools Downflow End-user access tools
Archive/backup data
The major integration issue is how to synchronize the various types of meta-data use throughout the data warehouse. The challenge is to synchronize meta-data between different products from different vendors using different meta-data stores Two major standards for meta-data and modeling in the areas of data warehousing and component-based development-MDC(Meta Data Coalition) and OMG(Object Management Group)
auditing data warehouse usage to provide user chargeback information replicating, subsetting, and distributing data maintaining effient data storage management purging data; archiving and backing-up data implementing recovery following failure security management
Data mart
data mart a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate data marts and data warehouses include:
a data mart focuses on only the requirements of users associated with one department or business function
data marts do not normally contain detailed operational data, unlike data warehouses as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated
Warehouse Manager
High summarized data Lightly summarized data Reporting, query,application development, and EIS(executive information system) tools
Load Manager
Detailed data
Query Manage
OLAP(online analytical processing) tools
DBMS
Warehouse Manager
Data mining
(First Tier) Operational data store (ODS) Archive/backup data Data Mart
summarized data(Relational database)
(Third Tier)
(Second Tier)
The cost of implementing data marts is normally less than that required to establish a data warehouse The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project