y y
coined by Bill Inmon in 1990 A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
Data Warehouse
Characteristics
y y y
subject-oriented Integrated
time-variant Data Warehouse is constructed by integrating Provides information from multiple heterogeneous sources. historical perspective e.g. past to ensure Data Preprocessing are applied 5-10 years y non-volatile Data once recorded cannot be updated
Data consistency. warehouse requires two operation in data accessing Initial loading of data Access of data
Operational database layer : The source data for the data warehouse
informational access layer - Tools to extract, transform, load data into the warehouse fall into this layer. detailed than an operational system data directory.
Metadata layer : The data directory - This is usually more Informational access layer :The data accessed for reporting
and analyzing and the tools for reporting and analyzing data
ETL Tools
Operational Data Application oriented Detailed Accurate, as of the moment of access Serves the clerical community Can be updated Run repetitively and non reflectively Understood before initial development Compatible with Software development Life cycle Performance sensitive (immediate response required when entraing a transaction)
DW Subject Oriented Summarized, otherwise refined Represents values overtime, snapshots Serves the managerial community Is not updated Run heuristically Completely understood before development Completely different life cycle Performance Relaxed (immediacy not required)
Operational Accesses a unit at a time (limited number of data elements for single record) Transaction driven Control of update a major concern in terms of ownership High availability Managed in its entirety Non redundancy Static structure; Variable contents Small amount of data used in a process
Data warehouse Accessed a set at a time (many records of many data elements) Analysis driven Control of update no issue Relaxed availability Managed by subsets Reduncancy is a fact of life Flexible Structure Large amount of data used in a process
Enable users to analyze different dimensions of multidimensional data. For example, it provides time series and trend analysis views
OLAP
Multidimensional OLAP
It has the ability to store data in the y Relational OLAP multi dimensional array that is Store the information in the form of rows highly optimized.
y
Hybrid
and columns in the particular sequence are OLAP. serialized by address. Base tables the created at the deep database and new tables properties of both Aggregates the which are created by the users are aggregated to connect the relational and multi dimensional data in meaningful way..
OLAPs
Types of OLAP
A process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions.
Data Mining
Steps in data mining is to describe the data (summarize its statistical attributes) build a predictive model (Based on patterns determined from
known results, then test that model on results outside the original sample)
An analyst might want to determine the factors that lead to loan defaults. -- query and report tools describe what is in a database --OLAP goes further; its used to answer why certain things are true. -- Data mining is different from OLAP, rather than verify hypothetical patterns, it uses the data itself to uncover such patterns.
Example
A broad category of technologies that allows for gathering, storing, accessing & analyzing data to help business users make better decisions analyzing business performance through data-driven insight
Business Intelligence
Track
Act
Analyze
Decide
Model
Information Architecture
Data Architecture
Technical Architecture
Product Architecture
BI Architecture