Anda di halaman 1dari 53

Data Warehouse Fundamentals

Chapter 1

Introduction to Data Warehouse

Paul K Chen

1
Introduction to Data Warehouse
Portions of the Materials at this website subject-Data
Warehouse Fundamentals -are drawn from the
Textbooks below:

Data Warehouse Fundamentals


Author: Paulraj Ponniah
Publisher: John Wiley & Sons, Inc. 2001

Database Systems
Authors: Thomas Connolly and Carolyn Begg
Publisher: Wesley Longman, Inc. Second Edition
Road Map for Learning By Subject
Chapters 1 DW Overview

Chapter 2 DW Architecture/Components/Building Blocks


Chapters 3
DW Project Planning and Management Trends
Chapter 4

Chapter 5 Analyzing DW Business Requirements

Chapters 6,7
Relational & Dimensional Modeling-DW DB Design

Chapters 8, 9, 10 Chapter 11

DW Information Delivery/Data Retrieval Physical Design Process and


by OLAP and Data Mining via Web Data Quality
Chapter 1 - Objectives
 Understand the differences between data and
information and the information crisis
 Recognize the information crisis at every enterprise
 Understand the various ways of organizing and
managing information for decision making use
 Review the history of decision support systems
 Learn briefly what is data warehouse and see why data
warehousing is the viable solution
Data and Information
 We’re told we live in the “information age”.
 People often talk about data and information as if
there were the same. They are, in many regards,
opposite.
 A datum is just a fact—your name is a fact, your phone
number is a fact.
 Information is data that is presented in a meaningful,
understandable and. beneficial format. Information is
data that has been organized , sequenced, correlated
and summarized, such as a phone book.
Data and Information
 A phone book is information. It not only contains
names and phone numbers, but it correctly associates
each person’s phone number with their names. It
presents this list of correlated names and phone
numbers in alphabetical sequence, so that we find the
phone number from the name. In addition, it divides
the phone numbers into two types; personal and
business.
 It is the function of the computer to convert data to
information.
Definitions
 Database: The database is a place where you put your
data; data that you wish to convert to information at
some future time.

 Database Management System: A DBMS is the


software that converts the data in your database to
information. It is the DBMS that provides you the
capability for cross-referencing, correlating, sorting,
summarizing, etc.
Information as A Competitive
Weapon
Information technology and quality information are not
the goals, but merely to support organizations to reach
goals of

 Superior products and services

 Greater productivity

 Eventually success
Data, Information, and Decision
 Data  Data Resource Management
(DRM)

 Information (Data + Process)  MIS (OLTP) & OOAD

 KM (Knowledge Mgt), KWS


 Knowledge (Knowledge Work Systems)

 DSS; ESS, EIS (Executive


 Decision (Information + Information Systems)
Knowledge)

 Data Warehousing/Data
 Data/Information/Decision Mart/Data Mining/OLAP
(Executive, Collaborative and
individual levels)
Data, Information, and Decision by
Subject
 Data Data processing
+ Processing System Analysis/Design
 Information MIS, Database Systems
 Object (Data+Processing) Object-Oriented SD/DA

 Knowledge Artificial Intelligence


+ Information Expert system
 Decision (executive level) DSS, EIS
 Decision (all levels, sophisticated) Data warehousing
Data Mining
The Information Crisis
 Integrated: Must have a single, enterprise-wide view.
 Data Integrity: Information must be accurate and
must conform to business rules.
 Accessible: Easily accessible with intuitive access paths,
and responsive for analysis.
 Credible: Every business factor must have one and one
value.
 Timely: Information must be available within the
stipulated time frame.
The Era of Information-Based
Management—Five Themes
 A Single Information Source (E-Business)

 Distributed Information Availability (XML)

 Information In A Business Context (Decision Support


Systems)

 Automated Information Delivery (for ex., Trigger)

 Information Quality and Ownership (for ex., DRM)


Complete E-Business Suite

ERP Marketing
EAI
Projects Sales

Financial
Services Order Mgt
One Database

Procurement
Human
Resources
Customer Manufacturing Supply Chain (SCM)
Relationship(CRM)
What is EAI?
 What is EAI? EAI refers to Enterprise Application Integration.
EAI is the merging of applications and data from various new and
legacy systems within a business. Various means are employed to
accomplish EAI, including middleware, in order to unify IT
resources, maximize new ERP investments, diminish errors and
get everyone on the same page. EAI enables companies to link
their existing software applications with each other and with
portals. EAI provides the ability to get their applications to
exchange critical data. EAI is usually close to the top of any CIO's
list of concerns. There are different approaches to EAI. Some rely
on linking specific applications with tailored code, but most rely
on generic solutions, typically called middleware. XML, combined
with SOAP and UDDI, is a kind of middleware.
Data Warehouse & ERP
– ERP = Enterprise Resource Planning

– A software solution that addresses enterprise needs


taking the process view of an organization to meet the
organization goals.

-- It integrates all the departments and functions across


a company into a single computer system that can
serve all those different departments’ particular
needs.
Information System Categories
Information System Categories
DATA RESOURCE MANAGEMENT
(DRM)
 DEFINITION
DATA RESOURCE MANAGEMENT (DRM) IS THE
BUSINESS DISCIPLINE WHICH FOCUSES ON HOW
DATA CAN BE MANAGED TO MOST EFFICIENTLY
SUPPORT THE BUSINESS ENTERPRISE. DRM
ADDRESSES THE MANAGEMENT OF ALL
ENTERPRISE DATA. WHEN COMBINED WITH OTHER
ENTERPRISE PROCESSES, DRM PROVIDES
INFORMATION WHEN NEEDED, WHERE NEEDED, IN
THE FORM NEEDED, WITH DESIRED ACCURACY
AND AT MINIMUM COST FOR BUSINESS
ENTERPRISE.
DATA RESOURCE MANAGEMENT
(DRM)
DATA RESOURCE MANAGEMENT BECOMES
INCREASINGLY CRITICAL TO THE SUCCESS OF THE
CORPORATION IN THE MARKETPLACE DUE TO THESE
NEW REALITIES:

 THE COMPETITIVE, GLOBAL ENVIRONMENT THAT


BUSINESS IS FACING

 EXPLOSIVE GROWTH OF THE WEB OVER THE INTERNET

 INCREASING USE OF DATA WAREHOUSE SYSTEMS TO


MAKE BETTER DECISIONS
DATA RESOURCE MANAGEMENT
(DRM)
WHAT IT IS:

 PROVIDING A UNIFIED AND INTEGRATED APPROACH FOR


PLANNING, CONTROL AND INTEGRATION OF OUR DATA ASSETS
IN SUPPORT OF ENTERPRISE’S BUSINESS

 ENCOURAGING THE REDUCTION OF UNNECESSARY DATA


DUPLICATION

 ENCOURAGING THE REUSE AND SHARING OF HIGH QUALITY


DATA

 DONE RIGHT, THE INVESTMENT CAN BE PAID BACK


MANY TIMES OVER.
DRM PRINCIPLES
THE FOLLOWING PRINCIPLES SERVE AS
GUIDELINES FOR MANAGING DATA AS AN
ENTERPRISE DATA:

 STRATEGICALLY AND TECHNICALLY DRIVEN:


THE EXISTENCE OF EACH DATA ITEM MUST BE
JUSTIFIED BY A BUSINESS PROCESS REQUIRED OF
EITHER SHORT-TERM OR LONG-TERM GOALS.
DRM PRINCIPLES (Continued)
 DATA LIFE CYCLE ASSESSMENT

DATA LIFE CYCLE FROM ACQUISITION OR CREATION TO


PRODUCTION OR DELETION MUST BE PERIODICALLY
ASSESSED BASED ON BUSINESS NEEDS AND CLIMATES.
DRM PRINCIPLES (Continued)
 DATA DEFINED

DATA MUST BE UNIQUELY DEFINED AND ASSIGNED


PRECISE MEANING PER ORGANIZATION VOCABULARY.
DRM PRINCIPLES (Continued)
 INTEGRITY

DATA INTEGRITY RULES MUST BE MAINTAINED TO


ASSURE CONSISTENCY AND TO CONTROL REDUNDANCY.
DRM PRINCIPLES (Continued)
 SECURITY/CONFIDENTIALITY

DATA MUST BE PROTECTED FROM UNAUTHORIZED AND


INADVERTENT ACCESS, MODIFICATION, DESTRUCTION
AND DISCLOSURE.
DRM PRINCIPLES (Continued)
 ACCESSIBILITY

DATA MUST BE MADE AVAILABLE WHEN AND WHERE


NEEDED FOR SHARING AND REUSE.
DRM PRINCIPLES (Continued)
 DATA STEWARDSHIP

DATA SUBJECT AREAS WILL BE MANAGED BY A TEAM OF


PEOPLE KNOWN AS DATA OWNERS AND CUSTODIANS.
THE GROUP IS RESPONSIBLE FOR ASSURING THAT DATA
STRUCTURE REFLECTS BUSINESS POLICIES AND RULES.
DRM PRINCIPLES (Continued)
 COST/BENEFIT OPTIMIZATION

DATA MUST BE UTILIZED TO MAXIMIZE BUSINESS


BENEFITS AT A MINIMUM COST.
Knowledge Management (KM) – Side
Benefits of DRM
 It is a systematic process for capturing, integrating,
organizing, and communicating knowledge
accumulated by employees.

 It is a vehicle to share corporate knowledge so that


employees may be more more effective and be
productive in their work.

 A knowledge management system must store all such


knowledge in a knowledge repository.
What is AI?
 What is intelligence?
– The ways humans think..
– The ways humans behave ..
– The ways rational/intelligent things think..
– -The ways rational/intelligent things behave…
 AI is the science of understanding intelligence and the
art of making intelligent things
What does AI do?
 Automation of problem solving
– Learning
– Memory (Knowledge Representation)
– Reasoning
– Acting
 Study of mental faculty through computational models
 Making computers do what people do better now (or
did better at some point!)
History of Decision-Support
Systems
 Ad Hoc Reports
 Special Extract Programs
 Small Applications
 Information Centers
 Decision-Support Systems
 Executive Information Systems
Four Levels of Analytical Processing
 In modern organization, at least four levels of
analytical processing should be supported by
information systems

– First level: Consists of simple queries and reports


against current and historical data

– Second level: Goes deeper and requires the ability to


do “what if” processing across data store
dimensions
Four Levels of Analytical Processing
– Third level: Needs to step back and analyze what
has previously occurred to bring about the current
stat of the data

– Fourth level: Analyzes what has happened in the


past and what needs to be done in the future in
order to bring some specific change
The Evolution of Data Warehousing
 Since 1970s, organizations gained competitive
advantage through systems that automate business
processes to offer more efficient and cost-effective
services to the customer.

 This resulted in accumulation of growing amounts of


data in operational databases.
The Evolution of Data Warehousing
 Organizations now focus on ways to use operational
data to support decision-making, as a means of gaining
competitive advantage.

 However, operational systems were never designed to


support such business activities.

 Businesses typically have numerous operational


systems with overlapping and sometimes contradictory
definitions.
The Evolution of Data Warehousing
 Organizations need to turn their archives of data into a
source of knowledge, so that a single integrated /
consolidated view of the organization’s data is
presented to the user.

 A data warehouse was deemed the solution to meet the


requirements of a system capable of supporting
decision-making, receiving data from multiple
operational data sources.
Objectives of Today’s Businesses
 Access and combine data from a variety of data stores
 Perform complex data analysis across these date stores
 Create multidimensional views of data and its
metadata
 Easily summarize and roll up the information across
subject areas and business dimensions
These objectives cannot be met easily
 Data is scattered in many types of incompatible
structures.
 Lack of documentation has prevented from integration
older legacy systems with newer systems
 Internet software like searching engine needs to be
improved
 Accurate and accessible metadata across multiple
organizations is hard to get
A New Type of System Environment
 Data is designed for analytical tasks
 Data from multiple applications
 Easy to use and conductive to long interactive sessions by users
 Read-intensive data usage
 Direct interaction with the system by the users without IT
assistance
 Content updated periodically and stable
 Content to include current and historical data
 Ability for users to run queries and get results online
 Ability for users to initiate reports
What is a Data Warehouse?
Data Warehousing is a decision support system. It has the
Following characteristics:

Characteristics:

1. A central database that is loaded from


multiple operational databases for the
purpose of end-user access and decision
support.
What is a Data Warehouse? -
Continued
2. A data warehouse differs from an
operational system in that the data it
contains is normally static and updated
in a scheduled manner through massive
loading procedures.
What is a Data Warehouse? -
Continued
3. A data warehouse is developed to
accommodate random, ad hoc queries
and to allow users to ‘drill down’ to
minute levels of detail.
Definition
Bill Inmon defines a central data warehouse as a
database that is:

1. Subject Oriented
Data naturally congregates around major
categories within any corporation. These
categories are called subject areas. For example,
subject areas are bill of material, customer,
product, and criminal profile. The subject area
will be designed to contain only the data
appropriate for decision support analysis.
Definition (Continued)
2. Integrated
Data integration is displayed by consistence
in the measurement of variables, naming
conventions, physical data definitions
across the data. There will be only one
definition, identifier, etc., for each subject
area.
Definition (Continued)
3. Time Variant

Data in the DW is historical and accurate as of


some point in time. Since DW data is extracted
from operational systems, it must have an
element of time as part of its key structure.
Definition (Continued)
4. Static
Since the data in DW is a snap shot extracted
from operational system, it must be static or
non-updateable.
Definition (Continued)
5. Data Granularity

 Data in the warehouse is summarized at different


levels.

 Granularity levels are based on the data types and the


expected system performance for queries.
The Benefits of Data Warehouse
 Enable workers to make better and wiser decisions

A data warehouse is specifically developed to allow


users the ability to explore data in an unlimited
number of ways, accommodating essentially any query
a manager could dream up and providing access to the
data sources that are behind the results. For example,
information gleaned from a data warehouse can
change pricing information.
The Benefits of Data Warehouse
 Identify hidden business opportunities

A data warehouse performs a second, and very


valuable function by searching data for trends
and abnormalities which users may not know to
look for.

For example: Assisting companies in spotting


sales trends, and detecting erroneous or
fraudulent billings.
The Benefits of Data Warehouse
 Bending with the customer

A data warehouse can help companies by really


understanding who their customers are and what
services they are using.

For example, by collecting and analyzing internet


portal click stream data, companies are able to
build extensive user profiles to boost profits
through sales channel.
The Benefits of Data Warehouse
 Precision Marketing

A data warehouse can aid in detecting segments


of the marketplace (geographically and
demographically) which remain untapped, and
help show the best way to reach out to these
potential customers (rapid response to market
and technology trends).
Tugas

 Apa yang dimaksud dengan datawarehouse


 Mengapa perlu adanya data warehouse dalam
lingkungan bisnis?
 Jelaskan manfaat adanya data warehouse
 Bagaimana pengembangan data warehouse di
masa depan
 Sebutkan contoh kasus dalam penggunaan data
warehouse