Anda di halaman 1dari 25

Objective

In this presentation we will


be focusing on
• Data, information.
• Need of information.
• Database
• Data warehousing concept
• Difference between
database and data
warehouse.
• Architecture of data
warehousing
• Applications of Data
warehousing
Data and Information

• They may sound synonymous but are


very different from each other.
• Data are plain facts. When data are
processed, organized, structured or
presented in a given context so as to
make them useful, they are called
Information.
Database
• Database is a place
where data is taken as
a
base and managed to
get available fast and
efficient access
• In general it is a
reservoir having
operational data
needed for daily
business.
Data Warehouse
• A single, complete and
consistent store of
data obtained from a
variety of different
sources made available
to end users in a what
they can understand
and use in a business
context.
Similarities and
differences
• Database and Data warehouse,Both
support application data storage access
and retrieval of data but only Data
warehouse support Decision making and
Business intelligence.
• Data warehouse has historical data and
the current data so comparison becomes
very easy for business analysis.
Major Distinction
• Database is used for running the
business.
• Data warehouse is used to know how
to run the business.
A situation!
• You are an analyst working for a computer firm.
• As an analyst you want know-
1- Who are your customers?
2- What kind of products they want to buy?
3- what is the most effective distribution channel?
4- What product promotions have the big impact
on revenue?
5- How do I decide upon segmentation ?
What do I do?
• Data scattered all over network based database.
• Data is available but can’t understand it.
• Data needs to be collected and summarized to
give it some meaning.
• Data collected cannot be used for analysis.
• All you need is to have a data warehouse in your
company.
Data warehousing
Information • A process of
transforming data
into information
and making it
available to users
in a timely enough
manner to make a
difference

Data
Warehouses are Very
Large Databases
35%

30%

25%
Respondents

20%

15%

10%
Initial
5% Projected 2Q96

Source: META Group, Inc.


0%
5GB 10-19GB 50-99GB 250-499GB
5-9GB 20-49GB 100-249GB 500GB-1TB
Very Large Data Bases
• Terabytes -- 10^12 Walmart -- 24
bytes: Terabytes

• Petabytes -- 10^15 Geographic


bytes: Information
Systems
• Exabytes -- 10^18 National Medical
bytes: Records

• Zettabytes -- 10^21 Weather images


bytes:
Intelligence Agency
• Zottabytes -- 10^24 Videos
Data Warehouse
Architecture
Relational
Databases
Optimized Loader
Extraction
ERP
Systems Cleansing

Data Warehouse
Engine Analyze
Purchased Query
Data

Legacy
Data Metadata Repository
Components of the
Warehouse
• Data Extraction and Loading
• The Warehouse
• Analyze and Query -- OLAP Tools
• Metadata
Data Mining tools
Loading the Warehouse

• Cleaning the data


before it is
loaded
Why is LOADING
required?
• Warehouse data comes from
disparate questionable sources
• Outside sources with
questionable quality procedures
Data Integration Across
Sources
Savings Loans Trust Credit card

Same data Different data Data found here Different keys


different name Same name nowhere else same data
Data Transformation
Terms
• Extracting • Enrichment
• Conditioning • Scoring
• Scrubbing • Loading
• Merging • Validating
• Householding • Delta Updating
Refresh

• Propagate updates on source data to


the warehouse
• Issues:
– when to refresh
– how to refresh -- refresh techniques
De-normalization

• Normalization in a data warehouse


may lead to lots of small tables
• Can lead to excessive I/O’s since
many tables have to be accessed
• De-normalization is the answer
especially since updates are rare
True Warehouse

Data Sources

Data Warehouse

Data Marts
A Sample Query
• Select month, dollars, cume(dollars)
as run_dollars, weight, cume(weight)
as run_weights
from sales, market, product, period
t
where year = 1993
and product like ‘Columbian%’
and city like ‘San Fr%’
order by t.perkey
Automated processes in
data warehouses
• select product, dollars as jun97_sales,
(select sum(s1.dollars)
from market mi, product pi, period, ti, sales si
where pi.product = product.product
and ti.year = period.year
and mi.city = market.city) as total97_sales,
100 * dollars/
(select sum(s1.dollars)
from market mi, product pi, period, ti, sales si
where pi.product = product.product
and ti.year = period.year
and mi.city = market.city) as percent_of_yr
• from market, product, period, sales
• where year = 1997
• and month = ‘June’ and city like ‘Ahmed%’
• order by product;
Applications

Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providersValue added data
Utilities Power usage analysis

Anda mungkin juga menyukai