Anda di halaman 1dari 65

Building Business Intelligence and

Data Mining Applications with


Microsoft SQL Server 2005
Introductions

Presenter
Javier Loria
Solid Quality Learning
javier@solidqualitylearning.com
Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview
Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview
Business Intelligence Platform

Integrate Analyze Report

z Data acquisition z Data enrichment, z Data presentation


from source with business and distribution
systems and logic, hierarchical z Data access for
integration views the masses
z Data transformation z Data discovery via
and synthesis data mining
Overview
Getting information from enterprise data
Using BI across the enterprise as an
integral part of doing business
Capture and model all of your data
Integration with business processes
Relational reporting and OLAP converged
through a single dimensional model
Business Intelligence Challenges
Multiple Data Models
Multiple Data Sources
Multiple APIs
Duplication of Data
What Is a Cube?
Markets Dimension

Atlanta

Chicago

Denver
Grapes

si t
Cherries

en c
on
m du
Dallas Melons

Di Pro
Apples

Q1 Q2 Q3 Q4
Time Dimension
What Is a Cube?
Enterprise BI Today
Data Sources Data Models Tools

MOLAP OLAP
Browser

MOLAP

Datamart Reporting
Tool (1)

Datamart
Reporting
Tool (2)

DW Reporting
Tool (3)
Relational vs. OLAP Reports
Feature Relational OLAP
Flexible schema 9 8
Real time data access 9 8
Single data store 9 8
Simple management 9 8
Detail reporting 9 8
High performance 8 9
End-user oriented 8 9
Ease of navigation and 8 9
exploration
Rich analytics 8 9
Rich semantics 8 9
Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview
The Unified Dimensional Model
The Best of Relational and OLAP
Relational Reporting OLAP Cubes
Multiple fact tables Multidimensional navigation
Hierarchical presentation
Full richness the
Friendly entity names
dimensions attributes
Powerful MDX calculations
Transaction level access
Central KPI framework
Star, snowflake, 3NF Multiple perspectives
Complex relationships Partitions

Recursive self joins Aggregations


Distributed sources
Slowly changing
dimensions
UDMs Role
Allows the User Model to be Enriched
Provides High Performance Queries
Allows the Capture of Business Rules to
Support Analysis
Supports Closing the Loop Where the
User Acts Upon the Data
Enterprise BI with UDM

MOLAP
OLAP
Browser

MOLAP
Reporting
Tool
Datamart
UDM
Datamart
BI Applications

DW
Scalable, High Performance
UDM Server

MOLAP
OLAP
Analysis Browser

XML/A or OLE DB/OLAP


Services
MOLAP
Reporting
Tool
Datamart
UDM
Datamart
BI Applications

DW
Analysis Server as UDM Server
Optimized SQL to all major RDBMS
platforms
XML/A client API
SOAP-based Web service
API supported by all major BI vendors
Managed and native providers
ADOMD.NET
OLE DB for OLAP
Streamlined BI Infrastructure
Unified logical model for both relational and
OLAP with superb performance and
scalability
One data store to manage ensure data
consistency and low TCO
Rich user experience with many Microsoft
and 3rd-party tools
BI Development Studio
Complete, integrated tool for the
development of BI applications
Enterprise software development
environment
Integrated with Visual Studio
Team development, source control,
versioning, developer isolation, resource
independent coding
Performance
Proactive caching
Automatic MOLAP cache creation and
management
MOLAP becomes transparent
No requirement to manage an OLAP store
Relational
reporting enjoys MOLAP-like
performance
MOLAP, ROLAP, and HOLAP
MOLAP Caching
Data Source Tool

MOLAP
Analysis
Services
MOLAP
OLAP

XML/A or ODBO
Browser

Datamart Reporting
UDM Tool

Datamart
BI Applications

DW
Cache
Notifications
Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview
UDM and The BI Studio
UDM Data Sources
Multiple Data Sources
OLTP
OLAP
XML
Data Source Views
Tables
Views
Stored Queries
Dimensions and Hierarchies
Dimensions Attribute-Based
Consolidates all attributes of an entity
Hierarchies Organize Data
Custom hierarchies can be created
from attributes
Cubes
No More Limits
Limited only by addressable objects
(2147483647)
Stored as XML
Logical Grouping of Measures and
Dimensions
Perspectives
UDM Provides Subject Area Centric
View of the Data Warehouse
Perspectives Feature Allows
User/Group Specific View of the Same
Data
Categorization
Semantically Meaningful Categories
Measures
Dimensions
Attributes
Hierarchies
Time
UDM Has Built-In Knowledge of Time
Natural (Calendar)
Fiscal
Reporting
Manufacturing
ISO 8601
Translations

UDM provides for multiple languages


Metadata in BI Studio and Client Tool
Displayed in Multiple Languages
Attribute Semantics
Names Vs. Keys
Ordering
Descretization
Key Performance Indicators
Actual Value
Goal Value
Status
Trend
Graphical Representation
Closing the Loop
Integrated Data Mining
Writeback
The UDM is not read-only
Actions
ProClarity Business Intelligence Analytics
Live Client
(Excel based)

Live Server

Web Client Bundle


(includes
Dashboard
Viewer)
OLAP
Cube Dashboard Server Selector
and
KPI Designer
(All Professional Clients)
OLAP
Cube
Web Standard
(zero footprint)
Business Logic Server
OLAP
Cube
Web Professional
(Includes
OLAP Business Reporter
Cube for Excel)
Analytics Server
Desktop Professional
(Includes
OLAP Business Reporter
Cube for Excel)
ProClarity Key Differentiators

Speed in decisions, real insight


One version of the truth
Analysis Platform
ProClarity + Microsoft; total BI platform
Super end-user friendly environment
All users own information
Several visualizations for quick
understanding
Platform total customizable
Low Total Cost of Ownership & Flexible to implement
Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview
Data Mining Architecture
LOB
LOB
Application
Application
Model
Model
Browsing
Browsing Web
Web
..NET
NET
Native
Native
Historical
Historical
Dataset
Dataset Reporting
Reporting
SQL
SQL Data Transform (SSIS)
OLE/DB
OLE/DB
Text
Text File
File

Prediction
Mining Models

Cube
Cube
New
New
Cube
Cube Dataset
Dataset
Operations
(SSIS)
CRoss Industry Standard Process
for Data Mining (CRISP)

http://www.crisp-dm.org
Microsoft Mining Model Algorithms

Decision Trees Clustering Time Series


Introduced in SQL Server 2000

Sequence Association Nave Bayes


Clustering

Neural Net
Microsoft Mining Models
When To Use What
Analytical Problem Examples Algorithms
Classification: Assign cases to Credit risk analysis Decision Trees
predefined classes Churn analysis Naive Bayes
Customer retention Neural Nets
Segmentation: Taxonomy for Customer profile analysis Clustering
grouping similar cases Mailing campaign Sequence Clustering
Association: Advanced counting Market basket analysis Decision Trees
for correlations Advanced data exploration Association
Time Series Forecasting: Predict Forecast sales Time Series
the future Predict stock prices
Prediction: Predict a value for a Quote insurance rates All
new case based on values for Predict customer income
similar cases
Deviation analysis: Discover how Credit card fraud detection All
a case or segment differs from Network infusion analysis
others
Thank You
Javier Lora
Business Intelligence,
Solid Quality Learning
javier@solidqualitylearning.com
Decision Trees
Classify each case to one of a few discrete
broad categories of selected attributes
The process of building is recursive
partitioning splitting data into partitions
and then splitting it up more
Initially all cases are in one big box
Decision Trees (cont.)
The algorithm tries all possible breaks in classes
using all possible values of each input attribute;
it then selects the split that partitions data to the
purest classes of the searched variable
Several measures of purity
Then it repeats splitting for each new class
Again testing all possible breaks
Unuseful branches of the tree can be
pre-pruned or post-pruned
Decision Trees (cont.)
Decision trees are used for classification and
prediction
Typical questions:
Predict which customers will leave
Help in mailing and promotion campaigns
Explain reasons for a decision
What are the movies young female customers likely to
buy?
Nave Bayes
Classification and Prediction Model
Calculates probabilities for each possible
state of the input attribute given each state
of the predictable attribute
Nave Bayes (cont.)
Used for classification
Assign new cases to predefined classes
Some typical questions:
Categorize bank loan applications
Determining which home telephone lines
are used for Internet access
Assigning customers to predefined
segments
Quickly gathering basic comprehension
Cluster Analysis
Grouping data into clusters
Objects within a cluster have high similarity
based on the attribute values
The class label of each object is not
known
Several techniques
Partitioning methods
Hierarchical methods
Density based methods
Model-based methods, more
Cluster Analysis (cont.)
Segments a heterogeneous population
into a number of more homogenous
subgroups or clusters
Some typical questions:
Discover distinct groups of customers
Identify groups of houses in a city
In biology, derive animal and plant
taxonomies
Sequence Clustering
Analyzes sequence-oriented data that
contains discrete-valued series
The sequence attribute in the series holds a
set of events with a specific order that can be
cosnsidered as a model
Typically used for Web customer analysis
Can be used for any other sequential data
Sequence Clustering (cont.)
Click-Stream Analysis
User Sequence
1 frontpage news travel travel
2 news news news news news
3 frontpage news frontpage news frontpage
4 news news
5 frontpage news news travel travel travel
6 news weather weather weather weather
7 news health health business business business
8 frontpage sports sports sports weather
9 weather
Microsoft Mining Models
Association Rules
For market basket analyses
Identify cross-selling opportunities
Arrange attractive packages
Considers each attribute/value pair as an
item
An item set is a combination of items in a
single transaction
The algorithm scans through the dataset
trying to find item sets that tend to appear
in many transactions
Association Rules Support
Support is the percentage of rows
containing the item combination compared
to the total number of rows:
Transaction 1: Frozen pizza, cola, milk
Transaction 2: Milk, potato chips
Transaction 3: Cola, frozen pizza
Transaction 4: Milk, pretzels
Transaction 5: Cola, pretzels
The support for the rule If a customer
purchases Cola, then they will purchase
Frozen Pizza is 40%
Association Rules Confidence
What if 60% of customers buy milk and
only 20% of those buy potato chips?
The confidence of an association rule is
the support for the combination divided by
the support for the condition
This gives a confidence for a rule If a
customer purchases Milk, they will
purchase Potato Chips of (20% / 60%) =
33%
Time Series
Predict continuous columns, such as
product sales or stock performance in a
forecasting scenario
Builds a model in two stages
First stage creates a list of optimal candidate
input columns
Second stage investigates each candidate
input column and determines if it improves the
model
Neural Network
Data modeling tool that is able to capture and
represent complex input/output relationships
Neural networks resemble the human brain in
the following two ways:
A neural network acquires knowledge through
learning
A neural network's knowledge is stored within inter-
neuron connection strengths known as synaptic
weights
It explores all possible data relationships
It can be slow
Back-Propagation
Traininga neural network is setting the best
weights on the inputs of each of the units
The back-propagation process:
Get a training example and calculate outputs
Calculate the error the difference between
the calculated and the expected (known) result
Adjust the weights to minimize the error

Anda mungkin juga menyukai