Anda di halaman 1dari 41

Session 3

BUSINESS INTELLIGENCE ANALYTICS BUSINESS INTELLIGENCE ANALYTICS BUSINESS INTELLIGENCE ANALYTICS BUSINESS INTELLIGENCE ANALYTICS - -- -1 11 1
Session 3
Agenda for Coming sessions
Analytics with Probabilistic Decision Making Model
Introduction to Logistics Regression
Decision Theory
Non Linear Models
Business Analytics and Application Business Analytics and Application
Sentiment Analysis and Opinion Mining
Online Business Channel and Web Analytics
Analytics in Marketing
Introduction to Markov Analysis
Markov Decision Process
Poisson Process Models
Continue
Product Development
Introduction to Life Cycle Cost
Total cost of ownership
Analytics In Operation
Introduction to Sig Sigma for problem solving Introduction to Sig Sigma for problem solving
Analytics In finance
Brownian Process
Asset Performance Measure
Case Study
7i technology is a medium-sized consulting firm in San Francisco that
specializes in developing various forecast of product demand, sales,
consumption, or other information for its clients. To a lesser degree, it
has also developed ongoing models for internal use by client companies.
When contacted by a potential client, 7i technology usually establishes a
basic agreement with the firms top management that sets out the
general goals of the end product, primary contact personnel in both
firms, and an outline of the projects overall (including any necessary firms, and an outline of the projects overall (including any necessary
time constraints for intermediate and final completion and rough price
estimate for the contract). Following this step, a team of 7i personnel is
assembled to determine the most appropriate forecasting technique and
to develop a more detailed work program to be used as the basis for
final contract negotiations. This team which vary in size according to the
scope of the project and the clients needs, will perform the tasks
established in the work program in conjunction with any personnel from
the client firm who would be included in the team.
Continue
Recently, 7i has been contacted by a rapidly growing multinational firm that
manufactures, sells android based tablets for enterprise and retail use.
Honeycomb has seen aggressive in global and regional market and is in the
process to define new strategy to increase its present market share. But the
problem which the company is presently is facing is in terms of demand so that
they can offer competitive price to increase their market share.
As a Business Analyst of 7i you must decide between different forecasting
techniques for weekly sales of tablets; a linear trend equation and the naive techniques for weekly sales of tablets; a linear trend equation and the naive
approach. The linear trend equation is
Yi= 12+2x, and it was developed using data from periods 1 through 10. Based
on the data from periods 11 through 20, calculate the MPE (Mean Percentage
Error) and MAPE (Mean Absolute Percentage Error).
Base on the values of MPE and MAPE comment on which of the two methods
has the greater overall accuracy. Compare the two methods in terms of the
forecast bias.
Data
T Units Sold (000 thousand)
11 25
12 28
13 34
14 40
15 44 15 44
16 39
17 48
18 50
19 47
20 54
Since the MAPE values for the methods for the methods are
approximately equal, the overall accuracy of the two methods
is about the same. Both Methods are predicting
approximately 11 % away from actual.
Since the MPE is -7.91 % for the linear trend equation, the Since the MPE is -7.91 % for the linear trend equation, the
trend equation is overestimating sales by 7.91%.
On the otherhand MPE is +ve for nave forecasting methods.
It is underestimating the sales by 7.66%
Accuracy and Control
Forecast Errors
Forecast error is the difference between the value that occurs
and the value that was predicted for a given time period.
Error=Actual-Forecast
Positive errors results when the forecast is too low and
negative when it is too high negative when it is too high
Reasons of Forecasting errors
The model may be inadequate due to (a) the omission of an
important variable, (b) a change or shift in the variable that the
model cannot deal with (e.g., sudden appearance of a trend or
cycle), or (c) the appearance of a new variable (e.g., new
competitor)
Irregular variations due to severe weather or other natural
phenomena, temporary shortages or breakdowns, catastrophes, or phenomena, temporary shortages or breakdowns, catastrophes, or
similar events may occur.
The forecasting technique may be used incorrectly or the results
may be misinterpreted
There are random variations in the data. Randomness is the
inherent variation that remains in the data after all causes of
variation have been accounted for
Types of Forecasting Accuracy
Mean Absolute Deviation( MAD)
Definition: measures the average forecast error over a
number of periods, without regard to the sign of the error:
for computation, all errors are treated as positive.
Mean Squared Error (MSE) Mean Squared Error (MSE)
Definition: the average squared error experienced over a
number of periods.
Formula

= =

n
F A
n
e
MAD
( )
1 1
2
2

=

n
F A
n
e
MSE
n n
Continue
The MSE is a variance, and the n-1 in its denominator is used
instead of n for essentially the same reason that n-1 is used to
compute a sample standard deviation model.
Difference Between the two models
MSE square of each error, tends to emphasize large errors MSE square of each error, tends to emphasize large errors
more than the MAD measure.
Monitoring and Controlling Forecast
Is it time to reexamine the validity of the forecasting
technique being used?
There are two types of random errors
Which are inherent and cannot be removed from the model
Second one is non-random errors which can be eliminated Second one is non-random errors which can be eliminated
How to eliminate such errors?
Modifying the technique
Improving data collection .
Forecast Error
Mean Forecast Error (MFE)
F A
MFE
i i


=
n
F A
MFE
i i


=
The response variable, Y, is categorical
Analytics In Decision Making
Logistics Model
Introduction
Quiz
Difference Between Linear and
Nonlinear Regression Models
ui Xi Yi + + = 2 1
ui ie Yi
ui Xi Yi
Xi
+ =
+ + =
2
2 1


Exponential Regression Model
Business Problem in Marketing/ Retail
What is the success probability by endorsing Chetan Bhagat
to promote Huwaie technologies products.
What channel of delivery is more effective.
What is the impact of price label on buyers decision.
Business Problem in Banking and
Finance
How to distinguish between good and bad credit risks.
How to identify most profitable customer.
How customer will react in terms of there invest in mutual
funds during bad market situations.
Definition
Logistic regression also known as logit analysis is a statistical
model used for prediction of probability of occurrence of an
event.
Logistic regression differs from multiple regression, however,
in being specifically designed to predict the probability of an in being specifically designed to predict the probability of an
event occurring (i.e. the probability of an observation being
in the group). Although probability values are metric
measures, they are fundamental difference between two.
What this model explains.
Logistics Regression models how probability, P, of an event
may be affected by one or more explanatory variables.
Classification
Classifying customer by their buying habits between various
categories.
Classification by a telecom operators its various customers in
terms of usage.
Challenger launch temperature vs
damage data
Equation
Z
X
e
e
P
i
) (
1
1
1
2 1

+
=
+
i i
Z
Z
Z
i
X Z
e
e
e
P
i
2 1
1 1
1
+ =
+
=
+
=

Representation of Binary Dependent
Variable
Logistic regression represents the two groups of interest as
binary variable with values of 0 and 1
The assignment of values is not important but the
interpretation of coefficient are done in this format
Example likes whether to launch a marketing campaign at a Example likes whether to launch a marketing campaign at a
particular reason. The result would be success and failure.
Use of Logistic Curve- Sigmoid or S
shaped
8
10
12
14
Probability of Event
(Dependent Variable)
0
2
4
6
Low High Level of the Independent
Variable
Explanation
Binary Values has only value between 0 and 1
In order to define relationship in logistics regression we use
logistic curve between independent and dependent variable.
Unique Nature of the Dependent
Variable
Binary nature of the dependent variable (0 or 1) has
properties that violate basic assumptions multiple regression.
The error term of a discrete variable
Logit Function
The logit function is a logarithmic transformation of the
logistic function. It is defined as the natural logarithm of
odds.
Logit of a variable (with value between 0 and 1) is given
by. by.
X In Logit
1 0
1
) (

+ =

=
Logistic Transformation
The logistic regression model is given by
) (
) (
1
1 0
1 0
e
e
i
X
Xi



+
=
+
+
1 1 0
) (
1
1
1 0
X In
e
i
X
i
i


+ =

+
More robust
Error terms need not be normal
No requirement for equal variance for error terms
No requirement for linear relationship between dependent
and dependent and independent variables. and dependent and independent variables.
In standard regression , the error term is assumed to follow
normal distribution whereas in case of logistics regression its
not the same.
In case of binary logistics regression, the error for a given
value of X (explanatory variable) is either 1--X or - X value of X (explanatory variable) is either 1--X or - X
.
Thus the error will not follow normal distribution
Binary Logistic Model
Binomial (or binary) logistic regression is a model in which
the dependent variable is dichotomous .
The independent variables may be of any type.
Estimation of parameters
No closed form solutions exists for estimation of regression
parameters of logistics regression.
Estimation of parameters in logistic regression is carried out
using Maximum Likelihood Estimation (MLE) technique.
Maximum Likelihood Estimator
(MLE)
MLE is a statistical model for estimating model parameters of
a function
For a given data set, the MLE chooses the values of the
model parameters that makes the data more likely than
other parameters values. other parameters values.
Assume that X1, X2 X3,Xn are some sample observation
s of a distribution f(x,0), where 0 is an unknown parameter.
The likelihood function is L()=F(X1,X2,..,Xn,) which is
the joint probability density function of the sample.
The value of ,*,which maximizes L() is called the The value of ,*,which maximizes L() is called the
maximum Likelihood estimator of .
E.g. Exponential Distribution
Let x1, x2, , xn be the sample observation that follows
exponential distribution with parameter .
That is:
f(x, )=e-x
The likelihood function is given by (assuming independence):

Anda mungkin juga menyukai