Anda di halaman 1dari 45

Regression Analysis and Modeling for Decision Support

Regression Analysis: Estimating Models for Decision Support Regression Analysis is a study of relationship between a set of independent variables and the dependent variable.
Independent variables are characteristics that can be measured directly (example the area of a house). These variables are also caled predictor variables (used to predict the dependent variable) or explanatory variables (used to explain the behavior of the dependent variable).

Dependent variable is a characteristic whose value depends on the values of independent variables.
Constant term Coefficients

Y = B0 + B1*X1 + B2*X2 + +/- E


Dependent Variable Independent Variable Random Error

Purpose of Regression Analysis


Past / Experience / Known Now Future/Unknown time Explanation:Use regression analysis to develop a mathematical model to explain the variance in the dependent variable based on values of independent variables. Prediction: If the regression model adequately explains the dependent variable, use the model to predict values of the dependent variable.

Explain Selling Price of a house (dependent) based on its characteristics (independents). If the model is valid, use it for prediction. Develop Regression Model using known data (sample) Selling Price = 40,000 + 100(Sq.ft) + 20,000(#Baths)
If the above model is reliable and valid, Use this model to predict the Selling Price of any house based on its area (Sq.ft.) and the number of bathrooms (#Baths) The constant term (40,000) is the fixed price of the house. This is not dependent on the values of the variables considered. Can be interpreted as the price of the lot and transaction costs. The coefficient of Sq.ft. (100) is the change in Selling Price for an additional Square Foot. Can be interpreted as Price per Sq.Foot.

Procedure for Building Regression Models

Define Objectives

Define/Clarify Purpose. Identify, graph and describe the measurement of the dependent variable. Specific y

Select Variables

Identify possible independent variables (predictors should make sense). Describe each X. Use scatter plots and correlations for y with each x. Estimate Regression Coefficients (using least squares method).

Estimate Model

Test Model

Test to see if all coefficients are significant (reliability). Establish validity (are relationships as expected, do predictions match actuals?).

Implement and Use

Implement the model in Decision Support System. Incorporate error in predictions. Outline limitations/constraints of the model.

Monitor Performance

Compare predictions with actual values. Modify/Refine/Expand model if necessary.

Selecting Independent Variables: Scatter Plots


Scatter Plots are used to visualize the relationship between any two variables. For regression analysis, we are looking for strong linear relationship between the independent and dependent variable.
Overhead Vs. Production Runs (11.2)

Sales Vs. Promotion (Ex. 11.1)


130
Sales

Overhead (y)

y = 25.12 + 0.7623x R2 = 0.4529

130000 110000 90000 70000 50000 0 y = 75606 + 655.07x R2 = 0.271 20 40 60 80 Production Runs (x)

110 90 70 50 60 80 100 120 Promotion

Overhead Vs. Machine Utilization


130000

Overhead

110000 90000 70000 1400 1600 1800 2000

Y-Intercept (Constant): Value of the dependent variable irrespective of the value(s) of the independent variable(s). X-Coefficient (Slope): Change in dependent variable per unit change in independent variable. R-Squared: Proportion of variance in dependent variable explained by independent variable(s).

50000 1000 1200 y = 48621 + 34.702x 2 R = 0.3993

Machine Hours

Overhead = 3996 + 43 M_Hrs + 883 Runs

Selecting Independent Variables: Correlation Analysis


Correlation Coefficients are used to measure the linear relationship between any two variables. For regression analysis, we are looking for strong linear relationship between the independent and dependent variable, and low correlations among independent variables .

MachHrs ProdRuns Overhead MachHrs 1 ProdRuns -0.22909 1 Overhead 0.631885 0.520544 1

Correlation of MachHrs with ProdRuns (should be low)

Correlation of MachHrs with Overhead (should be high)

Correlation of ProdRuns with Overhead (should be high)

Multicollinearity exists when two independent variables are highly correlated (redundancy).

Simple Linear Regression

Linear regression function One dependent and one independent variables Mathematical form : Y = b0+ b1X + e
b0 and b1 are parameters (unknown constants) and their values are estimated from a known sample of X and corresponding Y.

Estimated Model: Y-Pred = b0 + b1X


Y-actual
b0 and b1 are estimates (based on a sample) of B0 and B1 which are parameters (based on population) Estimation of b0 and b1 (coefficients) is done by the Least Squares Method. This method selects the line that has the smallest squared error

Y-pred

*
B1 = slope

B0 = y -intercept

Example of Simple Linear Regression: Defining Objective(s)


Define Objectives

Pharmex is a chain of drugstores that operates around the country. To see how effective their advertising and other promotional activities are, the company has collected data from 50 randomly selected metropolitan regions. In each region it has compared its own promotional expenditures and sales to those of the leading competitor in the region over the past year.

So, Pharmexs objective is to model the relationship between Promotion expenditures and Sales
Since Pharmex is interested in improving its sales, relative to its largest competitor, the dependent (outcome) variable for this situation is Sales: Pharmexs sales as a percentage of those of the leading competitor. This is the dependent (or predicted) variable.

Example of SLR: Select Independent Variable


Variable Selection

The company expects that there is a positive relationship between the Relative measures of Sales and Promotion Expenditures, so that regions with relatively more expenditures have relatively more sales.
Promote: Pharmexs promotional expenditures as a percentage of those of the leading competitor. This is the independent variable (or predictor variable), one which can be controlled by Pharmex.

Selection Criteria:
Based in Common Sense and Experience Scatter Plots and Correlations
Description of Variables:

Sales Vs. Promotion (Ex. 11.1)


130
Sales

y = 25.12 + 0.7623x R2 = 0.4529

110 90 70 50 60 80 100 120 Promotion

List each variable, how measured, and expected relationship with dependent variable. In this section report results of Correlation Analyses, Scatter Plots, etc.

Example of SLR: Collect and Organize Data

Data Collection

Pharmex ($)
Region
Sales(Sp) Prom (Pp)

Competitor ($) Indexes (regr. data)


Sales(Sc) Prom (Pc) Sales= Sp/Sc Promote = Pp/Pc

Collect all relevant Data and Organize it in a Dataset one which can be analyzed by a solver (like Excel)

Example of SLR: Estimate Coefficients


Estimate Model

Regression Procedure in Excel

R-Square: 45% of the variance in Sales is explained by Promote (model) Estimated Coefficients: Y-intercept (b0) = 25.12 Slope (b1) = 0.762 Sales-predicted = 25.12 + 0.762 Promote P-Value: Indicates the probability of making a Type I error (the possibility that the coefficient is = 0, that is there is no relationship). If this value is greater than .05 do not use the variable as a predictor.

Example of SLR: Testing the Model

Reliability and Validity: Does the model make intuitive sense? Is the model easy to understand and interpret? Are all coefficients statistically significant? (p-values less than .05) Are the signs associated with the coefficients as expected? Does the model predict values that are reasonably close to the actual values? Is the model sufficiently sound? (High R-square, low standard error, etc.)

Example of SLR: Implementing and Using the Model Develop a Spreadsheet Model (Decision Support System)
Competitor's Promotion Pharmex's Promotion 125

Estimated
150

Promotion Index (Promote) Predicted Sales Index (Sales)

120 116.602

Decision Variable

Forecast (regression formula)


What-if Pharmex spent 160K on promotions? (Sensitivity analysis)
What will Pharmex have to do to achieve 20% sales more than its competitor? (goal seeking) What will happen to Pharmexs sales if its Competitors promotion can be any value between 130K and 140K? (Monte-Carlo Simulation)

Estimating Demand for a Product

Regression for Decision Support

14

Conceptual Structure of Demand Model

Firm Demand = Total Industry Demand * Market Share


Define Market Share = Firm Demand / Total Industry Demand Market Share = Firm Demand / (Avg. Demand * Number of Firms) Market Share = Relative Demand / Number of Firms

Firm Demand = Total Industry Demand * (Relative Demand / N)


Macro-economic Influences Seasonal Patterns Stage of Life Cycle

Exogenous Demand

Industry Activity Pricing, Promotion, Quality

Endogenous

TID FD RD/N

Demand

Competitive Profile Relative Pricing, Promotion, Quality, and Loyalty

Relative Demand

Total Industry Demand

TID
Exogenous Demand
Macro-Economic Influences - Seasonality - Stage of Life Cycle

Endogenous Demand
Industry Behavior - Pricing (Avg Price) - Promotion (Avg. Advertising) - Product Quality (Avg. R&D)

Estimate Trend and Seasonality using Time Series Analysis

Estimate weights of these factors using Regression Analysis

Relative Demand (Measure of Market Share)

Relative Price (PREL)

Pricing

Relative Advertising

Promotion

RD
Quality

Loyalty

RD1 (t-1)

Relative R&D

RD is firm specific and a measure of market share, the predictor variables should also be relative to industry averages. For example, relative price of the firm is PREL = Firms Price / Industry Avg. Price

Estimating Demand

Situation Overview

Regression for Decision Support

18

Objectives of Analysis

Regression for Decision Support

19

Objectives

Obtaining reliable forecasts for the Firm Demand

Build a more dependable model based on regression analysis

Make operations more efficient based on more reliable forecasts Monitor patterns in the overall demand for the industry

Defining the Data

Regression for Decision Support

21

Descriptions of the Variables

FD Firm Demand

Demand for our Firms product


Includes demand for all competitors in the market

TID Total Industry Demand

MS Market Share

Firms percentage share of the market

N Number of Competitors AFD Average Demand


Average Demand for all competitors in the market Avg. Demand = Total Industry Demand / N The firms demand relative to the average demand of the market Relative Demand = Firm Demand / Avg. Demand

RD - Relative Demand

Descriptions of the Variables (continued)


Firm Demand = Total Industry Demand * Market Share
Market Share = Firm Demand / Total Industry Demand Market Share = Firm Demand / (Avg. Demand * N) Market Share = Relative Demand / N

Hence: Firm Demand = Total Industry Demand * (Relative Demand / N)

Predictor Variables for Total Industry Demand

Total Industry Demand is a factor of individual

company demand

Demand for each company depends on macro-economic influences and overall industry trends

Company 1 Demand

Average Price Indicates industry trends


in pricing, and is reasonable to use instead of individual competitor prices Average Advertising Indicates industry expenditures in promotions, marketing

Company 2 Demand

Total Industry Demand

Company n Demand

In this case, also factors in R&D

Predictor Variables for Relative Demand

Relative Demand is a measure of the

firms market share

Hence, the predictor variables should also be relative to Industry Average

Relative Pricing

PREL Relative Price

PREL = Firm Price / Average Price

Relative Advertising

Relative Demand

AREL Relative Advertising

AREL = Firm Advertising / Average Advertising

Relative Loyalty

RD1 Relative Loyalty

This is estimated for the analysis by using the Relative Demand from the previous quarter

Description of Variables

Regression for Decision Support

26

TID Summary Measures

Avg_Price Count Mean Median Standard deviation Minimum Maximum Range Variance First quartile Third quartile Interquartile range Skewness 5th percentile 95th percentile 19.000 377.784 378.900 6.103 365.000 387.500 22.500 37.249 374.850 382.450 7.600 -0.735 366.980 384.440

Avg_Adv 19.000 93325.789 93400.000 9828.045 76800.000 108570.000 31770.000 96590470.175 86645.000 100825.000 14180.000 -0.338 76980.000 105051.000

TID 19.000 20677.895 19580.000 5233.006 12020.000 32850.000 20830.000 27384350.877 16520.000 23995.000 7475.000 0.577 14540.000 28836.000

TID Describing Avg. Price with Graphs

Time series plot of Avg_Price 400 395 390 385

Avg_Price

380 375 370 365 360 355 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Avg_Price

364

368

372

376

380

384

388

Quarter

Values seem to be trending down over time

Relatively stable - In the 375 range for the 19 quarters observed

Median is to the right of the Mean in the Box-Plot

Indicates negative skewness

TID Describing Avg. Advertising with Graphs


Time series plot of Avg_Adv 114000 108000 102000

Avg_Adv

96000 90000 84000 78000 72000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Quarter

Avg_Adv

72000

78000

84000

90000

96000

102000

108000

114000

Some indications of seasonality from the time-series plot

Values change significantly with time

Median is almost the same as the mean

Slight negative skewness exists

TID Describing Total Industry Demand with Graphs


Time series plot of TID 40000 35000 30000

TID

25000 20000
TID

15000 10000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Quarter


12000 16000 20000 24000 28000 32000 36000

Time series plot indicates a steadily increasing demand for the product

Short term increases are steep

Mean is to the right of the Median

Indicates positive skewness

TID Correlations Matrix

Decent correlation between Quarter and TID Consider Quarter for regression analysis High correlation between Avg Price and TID Good candidate for regression analysis High correlation exists between Avg Advertising and TID Good candidate for regression analysis Significant correlation between Avg Price, Avg Advertising and Quarter Potential candidates for variable exclusion during regression analysis

RD Scatter Plot Arel vs RD


1.75 Correlation = 0.378 1.50

1.25

1.00

RD
0.75 0.50 0.25 0.00 0.4 0.6 0.8 1.0 Arel 1.2 1.4 1.6

Trendline indicates a positive relationship A correlation factor of .378 indicating less correlation between the two variables Arel is a potential candidate to be discarded

RD Scatter Plot Prel vs RD


1.75 Correlation = -0.670 1.50

1.25

1.00

RD
0.75 0.50 0.25 0.00 0.97

0.98

0.99

1.00 Prel

1.01

1.02

1.03

Trendline indicates a negative relationship Has a correlation factor of -0.67 indicating a fair amount of correlation between the two variables

RD Scatter Plot RD1 vs RD


1.75 Correlation = 0.711 1.50

1.25

1.00

RD
0.75 0.50 0.25 0.00 0.00

0.25

0.50

0.75 RD1

1.00

1.25

1.50

1.75

Trendline indicates a positive relationship Has a correlation factor of 0.711indicating a fair amount of correlation between the two variables

RD Correlations Matrix

RD has a reasonably high correlation to Prel and RD1 RD has a relatively lower correlation to Arel The variable could be discarded during regression analysis The correlations between Prel, Arel, and RD1 are low enough to indicate that there will not be problems related to multicollinearity

Analysis and Modeling

Regression for Decision Support

36

TID Model Regression Analysis (1)

Dependent Variable: Total Industry Demand Independent Variable: Quarter Resulting Equation: TID = 14218.0702 + (645.9825 * Quarter) The R2 for the resulting equation indicates that the variable Quarter explains 48% of TID, and hence the unexplained value is 52% This makes for a very poor model despite the p-value being very low which implies that the probability of a Type 1 error is minimal

TID Model Regression Analysis (2)

Dependent Variable: Total Industry Demand Independent Variables: Quarter, Avg. Price, Avg. Advertising Resulting Equation: TID = 130249 + (132.228 * Quarter) + (-358.613 * Avg Price) + (0.263 * Avg. Advertising) The R2 for the resulting equation indicates that the model explains 90.69% of TID, and hence the unexplained value is about 9%, this generally signifies a good model p-value is within tolerance levels for all variables, except Quarter which has a p-value of 0.21 and includes the 0 value in its range. Hence Quarter must be discarded

TID Model Regression Analysis (3)

Dependent Variable: Total Industry Demand Independent Variables: Avg. Price, Avg. Advertising Resulting Equation: TID = 164336.17 + (-445.168 * Avg Price) + (0.262 * Avg. Advertising) The R2 for the resulting equation indicates that the model explains 89.67% of TID, and hence the unexplained value is about 10.5% p-value is within tolerance levels for all variables, and none of the variables include the 0 value in its range This model is good, and is the model that will be used for TID

RD Model Regression Analysis (3)

Dependent Variable: Relative Demand Independent Variables: Relative Pricing (Prel), Relative Advertising (Arel), Loyalty (RD1) Resulting Equation: RD = 16.13 + (-16.445 * Prel) + (0.779 * Arel) + (0.533 * RD1) The R2 for the resulting equation indicates that the model explains 95.75% of RD, and hence the unexplained value is about 4.25% p-value is within tolerance levels for all variables, and none of the variables include the 0 value in its range This model is very good, and is the model that will be used for RD

Verifying the Model

Regression for Decision Support

41

Verification of TID Model


TID 12020 14820 18140 16610 18400 19580 22270 21740 16380 17390 25800 TID From Model 12007.57 16117.2828 20085.4692 13916.7612 20310.882 18629.3292 21969.4348 19989.666 15650.9924 19244.7508 26154.594 % Error 0.10341098 -8.7535951 -10.724748 16.2145623 -10.385228 4.85531563 1.34964167 8.05121435 4.45059585 -10.665617 -1.3743953

28390
16430 15960 23060 24410 23580 25050 32850

29315.8204
19095.1556 14403.594 22142.378 23690.334 24150.4564 25619.302 29097.85

-3.2610793
-16.221276 9.75191729 3.97928014 2.94824252 -2.4192383 -2.2726627 11.42207

Verification of RD Model
RD 1.083375 1.162486 0.559733 0.716148 1.073539 1.313764 1.206246 1.264029 0.926046 0.953696 0.413328 RD From Model 1.067091847 1.165754702 0.536542721 0.721174808 1.089967206 1.315070546 1.111114124 1.210123358 0.93732012 1.07437361 0.437045784 % Error 1.503002 -0.28118 4.143097 -0.70192 -1.53028 -0.09945 7.886607 4.264589 -1.21745 -12.6537 -5.73825

1.2594
0.976126 0.90107 0.658285 1.174001 0.981371 0.666402 1.449908

1.355604764
0.981062762 0.926000479 0.682931661 1.137304179 1.023306272 0.72000557 1.40887709

-7.63894
-0.50575 -2.76676 -3.74407 3.125791 -4.27313 -8.04373 2.829897

Using the Model

Regression for Decision Support

44

DSS Forecasting Firm Demand


A Model For Forecasting Demand Inputs Quarter# "t" Estimates for Industry (for Qtr "t") Estimated Average Price Estimated Average Advertising Number of Firms in Industry Your Decisions (for Qtr. "t") Price Advertising Historical Data for Qtr = t-1 Your Demand Total Industry Demand Outputs 23 Total Industry Demand 376 100000 10 Relative Demand Market Share Your Firm's Estimated Demand 378 90000 Average Demand 23225.68 0.833162 0.083316 1935.075 2322.568 Variable Intercept Prel Arel RD1

TID Model's Coefficie Variable Coefficients Constant 164336.17 Avg_Price -445.16 Avg_Adv 0.26

RD Model's Coefficie

Coefficients 16.13 -16.44 0.77 0.53

2500 25000

Calculations Relative Price (Qtr = t) Relative Advertising (Qtr = t) Average Demand (Qtr = t-1) Your Firm's Relative Demand (Qtr = t-1)

1.005319 0.9 2500 1

The coefficients from the TID model and the RD model are entered into this model Based on estimated values for the various items on the left, the Firm Demand is calculated by this model

Anda mungkin juga menyukai