Regression Analysis: Estimating Models for Decision Support Regression Analysis is a study of relationship between a set of independent variables and the dependent variable.
Independent variables are characteristics that can be measured directly (example the area of a house). These variables are also caled predictor variables (used to predict the dependent variable) or explanatory variables (used to explain the behavior of the dependent variable).
Dependent variable is a characteristic whose value depends on the values of independent variables.
Constant term Coefficients
Explain Selling Price of a house (dependent) based on its characteristics (independents). If the model is valid, use it for prediction. Develop Regression Model using known data (sample) Selling Price = 40,000 + 100(Sq.ft) + 20,000(#Baths)
If the above model is reliable and valid, Use this model to predict the Selling Price of any house based on its area (Sq.ft.) and the number of bathrooms (#Baths) The constant term (40,000) is the fixed price of the house. This is not dependent on the values of the variables considered. Can be interpreted as the price of the lot and transaction costs. The coefficient of Sq.ft. (100) is the change in Selling Price for an additional Square Foot. Can be interpreted as Price per Sq.Foot.
Define Objectives
Define/Clarify Purpose. Identify, graph and describe the measurement of the dependent variable. Specific y
Select Variables
Identify possible independent variables (predictors should make sense). Describe each X. Use scatter plots and correlations for y with each x. Estimate Regression Coefficients (using least squares method).
Estimate Model
Test Model
Test to see if all coefficients are significant (reliability). Establish validity (are relationships as expected, do predictions match actuals?).
Implement the model in Decision Support System. Incorporate error in predictions. Outline limitations/constraints of the model.
Monitor Performance
Overhead (y)
130000 110000 90000 70000 50000 0 y = 75606 + 655.07x R2 = 0.271 20 40 60 80 Production Runs (x)
Overhead
Y-Intercept (Constant): Value of the dependent variable irrespective of the value(s) of the independent variable(s). X-Coefficient (Slope): Change in dependent variable per unit change in independent variable. R-Squared: Proportion of variance in dependent variable explained by independent variable(s).
Machine Hours
Multicollinearity exists when two independent variables are highly correlated (redundancy).
Linear regression function One dependent and one independent variables Mathematical form : Y = b0+ b1X + e
b0 and b1 are parameters (unknown constants) and their values are estimated from a known sample of X and corresponding Y.
Y-pred
*
B1 = slope
B0 = y -intercept
Pharmex is a chain of drugstores that operates around the country. To see how effective their advertising and other promotional activities are, the company has collected data from 50 randomly selected metropolitan regions. In each region it has compared its own promotional expenditures and sales to those of the leading competitor in the region over the past year.
So, Pharmexs objective is to model the relationship between Promotion expenditures and Sales
Since Pharmex is interested in improving its sales, relative to its largest competitor, the dependent (outcome) variable for this situation is Sales: Pharmexs sales as a percentage of those of the leading competitor. This is the dependent (or predicted) variable.
The company expects that there is a positive relationship between the Relative measures of Sales and Promotion Expenditures, so that regions with relatively more expenditures have relatively more sales.
Promote: Pharmexs promotional expenditures as a percentage of those of the leading competitor. This is the independent variable (or predictor variable), one which can be controlled by Pharmex.
Selection Criteria:
Based in Common Sense and Experience Scatter Plots and Correlations
Description of Variables:
List each variable, how measured, and expected relationship with dependent variable. In this section report results of Correlation Analyses, Scatter Plots, etc.
Data Collection
Pharmex ($)
Region
Sales(Sp) Prom (Pp)
Collect all relevant Data and Organize it in a Dataset one which can be analyzed by a solver (like Excel)
R-Square: 45% of the variance in Sales is explained by Promote (model) Estimated Coefficients: Y-intercept (b0) = 25.12 Slope (b1) = 0.762 Sales-predicted = 25.12 + 0.762 Promote P-Value: Indicates the probability of making a Type I error (the possibility that the coefficient is = 0, that is there is no relationship). If this value is greater than .05 do not use the variable as a predictor.
Reliability and Validity: Does the model make intuitive sense? Is the model easy to understand and interpret? Are all coefficients statistically significant? (p-values less than .05) Are the signs associated with the coefficients as expected? Does the model predict values that are reasonably close to the actual values? Is the model sufficiently sound? (High R-square, low standard error, etc.)
Example of SLR: Implementing and Using the Model Develop a Spreadsheet Model (Decision Support System)
Competitor's Promotion Pharmex's Promotion 125
Estimated
150
120 116.602
Decision Variable
14
Exogenous Demand
Endogenous
TID FD RD/N
Demand
Relative Demand
TID
Exogenous Demand
Macro-Economic Influences - Seasonality - Stage of Life Cycle
Endogenous Demand
Industry Behavior - Pricing (Avg Price) - Promotion (Avg. Advertising) - Product Quality (Avg. R&D)
Pricing
Relative Advertising
Promotion
RD
Quality
Loyalty
RD1 (t-1)
Relative R&D
RD is firm specific and a measure of market share, the predictor variables should also be relative to industry averages. For example, relative price of the firm is PREL = Firms Price / Industry Avg. Price
Estimating Demand
Situation Overview
18
Objectives of Analysis
19
Objectives
Make operations more efficient based on more reliable forecasts Monitor patterns in the overall demand for the industry
21
FD Firm Demand
MS Market Share
Average Demand for all competitors in the market Avg. Demand = Total Industry Demand / N The firms demand relative to the average demand of the market Relative Demand = Firm Demand / Avg. Demand
RD - Relative Demand
company demand
Demand for each company depends on macro-economic influences and overall industry trends
Company 1 Demand
Company 2 Demand
Company n Demand
Relative Pricing
Relative Advertising
Relative Demand
Relative Loyalty
This is estimated for the analysis by using the Relative Demand from the previous quarter
Description of Variables
26
Avg_Price Count Mean Median Standard deviation Minimum Maximum Range Variance First quartile Third quartile Interquartile range Skewness 5th percentile 95th percentile 19.000 377.784 378.900 6.103 365.000 387.500 22.500 37.249 374.850 382.450 7.600 -0.735 366.980 384.440
Avg_Adv 19.000 93325.789 93400.000 9828.045 76800.000 108570.000 31770.000 96590470.175 86645.000 100825.000 14180.000 -0.338 76980.000 105051.000
TID 19.000 20677.895 19580.000 5233.006 12020.000 32850.000 20830.000 27384350.877 16520.000 23995.000 7475.000 0.577 14540.000 28836.000
Avg_Price
Avg_Price
364
368
372
376
380
384
388
Quarter
Avg_Adv
Avg_Adv
72000
78000
84000
90000
96000
102000
108000
114000
TID
25000 20000
TID
Time series plot indicates a steadily increasing demand for the product
Decent correlation between Quarter and TID Consider Quarter for regression analysis High correlation between Avg Price and TID Good candidate for regression analysis High correlation exists between Avg Advertising and TID Good candidate for regression analysis Significant correlation between Avg Price, Avg Advertising and Quarter Potential candidates for variable exclusion during regression analysis
1.25
1.00
RD
0.75 0.50 0.25 0.00 0.4 0.6 0.8 1.0 Arel 1.2 1.4 1.6
Trendline indicates a positive relationship A correlation factor of .378 indicating less correlation between the two variables Arel is a potential candidate to be discarded
1.25
1.00
RD
0.75 0.50 0.25 0.00 0.97
0.98
0.99
1.00 Prel
1.01
1.02
1.03
Trendline indicates a negative relationship Has a correlation factor of -0.67 indicating a fair amount of correlation between the two variables
1.25
1.00
RD
0.75 0.50 0.25 0.00 0.00
0.25
0.50
0.75 RD1
1.00
1.25
1.50
1.75
Trendline indicates a positive relationship Has a correlation factor of 0.711indicating a fair amount of correlation between the two variables
RD Correlations Matrix
RD has a reasonably high correlation to Prel and RD1 RD has a relatively lower correlation to Arel The variable could be discarded during regression analysis The correlations between Prel, Arel, and RD1 are low enough to indicate that there will not be problems related to multicollinearity
36
Dependent Variable: Total Industry Demand Independent Variable: Quarter Resulting Equation: TID = 14218.0702 + (645.9825 * Quarter) The R2 for the resulting equation indicates that the variable Quarter explains 48% of TID, and hence the unexplained value is 52% This makes for a very poor model despite the p-value being very low which implies that the probability of a Type 1 error is minimal
Dependent Variable: Total Industry Demand Independent Variables: Quarter, Avg. Price, Avg. Advertising Resulting Equation: TID = 130249 + (132.228 * Quarter) + (-358.613 * Avg Price) + (0.263 * Avg. Advertising) The R2 for the resulting equation indicates that the model explains 90.69% of TID, and hence the unexplained value is about 9%, this generally signifies a good model p-value is within tolerance levels for all variables, except Quarter which has a p-value of 0.21 and includes the 0 value in its range. Hence Quarter must be discarded
Dependent Variable: Total Industry Demand Independent Variables: Avg. Price, Avg. Advertising Resulting Equation: TID = 164336.17 + (-445.168 * Avg Price) + (0.262 * Avg. Advertising) The R2 for the resulting equation indicates that the model explains 89.67% of TID, and hence the unexplained value is about 10.5% p-value is within tolerance levels for all variables, and none of the variables include the 0 value in its range This model is good, and is the model that will be used for TID
Dependent Variable: Relative Demand Independent Variables: Relative Pricing (Prel), Relative Advertising (Arel), Loyalty (RD1) Resulting Equation: RD = 16.13 + (-16.445 * Prel) + (0.779 * Arel) + (0.533 * RD1) The R2 for the resulting equation indicates that the model explains 95.75% of RD, and hence the unexplained value is about 4.25% p-value is within tolerance levels for all variables, and none of the variables include the 0 value in its range This model is very good, and is the model that will be used for RD
41
28390
16430 15960 23060 24410 23580 25050 32850
29315.8204
19095.1556 14403.594 22142.378 23690.334 24150.4564 25619.302 29097.85
-3.2610793
-16.221276 9.75191729 3.97928014 2.94824252 -2.4192383 -2.2726627 11.42207
Verification of RD Model
RD 1.083375 1.162486 0.559733 0.716148 1.073539 1.313764 1.206246 1.264029 0.926046 0.953696 0.413328 RD From Model 1.067091847 1.165754702 0.536542721 0.721174808 1.089967206 1.315070546 1.111114124 1.210123358 0.93732012 1.07437361 0.437045784 % Error 1.503002 -0.28118 4.143097 -0.70192 -1.53028 -0.09945 7.886607 4.264589 -1.21745 -12.6537 -5.73825
1.2594
0.976126 0.90107 0.658285 1.174001 0.981371 0.666402 1.449908
1.355604764
0.981062762 0.926000479 0.682931661 1.137304179 1.023306272 0.72000557 1.40887709
-7.63894
-0.50575 -2.76676 -3.74407 3.125791 -4.27313 -8.04373 2.829897
44
TID Model's Coefficie Variable Coefficients Constant 164336.17 Avg_Price -445.16 Avg_Adv 0.26
RD Model's Coefficie
2500 25000
Calculations Relative Price (Qtr = t) Relative Advertising (Qtr = t) Average Demand (Qtr = t-1) Your Firm's Relative Demand (Qtr = t-1)
The coefficients from the TID model and the RD model are entered into this model Based on estimated values for the various items on the left, the Firm Demand is calculated by this model