Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional material on Connect can only be demonstrated using other programs, such as MINITAB, SPSS, and SAS. Please consult the user guides for these programs for instructions on their use.
The least squares point estimates on the output give the prediction equation
5 21,113.7879 1 3.6121x1 1 0.0421x2 1 0.1289x3 1 256.9555x4 1 324.5334x5 . y
Using this prediction equation, we can calculate the predicted sales values and residuals given on the MegaStat output of Figure 12.50. For example, observation 10 on this output corresponds to a sales representative for whom x1 5 105.69, x2 5 42,053.24, x3 5 5,673.11, x4 5 8.85, and x5 5 0.31. If we insert these values into the prediction equation, we obtain a predicted sales value 10 5 4,143.597. Since the actual sales for the sales representative are y10 5 4,876.370, the of y 10 5 4,143.597, which is residual e10 equals the difference between y10 5 4,876.370 and y 732.773. The normal plot of the residuals in Figure 12.51(a) has a straight-line appearance. The plot of the residuals versus predicted sales in Figure 12.51(b) has a horizontal band appearance, as do the plots of the residuals versus the independent variables (the plot versus x3, advertising, is shown in Figure 12.51(c)). We conclude that the regression assumptions approximately hold for the sales territory performance model (note that because the data are cross-sectional, a residual plot versus time is not appropriate).
FIGURE
FIGURE
Observation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Predicted
3,504.990 3,901.180 2,774.866 4,911.872 5,415.196 2,026.090 5,126.127 3,106.925 6,055.297 4,143.597 2,503.165 1,827.065 2,478.083 2,351.344 4,797.688 2,904.099 3,362.660 2,907.376 3,625.026 4,056.443 1,409.835 2,494.101 1,617.561 4,574.903 2,488.700
Residual
164.890 2427.230 2479.766 2236.312 710.764 108.850 294.467 260.525 464.153 732.773 234.895 706.245 269.973 213.964 2210.738 2174.859 273.260 2106.596 2360.826 2602.823 331.615 2458.351 239.561 2407.463 311.270
1.0
2.0
3.0
To conclude this section, we consider the DurbinWatson test for first-order autocorrelation. This test is carried out for a multiple regression model exactly as it is for a simple linear regression model (see Section 11.10), except that we consider k, the number of independent variables used by the model, when looking up the critical values dL,a and dU,a. For example, Figure 12.52 gives n 5 16 weekly values of Folio Bookstore sales (y), Folios advertising expenditure (x1), and competitors advertising expenditure (x2). The DurbinWatson statistic for the model
y 5 b0 1 b1x1 1 b2x2 1 e
k52 dL,0.05 dU,0.05
0.95 0.98 1.02 1.05 1.54 1.54 1.54 1.53
n
15 16 17 18
is d 5 1.63. If we set a equal to 0.05, then we use Table A.12a portion of which is shown in the page margin. Because n 5 16 and k 5 2, the appropriate critical values for a test for first-order positive autocorrelation are dL,0.05 5 0.98 and dU,0.05 5 1.54. Because d 5 1.63 is greater than dU,0.05 5 1.54, we conclude that there is no first-order positive autocorrelation. The DurbinWatson test carried out in Figure 12.52 indicates that this autocorrelation does exist for the model relating y to x1. Therefore, adding x2 to this model seems to have removed the autocorrelation.
FIGURE
12.52 Folio Bookstore Sales and Advertising Data, and Residual Analysis
(a) The data and the MegaStat output of the residuals from a simple linear regression relating Folios sales to Folios advertising expenditure
Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Adver 18 20 20 25 28 29 29 28 30 31 34 35 36 38 41 45 Compadv 10 10 15 15 15 20 20 25 35 35 35 30 30 25 20 20 Sales 22 27 23 31 45 47 45 42 37 39 45 52 57 62 73 84 Predicted Residual 18.7 3.3 23.0 4.0 23.0 20.0 33.9 22.9 40.4 4.6 42.6 4.4 42.6 2.4 40.4 1.6 44.7 27.7 46.9 27.9 53.4 28.4 55.6 23.6 57.8 20.8 62.1 20.1 68.6 4.4 77.3 6.7 DurbinWatson 5 0.65
5.0
0.0
5.0
10.1 0 5 10 Observation 15 20
assumptions for a multiple regression model. 12.64 Discuss how to carry out the DurbinWatson test for a multiple regression model.
METHODS AND APPLICATIONS 12.65 THE HOSPITAL LABOUR NEEDS CASE
(x1) (not shown), indicate that 3 hospitals are substantially larger than the other 13 hospitals. We will discuss the potential influence of these three large hospitals in Section 12.14.
12.66 THE FRESH DETERGENT CASE
Consider the hospital labour needs data in Table 12.5 (page 424). Figure 12.53 gives residual plots that are obtained when we perform a regression analysis of these data by using the model
hours, BedDays (x2), and Length (x3). Note: The first two of these plots, as well as the plot versus Xray
Recall that Table 12.4 (page 424) gives values for n 5 30 sales periods of demand for Fresh liquid laundry detergent (y), price difference (x4), and advertising expenditure (x3). a. Figure 12.54(a) gives the residual plot versus x3 that is obtained when the regression model relating y to x4 and x3 is used to analyze the Fresh detergent data. Discuss why the residual plot indicates that we should add x2 3 to the model. b. Figure 12.54(b) gives the residual plot versus time and the DurbinWatson statistic that are obtained when the regression model relating y to x4, x3, and x2 3 is used to analyze the Fresh detergent data. Test for positive autocorrelation by setting a equal to 0.05.
FIGURE
12.53 MegaStat and Excel Residual Analysis for the Hospital Labour Needs Model (for Exercise 12.65)
Normal Score
Predicted
Residuals
Residuals
500 0 0.00 500 1,000 2.00 4.00 6.00 8.00 10.00 12.00
BedDays
Length
FIGURE
12.54 MegaStat Output for the Fresh Detergent Data (Exercise 12.66)
0.715 0.477 0.238 0.000 0.238 0.477 0.715 4.00 5.00 6.00 X3 7.00 8.00
0.664 Residual (gridlines = std. error) 0.443 0.221 0.000 -0.221 -0.443 -0.664 0 5 10 15 20 25 30 35 Observation
Consider the quadratic regression model describing the QHIC data. Figure 12.55 shows that the residual plot versus x for this model fans out, indicating that the error term tends to become larger as x increases. To remedy this violation of the constant-variance assumption, we divide all terms in the quadratic model by x. This gives the transformed model
y 1 e 5 b0 a b 1 b1 1 b2x 1 . x x x
Figure 12.56(a) and (b) gives a regression output and a residual plot versus x for this model.
constant-variance assumption holds for the transformed model? b. Consider a home worth $220,000. Let m0 represent the mean yearly upkeep expenditure for all homes worth $220,000 and y0 represent the yearly upkeep expenditure for an individual home worth $220,000. The bottom of the output in Figure 12.56(a) says that y 220 5 5.635 is a point estimate of m0y220 and a y point prediction of y0y220. Multiply this result by . Multiply the ends of the confidence 220 to obtain y interval and prediction interval shown on the output by 220. This will give a 95 percent confidence interval for m0 and a 95 percent prediction interval for y0.
FIGURE
-146.897 -293.795 -440.692 0 50 100 150 200 Value X 250 300 350
FIGURE
12.56 MegaStat Output of the Transformed QHIC Model for Exercise 12.67
Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations 0.7134 0.508939 0.482395 0.793459 40
ANOVA df
Regression Residual Total 2 37 39
SS
24.14244 23.29437 47.43681
MS
12.07122 0.629577
F
19.17353
Signicance F
1.93E-06
t Stat
Residuals
2 0 -2
0.005
0.01 1/X
0.015
0.02
0.025
100
200
300
400
Value X