Anda di halaman 1dari 64

Instructor's Solutions Manual Chapter 14

Chapter 14 Solutions Develop Your Skills 14.1 1. Scatter diagrams are shown below.

SalaryandAge
100 90 80 70 60 50 40 30 20 10 0 20 30 40 Age 50 60 70

Salary($000)

SalaryandYearsofPostsecondary Education
100 90 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 YearsofPostsecondaryEducation

Salary($000)

SalaryandYearsofExperience
100 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 YearsofExperience

Salary($000)

Copyright 2011 Pearson Canada Inc.

389

Instructor's Solutions Manual Chapter 14

All three scatter diagrams show the expected positive relationships. Salary appears to be linearly positively related to age, years of postsecondary education, and years of experience. We note that the variability in salary increases for older ages, and greater years of experience. Salary seems more strongly related to age for ages under about 40. Salary also appears to be more strongly related to years of experience under about 15. The variability of salary when plotted against years of postsecondary education is more variable than for the other explanatory variables, but also more constant. At this point, years of experience appears to be the strongest candidate as an explanatory variable. 2. An excerpt of the Regression output is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR RSquare AdjustedRSquare StandardError Observations ANOVA df Regression Residual Total 3 36 39 Coefficients 27.70373012 0.3191034 2.846348768 1.568477845 0.928389811 0.861907642 0.850399945 6.844780506 40

Intercept Age YearsofPostsecondaryEducation YearsofExperience

The model is as follows: Salary ($000) = 27.7 -0.3 (Age) + 2.8 (Years of Postsecondary Education) + 1.6 (Years of Experience) In other words, salary is $27,704 - $319 for each year of age + $2,846 for each year of postsecondary education + $1,568 for each year of experience. The coefficient for age does not seem appropriate, and points to problems with this model.

Copyright 2011 Pearson Canada Inc.

390

Instructor's Solutions Manual Chapter 14

3. The scatter diagram for age and years of experience is shown below. Note that the age axis starts at 20, since there are no workers under 20 years of age.

AgeandYearsofExperience
40 35 30 25 20 15 10 5 0 20 30 40 Age 50 60 70

The two variables are very closely related, as we would expect. It is not possible to acquire years of experience without also acquiring years of age. It does not make sense to include both explanatory variables in the model.

YearsofExperience

Copyright 2011 Pearson Canada Inc.

391

Instructor's Solutions Manual Chapter 14

4.

An excerpt of the Regression output for the salaries data set with years of postsecondary education and age as explanatory variables is shown below.
SUMMARYOUTPUT RegressionStatistics MultipleR RSquare AdjustedRSquare StandardError Observations ANOVA df Regression Residual Total 2 37 39 Coefficients 2.125320671 1.121515567 2.155278577 0.902574999 0.814641629 0.804622258 7.822241019 40

Intercept Age YearsofPostsecondary Education

The model is as follows: Salary ($000) = -2.1 + 1.1 (Age) + 2.2 (Years of Postsecondary Education) In other words, Salary = -2,125 + $1,122 for each year of age + $2,155 for each year of postsecondary education

Copyright 2011 Pearson Canada Inc.

392

Instructor's Solutions Manual Chapter 14

5.

An excerpt of the Regression output for the salaries data set with years of postsecondary education and years of experience is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.92720353 RSquare 0.859706386 AdjustedRSquare 0.852122948 StandardError 6.805249337 Observations 40 ANOVA df Regression Residual Total 2 37 39 Coefficients 20.87563476

Intercept Yearsof Postsecondary Education YearsofExperience

2.673930095 1.238703002

The model is as follows: Salary ($000) = 20.9 +2.7 (Years of Postsecondary Education) + 1.2 (Years of Experience) In other words, Salary = $20,876 + $2,674 for every year of postsecondary education + $1,239 for every year of experience.

Copyright 2011 Pearson Canada Inc.

393

Instructor's Solutions Manual Chapter 14

Develop Your Skills 14.2 6. The residual plots are shown below.

AgeResidualPlot
20 15 10

Residuals

5 0 5 0 10 15 Age 10 20 30 40 50 60 70

YearsofPostsecondary EducationResidualPlot
20 15 10 5 0 5 0 10 15 YearsofPostsecondaryEducation 2 4 6 8 10

Residuals Residuals

YearsofExperienceResidual Plot
20 15 10 5 0 5 10 15 0 10 20 YearsofExperience 30 40

Copyright 2011 Pearson Canada Inc.

394

Instructor's Solutions Manual Chapter 14

All of these residual plots appear to have the horizontal band appearance that is desired, and so these plots appear to be consistent with the required conditions. A plot of the residuals vs. predicted salaries for this model is shown below.

Residualsvs.PredictedSalary
(Age,Years of Postsecondary Education, Years of Experience)
20 15 10 5 0 5 10 15 0 20 40 60 80 100 PredictedSalary(000)

This residual plot also appears to have the desired horizontal band appearance, centred around zero. There is somewhat less variability in the residuals for predicted salaries under about $40,000, but it is not pronounced. 7. The residual plots are shown below.

Residual

AgeResidualPlot
20 15 10

Residuals

5 0 5 0 10 15 20 Age 10 20 30 40 50 60 70

Copyright 2011 Pearson Canada Inc.

395

Instructor's Solutions Manual Chapter 14

YearsofPostsecondary EducationResidualPlot
20 10

Residuals

0 0 10 20 YearsofPostsecondaryEducation 2 4 6 8 10

In the age residual plot, we see a a couple of points that are above and below the desired horizontal band. Also, the residuals appear to be centred somewhat above zero. The postsecondary education residual plot looks more like the desired horizontal band, although once again, the residuals appear to be centred above zero. A plot of the residuals vs. predicted salary is shown below.

Residualsvs.PredictedSalary
(Age,Years of Postsecondary Education)
20 15 10 5 0 5 10 15 20 0 20 40 60 80 100 PredictedSalary(000)

The plot appears to have the desired horizontal band appearance, with the residuals centred around zero. The two circled points correspond to observations with standardized residuals +2 or -2 (observations 29 and 40).

Residual

Copyright 2011 Pearson Canada Inc.

396

Instructor's Solutions Manual Chapter 14

8.

The residual plots are shown below.

YearsofPostsecondary EducationResidualPlot
15 10
Residuals

5 0 5 0 10 15 YearsofPostsecondaryEducation 2 4 6 8 10

YearsofExperienceResidual Plot
15 10 5
Residuals

0 5 10 15 YearsofExperience 0 10 20 30 40

Both appear to have the desired horizontal band appearance, centred on zero.

Copyright 2011 Pearson Canada Inc.

397

Instructor's Solutions Manual Chapter 14

A plot of the residuals vs. predicted salaries is shown below.

Residualsvs.PredictedSalary
(Yearsof Postsecondary Education, Years of Experience)
15 10

Residuals

5 0 5 10 15 0 20 40 60 80 100 PredictedSalary

There appears to be somewhat less variability for lower predicted salaries. However, overall, the plot shows the desired horizontal band, centred around zero. 9. A histogram of the residuals for the model discussed in Exercise 6 is shown below.

Residualsfor Salary Model


14 12

(Age,YearsofPostsecondaryEducation,Yearsof Experience)

Frequency

10 8 6 4 2 0

Residual

Copyright 2011 Pearson Canada Inc.

398

Instructor's Solutions Manual Chapter 14

The histogram is somewhat skewed to the right. It appears to be centred close to zero. A histogram of the residuals for the model discussed in Exercise 7 is shown below.

Residualsfor Salary Model


12 10

(Age,YearsofPostsecondaryEducation)

Frequency

8 6 4 2 0 Residual

As for the previous model, we see the histogram is somewhat skewed to the right, and centred approximately on zero. A histogram of the residuals for the model discussed in Exercise 8 is shown below.

Residualsfor Salary Model


14 12
Frequency

(YearsofPostsecondaryEducation,Yearsof Experience)

10 8 6 4 2 0 Residual

As with the others, this histogram appears skewed to the right, but the skewness appears more pronounced here.
Copyright 2011 Pearson Canada Inc.

399

Instructor's Solutions Manual Chapter 14

10. For the model containing all explanatory variables: There is one set of observations that produces a standardized residual just slightly above 2. This is data point 26, where the observed salary is $67,400, age is 47, years of postsecondary education are 5, and years of experience are 17. The actual salary is above the predicted salary of $53,600. If we had access to the original records, we would double-check this data point. For the model containing years of postsecondary education and age as explanatory variables: There are two observations with standardized residuals +2 or -2 (observations 29 and 40). As mentioned in the answer to Exercise 7, these two points are obvious in the plot of residuals vs. predicted salary for this model. f we had access to the original records, we would double-check these data points. For the model containing years of postsecondary education and years of experience as explanatory variables: There are no observations with standardized residuals +2 or -2.

Copyright 2011 Pearson Canada Inc.

400

Instructor's Solutions Manual Chapter 14

Develop Your Skills 14.3 11. Adjusted

12. Test for the significance of the overall model (all explanatory variables): H0: 1 = 2 = 3 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 74.9, and the p-value is approximately zero. There is strong evidence that the overall model is significant. Tests for the significance of the individual explanatory variables: Age: H0: 1 = 0 H1: 1 0

(from Excel output). The p-value is 0.5, so we fail to reject H0. There is not enough evidence to conclude that age is a significant explanatory variable for salaries, when years of postsecondary education and years of experience are included in the model. This is not surprising, given how closely related age and years of experience appear to be.

Copyright 2011 Pearson Canada Inc.

401

Instructor's Solutions Manual Chapter 14

Years of postsecondary education: H0: 2 = 0 H1: 2 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that years of postsecondary education is a significant explanatory variable for salaries, when age and years of experience are included in the model. Years of experience: H0: 3 = 0 H1: 3 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that years of experience is a significant explanatory variable for salaries, when age and years of postsecondary education are included in the model. 13. Test for the significance of the overall model (age and years of postsecondary education ): H0: 1 = 2 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 81.3, and the p-value is approximately zero. There is strong evidence that the overall model is significant.

Copyright 2011 Pearson Canada Inc.

402

Instructor's Solutions Manual Chapter 14

Tests for the significance of the individual explanatory variables: Age: H0: 1 = 0 H1: 1 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that age is a significant explanatory variable for salaries, when years of postsecondary education are included in the model. Years of postsecondary education: H0: 2 = 0 H1: 2 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that years of postsecondary education is a significant explanatory variable for salaries, when age is included in the model.

Copyright 2011 Pearson Canada Inc.

403

Instructor's Solutions Manual Chapter 14

14. Test for the significance of the overall model (years of experience and years of postsecondary education ): H0: 1 = 2 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 113.4, and the p-value is approximately zero. There is strong evidence that the overall model is significant. Tests for the significance of the individual explanatory variables: Years of postsecondary education: H0: 1 = 0 H1: 1 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that years of postsecondary education is a significant explanatory variable for salaries, when years of experience are included in the model. Years of experience: H0: 2 = 0 H1: 2 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that years of experience is a significant explanatory variable for salaries, when years of postsecondary education are included in the model.

Copyright 2011 Pearson Canada Inc.

404

Instructor's Solutions Manual Chapter 14

15. The adjusted R2 values are shown below: Model Adjusted R2 All Explanatory Variables 0.85 Years of Postsecondary Education and Age 0.80 Years of Postsecondary Education and Years of Experience 0.85 At this point, the model that contains years of postsecondary education and age does not seem worth considering. The adjusted R2 value is lower than for the other models. As we have already seen, age and years of experience are highly correlated, and it appears that the model containing years of experience does a better job. Develop Your Skills 14.4 16. The Excel output from the Multiple Regression Tools add-in is shown below (in two parts, to fit better on the page):

ConfidenceIntervalandPredictionIntervalsCalculations Point 95 =ConfidenceLevel(%) Number MortgageRates HousingStarts AdvertisingExpenditure 1 6 3500 3500

PredictionInterval ConfidenceInterval Lowerlimit Upperlimit Lowerlimit Upperlimit 106933.6829 137638.2386 114521.388 130050.5335

With 95% confidence, the interval ($114,521.39, $130,050.53) contains average Woodbon sales when mortgage rates are 6%, housing starts are 3,500 and advertising expenditure is $3,500. 17. It would not be appropriate to use the Woodbon model to make a prediction for mortgage rates of 6%, housing starts of 2,500, and advertising expenditure of $4,000, because the highest advertising expenditure in the sample data is only $3,500. We should not rely on a model for predictions based on explanatory variable values that are outside the range of the sample data on which the model is based.

Copyright 2011 Pearson Canada Inc.

405

Instructor's Solutions Manual Chapter 14

18. The Excel output from the Multiple Regression Tools add-in is shown below (in two parts, to fit better on the page):
ConfidenceIntervalandPredictionIntervalsCalculations Point 95 =ConfidenceLevel(%) Number Age Yearsof PostsecondaryEducation 1 35 5

PredictionInterval Lowerlimit Upperlimit 31.68852578 64.1197083

With 95% confidence, the interval ($31,689, $64,120) contains the salary of an individual who is 35 years old, and who has 5 years of postsecondary education. 19. The Excel output from the Multiple Regression Tools add-in is shown below (in two parts, to fit better on the page):
ConfidenceIntervalandPredictionIntervalsCalculations Point 95 =ConfidenceLevel(%) Number Age Yearsof PostsecondaryEducation 1 35 5

ConfidenceInterval Lowerlimit Upperlimit 44.47730973 51.330924

With 95% confidence, the interval ($44,477, $51,331) contains the average salary of all individuals who are 35 years old, and who have 5 years of postsecondary education. The confidence interval is narrower than the prediction interval from Exercise 18, because the variability in the average salary is less than the variability for an individual salary.

Copyright 2011 Pearson Canada Inc.

406

Instructor's Solutions Manual Chapter 14

20. The Excel output from the Multiple Regression Tools add-in is shown below (in two parts, to fit better on the page):

ConfidenceIntervalandPredictionIntervalsCalculations Point 95 =ConfidenceLevel(%) Number Yearsof PostsecondaryEducation Yearsof Experience 1 5 10

PredictionInterval Lowerlimit Upperlimit 32.50852474 60.7561058

With 95% confidence, the interval ($32,509, $60,756) contains the salary of an individual who has 5 years of postsecondary education, and 10 years of experience. 21. The text contains scatter diagrams of Woodbon Annual Sales plotted against mortgage rates and advertising expenditure (see Exhibit 14.2). Each relationship appears linear, with no pronounced curvature. A plot of the residuals versus the predicted y-values for this model is shown below.

WoodbonModel,ResidualsVersus PredictedSales(MortgageRatesand
AdvertisingExpenditure as Explanatory Variables)
20000 15000 10000 Residual 5000 0 5000 10000 15000 20000 0 20000 40000 60000 80000 100000 120000 140000 PredictedSalesValues

Copyright 2011 Pearson Canada Inc.

407

Instructor's Solutions Manual Chapter 14

The plot shows the desired horizontal band appearance, although there appears to be reduced variability for higher predicted values. The other residual plots are shown below.

The mortgage rates residual plot shows the desired horizontal band appearance.

AdvertisingExpenditure ResidualPlot
20000 15000 10000 5000 0 5000 10000 15000 20000 $0 $1,000 $2,000 $3,000 $4,000 AdvertisingExpenditure

The advertising expenditure residual plot shows decreased variability for higher advertising expenditures. This is a concern, because it appears to violate the required conditions. At this point, we will refrain from conducting the F-test, as the required conditions are not met.
Copyright 2011 Pearson Canada Inc.

Residuals

408

Instructor's Solutions Manual Chapter 14

22. The Excel Regression output for the model that includes per-capital income and population is shown below.
SUMMARYOUTPUT RegressionStatistics MultipleR 0.720727015 RSquare 0.51944743 AdjustedRSquare 0.483850943 StandardError 687.5975486 Observations 30 ANOVA df Regression Residual Total 2 27 29 SS MS F SignificanceF 13798538.88 6899269 14.59266 5.05245E05 12765340.5 472790.4 26563879.38

Intercept Population PerCapitaIncome

Coefficients StandardError tStat Pvalue Lower95% 25502.98998 1443.513376 17.6673 2.3E16 22541.14522 0.062546304 0.011890908 5.260011 1.52E05 0.038148175 0.024579975 0.034847903 0.70535 0.486634 0.046922016

From this we can see that the model is significant. H0: 1 = 2 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 14.6, and the p-value is approximately zero. There is strong evidence that the overall model is significant. However, only one of the explanatory variables is significant in this model.

Copyright 2011 Pearson Canada Inc.

409

Instructor's Solutions Manual Chapter 14

Population: H0: 1 = 0 H1: 1 0

(from Excel output). The p-value is approximately zero, so we reject H0. There is enough evidence to conclude that population is a significant explanatory variable for sales, when per-capita income is included in the model. Per-capita income: H0: 2 = 0 H1: 2 0

(from Excel output). The p-value is 0.49, so we fail to reject H0. There is not enough evidence to conclude that per-capita income is a significant explanatory variable for sales, when population is included in the model.

Copyright 2011 Pearson Canada Inc.

410

Instructor's Solutions Manual Chapter 14

We proceed with the analysis by creating all possible regressions. The output is shown below.

MultipleRegressionToolsAllPossibleModelsCalculations

ModelNumber 1 VariableLabels Intercept Population

AdjustedR^2 StandardError 0.493121976 681.4291273 Coefficients pvalue 26434.2424 7.3224E28 0.063380025 9.17832E06

K SignificanceF 1 9.17832E06

ModelNumber 2 VariableLabels Intercept PerCapita Income

AdjustedR^2 StandardError 0.007751492 960.828081 Coefficients pvalue 27795.7368 1.64199E14 0.042711388

K SignificanceF 1 0.385584016

0.385584016

ModelNumber 3 VariableLabels Intercept Population PerCapita Income

AdjustedR^2 StandardError 0.483851936 687.6320544 Coefficients pvalue 25503.11658 2.306E16 0.062550338 1.51515E05 0.024571325

K SignificanceF 2 5.05232E05

0.486807201

It is clear, from this output, that the model for sales that we would want to explore first is the one with population as the explanatory variable. This model has the highest adjusted R2, the lowest standard error, and it is significant. Of course, once we have focused on this model, we must ensure that it meets the required conditions.

Copyright 2011 Pearson Canada Inc.

411

Instructor's Solutions Manual Chapter 14

The scatter diagram for sales and population shows some evidence of a positive linear relationship.

SalesandPopulation
$32,000 $31,500 $31,000 $30,500 $30,000 $29,500 $29,000 $28,500 $28,000 $27,500 20,000 30,000 40,000 50,000 60,000 70,000 Population

A plot of the residuals versus the predicted sales values for this model is shown below.

Sales

ResidualsVersus Predicted Sales


(ModelBasedonPopulation)
2000 1500 1000 500 0 500 1000 1500 2000 28000 28500 29000 29500 30000 30500 31000 PredictedSales

This plot shows, more or less, the desired horizontal band appearance. However, there are two points that raise questions, as they are far from the other points (the points are circled on the plot).
Copyright 2011 Pearson Canada Inc.

Residual

412

Instructor's Solutions Manual Chapter 14

The plot of residuals versus population is shown below.

PopulationResidualPlot
2000 1500 1000 500 0 500 1000 1500 2000 20,000 40,000 Population 60,000 80,000

As we might have expected, the same two points stand out in the plot. There are no dates associated with these data points, so we cannot assess whether they are related over time. A histogram of the residuals is shown below.

Residuals

Residuals
(ModelBasedonPopulation)
12 10 Frequency 8 6 4 2 0 Residual

The histogram appears to be approximately normally-distributed.


Copyright 2011 Pearson Canada Inc.

413

Instructor's Solutions Manual Chapter 14

There are two observations which produce a standardized residual +2 or -2 (observations 2 and 10). These are the same data points that stood out on the residual plots. If we had access to the original data, we would double-check these data points. Because we cannot do that, we will leave them in the model. Our analysis suggests that the model that predicts sales on the basis of population is the best model for this data set. Because the model appears to meet the required conditions, it could be used as the basis for predictions of sales. 23. Here is the correlation matrix for the variables in the Salaries data set.

Age Age YearsofPostsecondary Education YearsofExperience Salary(000) 1

Yearsof Postsecondary Education

Yearsof Experience

Salary(000)

0.318722486 0.971756454 0.861913715

1 0.227538151 0.528597263

1 0.862062768

From this we can see that years of experience and age are very highly correlated, and so we would not choose to include both in our model. Both age and years of experience are very highly correlated with salary, and so one or the other appears to be promising as an explanatory variable.

Copyright 2011 Pearson Canada Inc.

414

Instructor's Solutions Manual Chapter 14

24. We will use the Excel add-in to provide summary data about all possible models (notice that the output is spread over two pages).

ModelNumber 1 VariableLabels Intercept Age

AdjustedR^2 StandardError 0.736129338 9.090529977 Coefficients pvalue 0.207680673 0.967203503 1.252388299 9.16507E13

K SignificanceF 1 9.16507E13

ModelNumber 2 VariableLabels Intercept Yearsof Postsecondary Education

AdjustedR^2 StandardError 0.260452305 15.21867153 Coefficients pvalue 36.37394928 5.56639E10

K SignificanceF 1 0.000454415

4.03150375

0.000454415

ModelNumber 3 VariableLabels Intercept Yearsof Experience

AdjustedR^2 StandardError 0.736393063 9.085986075 Coefficients pvalue 28.23279516 2.36687E13 1.365020143 8.99113E13

K SignificanceF 1 8.99113E13

ModelNumber AdjustedR^2 4 0.804622258 VariableLabels Coefficients Intercept 2.125320671 Age 1.121515567 PostsecondaryE 2.155278577

StandardError 7.822241019 pvalue 0.628936674 1.8482E12 0.000547032

K SignificanceF 2 2.87227E14

Copyright 2011 Pearson Canada Inc.

415

Instructor's Solutions Manual Chapter 14

ModelNumber 5 VariableLabels Intercept Age Yearsof Experience

AdjustedR^2 StandardError 0.740351949 9.017500671 Coefficients pvalue 13.78392724 0.249310597 0.631384586 0.216724787 0.69640475

K SignificanceF 2 5.53555E12

0.211309492

ModelNumber 6 VariableLabels Intercept Yearsof Postsecondary Education Yearsof Experience

AdjustedR^2 StandardError 0.852122948 6.805249337 Coefficients pvalue 20.87563476 9.19447E11

K SignificanceF 2 1.66036E16

2.673930095 1.238703002

2.60097E06 1.02872E14

ModelNumber 7 VariableLabels Intercept Age Yearsof Postsecondary Education Yearsof Experience

AdjustedR^2 StandardError 0.850399945 6.844780506 Coefficients pvalue 27.70373012 0.005220235 0.3191034 0.453660758

K SignificanceF 3 1.51907E15

2.846348768 1.568477845

5.77326E06

0.001223366

Copyright 2011 Pearson Canada Inc.

416

Instructor's Solutions Manual Chapter 14

25. There are many possible models here. However, the one that looks most promising is the one that includes years of experience and years of postsecondary education. This is a logical model. Overall, it is significant, and each of the explanatory variables is significant. The standard error is relatively low. As well, the model makes sense. It is reasonable to expect that both of these factors would have a positive impact on salary. We cannot decide to rely on this model without checking the required conditions. The residual plots are shown below.

Residualsvs.PredictedSalary
(Yearsof Postsecondary Education, Years of Experience)
15 10

Residuals

5 0 5 10 15 0 20 40 60 80 100 PredictedSalary

YearsofPostsecondary EducationResidualPlot
15 10

Residuals

5 0 5 0 10 15 YearsofPostsecondaryEducation 2 4 6 8 10

Copyright 2011 Pearson Canada Inc.

417

Instructor's Solutions Manual Chapter 14

YearsofExperienceResidual Plot
15 10 5

Residuals

0 5 10 15 YearsofExperience 0 10 20 30 40

All the residual plots show the desired horizontal band appearance, centred on zero. A histogram of the residuals has some right-skewness.

Residualsfor Salary Model


(YearsofPostsecondaryEducation,Yearsof Experience)
14 12

Frequency

10 8 6 4 2 0 Residual

There are no obvious outliers or influential observations. We choose this model as the best available.

Copyright 2011 Pearson Canada Inc.

418

Instructor's Solutions Manual Chapter 14

26. The Excel Regression output for the model based on income only is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.604253286 RSquare 0.365122033 AdjustedRSquare 0.345883307 StandardError 422.1512823 Observations 35 ANOVA df Regression Residual Total 1 33 34 SS MS F SignificanceF 3382189.615 3382190 18.97849 0.000121079 5880986.271 178211.7 9263175.886

Intercept Income($000)

Coefficients StandardError tStat Pvalue Lower95% 75.02173946 396.7061107 0.189112 0.851164 732.0829074 23.17297158 5.319255679 4.356431 0.000121 12.35086458

Compare these results with those shown in Exhibit 14.26, where gender is included in the model. We see that the adjusted R2 is higher for the model that includes gender, and the standard error is lower. It appears that adding the gender variable improves the model.

Copyright 2011 Pearson Canada Inc.

419

Instructor's Solutions Manual Chapter 14

27. We used indicator variables as shown in Exhibit 14.27 in the text. The Excel Regression output is as shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.404030563 RSquare 0.163240695 AdjustedRSquare 0.101258525 StandardError 313.2587914 Observations 30 ANOVA df Regression Residual Total 2 27 29 SS 516890.0667 2649538.9 3166428.967 MS F SignificanceF 258445.0333 2.633671806 0.090179418 98131.07037

Intercept Onever Durible

Coefficients StandardError tStat Pvalue Lower95% 1564.4 99.06112778 15.79226923 3.67865E15 1361.143357 218.9 140.0935904 1.562526875 0.12981017 506.3483007 313.4 140.0935904 2.237075937 0.03373543 600.8483007

H0: 1 = 2 = 3 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 2.63, and the p-value is about 9%. There is not enough evidence to infer that there is a significant relationship between battery life and brand. Note that this is the same conclusion we came to in Chapter 11.

Copyright 2011 Pearson Canada Inc.

420

Instructor's Solutions Manual Chapter 14

28. First, we must set up the data set with indicator variables. We cannot run the regression with the 1, 2, and 3 codes for region that are in the data set. Two indicator variables (in combination) indicate region, as follows: Central 1 0 North 0 1 Southwest 0 0

Excel's Regression output is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.692380383 RSquare 0.479390595 AdjustedRSquare 0.429009039 StandardError 12.12223863 Observations 35 ANOVA df Regression Residual Total 3 31 34 SS MS 4194.738105 1398.246 4555.408752 146.9487 8750.146857 F SignificanceF 9.5152 0.000131176

Intercept NumberofSales Contacts(Monthly) RegionIndicator1 RegionIndicator2

Coefficients StandardError tStat Pvalue Lower95% 14.78132092 11.7183791 1.26138 0.216582 38.68111257 0.827088142 14.07075014 6.169658282 0.174906441 4.728746 4.67E05 0.470364104 5.122616397 2.74679 0.009933 3.623105161 5.449944445 1.132059 0.266291 4.945576653

We can see that the overall model is significant. As well, the number of sales contacts is significant. The first indicator variable is also significant, but the second one is not.

Copyright 2011 Pearson Canada Inc.

421

Instructor's Solutions Manual Chapter 14

What does this mean? It appears that when the first region indicator variable is included in the model, the second one is not significant. If we think about what the region indicator variables tell us, it appears that region is significant, but only in the sense that it matters whether the region is central, or not (the distinction between north and southwest is not significant). In fact, if we re-run the model, keeping only the distinction between sales in the central region or not, the results are as follows:

SUMMARYOUTPUT RegressionStatistics MultipleR 0.67665967 RSquare 0.457868309 AdjustedRSquare 0.423985079 StandardError 12.17545162 Observations 35 ANOVA df Regression Residual Total 2 32 34 SS MS F SignificanceF 4006.414947 2003.207 13.51312 5.56771E05 4743.73191 148.2416 8750.146857 Pvalue Lower95% 0.333408 34.1436764

Coefficients StandardError tStat Intercept 11.10718667 11.30939798 0.98212 NumberofSales Contacts (Monthly) 0.822594987 0.175628992 4.683708 RegionIndicator1 10.67039881 4.16780093 2.560199

4.97E05 0.464850439 0.015384 2.180866167

The model can be interpreted as follows: Sales = -$11,107.19 + $822.59 X Number of Sales Contacts + $10,670.40 for the central region, and Sales = -$11,107.19 + $822.59 X Number of Sales Contacts for the north or southwest regions.

Copyright 2011 Pearson Canada Inc.

422

Instructor's Solutions Manual Chapter 14

29. Excel's Regression output for the model including both number of employees and shift is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.40825172 RSquare 0.166669467 AdjustedRSquare 0.128790806 StandardError 4165.200265 Observations 47 ANOVA df Regression Residual Total 2 44 46 SS MS F SignificanceF 152673338.7 76336669 4.400089 0.018112587 763351302.8 17348893 916024641.5

Coefficients StandardError tStat Pvalue Lower95% Intercept 33118.64628 10531.14671 3.144828 0.002976 11894.51498 NumberofEmployees 246.1711185 83.00625235 2.965694 0.004865 78.88301139 Shift(0=Day,1=Night) 158.6692673 1282.379223 0.12373 0.902092 2425.796202

It appears the overall model is significant, however, shift is not a significant explanatory variable when the number of employees is included in the model. As well, if the model is run with only shift included as an explanatory variable, it is not significant. Therefore, it appears that shift is not a useful explanatory variable for the number of units produced. As well, the model based on number of employees, while significant, is not a particularly useful model (the adjusted R2 is only 0.15).

Copyright 2011 Pearson Canada Inc.

423

Instructor's Solutions Manual Chapter 14

30. Province is not a significant explanatory variable for wages and salaries, either as the sole explanatory variable (p-value for the F-test = 0.67), or when age is included in the model (pvalue for the test of the province indicator variable coefficient = 0.76). While it appears that the model including age alone is significant, the required conditions are not met. See the residual plot shown below. The plot clearly shows a pattern of increasing variability for higher ages.

AgeResidualPlot
200000 150000 100000

Residuals

50000 0 50000 100000 0 10 20 30 40 Age 50 60 70 80 90

Chapter Review Exercises 1. The model can be interpreted as follows: $Monthly Credit Card Balance = $38.36 + $0.99(Age of Head of Household) + $22.04(Income in thousands of dollars) + $0.38(Value of the home in thousands of dollars). Generally, monthly credit card balances are higher for older heads of household with higher incomes and more expensive homes.

Copyright 2011 Pearson Canada Inc.

424

Instructor's Solutions Manual Chapter 14

2.

H0: 1 = 2 = 3 = 0 H1: At least one of the is is not zero. = 0.05 From the Excel output, we see that F = 5.96. The F distribution will have (3, 31) degrees of freedom. We estimate that the p-value < 0.01. There is strong evidence that the overall model is significant.

3.

Age of head of household: H0: 1 = 0 H1: 1 0

(from Excel output). The p-value is 0.93, so we fail to reject H0. There is not enough evidence to conclude that age of head of household is a significant explanatory variable for credit card balances, when household income and value of the home are included in the model. Income ($000): H0: 2 = 0 H1: 2 0

(from Excel output). The p-value is 0.02, so we reject H0. There is enough evidence to conclude that household income is a significant explanatory variable for credit card balances, when age of head of household and value of the home are included in the model.

Copyright 2011 Pearson Canada Inc.

425

Instructor's Solutions Manual Chapter 14

Value of home ($000): H0: 3 = 0 H1: 3 0

(from Excel output). The p-value is 0.92, so we fail to reject H0. There is not enough evidence to conclude that value of the home is a significant explanatory variable for credit card balances, when age of head of household and household income are included in the model. 4. Age of head of household is fairly highly correlated with household income. As well, it appears that age is not a significant explanatory variable, when household income is included in the model. Collinearity between these two variables may be causing a problem in the model. Because the tests are done in the same format as the final exam, it is expected that the tests will prove to be better predictors of the final exam mark. However, good knowledge of the material is likely to result in higher marks for all of the evaluations, so we must consider that any one of them could be a good predictor of the final exam mark. None of the correlations between the explanatory variables is particularly high. There is a fairly high correlation between the mark on Test #2 and the final exam mark, which suggests that the mark on Test #2 might be a good explanatory variable for the final exam mark. The model that predicts the final exam mark on the basis of the mark on Test #2 is clearly the best. The adjusted R2 is higher, the standard error is lower, than for all the other variations. As well, the model is significant (p-value for the F-test is approximately zero).

5.

6.

7.

Copyright 2011 Pearson Canada Inc.

426

Instructor's Solutions Manual Chapter 14

8.

We have 95% confidence that the interval (675.45, $2,486.74) contains the monthly credit card bill for a head of household aged 45, with annual income of $65,000 and a home valued at $175,000. The Excel output is shown below (split for visibility).

ConfidenceIntervalandPredictionIntervalsCalculations Point 95% =ConfidenceLevel(%) AgeofHeadof ValueofHome Number Household Income(000) (000) 1 45 65 175 PredictionInterval Lowerlimit Upperlimit 675.4457684 2486.743786

9.

All of the models that include Test #2 as an explanatory variable are better than those which do not. Test #2 was the basis for the best model when only one explanatory variable was included in the model, and so this is not surprising. The two-variable model with the highest adjusted R2 contains Test #2 and Assignment #2. Adding Assignment #2 to Test #2 as an explanatory variable increases the adjusted R2 value from 0.51 to 0.57, and the standard error decreases from 14.3 to 13.4. Prediction and confidence intervals made with the two-variable model would be narrower than for the model with only Test #2. The two-variable model is better, but whether it is "best" depends on the way the model might be used. Suppose it is being used to predict the exam marks, and identify those who are in danger of failing the course, or not achieving a grade level necessary for external accreditation. Test #2 is a significant explanatory variable. If Assignment #2 comes much later in the course, it may be better to use the single-variable model, so that the student can be alerted to a potential problem earlier, with time for adjustments.

Copyright 2011 Pearson Canada Inc.

427

Instructor's Solutions Manual Chapter 14

10. The residual plots for this model are shown below. All have the desired appearance.

Assignment#2ResidualPlot
30 20 10 0 10 0 20 30 Assignment#2 50 100 150

Residuals

Test#2ResidualPlot
30 20

Residuals

10 0 10 20 30 Test#2 0 20 40 60 80 100 120

Residualsvs.PredictedExamMark
(Test #2 and Assignment #2)
30 20 Residual 10 0 10 20 30 0 20 40 60 80 100 120 PredictedExamMark

Copyright 2011 Pearson Canada Inc.

428

Instructor's Solutions Manual Chapter 14

There is one data point that produces a standardized residual that is greater than 2 (observation 69). However, there is no way to double-check this point. It appears this model meets the required conditions. 11. None of the three-variable models represents a real improvement on the model which includes Test #2 and Assignment #2. As we might expect, the best three-variable models include both Test #2 and Assignment #2. The best of these, in terms of higher adjusted R2, also contains Test #1. However, Test #1 is not significant as an explanatory variable when Test #2 and Assignment #2 are included in the model. This is true for all the other models that include both Test #2 and Assignment #2: the third explanatory variable is not significant when Test #2 and Assignment #2 are included in the model. 12. The Excel Regression output for the model containing all possible explanatory variables is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.775514198 RSquare 0.601422271 AdjustedRSquare 0.579030264 StandardError 13.22677489 Observations 95 ANOVA df Regression Residual Total 5 89 94 SS MS F SignificanceF 23494.40275 4698.881 26.85879 1.85347E16 15570.33409 174.9476 39064.73684 tStat 3.345719 1.275381 1.751646 2.388069 5.769027 0.365879 Pvalue Lower95% 0.001203 7.344028811 0.205494 0.055878047 0.083279 0.020060282 0.019051 0.022668174 1.14E07 0.290291262 0.715323 0.104482531

Intercept Assignment#1 Test#1 Assignment#2 Test#2 Quizzes

Coefficients StandardError 18.08370688 5.405029406 0.100148977 0.078524761 0.149314405 0.085242328 0.134964923 0.056516334 0.442801866 0.076755029 0.023581516 0.064451647

Again we see that the explanatory variables other than Test #2 and Assignment #2 are not significant in this model. This is not the best model.

Copyright 2011 Pearson Canada Inc.

429

Instructor's Solutions Manual Chapter 14

13. Using the model that includes Test #2 and Assignment #2, the Excel output is shown below (split for visibility):

ConfidenceIntervalandPredictionIntervalsCalculations Point 95% =ConfidenceLevel(%) Number Assignment#2 Test#2 1 65 70 PredictionInterval Lowerlimit Upperlimit 48.90969082 102.4567362

We have 95% confidence that the interval (48.9, 100) contains the final exam mark of a student who received a mark of 65 on Assignment 2 and 70 on Test 2. After all the analysis, it appears that the best model generates a prediction interval so wide that it is not really useful.

Copyright 2011 Pearson Canada Inc.

430

Instructor's Solutions Manual Chapter 14

14. The Excel output for all possible regressions calculations is shown below.

MultipleRegressionToolsAllPossibleModelsCalculations

ModelNumber 1 VariableLabels Intercept Year

AdjustedR^2 StandardError 0.325238882 2007.384641 Coefficients pvalue 3294907.127 0.00180397 1651.875001 0.001726878

K SignificanceF 1 0.001726878

ModelNumber 2 VariableLabels Intercept Kilometres

AdjustedR^2 StandardError 0.420323605 1860.580138 Coefficients pvalue 21217.96234 1.44594E15 0.056903836 0.00027344

K SignificanceF 1 0.00027344

ModelNumber 3 VariableLabels Intercept Year Kilometres

AdjustedR^2 0.519585698 Coefficients 2076840.42 1045.988481 0.042999778

StandardError 1693.805491 pvalue 0.026740574 0.025384546 0.00403536

K SignificanceF 2 0.000120809

All three models are significant. The model with the highest adjusted R2 is the one that includes both year and kilometres as explanatory variables. Both year and kilometres are significant explanatory variables, when the other variable is included in the model.

Copyright 2011 Pearson Canada Inc.

431

Instructor's Solutions Manual Chapter 14

However, this model presents some problems. Initial scatter diagrams for each of the explanatory variables are shown below.

ListPrice,UsedCars,20042008ModelYear

HondaAccordPricesonAutoTrader.ca (November2008)
$25,000 $20,000 $15,000 $10,000 $5,000 $0 2003 2004 2005 2006 2007 2008

In this scatter diagram, we see there is more variability for list prices for older cars. Note there is only one data point for a car from the 2007 model year.

ListPrice,UsedCars,20042008Model Year

HondaAccordPricesonAutoTrader.ca (November2008)
$25,000 $20,000 $15,000 $10,000 $5,000 $0 0 50,000 100,000 Kilometres 150,000 200,000

Here, we see there is more variability in list prices for Honda Accords with higher kilometres.

Copyright 2011 Pearson Canada Inc.

432

Instructor's Solutions Manual Chapter 14

With both these explanatory variables included in the model, the residual plots are as shown below.

YearResidualPlot
6000 5000 4000 3000 2000 1000 0 1000 2000 3000 2003 2004 2005 2006 2007 2008 Year

Residuals

KilometresResidualPlot
6000 5000 4000 3000 2000 1000 0 1000 2000 3000 0 50000 100000 Kilometres 150000 200000

Both of these plots show an unusual observation, which is circled. Both refer to the observation where a Honda Accord with 121,353 kilometres, 2005 model year, was listed for $19,888. There may be something unusual about this observation to explain why the list price is so unusually high for a relatively high-mileage (kilometrage!) car. Referring back to the original listing, if it were available, might tell us something to explain this. Since we do not have this information available, we cannot assess if this point is legitimate.

Residuals

Copyright 2011 Pearson Canada Inc.

433

Instructor's Solutions Manual Chapter 14

Residualsvs.PredictedListPrice
(YearandKilometres)
6000 4000

Residuals

2000 0 2000 4000 0 5000 10000 15000 20000 25000 PredictedListPrice

The outlier that showed up on the other residual plots also shows up here. There are some concerns about the model. The standard error is fairly wide, so, for example, if we predicted the list price of a 2005 Honda Accord with 85,000 kilometres, the prediction interval would be ($13,114.99, $20,308.02). Therefore, the model is not that useful for predicting the list price of a used Honda Accord. 15. Year of the car is not really a quantitative variable. There are four years (2004, 2005, 2006, and 2007) in the sample data set, so three indicator variables are required. They could be set up as follows: Year Indicator Variable 1 Indicator Variable 2 Indicator Variable 3 2004 1 0 0 2005 0 1 0 2006 0 0 1 2007 0 0 0 All possible regressions calculations provide many possible models. However, notice again that there is only one observation for the year 2007. The data set is not really large enough to support this analysis. We will proceed, out of curiosity.

Copyright 2011 Pearson Canada Inc.

434

Instructor's Solutions Manual Chapter 14

The model with the highest adjusted R2 contains kilometres and only the indicator variable specifying whether the car is from the 2004 model year, or not.

ModelNumber 5 VariableLabels Intercept Kilometres 1=Year2004,0=NotYear 2004

AdjustedR^2 0.590043715 Coefficients 21641.08247 0.049699268

StandardError 1564.675734 pvalue 8.12791E17 0.000244684

K SignificanceF 2 2.11076E05

2071.672715 0.003726813

The model is as follows: For the model year 2004: List price = $21,641.08 0.05(Kilometres) - $2,017.67 For model years 2005, 2006, and 2007: List price = $21,641.08 0.05(Kilometres) Notice that this model is more intuitive than the model from Exercise 14, which was: List price = -$2,076,840.42 + $1,045.99(Year) + 0.043(Kilometres) Such a model does not really make sense, and this should have been your clue that treating the year of a car as a quantitative variable is not the correct approach.

Copyright 2011 Pearson Canada Inc.

435

Instructor's Solutions Manual Chapter 14

16. All seven possible regression models are significant, as the output for all possible regressions calculations shows. The output is split over two pages.

MultipleRegressionToolsAllPossibleModelsCalculations

ModelNumber 1 VariableLabels Intercept LocalPopulation

AdjustedR^2 0.279439295 Coefficients 26728.22947 0.014837063

StandardError 1587.969587 pvalue 4.10163E28 0.001578561

K SignificanceF 1 0.001578561

ModelNumber 2 VariableLabels Intercept MedianIncomein LocalArea

AdjustedR^2 StandardError 0.422781893 1421.270907 Coefficients pvalue 17430.60759 4.51747E08 0.163329197 6.02729E05

K SignificanceF 1 6.02729E05

ModelNumber 3 VariableLabels Intercept EstimatedTraffic Volume(Weekly)

AdjustedR^2 StandardError 0.243535302 1627.051224 Coefficients pvalue 23903.47264 5.63582E16

K SignificanceF 1 0.003278203

0.188766345 0.003278203

ModelNumber 4 VariableLabels Intercept LocalPopulation MedianIncomein LocalArea

AdjustedR^2 0.461214393 Coefficients 18991.09596 0.007473002

StandardError 1373.140214 pvalue 2.35328E08 0.094817808

K SignificanceF 2 9.0189E05

0.127329166 0.003228531

Copyright 2011 Pearson Canada Inc.

436

Instructor's Solutions Manual Chapter 14

ModelNumber 5 VariableLabels Intercept LocalPopulation EstimatedTraffic Volume(Weekly)

AdjustedR^2 0.51721688 Coefficients 22441.81416 0.014268244

StandardError 1299.819151 pvalue 6.68122E17 0.000332706

K SignificanceF 2 2.0497E05

0.180554925 0.000664511

ModelNumber 6 VariableLabels Intercept MedianIncomein LocalArea EstimatedTraffic Volume(Weekly)

AdjustedR^2 StandardError 0.485521563 1341.808324 Coefficients pvalue 16748.85286 4.97853E08

K SignificanceF 2 4.83609E05

0.133898131 0.000822866 0.110679438 0.045105932

ModelNumber 7 VariableLabels Intercept LocalPopulation MedianIncomein LocalArea EstimatedTraffic Volume(Weekly)

AdjustedR^2 0.567509125 Coefficients 18633.31979 0.009787852

StandardError 1230.255648 pvalue 5.65578E09 0.020230685

K SignificanceF 3 1.49902E05

0.079865392 0.052205498 0.13655739 0.010370099

None of the one-variable models seems useful, as the adjusted R2 is quite low. Of the twovariable models, the most promising is Model Number 5, with local population and estimated weekly traffic volume as explanatory variables. The model is significant, and each of the explanatory variables is significant, when the other one is included in the model. The adjusted R2 is 0.52, which is not high, but still better than the other two-variable models. At first, it appears that Model Number 7, which includes all three explanatory variables, might be best, as it has the highest adjusted R2 of all the models. However, note that median income in the local area is not a significant explanatory variable (with a 5% significance level), when the other two variables are included in the model. as well, not that the standard error for this model is almost the same as for Model Number 5, which relies on only two explanatory variables.

Copyright 2011 Pearson Canada Inc.

437

Instructor's Solutions Manual Chapter 14

Therefore, we will investigate Model Number 5 to see if it conforms to the required conditions. First, such a model makes some sense. Initial scatter diagrams for monthly sales and each explanatory variable show some evidence of a positive linear relationship, although neither relationship looks particularly strong. Note that the scales on some of the axes in the graphs below do not start at zero.

MonthlySalesandLocalPopulation
$33,000 $32,000 $31,000 $30,000 $29,000 $28,000 $27,000 $26,000 $25,000 $24,000 0 50,000 100,000 150,000 200,000 250,000 300,000 LocalPopulation

MonthlySales

MonthlySalesandWeekly Traffic
$33,000 $32,000 $31,000 $30,000 $29,000 $28,000 $27,000 $26,000 $25,000 $24,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000

MonthlySales

EstimatedTrafficVolume(Weekly)

Copyright 2011 Pearson Canada Inc.

438

Instructor's Solutions Manual Chapter 14

Residual plots show (more or less) the desired horizontal band appearance, as shown below.

LocalPopulationResidualPlot
3000 2000 1000

Residuals

0 1000 2000 3000 4000 0 100,000 200,000 300,000 LocalPopulation

EstimatedTrafficVolume (Weekly)ResidualPlot
3000 2000
Residuals

1000 0 1000 2000 3000 4000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 EstimatedTrafficVolume(Weekly)

Copyright 2011 Pearson Canada Inc.

439

Instructor's Solutions Manual Chapter 14

Residuals,MonthlySalesModel, DoughnutShop(Populationand Weekly


Traffic as Explanatory Variables)
3000 2000 1000

Residual

0 1000 2000 3000 4000 25000 26000 27000 28000 29000 30000 31000 32000 33000 PredictedMonthlySales

A histogram of the residuals is approximately normal. There do not appear to be any outliers or influential observations.

Residuals,MonthlySalesModel, DoughnutShop(Populationand Weekly


Traffic as Explanatory Variables)
12 10 Frequency 8 6 4 2 0 Residual

It appears the model meets the required conditions. This model could be the basis of a location decision. The form of the model is as follows: Predicted Monthly Sales = $22,441 + 0.0143(Local Population) + 0.1806(Estimated Weekly Traffic Volume)

Copyright 2011 Pearson Canada Inc.

440

Instructor's Solutions Manual Chapter 14

17. While it is tempting the add the new data and analyze the model which was best from the analysis we did for Exercise 16, the correct approach is to look at all possible models. We have to allow for the possibility that the new information ALONE will be the basis of the most important explanatory variable. In fact, the output of all possible regressions calculations shows that inclusion of the indicator variable for the location being within a five-minute drive of a major highway does improve the model we chose as best for Exercise 16. However, the best of all of the models, in terms of adjusted R2, is the model with all possible explanatory variables. The adjusted R2 for this model if 0.656, compared with 0.517 for the preferred model in Exercise 16. The data requirements for this model are more onerous, and this would have to be taken into consideration before the model was selected. While local population and median incomes could be obtained through Statistics Canada, information about estimated weekly traffic volume will probably have to be collected (and possibly over several weeks). However, the information about whether a location is within a five-minute drive of a major highway could be obtained by looking at road maps and estimating driving distance. We will analyze the "all-in" model to see if it conforms to required conditions. Residual plots look acceptable. The histogram of residuals appears normally-distributed. There are no obvious outliers or influential observations.

LocalPopulationResidual Plot
3000 2000 1000 0 1000 2000 3000 0 50,000 100,000 150,000 200,000 250,000 300,000 LocalPopulation
Residuals

Copyright 2011 Pearson Canada Inc.

441

Instructor's Solutions Manual Chapter 14

1=WithinFiveMinute Drive ofMajorHighway, 0=OtherwiseResidualPlot


4000
Residuals

2000 0 2000 0 4000 0.2 0.4 0.6 0.8 1 1.2

1=WithinFive MinuteDriveofMajorHighway, 0=Otherwise

MedianIncomein LocalArea ResidualPlot


3000 2000 1000 0 1000 2000 3000 40000 50000 60000 70000 80000 90000 MedianIncomeinLocalArea
Residuals

EstimatedTrafficVolume (Weekly)ResidualPlot
3000 2000 1000 0 1000 2000 3000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 EstimatedTrafficVolume(Weekly)

Residuals

Copyright 2011 Pearson Canada Inc.

442

Instructor's Solutions Manual Chapter 14

DoughnutShopSalesPrediction Model(AllExplanatoryVariables)
3000 2000 Residual 1000 0 1000 2000 3000 25000 27000 29000 31000 33000 PredictedMonthlySales

Doughnut Shop SalesPrediction Model (All Explanatory Variables)


12 10

Frequency

8 6 4 2 0

Residual

It appears the model meets the required conditions. The model is as follows: For locations within a five-minute drive of a major highway: $Predicted Monthly Sales = $17,413 + 0.009(Local Population) + 0.089(Median Income in Local Area) + 0.145(Estimated Weekly Traffic Volume) +$1,137 For locations not within a five-minute drive of a major highway: $Predicted Monthly Sales = $17,413 + 0.009(Local Population) + 0.089(Median Income in Local Area) + 0.145(Estimated Weekly Traffic Volume) 18. a. Since the Canadian economy is resource-based, it does not seem unusual to look to resource stocks as a stand-in for the entire stock index. Once could argue that as the economy overall goes, so will go the financial sector and the stock index. The Rona stock seems less likely, ahead of time, to be a good predictor of the TSX. However, we will begin, as usual, by looking at all possible models.

Copyright 2011 Pearson Canada Inc.

443

Instructor's Solutions Manual Chapter 14

However, when we do this, we do not find a useful model. Of the one-variable models, the best is the one that predicts the TSX on the basis of the price of Potash Corporation stock. However, although the model is significant, the adjusted R2 is quite low, and the standard error is relatively high. When we examine the two-variable models, all of them have at least one variable that is not significant in the model when the other is present. All the three- and four-variable models show the same problem. This is not surprising, as all of the stocks and the TSX will be affected by overall economic conditions. Some multi-collinearity is the likely result. Scatter diagrams for the TSX and each stock price also hint that there does not appear to me a linear relationship between these stock prices and the TSX.

TSXandRoyalBankStockPrice
16000 14000 12000 10000 8000 6000 4000 2000 0 0 20 40 60 80 10 0 RoyalBankStockPrice

TSX

TSXandRonaInc.StockPrice
16000 14000 12000 10000 8000 6000 4000 2000 0 0 10 20 30 40 50 60 RonaInc.StockPrice

TSX

Copyright 2011 Pearson Canada Inc.

444

Instructor's Solutions Manual Chapter 14

TSXandPetroCanadaStockPrice
16000 14000 12000 10000 8000 6000 4000 2000 0 0 20 40 60 80 10 0 PetroCanadaStockPrice

TSX

TSXandPotashCorp.StockPrice
16000 14000 12000 10000 8000 6000 4000 2000 0 50 100 150 PotashCorp.StockPrice 200 250

The only relationship that looks somewhat linear is the last one, between the TSX and the Potash Corporation stock price, and there is clearly more variability in the TSX for prices in the lower part of the Potash Corporation stock price range for these data. This helps explain why the only model that appeared to have any predictive power was the one based on the stock price of the Potash Corporation. b. There is definitely evidence of the stock market crisis at the end of 2008. For example, if we examine the residuals for the model based on the Potash Corporation stock price, they show a definite time-related pattern, as shown below. It is not a good idea to try to build a model using data for this time period. Whatever relationships may have held before the fall of 2008, it appeared that financial markets were increasingly unpredictable, and the new information that became available at the time of the crisis may change forever the way stock markets work.

TSX

Copyright 2011 Pearson Canada Inc.

445

Instructor's Solutions Manual Chapter 14

TSXModel,ResidualsOverTime(Potash
CorporationStockPriceasExplanatoryVariable)
6000 5000 4000 3000 2000 1000 0 1000 2000 3000 4000

Residual

01/11/2002

01/03/2003

01/07/2003

01/11/2003

01/03/2004

01/07/2004

01/11/2004

01/03/2005

01/07/2005

01/11/2005

01/03/2006

01/07/2006

01/11/2006

01/03/2007

01/07/2007

01/11/2007

01/03/2008

01/07/2008

19. There are many possible models. However, for many of the models, when the overall model is significant, some of the individual explanatory variables are not significant, given the other explanatory variables in the model. This is not surprising, as the factors that lead to student success in one subject probably contribute to student success in other subjects. The best one-variable model is based on the mark in Intermediate Accounting 1. The best two-variable model includes the mark in Intermediate Accounting 1 and Cost Accounting 1. Model results are summarized below.

ModelNumber 1 VariableLabels Intercept IntermediateAccounting1 ModelNumber 6 VariableLabels Intercept IntermediateAccounting1 CostAccounting1

AdjustedR^2 0.520480342 Coefficients 17.2097272 0.711183518 AdjustedR^2 0.590366643 Coefficients 14.52779938 0.420200699 0.377202568

StandardError 12.93893991 pvalue 0.008500616 2.05994E09 StandardError 11.95895263 pvalue 0.016864759 0.002427925 0.003952342

K SignificanceF 1 2.05994E09 K SignificanceF 2 2.92403E10

Of these two, Model Number 6 appears to be the better model, with a higher adjusted R2, and somewhat lower standard error.

01/11/2008

Copyright 2011 Pearson Canada Inc.

446

Instructor's Solutions Manual Chapter 14

20. Scatter diagrams of the Statistics 1 mark and each of the marks in Intermediate Accounting 1 and Cost Accounting 1 are shown below. Both relationships appear linear.

StudentMarksin Intermediate Accounting1andStatistics1


100

MarkinStatistics1

80 60 40 20 0 0 20 40 60 80 100 120 MarkinIntermediateAccounting1

StudentMarksin CostAccounting1 andStatistics1


100
MarkinStatistics1

80 60 40 20 0 0 20 40 60 80 100 120 MarkinCostAccounting1

Copyright 2011 Pearson Canada Inc.

447

Instructor's Solutions Manual Chapter 14

Residual plots appear to be as desired.

IntermediateAccounting1 ResidualPlot
40 20
Residuals

0 0 20 40 IntermediateAccounting1 20 40 60 80 100 120

CostAccounting1Residual Plot
40 20
Residuals

0 0 20 40 CostAccounting1 20 40 60 80 100 120

Residualsvs. Predicted Statistics1Mark(Cost


Accounting1andIntermediateAccounting1as ExplanatoryVariables)
30 20 10

Resdiaul

0 10 0 20 30 40 PredictedStatistics1mark 20 40 60 80 100

Copyright 2011 Pearson Canada Inc.

448

Instructor's Solutions Manual Chapter 14

The histogram of residuals is skewed to the left, as shown below. However, generally, the residuals appear to be normally-distributed.

Residualsvs. Predicted Statistics1Mark (Cost


20 18 16 14 12 10 8 6 4 2 0

Accounting1andIntermediateAccounting1as ExplanatoryVariables)

Frequency

Residual

There are some data points with standardized residuals either -2 or +2. However, we have no way to verify these data points, so for now, we have no choice but to leave them in the model. It appears that the model to predict the Statistics 1 mark on the basis of marks in Cost Accounting 1 and Intermediate Accounting 1 meets the required conditions. 21. We have 95% confidence that the interval (42, 91) contains the Statistics 1 mark of an individual student who achieved a mark of 65 in Cost Accounting 1 and Intermediate Accounting 1.

Copyright 2011 Pearson Canada Inc.

449

Instructor's Solutions Manual Chapter 14

22. Of all the one-variable models, the best is the one based on years of experience. Of all the other models, the one based on years of experience and the local advertising budget is best. This model has an adjusted R2 of 0.95. The model seems sensible. Sales = $12,260 + $1,185(Years of Experience) + $4(Local Advertising Budget) It seems reasonable to expect that salespeople would increase their skill as they gain years of experience, and this could result in increased sales. It also seems likely that increases in the local advertising budget would lead to increases in sales. Scatter diagrams for each explanatory variable and sales are shown below. Note the vertical axes on each graph does not start at zero.

SalesandYearsofExperience
$80,000 $75,000 $70,000 $65,000 $60,000 $55,000 $50,000 $45,000 $40,000 0 5 10 15 20 25 30 35 40 YearsofExperience

Sales

SalesandLocalAdvertising Budget
$80,000 $75,000 $70,000 $65,000 $60,000 $55,000 $50,000 $45,000 $40,000 $3,000 $3,500 $4,000 $4,500 $5,000 $5,500 $6,000 $6,500 $7,000 LocalAdvertisingBudget

Sales

Copyright 2011 Pearson Canada Inc.

450

Instructor's Solutions Manual Chapter 14

There appears to be a strong linear relationship between sales and years of experience. The relationship between the local advertising budget and sales is less obvious. However, these variables in combination appear to provide the best model for sales. The residual plots appear to have the desired horizontal band appearance. There is one point that appears unusual in all three plots (it is indicated with a triangular marker). All three points correspond to the same observation, which is the 40th data point. If we had the ability to double-check the accuracy of this point, we would. This data point is the only one with a standardized residual either -2 or +2.

LocalAdvertising BudgetResidual Plot


6000 4000
Residuals

2000 0 2000 4000 6000 2000 3000 4000 5000 6000 7000 LocalAdvertisingBudget

Yearsof ExperienceResidual Plot


6000 4000
Residuals

2000 0 2000 4000 6000 0 10 20 YearsofExperience 30 40

Copyright 2011 Pearson Canada Inc.

451

Instructor's Solutions Manual Chapter 14

Residualsfor Marchapex Sales Model (Yearsof ExperienceandLocalAdvertisingBudgetasExplanatory Variables)


5000 3000

Residual

1000 1000 3000 5000 40000 45000 50000 55000 60000 65000 70000 75000 80000 PredictedSales

The histogram of the residuals is somewhat bimodal, with some left-skewness. While not significantly non-normal, such a histogram suggests some caution when using the model.

Residualsfor Marchapex Sales Model (Yearsof


14 12
Frequency

ExperienceandLocalAdvertisingBudgetas ExplanatoryVariables)

10 8 6 4 2 0

Residual

A 95% prediction interval for a salesperson with 15 years of experience, and a local advertising budget of $4,000 would be ($42,081, $50,035).

Copyright 2011 Pearson Canada Inc.

452

Anda mungkin juga menyukai