HANDOUT #4
CORRELATION AND SIMPLE REGRESSION ANALYSIS
Instructor: Hernando Burgos-Soto
A) Formulas
1) The Pearson product-moment correlation coefficient:
=
( ) ( )
( )) ( ))
( )( )
. =
=
( ))
=
) )
or
SS01 =
( )( ) =
SS00 =
( )) =
and
. =
4) y intercept of the regression line
SS23
SS00
, = .
5) Sum of squares of errors
SSE =
( )) =
) ,
SSE
2
7) Coefficient of determination
( )) =
SS11 =
SSE . ) SS00
=1
=
SS11
SS11
H, : . = 0
H, : . = 0
H= : . 0
H= : . > 0
H= : . < 0
8)
=
. .
; where C =
C
6
SS00
and df. = 2
B) Exercises
1)
158 296 87
110 436
2) In an effort to determine whether any correlation exists between the share prices of airlines,
an analyst sampled six days of activity on the stock market. Using the following share prices
of Air Canada and WestJet, compute the coefficient of correlation. Share prices have been
rounded off to the nearest hundredth for ease of computation.
Air Canada
0.75
0.76 0.84
0.85
0.86
0.86
WestJet
3) Sketch a scatter plot from the following data, and determine the equation of the regression
line.
x
12
21
28 8
17
15
22
20
19 24
4) A corporation owns several companies. The strategic planner for the corporation believes
dollars spent on advertising can to some extent be a predictor of total sales dollars. As an
aid in long-term planning, she gathers the following sales and advertising information from
several of the companies for 2009 ($ millions).
Advertising
Sales
$148 55 338
994 541
89 126
379
Develop the equation of the simple regression line to predict sales from advertising
expenditures using these data.
5) Is it possible to predict the annual number of business bankruptcies by the number of firm
births (business starts)? The following table shows the number of business bankruptcies
(1,000s) and the number of firm births (10,000s) for a six-year period. Use these data to
develop the equation of the regression model to predict the number of business bankruptcies
by the number of firm births. Discuss the meaning of the slope.
Business Bankruptcies (1,000s)
34.3
35.0
38.5
40.1
35.5
37.9
58.1
55.4
57.0
58.5
57.4
58.0
6) Investment analysts generally believe the interest rate on bonds is inversely related to the
prime interest rate for loans; that is, bonds perform well when lending rates are down and
perform poorly when interest rates are up. Can the bond rate be predicted by the prime
interest rate? Use the following data to construct a least squares regression line to predict
bond rates by the prime interest rate.
Bond Rate
5%
12 9 15 7
16% 6 8
4 7
7) Solve for the predicted values of y and the residuals for the following data:
x
12
21
28
20
17
15
22
19
24
8) Suppose milk is produced in a certain area. Some people might argue that because of
transportation costs, the price of milk in stores increases with the distance of markets from
that area. Suppose the milk prices in eight cities are as follows.
Price of Milk (per 2 L)
1,245 425
2.37
Use the prices along with the distance of each city from the milk-producing area to develop
a regression line to predict the price of 2 L of milk by the number of kilometers the city is
from the milk-producing area. Use the data and the regression equation to compute residuals
for this model. Sketch a graph of the residuals in the order of the x values. Comment on the
shape of the residual graph.
9) In Problem 5, you were asked to develop the equation of a regression model to predict the
number of business bankruptcies by the number of firm births. Using this regression model
and the data given in Problem 5 (and provided here again), solve for the predicted values of
y and the residuals. Comment on the size of the residuals.
Business Bankruptcies (1,000s)
34.3
35.0
38.5
40.1
35.5
37.9
58.1
55.4
57.0
58.5
57.4
58.0
10) Determine the sum of squares of error (SSE) and the standard error of the estimate (6 ) for
Problem 8. Determine how many of the residuals computed are within one standard error
of the estimate. If the error terms are normally distributed, approximately how many of
these residuals should be within 16 ?
11) Determine the SSE and 6 for Problem 9. Use the residuals computed and determine how
many of them are within 16 and 26 . How do these numbers compare with what the
empirical rule says should occur if the error terms are normally distributed?
12) Determine the sum of squares of error (SSE) and the standard error of the estimate (6 ) for
Problem 7. Determine how many of the residuals computed are within one standard error
of the estimate. If the error terms are normally distributed, approximately how many of
these residuals should be within 16 ?
13) Compute ) for Problem 7. Discuss the value of ) obtained.
14) Compute ) for Problem 8. Discuss the value of ) obtained.
15) Compute ) for Problem 9. Discuss the value of ) obtained.
16) Test the slope of the regression line determined in Problem 6. Use = 0.05.
17) Test the slope of the regression line determined in Problem 7. Use = 0.01.
18) Test the slope of the regression line determined in Problem 8. Use = 0.10.
19) Test the slope of the regression line determined in Problem 9. Use a 5% level of
significance.
20) Study the following ANOVA table, which was generated from a simple regression analysis.
Discuss the F test of the overall model. Determine the value of t and test the slope of the
regression line.
Analysis of Variance
Source
DF
SS
MS
Regression
116.65
116.65
8.26
0.021
Error
112.95
14.12
Total
229.60