Linear
Regression with
One Regressor
• Estimation:
– How should we draw a line through the data to estimate
the population slope?
• Answer: ordinary least squares (OLS).
– What are advantages and disadvantages of OLS?
• Hypothesis testing:
– How to test if the slope is zero?
• Confidence intervals:
– How to construct a confidence interval for the slope?
Test score
β1 = = ??
STR
• That is,
Test score = –2.28
STR
• The intercept (taken literally) means that, according to this
estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9. But this
interpretation of the intercept makes no sense – it
extrapolates the line outside the range of the data – here,
the intercept is not economically meaningful.
One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and Test Score = 657.8
predicted value: YˆAntelope = 698.9 – 2.28×19.33 = 654.8
ESS (Yˆ Yˆ )
i 1
i
2
Definition of R2: R2 = = n
TSS
(Y Y )
i 1
i
2
• R2
= 0 means ESS = 0
• R2 = 1 means ESS = TSS
• 0 ≤ R2 ≤ 1
• For regression with a single X, R2 = the square of the
correlation coefficient between X and Y
1 n 2
=
n 2 i 1
uˆi
1 n
The second equality holds because û = uˆi = 0.
n i 1
The SER:
has the units of u, which are the units of Y
measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE =
n i 1
uˆi
• The entities are selected at random, so the values of (X, Y) for different
entities are independently distributed.
( X i
X )[1 ( X i X ) (ui u )]
= i1
n
( X i
X ) 2
4-32
Copyright © 2011 Pearson Addison-Wesley. All rights reserved. i1
=
so – β1 = .
n n
Now ( X i
X )(u i u ) = ( X
i1
i
X )u i –
i1
n n
= ( X i X )u i – X i nX u
i1 i1
n
= ( X i X )u i
i1
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
4-33
n n
Substitute ( X i
X )(u i u ) = ( X i X )u i into the
i1 i1
expression for – β1:
– β1 =
so
n
( X i
X )u i
i1
– β1 = n
( X i
X ) 2
i1
n i1 ( 2X )2
where the final equality uses assumption 2. Thus,
1 var[( X i x )ui ]
var( ) = .
n 2 2
( X )
Summary so far
1. is unbiased: E( ) = β1 – just like !
2. var( ) is inversely proportional to n – just like !
1 n 1 n
n i1
vi
n i1
vi
– β1 = ≈ , where vi = (Xi – )ui
n 1 2 X2
n Xs
v2
~ N 1 , 2 2
, where vi = (Xi – μX)ui
n( X )
The math
1 var[( X i x )ui ]
var( – β1) =
n ( 2X )2
Where = var(Xi). The variance of X appears
(squared) in the denominator – so increasing the
spread of X decreases the variance of β1.
The intuition
If there is more variation in X, then there is more
information in the data that you can use to fit the
regression line. This is most easily seen in a figure…
The number of black and blue dots is the same. Using which
would you get a more accurate regression line?