Overview
Now that we have the sampling distribution of OLS estimator,
we are ready to perform hypothesis tests about 1 and to
construct confidence intervals about its estimate.
Also, we will cover some loose ends about regression:
Heteroskedasticity and homoskedasticity (this is new)
The efficiency of OLS (this is also new)
Use of the t-statistic in hypothesis testing (new but not
surprising)
Review
The Model:
Yi = 0 + 1Xi + ui, i = 1,, n
The Least Squares Assumptions (for i =1,,n):
1. E(ui | Xi = x) = 0.
2. (Xi,Yi) are i.i.d across i.
3. Large outliers are rare (E(Yi 4) < , E(Xi 4) < ).
The Sampling Distribution of 1 :
Under the Assumptions, for n large, 1 is approximately
distributed,
~N
1
2
v
4
X
, where vi = (Xi
X)ui
3
Hypothesis Testing and the Standard Error of 1
(SW Section 5.1)
The objective is to test a hypothesis, like 1 = 0, using data to
reach a tentative conclusion whether the (null) hypothesis can be
rejected.
General setup
Null hypothesis and two-sided alternative:
H0 :
where
1,0
1,0
vs. H1:
1,0
1,0
vs. H1:
<
1,0
4
1,
t=
1,0
SE ( 1 )
var[( X i
n(
x
2
X
) ui ] =
2
v
4
X
, where vi = (Xi
X)ui.
X ) ui .
6
1
1
=
1
n
2
SE( 1 ) =
n 2
1
n
vi2
i 1
2
(Xi
, where vi = ( X i
X )ui .
X )2
i 1
1,0
vs. H1:
1,0
1
1,0
1,0
t=
= 1
SE ( 1 )
2
1
2.28 0
t-statistic testing 1,0 = 0 =
=
= 4.38
0.52
SE ( 1 )
1
1,0
(SW 5.2)
11
= { 1
1.96 SE( 1 )}
12
13
14
15
and
1:
Estimation:
OLS estimators 0 and 1
and have approximately normal sampling distributions
0
in large samples
Testing:
H0: 1 = 1,0 v. 1
1,0 ( 1,0 is the value of 1 under H0)
t = ( 1 1,0)/SE( 1 )
p-value = area under standard normal outside tact (large n)
Confidence Intervals:
95% confidence interval for 1 is { 1 1.96 SE( 1 )}
This is the set of 1 that is not rejected at the 5% level
The 95% CI contains the true 1 in 95% of all samples.
16
Now
Heteroskedasticity and homoskedasticity (this is new)
The efficiency of OLS (this is also new)
Comment on the t-statistic in hypothesis testing
17
Homoskedasticity in a picture:
19
Heteroskedasticity in a picture:
20
21
Heteroskedastic or homoskedastic?
22
Heteroskedastic or homoskedastic?
23
Heteroskedastic or homoskedastic?
24
2
u
, then
var[( X i
E [( X i
x )ui ]
var( 1 ) =
=
2 2
n(
n( X )
2 2
)
ui ]
x
2 2
X)
2
u
2
X
errors:
Homoskedasticity-only standard error formula:
n
1
SE( 1 ) =
1
n
n
1
n
ui2
i 1
(Xi
X )2
i 1
28
Practical implications
The homoskedasticity-only formula for the standard error of
and the heteroskedasticity-robust formula differ so in
1
29
Robust SE Haiku
Number of obs =
420
F( 1,
418) =
19.26
Prob > F
= 0.0000
R-squared
= 0.0512
Root MSE
= 18.581
------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
--------+---------------------------------------------------------------str | -2.279808
.5194892
-4.39
0.000
-3.300945
-1.258671
_cons |
698.933
10.36436
67.44
0.000
678.5602
719.3057
heteroskedasticity).
The two formulas coincide (when n is large) in the special
case of homoscedasticity
So, you should always use heteroskedasticity-robust standard
errors.
32
35
where
wi
(Xi
(Xi
X)
X)
Yi
wiYi
(Xi X )
( X i X )2
37
The foregoing results are impressive, but these results and the
OLS estimator have important limitations.
1. The Gauss-Markov theorem really isnt that compelling:
The condition of homoskedasticity often doesnt hold
(homoskedasticity is special)
2. OLS can be sensitive to outliers, and if there are big outliers
other estimators can be more efficient (have a smaller
variance). One such estimator is the least absolute deviations
(LAD) estimator:
n
min b0 ,b1
Yi
(b0
b1 X i )
i 1
and
For n < 30, the t critical values can be a fair bit larger than the
N(0,1) critical values
For n > 50 or so, the difference in tn2 and N(0,1) distributions
is negligible. Recall the Student t table:
degrees of freedom
10
20
30
60
120
40
Practical implications
If n < 50 then use the tn2 instead of the N(0,1) critical values
for hypothesis tests and confidence intervals.
If n > 50, we can rely on the large-n results presented earlier,
based on the CLT, to perform hypothesis tests and construct
confidence intervals using the large-n normal approximation.
41