matrix ; beduc=init(100,1,0)$
proc$
draw ; n=25 $
regress; quietly ; lhs=hhninc ; rhs = one,educ $
matrix ; beduc(i)=b(2) $
sample;all$
endproc$
execute ; i=1,100 $
histogram;rhs=beduc; boxplot $
b ( X'X ) 1 X'y
= ( X'X ) 1 X'(X + ) = ( X'X ) 1 X'
b = The true parameter plus sampling error.
Also
= ( X'X ) 1 i1 x i y i
n
b ( X'X ) 1 X'y
( X'X ) 1 X' ( X'X ) 1 i1 x ii = i1 ( X'X) 1 x ii
n n
= i1 v ii
n
b ( X'X ) 1 X'y
= ( X'X ) 1 X'(X + ) = ( X'X ) 1 X'
E[b|X ]= We extablished this earlier.)
Var[b | X ] E[(b )(b ) ' | X]
= E ( X'X) 1 X' ' X( X'X) 1 | X
= ( X'X) 1 X'E[ ' | X] X( X'X) 1
= ( X'X) 1 X'2 I X( X'X) 1
= 2 ( X'X) 1 X'I X( X'X) 1
= 2 ( X'X) 1 X'X( X'X) 1
= 2 ( X'X) 1
7-23/72 Part 7: Finite Sample Properties of LS
Unconditional Variance
of the Least Squares Estimator
b ( X'X )1 X'y
E[b|X] =
Var[b | X ] 2 ( X'X ) 1
Var[b] = E{Var[b | X ] } + Var{E[b|X ]}
= 2E[(X'X )1 ] + Var{}
= 2E[(X'X )1 ] + 0
We will ultimately need to estimate E[( X'X ) 1 ].
We will use the only information we have, X , itself.
2. Unbiased: E[b|X] = β
v i ( X X ) 1 x i where x i is row i of X.
We derived E[b | X] and Var[b | X] earlier. The distribution
of b | X is that of the linear combination of the disturbances, i .
If i has a normal distribution, denoted ~ N[0,2 ], then
b | X = A where ~ N[0,2I] and A = ( X X) 1 X .
b | X ~ N[, A2IA ] N[, 2 ( X X) 1 ].
Note how b inherits its stochastic properties from .
e y - Xb
y X( X ' X) 1 X ' y
1
[I X( X ' X) X ']y
My M( X ) MX M M
e'e (M'(M
'M'M 'MM 'M
E[e'e| X ] 2 trace M
What is the trace of M? Trace of square matrix = sum of diagonal elements.
(Result A - 108) M is idempotent, so its trace equals its rank.
(Theorem A.4) Its rank equals the number of nonzero characeristic roots.
Characteric Roots : Signature of a Matrix = Spectral Decomposition
= Eigen (own) value Decomposition
(Definition A.16) A = CC' where
C = a matrix of columns such that CC' = C'C = I
= a diagonal matrix of the characteristic roots
(Elements of may be zero.)
(X’X)-1
s2(X’X)-1
Regardless, is unbiased.
Consider one of the unbiased coefficient estimators of k. E[bk] = k
z * z *
where z* = the vector of residuals from the regression of z on X.
z * z *
The R 2 in that regression is R z|X
2
=1- , so
i1 (zi z)
n 2
z * z* 1 R 2z|X i 1 (z i z) 2 . Therefore,
n
2
Var[c] = [zM X z ]
2 1
1 R
2
z|X
n
i 1
(z i z) 2
1 R
2
z|X
n
i 1
(z i z) 2
All else constant, the variance of the coefficient on z rises as the fit
in the regression of z on the other variables goes up. If the fit is
perfect, the variance becomes infinite.
"Detecting" multicollinearity?
1
Variance inflation factor: VIF(z) = .
1 R z|X
2
Condition number
larger than 30 is
‘large.’
In the Filippelli test, Stata found two coefficients so collinear that it dropped them from
the analysis. Most other statistical software packages have done the same thing, and
most authors have interpreted this result as acceptable for this test.
What is better: Include a variable that causes collinearity, or drop the variable
and suffer from a biased estimator?
Mean squared error would be the basis for comparison.
Some generalities. Assuming X has full rank, regardless of the condition,
b is still unbiased
Gauss-Markov still holds
Both variable coefficients are significant and must be included in the model (as per
specification).