Anda di halaman 1dari 77

Microeconometrics

Lecture 2 - Instrumental variables


Alex Armand
aarmand@unav.es
Universidad de Navarra

January 2016

Alex Armand aarmand@unav.es

Microeconometrics

1/76

Lecture summary

Exogeneity: an example

Instrumental Variables Estimator in the Simple Model

IV Estimation of a General Equation

Two Stage Least Squares

Application: Endogeneity of Children in Labor Supply

Testing for Endogeneity

Alex Armand aarmand@unav.es

Microeconometrics

2/76

Exogeneity: an example
Malaria is transmitted by mosquitoes, at night, from person to person
Eritrea has been particularly aected by Malaria
Economy depends on agriculture, possible mainly during the rainy
season (malaria season)

What are the available technologies to eradicate the risk of


infection?
Alex Armand aarmand@unav.es

Microeconometrics

3/76

Insecticide-Treated mosquito bed-nets

Alex Armand aarmand@unav.es

Microeconometrics

4/76

Larval Habitat Management

Alex Armand aarmand@unav.es

Microeconometrics

5/76

Indoor Residual Spraying

Alex Armand aarmand@unav.es

Microeconometrics

6/76

IRS campaign
Introduced IRS on a voluntary basis: this can be eective in
eradicating malaria in a low transmission setting like Eritrea

Alex Armand aarmand@unav.es

Microeconometrics

7/76

Estimation

Estimate the eect of providing IRS on the percentage of individuals


infected with Malaria in village j
Yj is percentage of individuals infected
dj is the percentage of individuals who received IRS
Estimate the impact of the policy comparing individuals treated and
non-treated
Yj

= +

dj + Xj0 + j

Is this correct?

Alex Armand aarmand@unav.es

Microeconometrics

8/76

Linear model and one explanatory variable

Simple linear model in the population


y=

1x

(1)

+u

where u is thought to be correlated with x


If Cov (x, u) 6= 0 (x is endogenous) then

ordinary least squares (OLS) will be inconsistent for

If we have a rich set of controls, we might be able to break the link


between u and x but that is an OLS solution.

Alex Armand aarmand@unav.es

Microeconometrics

9/76

Instrumental variable

An instrumental variable, z, for x has two properties


Cov (z, u) = 0 (exogeneity)

(2)

Cov (z, x) 6= 0 (relevance)

(3)

A key dierence is that while we must take (2) on faith, we can always
test the null that z and x are uncorrelated given a sample of data (and
hope to strongly reject the null that they are uncorrelated).

Alex Armand aarmand@unav.es

Microeconometrics

10/76

Identification of

Apply the covariance operator to y =


Cov (z, y ) =
Identify

1 Cov (z, x)

1x

+u

+ Cov (z, u)

using exogeneity assumption


1

Cov (z, y )
Cov (z, x)

(4)

is identified because it is a function of population moments of


variables we can observe.
1

Alex Armand aarmand@unav.es

Microeconometrics

11/76

IV Estimator

Replacing the population covariances with the sample covariances


1,IV

=
=

PN

Pi=1
N

(zi

z)(yi

y)

z)(xi x)
i=1 (zi
P
N
N 1 i=1 (zi z)ui
P
N 1 N
z)(xi
i=1 (zi

x)

Consistency of 1,IV follows by the law of large numbers and the


algebra of plims

Alex Armand aarmand@unav.es

Microeconometrics

12/76

Endogenous instrument

Use a slightly endogenous instrument rather than OLS?


Allowing u to be correlated with both x and z
for example, Cov (x, u) = Corr (x, u)
plim 1,OLS =

x u
u

and
plim 1,IV =

Alex Armand aarmand@unav.es

Microeconometrics

u
x

Corr (x, u)

(5)

Corr (z, u)
Corr (z, x)

(6)

13/76

Weak instrument I

z is a weak instrument if Corr (z, x) is small


A small correlation between z and u can produce a larger asymptotic
bias than OLS
In economics, very common to see IV estimates that are larger in
magnitude than OLS estimates
Weak instruments lead to large asymptotic standard errors

Alex Armand aarmand@unav.es

Microeconometrics

14/76

Weak instrument II
Under a homoskedasticity assumption
=

2
u
2 2
x z,x

N( 1,IV

1)

p
Avar N( 1,IV
Asymptotic standard deviation of
u
x

1)

(7)
is

1
|z,x |

When |z,x | is small, the asymptotic variance can be very large


The formula for the OLS estimator omits |z,x |

Alex Armand aarmand@unav.es

Microeconometrics

15/76

Example

Estimating the return to schooling via simple regression:


log(wage) =

1 educ

+u

(8)

Suggestions for z:
mothers education
number of siblings
distance to the nearest college at age 16
z binary, z = 1 if born in first quarter of year
z is a randomly assigned education grant during high school

Alex Armand aarmand@unav.es

Microeconometrics

16/76

Example

Estimating the return to schooling via simple regression:


log(wage) =

1 educ

+u

(8)

Suggestions for z:
mothers education
number of siblings
distance to the nearest college at age 16
z binary, z = 1 if born in first quarter of year
z is a randomly assigned education grant during high school

Alex Armand aarmand@unav.es

Microeconometrics

16/76

General IV approach

Population model
y =x +u
where x is 1 K ,

(9)

is K 1, and in the vast majority of cases, x1 = 1

No such thing as an OLS or IV model

OLS and IV are dierent estimation methods that can be applied to


the same model.
They are consistent under dierent assumptions.

Alex Armand aarmand@unav.es

Microeconometrics

17/76

Exclusion restriction

Let z = (z1 , z2 , ..., zL ) be a 1 L vector, where z1 = 1 almost always.


z contains all exogenous elements of x

if one or more elements of x is correlated with u, z must contain some


outside variables

z is exogenous

Alex Armand aarmand@unav.es

E (z0 u) = 0

Microeconometrics

(10)

18/76

Identification

Suppose L = K .
For example, x = (1, x2 , ..., xK

1 , xK )

and z = (1, x2 , ..., xK

1 , z1 )

xK is possibly endogenous and z1 as an IV for xK .


Using (9) and (10),
E (z0 y ) = E (z0 x) + E (z0 u)
0

= E (z x)

Alex Armand aarmand@unav.es

Microeconometrics

by (10)

(11)
(12)

19/76

Rank condition

If we assume the rank condition


rank E (z0 x) = K
Then

= [E (z0 x)]

E (z0 y )

(13)
(14)

This extends the moment condition for OLS


special case z = x

Alex Armand aarmand@unav.es

Microeconometrics

20/76

IV Estimator

Give a random sample,


^IV =

N
X
i=1

z0i xi

N
X
i=1

z0i yi

Using the algebra of plims and the WLLN, plim( ^IV ) =


and (12)

Alex Armand aarmand@unav.es

Microeconometrics

(15)
under (10)

21/76

Matrix notation

In matrix notation we can write


^IV = (Z0 X/N)
where

(Z0 Y/N) = (Z0 X)

1 0

Z Y,

0 1
0 1
0 1
x1
y1
z1
Bx2 C
B y2 C
Bz2 C
B C
B C
B C
X = B . C, Y = B . C, Z = B . C
nk
@ .. A n1 @ .. A nk @ .. A
xn
yn
zn

Alex Armand aarmand@unav.es

Microeconometrics

22/76

Instrument exogeneity

The condition E (z0 u) = 0 is called instrument exogeneity


Rank condition (12) is also called instrument relevance
Without extra information we cannot test E (z0 u) = 0 because u is
unobserved
We might have proxies for u, but then those would likely be in x.

Alex Armand aarmand@unav.es

Microeconometrics

23/76

Rank Condition I

We can estimate E (z0 x) so we can test the rank condition


Dicult in general, but easy with a single endogenous explanatory
variable, xK
Write the reduced form of xK as
xK =

2 x2

+ ... +

K 1 xK 1

+ 1 z 1 + r k

(16)

where, by definition
E (rK ) = 0, Cov (xj , rK ) = 0, j = 2, ..., K

Alex Armand aarmand@unav.es

Microeconometrics

1, Cov (z1 , rK ) = 0 (17)

24/76

Rank Condition II
In other words, the linear projection of xK on (1, x2 , ..., xK
L(xK |1, x2 , ..., xK

1 , z1 )

2 x2

+ ... +

K 1 xK 1

1 , z1 )

+ 1 z 1

is
(18)

The rank condition (12) holds if and only if


1 6= 0

(19)

OLS consistently estimates the parameters of a linear projection (not


necessarily unbiased).
Need to reject
H0 : 1 = 0

(20)

in favor of (19) convincingly. Heteroskedasticity-robust inference can


be used.

Alex Armand aarmand@unav.es

Microeconometrics

25/76

Do not care about the j in (18), but x2 , ..., xK 1 must be partialled


out. (z1 could be correlated with xK , but we require that xK is
partially correlated with z1 .)
xK can be discrete, continuous, or a hybrid. Regardless of the nature
of xK , the linear projection is well-defined. The IV estimator is
consistent under E (z0 u) = 0 if
L(xK |1, x2 , ..., xK

1 , z1 )

6= L(xK |1, x2 , ..., xK

1 ).

This is just another way to say that, in a linear sense, z1 helps to


predict xK controlling for the other exogenous variables.
Regressing xK on 1, x2 , ..., xK
first-stage regression.

Alex Armand aarmand@unav.es

1 , z1

using the data is often called the

Microeconometrics

26/76

Two Stage Least Squares

In some cases, we have more instruments than we need.


For example, if we can use mothers education as an IV, why not
fathers education?
Linear model
y

= x +u

E (z u) = 0
where L = dim(z)

Alex Armand aarmand@unav.es

(21)
(22)

dim(x) = K .

Microeconometrics

27/76

Overidentification

When L > K , have more than one IV estimator. We say the model
(21) is (potentially) overidentified.
When L = K and the rank condition holds, the model is just
identified.
Suppose z1 and z2 are IVs for xK
Under a homoskedasticity assumption, the best IV for xK is the linear
combination of all exogenous variables defined by the linear projection.

Alex Armand aarmand@unav.es

Microeconometrics

28/76

In general, the best vector of IVs for x is the vector of linear


projections of each element of x on z
Write the LPs in error form as
x = z + r

1K

1LLK

1K

where is the L K matrix


= [E (z0 z)]
LL

and

Alex Armand aarmand@unav.es

[E (z0 x)]
LK

E (z0 r) = 0.

Microeconometrics

29/76

For each xj we can write


xj = z j + rj xj + rj
where j (L 1) is the j th column of .

Exogenous variables act as their own instruments


In the general case, use

x = z

as the 1 K vector of instruments for x.

Alex Armand aarmand@unav.es

Microeconometrics

30/76

Because z is exogenous, so is x
E (x0 u) = 0
The rank condition becomes
rank E (x0 x) = K
Substituting for x
E (x0 x) = 0 E (z0 x) = E (x0 z)[E (z0 z)]

Alex Armand aarmand@unav.es

Microeconometrics

E (z0 x).

31/76

2SLS assumptions

Formally, here are the first two assumptions for 2SLS, stated in the
population.
Assumption 2SLS.1 (Exogenous Instruments): E (z0 u) = 0.
Assumption 2SLS.2 (Rank Condition):
(a) rank E (z0 z) = L: rules out perfect collinearity among the
exogenous variables (which means we cannot use linear combinations of
exogenous variables as additional instruments).
(b) rank E (z0 x) = K : requires L K .

Alex Armand aarmand@unav.es

Microeconometrics

32/76

Deriving 2SLS: two-step estimation

With

= [E (x0 x)]

E (x0 y ),

need to worry about unknown because xi = zi .


Two-step estimation:
= (Z0 Z)
(1) Run the regression xi on zi , i = 1, ..., N to obtain

Z0 X.

Obtain the vector fitted values,


i = 1, ..., N
^
xi = zi ,

Alex Armand aarmand@unav.es

Microeconometrics

33/76

(2) Use ^
xi as the vector of IVs for xi :
^IV =

N
X
i=1

^
x0i xi

N
X

^
x0i yi

i=1

We can write this dierently. Because


xi
N
X

^
x0i^
ri

= ^
xi +^
ri
= 0 (by OLS FOCs)

i=1

Alex Armand aarmand@unav.es

Microeconometrics

34/76

2SLS Estimator

So

N
X

^
x0i xi =

i=1

N
X

^
x0i ^
xi

i=1

and then the IV estimator can be written as a two stage least squares
estimator:
! 1
!
N
N
X
X
1
0
1
0
^2SLS = N
^
xi ^
xi
N
^
x i yi .
i=1

Alex Armand aarmand@unav.es

Microeconometrics

i=1

35/76

2SLS residuals and orthogonality


The 2SLS residuals are defined as
ui = yi

xi ^2SLS ,

where it is xi , not ^
xi , multiplying ^2SLS .
An algebraic fact is that the ui are orthogonal to ^
xi in the sample.
This is the condition that determines ^2SLS :
N
X

^
x0i ui = 0

i=1

Alex Armand aarmand@unav.es

Microeconometrics

36/76

Using full data matrices and some algebra, we can write


^2SLS

^0 X)
^
= (X

1 ^0

XY

= [(X0 Z)(Z0 Z)
=

(Z0 X)]

+ [(X0 Z/N)(Z0 Z/N)


(Z0 Z/N)

(X0 Z)(Z0 Z)
1

(Z0 X/N)]

1
1

(Z0 Y)

(X0 Z/N)

(Z0 U/N)

where the last expression can be used to show consistency by applying


the WLLN to each term, along with the rank condition and E (z0 u) = 0

Alex Armand aarmand@unav.es

Microeconometrics

37/76

Key Result:
Under 2SLS.1 and 2SLS.2, ^2SLS on a random sample is consistent for
.
For inference, it is useful to show
! 1
!
N
N
X
X
p
1
0

1/2
0
N( ^2SLS
)= N
xi xi
N
xi ui + op (1)
i=1

i=1

xi

where the
= zi are the linear projections.
It follows that
p
d
N( ^2SLS
) ! Normal(0, A

BA

A = E (x0
i xi )

B = E (ui2 x0
i xi )

Alex Armand aarmand@unav.es

Microeconometrics

38/76

Assumption 2SLS.3 (Homoskedasticity)

u 2 is uncorrelated with all elements of z as well as zj2 and zj zh . That


is,
E (u 2 z0 z) = E (u 2 )E (z0 z) 2 E (z0 z).
Under 2SLS.1, 2SLS.2, and 2SLS.3
p
d
N( ^2SLS
) ! Normal(0,

Alex Armand aarmand@unav.es

Microeconometrics

39/76

Consistent estimators of
2

and A:

= (N

K)

N
X
i=1

^=N
A

N
X

ui2 !

^
x0i ^
xi

i=1

Under 2SLS.1, 2SLS.2, and 2SLS.3, we can use


^
[ ( ^2SLS ) = 2 A
Avar

Alex Armand aarmand@unav.es

^0 X)
^
/N = 2 (X

Microeconometrics

40/76

Pitfalls with 2SLS

A little endogeneity of one or more instruments can lead to large


inconsistency if the instruments are weak

The standard errors of 2SLS can be large.

Alex Armand aarmand@unav.es

Microeconometrics

41/76

Angrist and Evans (AER, 1998)

Alex Armand aarmand@unav.es

Microeconometrics

42/76

4. Application: Endogeneity of Children in


DataLabor Supply
Data are a subset from Angrist and Evans (AER, 1998), LABSUP.DTA.

Data are a subset from Angrist and Evans (AER, 1998), LABSUP.DTA.
. use labsup.dta
. * Women are black or Hispanic (possibly both).
. des hours nonmomi kids educ age black hispan samesex
storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------hours
byte
%8.0g
hours of work per week, mom
nonmomi
float %9.0g
non-mom income, $1000s
kids
byte
%8.0g
number of kids
educ
byte
%8.0g
moms years of education
age
byte
%8.0g
age of mom
black
byte
%8.0g
=1 of black
hispan
byte
%8.0g
=1 if hispanic
samesex
byte
%8.0g
first two kids are of same sex

Alex Armand aarmand@unav.es

Microeconometrics

43/76

. sum hours nonmomi kids educ age black hispan


Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------hours |
31857
21.22011
19.49892
0
99
nonmomi |
31857
31.7618
20.41241 -39.93675
157.438
kids |
31857
2.752237
.9771916
2
12
educ |
31857
11.00534
3.305196
0
20
age |
31857
29.74175
3.613745
21
35
-------------+-------------------------------------------------------black |
31857
.4129705
.4923753
0
1
hispan |
31857
.593182
.4912481
0
1
. count if hours == 0
13068
. count if hours == 40
11245
. * hours has lots of discreteness. Classical linear model
. * are clearly violated.
Alex Armand aarmand@unav.es

Microeconometrics

44/76

. tab kids
number of |
kids |
Freq.
Percent
Cum.
------------+----------------------------------2 |
16,215
50.90
50.90
3 |
10,014
31.43
82.33
4 |
3,736
11.73
94.06
5 |
1,374
4.31
98.37
6 |
323
1.01
99.39
7 |
134
0.42
99.81
8 |
47
0.15
99.96
9 |
6
0.02
99.97
10 |
4
0.01
99.99
11 |
2
0.01
99.99
12 |
2
0.01
100.00
------------+----------------------------------Total |
31,857
100.00
. tab samesex
first two |
|

kids are of
Alex Armand aarmand@unav.es

Microeconometrics

45/76

11 |
2
0.01
99.99
12 |
2
0.01
100.00
------------+----------------------------------Total |
31,857
100.00
. tab samesex
first two |
kids are of |
same sex |
Freq.
Percent
Cum.
------------+----------------------------------0 |
15,840
49.72
49.72
1 |
16,017
50.28
100.00
------------+----------------------------------Total |
31,857
100.00

Alex Armand aarmand@unav.es

Microeconometrics

46/76

OLS estimates
Each child beyond the first two reduces estimated hours by about 2.3
hours, other things fixed

. * First use OLS with heteroskedasticity-robust standard errors:


. reg hours kids nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
377.87
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -2.325836
.1155164
-20.13
0.000
-2.552253
-2.099419
nonmomi | -.0578328
.0053515
-10.81
0.000
-.068322
-.0473436
educ |
.5860083
.0374881
15.63
0.000
.5125302
.6594865
age |
2.048793
.4483823
4.57
0.000
1.169946
2.927639
agesq | -.0277198
.0076957
-3.60
0.000
-.0428036
-.012636
black |
1.058285
1.35088
0.78
0.433
-1.589492
3.706063
hispan | -5.114147
1.35152
-3.78
0.000
-7.763179
-2.465116
_cons | -10.44695
6.588891
-1.59
0.113
-23.36143
2.467528
-----------------------------------------------------------------------------. *Armand
Each child
beyond the
Alex
aarmand@unav.es

firstMicroeconometrics
two reduces estimated hours by about 2.3 hours,47/76

.
.
.
.

*
*
*
*

But what if kids is endogenous?


Assume samesex is exogenous to the labor supply equation.
Is samesex partially correlated with kids?
Estimate the reduced form for kids (first-stage regression):

Endogeneity: samesex as instrument

. reg kids samesex nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
437.80
0.0000
0.1191
.91724

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------samesex |
.0703744
.0102783
6.85
0.000
.0502285
.0905202
nonmomi | -.0027871
.000257
-10.85
0.000
-.0032907
-.0022834
educ | -.0853676
.0020296
-42.06
0.000
-.0893457
-.0813895
age |
.0589312
.0203278
2.90
0.004
.019088
.0987744
agesq |
1.98e-06
.0003559
0.01
0.996
-.0006956
.0006995
black |
.0128681
.0644422
0.20
0.842
-.113441
.1391772
hispan | -.0424722
.0644997
-0.66
0.510
-.1688941
.0839498
_cons |
2.010258
.2930274
6.86
0.000
1.435913
2.584603
-----------------------------------------------------------------------------. * Yes: Having the first two children the same gender means the expected
. * number of children is estimated to be .07 higher.
Alex Armand aarmand@unav.es

Microeconometrics

48/76

2SLS estimates
. * Now compute the IV (2SLS) estimates:
Much
bigger eect using IV, but only marginally statistically significant
. ivreg hours nonmomi educ age agesq black hispan (kids = samesex), robust
Instrumental variables (2SLS) regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
304.81
0.0000
0.0583
18.924

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -4.878903
3.013547
-1.62
0.105
-10.78557
1.027766
nonmomi | -.0649179
.0099359
-6.53
0.000
-.0843926
-.0454432
educ |
.368042
.2595992
1.42
0.156
-.1407823
.8768664
age |
2.200964
.4845126
4.54
0.000
1.2513
3.150627
agesq | -.0277443
.007744
-3.58
0.000
-.042923
-.0125657
black |
1.094986
1.376742
0.80
0.426
-1.603482
3.793454
hispan | -5.217758
1.381364
-3.78
0.000
-7.925284
-2.510232
_cons | -5.253976
9.037541
-0.58
0.561
-22.9679
12.45995
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex
-----------------------------------------------------------------------------. * Much bigger effect using IV, but only marginally statistically significant.
Alex Armand aarmand@unav.es

Microeconometrics

49/76

Weak instrument
The partial correlation is even smaller. Its not surprising the IV
estimate is much less precise than OLS
. corr kids samesex
(obs=31857)
|
kids samesex
-------------+-----------------kids |
1.0000
samesex |
0.0358
1.0000

A much larger. sample


as incorrelation
Angrist and Evans,
another Its no
* The size,
partial
is evenand
smaller.
. * is much
less precise
than OLS.
instrument indicating
a multiple
second birth
help a lot with
precision

. * A much larger sample size, as in Angrist and Eva


. * -- indicating a multiple second birth -- help a

Alex Armand aarmand@unav.es

Microeconometrics

50/76

Testing for endogeneity

Consider the model


y

= x +u

E (z u) = 0
so that the 1 L vector z is assumed exogenous.
We want to test whether x is endogenous
null hypothesis is E (x0 u) = 0

Alex Armand aarmand@unav.es

Microeconometrics

51/76

Durbin-Wu-Hausman (DWH)

The Durbin-Wu-Hausman (DWH) takes the null to be that


^2SLS

p
^OLS !
0

If all elements of x are exogenous then 2SLS and OLS should dier
only due to sampling error.

To
p obtain a test statistic we need to find the limiting distribution of
N( ^2SLS ^OLS )

Alex Armand aarmand@unav.es

Microeconometrics

52/76

Regression-based Hausman test

A regression-based Hausman test uses the control function approach.


Write
y1 = z1

+ y2 1 + u1

(23)

where z1 is 1 L1 , y2 is 1 G1 , and the entire vector of all


instruments is z = (z1 , z2 ), where z2 is 1 L2 with L2 G1 .
Write the reduced forms as

y2 = z2 + v2
0

E (z v2 ) = 0

Alex Armand aarmand@unav.es

Microeconometrics

53/76

For a G1 1 vector 1 write the linear projection of u1 on v2 :


u1
0
E (v2 e1 )

= v2 1 + e1
= 0

We also know E (z0 e1 ) = 0.


Writing
y1 = z1

+ y2 1 + v2 1 + e1

leads to a simple two-step procedure.

Alex Armand aarmand@unav.es

Microeconometrics

54/76

Procedure

Regress yi2 on zi to obtain the 1 G1 reduced form residuals, ^


vi2 (one
vector for each observation i). This can be done for each element of
yi2 separately.

Run the regression


yi1 on zi1 , yi2 ,^
vi2

(24)

and use a joint Wald test of H0 : 1 = 0, where 1 is the vector of


coecients on ^
vi2 .

Alex Armand aarmand@unav.es

Microeconometrics

55/76

Extension of the procedure


Sometimes we may want to test the null hypothesis that a subset of
explanatory variables is exogenous while allowing another set of
variables to be endogenous. Write an expanded model as
y1 = z1
where 1 is G1 1 and

+ y2 1 + y3

(25)

+ u1

is J1 1.

We allow y2 to be endogenous and test H0 : E (y30 u1 ) = 0.


The relevant equation is now y1 = z1
when we operationalize it,
yi1 = zi1

Alex Armand aarmand@unav.es

+ yi2 1 + yi3

Microeconometrics

+ y2 1 + y3

+ v3 1 + e1 , or,

+^
vi3 1 + errori

(26)

56/76

Because y2 is allowed to be endogenous under H0 , we cannot estimate


(31) by OLS in order to test H0 : 1 = 0.
Apply 2SLS to (31) with instruments (zi , yi3 ,^
vi3 );
remember, (y3 , v3 ) are exogenous in the augmented equation. In eect,
we still instrument for yi2 but yi3 and ^
vi3 act as their own instruments.

The usual Wald statistic for 2SLS for testing H0 : 1 = 0 is


asymptotically valid under H0 .

Alex Armand aarmand@unav.es

Microeconometrics

57/76

Testing Overidentifying Restrictions


If we have more instruments than the number we need, we can test
whether some of them are exogenous.
Write the equation as
y1 = z1

+ y2 1 + u1

(27)

where z1 is 1 L1 and y2 is 1 G1 .

The entire vector of instruments is z = (z1 , z2 ), where z2 is 1 L2 .


The equation is overidentified if L2 > G1 .

The 2SLS estimator uses L1 + L2 moment conditions to estimate


L1 + G1 parameters
L2

G1 overidentifying restrictions can be tested.

Alex Armand aarmand@unav.es

Microeconometrics

58/76

Example

y2 = educ and z2 = (motheduc, fatheduc)


Problem is the test will have weak power if the two IV estimators are
biased in a similar way
A failure to reject should not make us too confident.
A rejection indicates that one or both IVs fail the exogeneity
requirement
we do not know which one or whether it is both.

Alex Armand aarmand@unav.es

Microeconometrics

59/76

Regression based test

Regression-based tests are convenient.


Under homoskedasticity (Assumption 2SLS.3), obtain NRu2 from
ui1 on zi

(28)

where ui1 are the 2SLS residuals and z is the vector of all exogenous
variables.

Alex Armand aarmand@unav.es

Microeconometrics

60/76

Under the null, we should have, in the sample


N

N
X
i=1

Alex Armand aarmand@unav.es

z0i ui1 0

Microeconometrics

(29)

61/76

But we also know K1 = L1 + G1 exact moment conditions hold in the


sample:
N
N
X
X
1
0
1
N
^
xi1 ui1 = N
(zi 1 )0 ui1 = 0
(30)
i=1

i=1

1 is the L K1 matrix from regressing xi1 = (zi1 , yi2 ) on zi


where
1 are the fitted values.
and ^
xi1 = zi

Alex Armand aarmand@unav.es

Microeconometrics

62/76

Under the null hypothesis


E (z0 u) = 0
2 0

E (u z z) =
it can be shown

NRu2

(31)
2

E (z z)

2
L2 G 1

(32)
(33)

Easy to compute, but not robust to heteroskedasticity.

Alex Armand aarmand@unav.es

Microeconometrics

63/76

. * First
use OLS
thethe
effects
hours
worked:
First use
OLStotoestimate
estimate
eectsofofchildren
childrenonon
hours
worked:
. reg hours kids nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob
F
R-squared
Root MSE

31857
377.87
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.325836
.1155164
-20.13
0.000
-2.552253
-2.099419
nonmomi | -.0578328
.0053515
-10.81
0.000
-.068322
-.0473436
educ |
.5860083
.0374881
15.63
0.000
.5125302
.6594865
age |
2.048793
.4483823
4.57
0.000
1.169946
2.927639
agesq | -.0277198
.0076957
-3.60
0.000
-.0428036
-.012636
black |
1.058285
1.35088
0.78
0.433
-1.589492
3.706063
hispan | -5.114147
1.35152
-3.78
0.000
-7.763179
-2.465116
_cons | -10.44695
6.588891
-1.59
0.113
-23.36143
2.467528
------------------------------------------------------------------------------

Alex Armand aarmand@unav.es

Microeconometrics

64/76

2SLS estimates
. * Now compute the IV (2SLS) estimates:
Much
bigger eect using IV, but only marginally statistically significant
. ivreg hours nonmomi educ age agesq black hispan (kids = samesex), robust
Instrumental variables (2SLS) regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
304.81
0.0000
0.0583
18.924

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -4.878903
3.013547
-1.62
0.105
-10.78557
1.027766
nonmomi | -.0649179
.0099359
-6.53
0.000
-.0843926
-.0454432
educ |
.368042
.2595992
1.42
0.156
-.1407823
.8768664
age |
2.200964
.4845126
4.54
0.000
1.2513
3.150627
agesq | -.0277443
.007744
-3.58
0.000
-.042923
-.0125657
black |
1.094986
1.376742
0.80
0.426
-1.603482
3.793454
hispan | -5.217758
1.381364
-3.78
0.000
-7.925284
-2.510232
_cons | -5.253976
9.037541
-0.58
0.561
-22.9679
12.45995
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex
-----------------------------------------------------------------------------. * Much bigger effect using IV, but only marginally statistically significant.
Alex Armand aarmand@unav.es

Microeconometrics

65/76

2 Instruments
. * Now
samesex
and and
multi2nd
as IVs(1for
kids.
Nowuseuse
samesex
multi2nd
if second

are twins) as IVs for kids.

. * Estimate
form:
Estimatethe
thereduced
reduced
form:
. reg kids samesex multi2nd nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 8, 31848)
Prob
F
R-squared
Root MSE

31857
410.77
0.0000
0.1244
.91452

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------samesex |
.07044
.0102481
6.87
0.000
.0503533
.0905267
multi2nd |
.7632484
.0546856
13.96
0.000
.6560626
.8704342
nonmomi | -.0027879
.0002562
-10.88
0.000
-.0032901
-.0022858
educ | -.0853114
.0020267
-42.09
0.000
-.0892838
-.0813391
age |
.0563395
.020282
2.78
0.005
.016586
.0960929
agesq |
.0000436
.0003551
0.12
0.902
-.0006524
.0007396
black |
.0105681
.0645589
0.16
0.870
-.1159698
.1371059
hispan | -.0420447
.0646128
-0.65
0.515
-.1686882
.0845988
_cons |
2.043467
.2924263
6.99
0.000
1.4703
2.616634
------------------------------------------------------------------------------

Alex Armand aarmand@unav.es

Microeconometrics

66/76

. test samesex multi2nd


( 1)
( 2)

samesex
multi2nd
F(

0
0

2, 31848)
Prob
F

117.38
0.0000

. * Clearly the two IV candidates are partially corr

Clearly the two. IV


aredirection
partially correlated
withthat
kids,weboth
in
* candidates
both in the
(positive)
expect.
the direction (positive) that we expect.
. * Get the reduced form residuals.

Get the reduced form residuals.

. predict v2h, resid

predict v2h, resid

Alex Armand aarmand@unav.es

Microeconometrics

67/76

. * Test
equation:
Test the
thenull
null that
that kids
kids is
is exogenous
exogenousininthe
thehours
hours
equation:
. reg hours kids nonmomi educ age agesq black hispan v2h, robust
Linear regression

Number of obs
F( 8, 31848)
Prob
F
R-squared
Root MSE

31857
330.79
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.986165
1.284302
-2.33
0.020
-5.503447
-.4688828
nonmomi | -.0596653
.0064263
-9.28
0.000
-.072261
-.0470696
educ |
.5296332
.1154311
4.59
0.000
.3033839
.7558825
age |
2.08815
.4545537
4.59
0.000
1.197208
2.979093
agesq | -.0277261
.0076958
-3.60
0.000
-.0428101
-.0126422
black |
1.067778
1.350595
0.79
0.429
-1.57944
3.714995
hispan | -5.140945
1.352129
-3.80
0.000
-7.791169
-2.490721
v2h |
.665256
1.290263
0.52
0.606
-1.86371
3.194222
_cons | -9.103833
7.093029
-1.28
0.199
-23.00644
4.798776
-----------------------------------------------------------------------------. * The test statistic is only about .52, so there is little evidence that kids
. * is endogenous.

The test statistic is only about .52, so there is little evidence that kids
is endogenous.

Alex Armand aarmand@unav.es

Microeconometrics

68/76

. * Now
compute
the 2SLS
estimates:
Now
compute
the 2SLS
estimates:
. ivreg hours nonmomi educ age agesq black hispan (kids
robust
Instrumental variables (2SLS) regression

samesex multi2nd),

Number of obs
F( 7, 31849)
Prob
F
R-squared
Root MSE

31857
310.81
0.0000
0.0717
18.789

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.986165
1.28219
-2.33
0.020
-5.499307
-.473022
nonmomi | -.0596653
.0064235
-9.29
0.000
-.0722555
-.0470751
educ |
.5296332
.1152961
4.59
0.000
.3036484
.755618
age |
2.08815
.4545798
4.59
0.000
1.197156
2.979144
agesq | -.0277261
.0076979
-3.60
0.000
-.0428143
-.012638
black |
1.067778
1.355563
0.79
0.431
-1.589178
3.724733
hispan | -5.140945
1.357096
-3.79
0.000
-7.800906
-2.480985
_cons | -9.103834
7.092956
-1.28
0.199
-23.0063
4.798632
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
-----------------------------------------------------------------------------. * Note that these are the same as the CF estimates.

Note that these are the same as the CF estimates.


Alex Armand aarmand@unav.es

Microeconometrics

69/76

predict u1h, resid

. predict u1h, resid

the single
single overidentifying
overidentifying restriction
restriction using
using nonrobust
nonrobusttest:
test:
. * Test
Test the
. reg u1h samesex multi2nd nonmomi educ age agesq black hispan
Source |
SS
df
MS
------------- -----------------------------Model | 176.258976
8
22.032372
Residual | 11242898.1 31848 353.017398
------------- -----------------------------Total | 11243074.3 31856 352.934277

Number of obs
F( 8, 31848)
Prob
F
R-squared
Adj R-squared
Root MSE

31857
0.06
0.9999
0.0000
-0.0002
18.789

-----------------------------------------------------------------------------u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------samesex | -.1331695
.2105507
-0.63
0.527
-.5458569
.2795179
multi2nd |
.357619
1.136161
0.31
0.753
-1.869301
2.584539
nonmomi |
.0000221
.0053906
0.00
0.997
-.0105436
.0105879
educ |
.0000136
.0353226
0.00
1.000
-.06922
.0692472
age |
.0000577
.4481451
0.00
1.000
-.8783239
.8784393
agesq | -2.46e-06
.0077015
-0.00
1.000
-.0150978
.0150929
black |
.0017749
1.3505
0.00
0.999
-2.645257
2.648807
hispan |
.0037765
1.352616
0.00
0.998
-2.647404
2.654957
_cons |
.0605262
6.5755
0.01
0.993
-12.82771
12.94876
------------------------------------------------------------------------------

Alex Armand aarmand@unav.es

Microeconometrics

70/76

R-squared is zero to four decimal places, but N is large so compute


. * R-squared is zero to four decim
the statistic:

. di e(N)*e(r2)
.49942587
. di chi2tail(1,.499)
.47993984

* So.48,the
p-value
is against
aboutthe.48, sh
So the p-value .
is about
showing
little evidence
* overidentifying restriction
overidentifying .
restriction

Alex Armand aarmand@unav.es

Microeconometrics

71/76

Now compute the heteroskedasticity-robust test.

. * Now compute the heteroskedasticity-robust test.

. qui reg kids samesex multi2nd nonmomi educ age agesq black hispan
. predict kidsh
(option xb assumed; fitted values)
. qui reg samesex kidsh nonmomi educ age agesq black hispan
. predict r21h, resid
. qui reg multi2nd kidsh nonmomi educ age agesq black hispan
. predict r22h, resid
. reg u1h r21h, nocons robust
Linear regression

Number of obs
F( 1, 31856)
Prob
F
R-squared
Root MSE

31857
0.51
0.4767
0.0000
18.786

-----------------------------------------------------------------------------|
Robust
u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------r21h |
-.166174
.2335323
-0.71
0.477
-.6239062
.2915583
------------------------------------------------------------------------------

Alex Armand aarmand@unav.es

Microeconometrics

72/76

. reg u1h r22h, nocons robust


Linear regression

Number of obs
F( 1, 31856)
Prob
F
R-squared
Root MSE

31857
0.51
0.4767
0.0000
18.786

-----------------------------------------------------------------------------|
Robust
u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------r22h |
1.800574
2.530425
0.71
0.477
-3.159156
6.760305
-----------------------------------------------------------------------------. * Get the same answer since only the absolute value of the t matters.
. * Equivalently, use the F statistic reported in the upper right-hand
. * corner.
Get the same answer since only the absolute value of the t matters.

Equivalently, use the "F" test.

Alex Armand aarmand@unav.es

Microeconometrics

73/76

Now the
use the
features of
of ivregress:
Now. *use
features
ivregress:
. ivregress 2sls hours nonmomi educ age agesq black hispan (kids = samesex
multi2nd), robust first
First-stage regressions
----------------------Number of obs
F(
8, 31848)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

31857
410.77
0.0000
0.1244
0.1242
0.9145

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nonmomi | -.0027879
.0002562
-10.88
0.000
-.0032901
-.0022858
educ | -.0853114
.0020267
-42.09
0.000
-.0892838
-.0813391
age |
.0563395
.020282
2.78
0.005
.016586
.0960929
agesq |
.0000436
.0003551
0.12
0.902
-.0006524
.0007396
black |
.0105681
.0645589
0.16
0.870
-.1159698
.1371059
hispan | -.0420447
.0646128
-0.65
0.515
-.1686882
.0845988
samesex |
.07044
.0102481
6.87
0.000
.0503533
.0905267
multi2nd |
.7632484
.0546856
13.96
0.000
.6560626
.8704342
_cons |
2.043467
.2924263
6.99
0.000
1.4703
2.616634
------------------------------------------------------------------------------

Alex Armand aarmand@unav.es

Microeconometrics

74/76

Instrumental variables (2SLS) regression

Number of obs
Wald chi2(7)
Prob > chi2
R-squared
Root MSE

=
31857
= 2176.19
= 0.0000
= 0.0717
= 18.786

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -2.986165
1.282029
-2.33
0.020
-5.498896
-.4734331
nonmomi | -.0596653
.0064227
-9.29
0.000
-.0722535
-.0470771
educ |
.5296332
.1152816
4.59
0.000
.3036854
.7555811
age |
2.08815
.4545227
4.59
0.000
1.197302
2.978999
agesq | -.0277261
.0076969
-3.60
0.000
-.0428119
-.0126404
black |
1.067778
1.355393
0.79
0.431
-1.588743
3.724299
hispan | -5.140945
1.356926
-3.79
0.000
-7.800471
-2.48142
_cons | -9.103834
7.092065
-1.28
0.199
-23.00403
4.796358
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
. estat endog
Tests of endogeneity
Ho:
variables
are exogenous
Alex Armand
aarmand@unav.es

Microeconometrics

75/76

hispan | -5.140945
1.356926
-3.79
0.000
-7.800471
-2.481
_cons | -9.103834
7.092065
-1.28
0.199
-23.00403
4.7963
---------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
. estat endog
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1)
Robust regression F(1,31848)

=
=

.266346
.26584

(p = 0.6058)
(p = 0.6061)

. estat overid
Test of overidentifying restrictions:
Score chi2(1)

Alex Armand aarmand@unav.es

.506748

Microeconometrics

(p = 0.4766)

76/76