# Microeconometrics

Alex Armand
aarmand@unav.es

January 2016

Lecture summary

Exogeneity: an example

Exogeneity: an example
Malaria is transmitted by mosquitoes, at night, from person to person
Eritrea has been particularly aected by Malaria
Economy depends on agriculture, possible mainly during the rainy
season (malaria season)

## What are the available technologies to eradicate the risk of

infection?
IRS campaign
Introduced IRS on a voluntary basis: this can be eective in
eradicating malaria in a low transmission setting like Eritrea

Estimation

## Estimate the eect of providing IRS on the percentage of individuals

infected with Malaria in village j
Yj is percentage of individuals infected
dj is the percentage of individuals who received IRS
Estimate the impact of the policy comparing individuals treated and
non-treated
Yj

= +

dj + Xj0 + j

Is this correct?

y=

1x

(1)

+u

## where u is thought to be correlated with x

If Cov (x, u) 6= 0 (x is endogenous) then

## If we have a rich set of controls, we might be able to break the link

between u and x but that is an OLS solution.

Instrumental variable

## An instrumental variable, z, for x has two properties

Cov (z, u) = 0 (exogeneity)

(2)

## Cov (z, x) 6= 0 (relevance)

(3)

A key dierence is that while we must take (2) on faith, we can always
test the null that z and x are uncorrelated given a sample of data (and
hope to strongly reject the null that they are uncorrelated).

Identification of

Cov (z, y ) =
Identify

1 Cov (z, x)

1x

+u

+ Cov (z, u)

1

Cov (z, y )
Cov (z, x)

(4)

## is identified because it is a function of population moments of

variables we can observe.
1

IV Estimator

1,IV

=
=

PN

Pi=1
N

(zi

z)(yi

y)

z)(xi x)
i=1 (zi
P
N
N 1 i=1 (zi z)ui
P
N 1 N
z)(xi
i=1 (zi

x)

algebra of plims

Endogenous instrument

## Use a slightly endogenous instrument rather than OLS?

Allowing u to be correlated with both x and z
for example, Cov (x, u) = Corr (x, u)
plim 1,OLS =

x u
u

and
plim 1,IV =

x

Corr (x, u)

(5)

Corr (z, u)
Corr (z, x)

(6)

13/76

Weak instrument I

## z is a weak instrument if Corr (z, x) is small

A small correlation between z and u can produce a larger asymptotic
bias than OLS
In economics, very common to see IV estimates that are larger in
magnitude than OLS estimates
Weak instruments lead to large asymptotic standard errors

Weak instrument II
Under a homoskedasticity assumption
=

2
u
2 2
x z,x

N( 1,IV

1)

p
Avar N( 1,IV
Asymptotic standard deviation of
u
x

1)

(7)
is

1
|z,x |

## When |z,x | is small, the asymptotic variance can be very large

The formula for the OLS estimator omits |z,x |

Example

log(wage) =

1 educ

+u

(8)

Suggestions for z:
mothers education
number of siblings
distance to the nearest college at age 16
z binary, z = 1 if born in first quarter of year
z is a randomly assigned education grant during high school

General IV approach

Population model
y =x +u
where x is 1 K ,

(9)

## OLS and IV are dierent estimation methods that can be applied to

the same model.
They are consistent under dierent assumptions.

Exclusion restriction

## Let z = (z1 , z2 , ..., zL ) be a 1 L vector, where z1 = 1 almost always.

z contains all exogenous elements of x

## if one or more elements of x is correlated with u, z must contain some

outside variables

z is exogenous

Identification

Suppose L = K .
For example, x = (1, x2 , ..., xK

1 , xK )

1 , z1 )

## xK is possibly endogenous and z1 as an IV for xK .

Using (9) and (10),
E (z0 y ) = E (z0 x) + E (z0 u)
0

= E (z x)

Rank condition

## If we assume the rank condition

rank E (z0 x) = K
Then

= [E (z0 x)]

E (z0 y )

(13)
(14)

## This extends the moment condition for OLS

special case z = x

IV Estimator

^IV =

N
X
i=1

z0i xi

N
X
i=1

z0i yi

and (12)

Matrix notation

^IV = (Z0 X/N)
where

## (Z0 Y/N) = (Z0 X)

1 0

Z Y,

0 1
0 1
0 1
x1
y1
z1
Bx2 C
B y2 C
Bz2 C
B C
B C
B C
X = B . C, Y = B . C, Z = B . C
nk
@ .. A n1 @ .. A nk @ .. A
xn
yn
zn

Instrument exogeneity

## The condition E (z0 u) = 0 is called instrument exogeneity

Rank condition (12) is also called instrument relevance
Without extra information we cannot test E (z0 u) = 0 because u is
unobserved
We might have proxies for u, but then those would likely be in x.

Rank Condition I

## We can estimate E (z0 x) so we can test the rank condition

Dicult in general, but easy with a single endogenous explanatory
variable, xK
Write the reduced form of xK as
xK =

2 x2

+ ... +

K 1 xK 1

+ 1 z 1 + r k

(16)

where, by definition
E (rK ) = 0, Cov (xj , rK ) = 0, j = 2, ..., K

## Alex Armand aarmand@unav.es

Microeconometrics

## 1, Cov (z1 , rK ) = 0 (17)

Rank Condition II
In other words, the linear projection of xK on (1, x2 , ..., xK
L(xK |1, x2 , ..., xK

1 , z1 )

2 x2

+ ... +

K 1 xK 1

1 , z1 )

+ 1 z 1

is
(18)

1 6= 0

(19)

## OLS consistently estimates the parameters of a linear projection (not

necessarily unbiased).
Need to reject
H0 : 1 = 0

(20)

be used.

## Do not care about the j in (18), but x2 , ..., xK 1 must be partialled

out. (z1 could be correlated with xK , but we require that xK is
partially correlated with z1 .)
xK can be discrete, continuous, or a hybrid. Regardless of the nature
of xK , the linear projection is well-defined. The IV estimator is
consistent under E (z0 u) = 0 if
L(xK |1, x2 , ..., xK

1 , z1 )

1 ).

## This is just another way to say that, in a linear sense, z1 helps to

predict xK controlling for the other exogenous variables.
Regressing xK on 1, x2 , ..., xK
first-stage regression.

1 , z1

## using the data is often called the

Microeconometrics

26/76

## In some cases, we have more instruments than we need.

For example, if we can use mothers education as an IV, why not
fathers education?
Linear model
y

= x +u

E (z u) = 0
where L = dim(z)

Overidentification

When L > K , have more than one IV estimator. We say the model
(21) is (potentially) overidentified.
When L = K and the rank condition holds, the model is just
identified.
Suppose z1 and z2 are IVs for xK
Under a homoskedasticity assumption, the best IV for xK is the linear
combination of all exogenous variables defined by the linear projection.

## In general, the best vector of IVs for x is the vector of linear

projections of each element of x on z
Write the LPs in error form as
x = z + r

1K

1LLK

1K

= [E (z0 z)]
LL

and

## For each xj we can write

xj = z j + rj xj + rj
where j (L 1) is the j th column of .

## Exogenous variables act as their own instruments

In the general case, use

x = z

Because z is exogenous, so is x
E (x0 u) = 0
The rank condition becomes
rank E (x0 x) = K
Substituting for x
E (x0 x) = 0 E (z0 x) = E (x0 z)[E (z0 z)]

2SLS assumptions

Formally, here are the first two assumptions for 2SLS, stated in the
population.
Assumption 2SLS.1 (Exogenous Instruments): E (z0 u) = 0.
Assumption 2SLS.2 (Rank Condition):
(a) rank E (z0 z) = L: rules out perfect collinearity among the
exogenous variables (which means we cannot use linear combinations of
(b) rank E (z0 x) = K : requires L K .

With

= [E (x0 x)]

E (x0 y ),

## need to worry about unknown because xi = zi .

Two-step estimation:
= (Z0 Z)
(1) Run the regression xi on zi , i = 1, ..., N to obtain

Z0 X.

i = 1, ..., N
^
xi = zi ,

(2) Use ^
xi as the vector of IVs for xi :
^IV =

N
X
i=1

^
x0i xi

N
X

^
x0i yi

i=1

## We can write this dierently. Because

xi
N
X

^
x0i^
ri

= ^
xi +^
ri
= 0 (by OLS FOCs)

i=1

2SLS Estimator

So

N
X

^
x0i xi =

i=1

N
X

^
x0i ^
xi

i=1

and then the IV estimator can be written as a two stage least squares
estimator:
! 1
!
N
N
X
X
1
0
1
0
^2SLS = N
^
xi ^
xi
N
^
x i yi .
i=1

## 2SLS residuals and orthogonality

The 2SLS residuals are defined as
ui = yi

xi ^2SLS ,

where it is xi , not ^
xi , multiplying ^2SLS .
An algebraic fact is that the ui are orthogonal to ^
xi in the sample.
This is the condition that determines ^2SLS :
N
X

^
x0i ui = 0

i=1

^2SLS

^0 X)
^
= (X

1 ^0

XY

= [(X0 Z)(Z0 Z)
=

(Z0 X)]

(Z0 Z/N)

(X0 Z)(Z0 Z)
1

(Z0 X/N)]

1
1

(Z0 Y)

(X0 Z/N)

(Z0 U/N)

## where the last expression can be used to show consistency by applying

the WLLN to each term, along with the rank condition and E (z0 u) = 0

Key Result:
Under 2SLS.1 and 2SLS.2, ^2SLS on a random sample is consistent for
.
For inference, it is useful to show
! 1
!
N
N
X
X
p
1
0

1/2
0
N( ^2SLS
)= N
xi xi
N
xi ui + op (1)
i=1

i=1

xi

where the
= zi are the linear projections.
It follows that
p
d
N( ^2SLS
) ! Normal(0, A

BA

A = E (x0
i xi )

B = E (ui2 x0
i xi )

## u 2 is uncorrelated with all elements of z as well as zj2 and zj zh . That

is,
E (u 2 z0 z) = E (u 2 )E (z0 z) 2 E (z0 z).
Under 2SLS.1, 2SLS.2, and 2SLS.3
p
d
N( ^2SLS
) ! Normal(0,

Consistent estimators of
2

and A:

= (N

K)

N
X
i=1

^=N
A

N
X

ui2 !

^
x0i ^
xi

i=1

## Under 2SLS.1, 2SLS.2, and 2SLS.3, we can use

^
[ ( ^2SLS ) = 2 A
Avar

## A little endogeneity of one or more instruments can lead to large

inconsistency if the instruments are weak

4. Application: Endogeneity of Children in Labor Supply

DataLabor Supply
Data are a subset from Angrist and Evans (AER, 1998), LABSUP.DTA.

Data are a subset from Angrist and Evans (AER, 1998), LABSUP.DTA.
. use labsup.dta
. * Women are black or Hispanic (possibly both).
. des hours nonmomi kids educ age black hispan samesex
storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------hours
byte
%8.0g
hours of work per week, mom
nonmomi
float %9.0g
non-mom income, \$1000s
kids
byte
%8.0g
number of kids
educ
byte
%8.0g
moms years of education
age
byte
%8.0g
age of mom
black
byte
%8.0g
=1 of black
hispan
byte
%8.0g
=1 if hispanic
samesex
byte
%8.0g
first two kids are of same sex

## . sum hours nonmomi kids educ age black hispan

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------hours |
31857
21.22011
19.49892
0
99
nonmomi |
31857
31.7618
20.41241 -39.93675
157.438
kids |
31857
2.752237
.9771916
2
12
educ |
31857
11.00534
3.305196
0
20
age |
31857
29.74175
3.613745
21
35
-------------+-------------------------------------------------------black |
31857
.4129705
.4923753
0
1
hispan |
31857
.593182
.4912481
0
1
. count if hours == 0
13068
. count if hours == 40
11245
. * hours has lots of discreteness. Classical linear model
. * are clearly violated.
. tab kids
number of |
kids |
Freq.
Percent
Cum.
------------+----------------------------------2 |
16,215
50.90
50.90
3 |
10,014
31.43
82.33
4 |
3,736
11.73
94.06
5 |
1,374
4.31
98.37
6 |
323
1.01
99.39
7 |
134
0.42
99.81
8 |
47
0.15
99.96
9 |
6
0.02
99.97
10 |
4
0.01
99.99
11 |
2
0.01
99.99
12 |
2
0.01
100.00
------------+----------------------------------Total |
31,857
100.00
. tab samesex
first two |
|

kids are of
11 |
2
0.01
99.99
12 |
2
0.01
100.00
------------+----------------------------------Total |
31,857
100.00
. tab samesex
first two |
kids are of |
same sex |
Freq.
Percent
Cum.
------------+----------------------------------0 |
15,840
49.72
49.72
1 |
16,017
50.28
100.00
------------+----------------------------------Total |
31,857
100.00

OLS estimates
Each child beyond the first two reduces estimated hours by about 2.3
hours, other things fixed

. * First use OLS with heteroskedasticity-robust standard errors:

. reg hours kids nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
377.87
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -2.325836
.1155164
-20.13
0.000
-2.552253
-2.099419
nonmomi | -.0578328
.0053515
-10.81
0.000
-.068322
-.0473436
educ |
.5860083
.0374881
15.63
0.000
.5125302
.6594865
age |
2.048793
.4483823
4.57
0.000
1.169946
2.927639
agesq | -.0277198
.0076957
-3.60
0.000
-.0428036
-.012636
black |
1.058285
1.35088
0.78
0.433
-1.589492
3.706063
hispan | -5.114147
1.35152
-3.78
0.000
-7.763179
-2.465116
_cons | -10.44695
6.588891
-1.59
0.113
-23.36143
2.467528
## But what if kids is endogenous?

Assume samesex is exogenous to the labor supply equation.
Is samesex partially correlated with kids?
Estimate the reduced form for kids (first-stage regression):

## Endogeneity: samesex as instrument

. reg kids samesex nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
437.80
0.0000
0.1191
.91724

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------samesex |
.0703744
.0102783
6.85
0.000
.0502285
.0905202
nonmomi | -.0027871
.000257
-10.85
0.000
-.0032907
-.0022834
educ | -.0853676
.0020296
-42.06
0.000
-.0893457
-.0813895
age |
.0589312
.0203278
2.90
0.004
.019088
.0987744
agesq |
1.98e-06
.0003559
0.01
0.996
-.0006956
.0006995
black |
.0128681
.0644422
0.20
0.842
-.113441
.1391772
hispan | -.0424722
.0644997
-0.66
0.510
-.1688941
.0839498
_cons |
2.010258
.2930274
6.86
0.000
1.435913
2.584603
-----------------------------------------------------------------------------. * Yes: Having the first two children the same gender means the expected
. * number of children is estimated to be .07 higher.
2SLS estimates
. * Now compute the IV (2SLS) estimates:
Much
bigger eect using IV, but only marginally statistically significant
. ivreg hours nonmomi educ age agesq black hispan (kids = samesex), robust
Instrumental variables (2SLS) regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
304.81
0.0000
0.0583
18.924

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -4.878903
3.013547
-1.62
0.105
-10.78557
1.027766
nonmomi | -.0649179
.0099359
-6.53
0.000
-.0843926
-.0454432
educ |
.368042
.2595992
1.42
0.156
-.1407823
.8768664
age |
2.200964
.4845126
4.54
0.000
1.2513
3.150627
agesq | -.0277443
.007744
-3.58
0.000
-.042923
-.0125657
black |
1.094986
1.376742
0.80
0.426
-1.603482
3.793454
hispan | -5.217758
1.381364
-3.78
0.000
-7.925284
-2.510232
_cons | -5.253976
9.037541
-0.58
0.561
-22.9679
12.45995
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex
-----------------------------------------------------------------------------. * Much bigger effect using IV, but only marginally statistically significant.
Weak instrument
The partial correlation is even smaller. Its not surprising the IV
estimate is much less precise than OLS
. corr kids samesex
(obs=31857)
|
kids samesex
-------------+-----------------kids |
1.0000
samesex |
0.0358
1.0000

## Consider the model

y

= x +u

E (z u) = 0
so that the 1 L vector z is assumed exogenous.
We want to test whether x is endogenous
null hypothesis is E (x0 u) = 0

## Alex Armand aarmand@unav.es

Microeconometrics

51/76

Durbin-Wu-Hausman (DWH)

## The Durbin-Wu-Hausman (DWH) takes the null to be that

^2SLS

p
^OLS !
0

If all elements of x are exogenous then 2SLS and OLS should dier
only due to sampling error.

To
p obtain a test statistic we need to find the limiting distribution of
N( ^2SLS ^OLS )

## Alex Armand aarmand@unav.es

Microeconometrics

52/76

Write
y1 = z1

+ y2 1 + u1

(23)

## where z1 is 1 L1 , y2 is 1 G1 , and the entire vector of all

instruments is z = (z1 , z2 ), where z2 is 1 L2 with L2 G1 .
Write the reduced forms as

y2 = z2 + v2
0

E (z v2 ) = 0

## Alex Armand aarmand@unav.es

Microeconometrics

53/76

u1
0
E (v2 e1 )

= v2 1 + e1
= 0

## We also know E (z0 e1 ) = 0.

Writing
y1 = z1

+ y2 1 + v2 1 + e1

## Alex Armand aarmand@unav.es

Microeconometrics

54/76

Procedure

## Regress yi2 on zi to obtain the 1 G1 reduced form residuals, ^

vi2 (one
vector for each observation i). This can be done for each element of
yi2 separately.

## Run the regression

yi1 on zi1 , yi2 ,^
vi2

(24)

coecients on ^
vi2 .

## Alex Armand aarmand@unav.es

Microeconometrics

55/76

## Extension of the procedure

Sometimes we may want to test the null hypothesis that a subset of
explanatory variables is exogenous while allowing another set of
variables to be endogenous. Write an expanded model as
y1 = z1
where 1 is G1 1 and

+ y2 1 + y3

(25)

+ u1

is J1 1.

## We allow y2 to be endogenous and test H0 : E (y30 u1 ) = 0.

The relevant equation is now y1 = z1
when we operationalize it,
yi1 = zi1

## Alex Armand aarmand@unav.es

## Because y2 is allowed to be endogenous under H0 , we cannot estimate

(31) by OLS in order to test H0 : 1 = 0.
Apply 2SLS to (31) with instruments (zi , yi3 ,^
vi3 );
remember, (y3 , v3 ) are exogenous in the augmented equation. In eect,
we still instrument for yi2 but yi3 and ^
vi3 act as their own instruments.

## The usual Wald statistic for 2SLS for testing H0 : 1 = 0 is

asymptotically valid under H0 .

## Alex Armand aarmand@unav.es

Microeconometrics

57/76

## Testing Overidentifying Restrictions

If we have more instruments than the number we need, we can test
whether some of them are exogenous.
Write the equation as
y1 = z1

+ y2 1 + u1

(27)

where z1 is 1 L1 and y2 is 1 G1 .

## The entire vector of instruments is z = (z1 , z2 ), where z2 is 1 L2 .

The equation is overidentified if L2 > G1 .

## The 2SLS estimator uses L1 + L2 moment conditions to estimate

L1 + G1 parameters
L2

## Alex Armand aarmand@unav.es

Microeconometrics

58/76

Example

## y2 = educ and z2 = (motheduc, fatheduc)

Problem is the test will have weak power if the two IV estimators are
biased in a similar way
A failure to reject should not make us too confident.
A rejection indicates that one or both IVs fail the exogeneity
requirement
we do not know which one or whether it is both.

## Alex Armand aarmand@unav.es

Microeconometrics

59/76

## Regression-based tests are convenient.

Under homoskedasticity (Assumption 2SLS.3), obtain NRu2 from
ui1 on zi

(28)

where ui1 are the 2SLS residuals and z is the vector of all exogenous
variables.

## Alex Armand aarmand@unav.es

Microeconometrics

60/76

N

N
X
i=1

sample:
N
N
X
X
1
0
1
N
^
xi1 ui1 = N
(zi 1 )0 ui1 = 0
(30)
i=1

i=1

## 1 is the L K1 matrix from regressing xi1 = (zi1 , yi2 ) on zi

where
1 are the fitted values.
and ^
xi1 = zi

E (z0 u) = 0
2 0

E (u z z) =
it can be shown

NRu2

(31)
2

E (z z)

2
L2 G 1

(32)
(33)

. reg hours kids nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 7, 31849)
Prob
F
R-squared
Root MSE

31857
377.87
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.325836
.1155164
-20.13
0.000
-2.552253
-2.099419
nonmomi | -.0578328
.0053515
-10.81
0.000
-.068322
-.0473436
educ |
.5860083
.0374881
15.63
0.000
.5125302
.6594865
age |
2.048793
.4483823
4.57
0.000
1.169946
2.927639
agesq | -.0277198
.0076957
-3.60
0.000
-.0428036
-.012636
black |
1.058285
1.35088
0.78
0.433
-1.589492
3.706063
hispan | -5.114147
1.35152
-3.78
0.000
-7.763179
-2.465116
_cons | -10.44695
6.588891
-1.59
0.113
-23.36143
2.467528
------------------------------------------------------------------------------

2SLS estimates
. * Now compute the IV (2SLS) estimates:
Much
bigger eect using IV, but only marginally statistically significant
. ivreg hours nonmomi educ age agesq black hispan (kids = samesex), robust
Instrumental variables (2SLS) regression

Number of obs
F( 7, 31849)
Prob > F
R-squared
Root MSE

=
=
=
=
=

31857
304.81
0.0000
0.0583
18.924

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -4.878903
3.013547
-1.62
0.105
-10.78557
1.027766
nonmomi | -.0649179
.0099359
-6.53
0.000
-.0843926
-.0454432
educ |
.368042
.2595992
1.42
0.156
-.1407823
.8768664
age |
2.200964
.4845126
4.54
0.000
1.2513
3.150627
agesq | -.0277443
.007744
-3.58
0.000
-.042923
-.0125657
black |
1.094986
1.376742
0.80
0.426
-1.603482
3.793454
hispan | -5.217758
1.381364
-3.78
0.000
-7.925284
-2.510232
_cons | -5.253976
9.037541
-0.58
0.561
-22.9679
12.45995
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex
-----------------------------------------------------------------------------. * Much bigger effect using IV, but only marginally statistically significant.
2 Instruments
. * Now
samesex
and and
multi2nd
as IVs(1for
kids.
Nowuseuse
samesex
multi2nd
if second

## are twins) as IVs for kids.

. * Estimate
form:
Estimatethe
thereduced
reduced
form:
. reg kids samesex multi2nd nonmomi educ age agesq black hispan, robust
Linear regression

Number of obs
F( 8, 31848)
Prob
F
R-squared
Root MSE

31857
410.77
0.0000
0.1244
.91452

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------samesex |
.07044
.0102481
6.87
0.000
.0503533
.0905267
multi2nd |
.7632484
.0546856
13.96
0.000
.6560626
.8704342
nonmomi | -.0027879
.0002562
-10.88
0.000
-.0032901
-.0022858
educ | -.0853114
.0020267
-42.09
0.000
-.0892838
-.0813391
age |
.0563395
.020282
2.78
0.005
.016586
.0960929
agesq |
.0000436
.0003551
0.12
0.902
-.0006524
.0007396
black |
.0105681
.0645589
0.16
0.870
-.1159698
.1371059
hispan | -.0420447
.0646128
-0.65
0.515
-.1686882
.0845988
_cons |
2.043467
.2924263
6.99
0.000
1.4703
2.616634
------------------------------------------------------------------------------

( 1)
( 2)

samesex
multi2nd
F(

0
0

2, 31848)
Prob
F

117.38
0.0000

## Clearly the two. IV

aredirection
partially correlated
withthat
kids,weboth
in
* candidates
both in the
(positive)
expect.
the direction (positive) that we expect.
. * Get the reduced form residuals.

. * Test
equation:
Test the
thenull
null that
that kids
kids is
is exogenous
exogenousininthe
thehours
hours
equation:
. reg hours kids nonmomi educ age agesq black hispan v2h, robust
Linear regression

Number of obs
F( 8, 31848)
Prob
F
R-squared
Root MSE

31857
330.79
0.0000
0.0727
18.779

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.986165
1.284302
-2.33
0.020
-5.503447
-.4688828
nonmomi | -.0596653
.0064263
-9.28
0.000
-.072261
-.0470696
educ |
.5296332
.1154311
4.59
0.000
.3033839
.7558825
age |
2.08815
.4545537
4.59
0.000
1.197208
2.979093
agesq | -.0277261
.0076958
-3.60
0.000
-.0428101
-.0126422
black |
1.067778
1.350595
0.79
0.429
-1.57944
3.714995
hispan | -5.140945
1.352129
-3.80
0.000
-7.791169
-2.490721
v2h |
.665256
1.290263
0.52
0.606
-1.86371
3.194222
_cons | -9.103833
7.093029
-1.28
0.199
-23.00644
4.798776
-----------------------------------------------------------------------------. * The test statistic is only about .52, so there is little evidence that kids
. * is endogenous.

The test statistic is only about .52, so there is little evidence that kids is endogenous.
is endogenous.

## Alex Armand aarmand@unav.es

Microeconometrics

68/76

. * Now
compute
the 2SLS
estimates:
Now
compute
the 2SLS
estimates:
. ivreg hours nonmomi educ age agesq black hispan (kids
robust
Instrumental variables (2SLS) regression

samesex multi2nd),

Number of obs
F( 7, 31849)
Prob
F
R-squared
Root MSE

31857
310.81
0.0000
0.0717
18.789

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------kids | -2.986165
1.28219
-2.33
0.020
-5.499307
-.473022
nonmomi | -.0596653
.0064235
-9.29
0.000
-.0722555
-.0470751
educ |
.5296332
.1152961
4.59
0.000
.3036484
.755618
age |
2.08815
.4545798
4.59
0.000
1.197156
2.979144
agesq | -.0277261
.0076979
-3.60
0.000
-.0428143
-.012638
black |
1.067778
1.355563
0.79
0.431
-1.589178
3.724733
hispan | -5.140945
1.357096
-3.79
0.000
-7.800906
-2.480985
_cons | -9.103834
7.092956
-1.28
0.199
-23.0063
4.798632
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
-----------------------------------------------------------------------------. * Note that these are the same as the CF estimates.

## Note that these are the same as the CF estimates.

## . predict u1h, resid

the single
single overidentifying
overidentifying restriction
restriction using
using nonrobust
nonrobusttest:
test:
. * Test
Test the
. reg u1h samesex multi2nd nonmomi educ age agesq black hispan
Source |
SS
df
MS
------------- -----------------------------Model | 176.258976
8
22.032372
Residual | 11242898.1 31848 353.017398
------------- -----------------------------Total | 11243074.3 31856 352.934277

Number of obs
F( 8, 31848)
Prob
F
R-squared
Root MSE

31857
0.06
0.9999
0.0000
-0.0002
18.789

-----------------------------------------------------------------------------u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------samesex | -.1331695
.2105507
-0.63
0.527
-.5458569
.2795179
multi2nd |
.357619
1.136161
0.31
0.753
-1.869301
2.584539
nonmomi |
.0000221
.0053906
0.00
0.997
-.0105436
.0105879
educ |
.0000136
.0353226
0.00
1.000
-.06922
.0692472
age |
.0000577
.4481451
0.00
1.000
-.8783239
.8784393
agesq | -2.46e-06
.0077015
-0.00
1.000
-.0150978
.0150929
black |
.0017749
1.3505
0.00
0.999
-2.645257
2.648807
hispan |
.0037765
1.352616
0.00
0.998
-2.647404
2.654957
_cons |
.0605262
6.5755
0.01
0.993
-12.82771
12.94876
------------------------------------------------------------------------------

## R-squared is zero to four decimal places, but N is large so compute

. * R-squared is zero to four decim
the statistic:

. di e(N)*e(r2)
.49942587
. di chi2tail(1,.499)
.47993984

* So.48,the
p-value
is against
So the p-value .
showing
little evidence
* overidentifying restriction
overidentifying .
restriction

## Alex Armand aarmand@unav.es

Microeconometrics

71/76

. * Now compute the heteroskedasticity-robust test.

. qui reg kids samesex multi2nd nonmomi educ age agesq black hispan
. predict kidsh
(option xb assumed; fitted values)
. qui reg samesex kidsh nonmomi educ age agesq black hispan
. predict r21h, resid
. qui reg multi2nd kidsh nonmomi educ age agesq black hispan
. predict r22h, resid
. reg u1h r21h, nocons robust
Linear regression

Number of obs
F( 1, 31856)
Prob
F
R-squared
Root MSE

31857
0.51
0.4767
0.0000
18.786

-----------------------------------------------------------------------------|
Robust
u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------r21h |
-.166174
.2335323
-0.71
0.477
-.6239062
.2915583
------------------------------------------------------------------------------

## . reg u1h r22h, nocons robust

Linear regression

Number of obs
F( 1, 31856)
Prob
F
R-squared
Root MSE

31857
0.51
0.4767
0.0000
18.786

-----------------------------------------------------------------------------|
Robust
u1h |
Coef.
Std. Err.
t
P |t|
[95% Conf. Interval]
------------- ---------------------------------------------------------------r22h |
1.800574
2.530425
0.71
0.477
-3.159156
6.760305
-----------------------------------------------------------------------------. * Get the same answer since only the absolute value of the t matters.
. * Equivalently, use the F statistic reported in the upper right-hand
. * corner.
Get the same answer since only the absolute value of the t matters.

Now. *use
. ivregress 2sls hours nonmomi educ age agesq black hispan (kids = samesex
multi2nd), robust first
First-stage regressions
----------------------Number of obs
F(
8, 31848)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

31857
410.77
0.0000
0.1244
0.1242
0.9145

-----------------------------------------------------------------------------|
Robust
kids |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nonmomi | -.0027879
.0002562
-10.88
0.000
-.0032901
-.0022858
educ | -.0853114
.0020267
-42.09
0.000
-.0892838
-.0813391
age |
.0563395
.020282
2.78
0.005
.016586
.0960929
agesq |
.0000436
.0003551
0.12
0.902
-.0006524
.0007396
black |
.0105681
.0645589
0.16
0.870
-.1159698
.1371059
hispan | -.0420447
.0646128
-0.65
0.515
-.1686882
.0845988
samesex |
.07044
.0102481
6.87
0.000
.0503533
.0905267
multi2nd |
.7632484
.0546856
13.96
0.000
.6560626
.8704342
_cons |
2.043467
.2924263
6.99
0.000
1.4703
2.616634
------------------------------------------------------------------------------

## Instrumental variables (2SLS) regression

Number of obs
Wald chi2(7)
Prob > chi2
R-squared
Root MSE

=
31857
= 2176.19
= 0.0000
= 0.0717
= 18.786

-----------------------------------------------------------------------------|
Robust
hours |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -2.986165
1.282029
-2.33
0.020
-5.498896
-.4734331
nonmomi | -.0596653
.0064227
-9.29
0.000
-.0722535
-.0470771
educ |
.5296332
.1152816
4.59
0.000
.3036854
.7555811
age |
2.08815
.4545227
4.59
0.000
1.197302
2.978999
agesq | -.0277261
.0076969
-3.60
0.000
-.0428119
-.0126404
black |
1.067778
1.355393
0.79
0.431
-1.588743
3.724299
hispan | -5.140945
1.356926
-3.79
0.000
-7.800471
-2.48142
_cons | -9.103834
7.092065
-1.28
0.199
-23.00403
4.796358
-----------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
. estat endog
Tests of endogeneity
Ho:
variables
are exogenous
hispan | -5.140945
1.356926
-3.79
0.000
-7.800471
-2.481
_cons | -9.103834
7.092065
-1.28
0.199
-23.00403
4.7963
---------------------------------------------------------------------------Instrumented: kids
Instruments:
nonmomi educ age agesq black hispan samesex multi2nd
. estat endog
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1)
Robust regression F(1,31848)

=
=

.266346
.26584

(p = 0.6058)
(p = 0.6061)

. estat overid
Test of overidentifying restrictions:
Score chi2(1)

