Anda di halaman 1dari 3

PBAF 529

Tobit Models and Heckman Selection: Dealing with selection bias


Cheat Sheet

Tobit Models
Created by James Tobin, Tobit models are similar to a linear equation, but they also
include a normally distributed error term. The dependent variable is censored, which
means that Tobit models set parameters around it.

Tobit models address problems with data due to your measurement or dataset not
capturing all the information (i.e. ceiling effects or censored data).

Example A: Effect of GRE Scores on Graduate School admissions

GRE scores have a maximum and a minimum (200 and 800, respectively). Two people may have the same score
on the maximum or minimum range, but not have the same abilities.

Example B: Effect of schooling on wages

You are trying to measure the effect of schooling on peoples wage and you only have data on the wages of
those who are working. You are missing the data for those who are not working.

How Tobit models work for example B, only lower censored:

y i= observed outcome variable of interest (wages)


Because the distribution is truncated, the outcome is only observed above a certain threshold (we only know the
wages of people who work). To get around this problem we assume that there is some latent outcome variable
yi*, which is related to the observed outcomes in the following way. (You can think of yi* as a variable that
captures the outcome variable of interest for all observations in the sample, even for those where one wasnt
observed in reality.) (Wages for people who did not work).

> 0
= {
0 0
where

= +
and

= explanatory variables

= parameters specifying relationshipbetweeen x and y

= error term

Alex Chew and Amelia Vader 1


PBAF 529

What is the probability of being observed?

Censored data leads to biased estimates using regular OLS. You can try to correct for this error by introducing
an adjustment to your equation that takes into account the probability of being observed(in our example B
this would mean the probability of working and therefore having a wage above zero). Conceptually, this implies
the following relationship between the relationship between the outcome variable of interest, the explanatory
variables, and the observed outcomes in your sample.

E(y)=F(y)E(y*)

where

E(y)=Expected value of Outcome of interest

F(y)=Probability of being observed

E(y*) =Expected value of outcome of interest conditional upon being observed

Maximum Likelihood Estimator

Implementing the adjustment to correct for selection bias requires using a maximum likelihood estimation.
This means using an equation to determine the probability of being observed in the sample. Although a
number of maximum likelihood estimations exist, one of the more common is the Heckman Selection estimator.

Heckman Selection Estimator


Heckman selection is a statistical model developed by James Heckman to correct for
selection bias. It is a means of correcting for not having a randomly selection sample (i.e.
your sample isnt representative of the group you want to study).

Heckman selection model is a type of Tobit model

How Heckman works

1. Selection Equation (Maximum Likelihood Estimator)

First, you create calculate the selection equation, or the probability that someone is working (their propensity to
be in the sample): Make a probit model for determinants of being observed and record a likelihood estimate
for each observation.

2. Add exclusionary variables

This can make your selection equation better. Otherwise, the selection may be weak. For example B, effect of
schooling on wages, pick exclusionary variables would affect the likelihood of working and not affect wage rate
(e.g. having younger children at home, student status)

Alex Chew and Amelia Vader 2


PBAF 529

3. OLS regression

Then, use a statistically adjusted value (inverse mills ratio) calculated using your selection equation as an
Independent Variable in the OLS regression for your outcome of interest. Heckman treats the selection bias as
an omitted variable bias. Plug in a statistically adjusted version of the likelihood estimate from the selection
equation as an explanatory variable in an OLS regression.

The result is a better estimate or fit compared with running it with those without wage information, or running
it with a smaller sample of only those for whom you have wage information.

Key Assumptions
Error terms for selection equation and OLS regression are jointly normal.
Vi in equation is normally distributed and E [i | Vi] is linear

When to NOT use a Tobit:

If you have heteroskedaticity in the error term.


When you dont have an instrumental variable or exclusion restriction (without these you are going off
of assumptions about the distribution).
When you dont have a theory about the selection bias. Your model is only as good as the assumptions
you are make about the bias.
If you have colinearity problems.
If parameter is very sensitive

Resources
Tobit

How to read a Tobit Output: http://www.ats.ucla.edu/stat/stata/output/stata_tobit.htm

How to use a Tobit model in Stata: http://en.wikibooks.org/wiki/Stata/Tobit_and_Selection_Models

Information on the five variations of Tobit models: http://en.wikipedia.org/wiki/Tobit_model

Information on censoring problems: http://en.wikipedia.org/wiki/Censoring_%28statistics%29

Tobit model setup in Stata: Microeconometrics Using Stata, Cameron and Trivedi. pg. 536

Heckman

Microeconometrics Using Stata, Cameron and Trivedi. Pg. 558

How to use Heckman in Stata: http://www.stata.com/features/heckman-probit/heckprob.pdf

Basic information: http://en.wikipedia.org/wiki/Heckman_correction

Powerpoint on Heckman: http://rtm.wustl.edu/GMMC/heckman.pdf

Alex Chew and Amelia Vader 3

Anda mungkin juga menyukai