Tobit Models
Created by James Tobin, Tobit models are similar to a linear equation, but they also
include a normally distributed error term. The dependent variable is censored, which
means that Tobit models set parameters around it.
Tobit models address problems with data due to your measurement or dataset not
capturing all the information (i.e. ceiling effects or censored data).
GRE scores have a maximum and a minimum (200 and 800, respectively). Two people may have the same score
on the maximum or minimum range, but not have the same abilities.
You are trying to measure the effect of schooling on peoples wage and you only have data on the wages of
those who are working. You are missing the data for those who are not working.
> 0
= {
0 0
where
= +
and
= explanatory variables
= error term
Censored data leads to biased estimates using regular OLS. You can try to correct for this error by introducing
an adjustment to your equation that takes into account the probability of being observed(in our example B
this would mean the probability of working and therefore having a wage above zero). Conceptually, this implies
the following relationship between the relationship between the outcome variable of interest, the explanatory
variables, and the observed outcomes in your sample.
E(y)=F(y)E(y*)
where
Implementing the adjustment to correct for selection bias requires using a maximum likelihood estimation.
This means using an equation to determine the probability of being observed in the sample. Although a
number of maximum likelihood estimations exist, one of the more common is the Heckman Selection estimator.
First, you create calculate the selection equation, or the probability that someone is working (their propensity to
be in the sample): Make a probit model for determinants of being observed and record a likelihood estimate
for each observation.
This can make your selection equation better. Otherwise, the selection may be weak. For example B, effect of
schooling on wages, pick exclusionary variables would affect the likelihood of working and not affect wage rate
(e.g. having younger children at home, student status)
3. OLS regression
Then, use a statistically adjusted value (inverse mills ratio) calculated using your selection equation as an
Independent Variable in the OLS regression for your outcome of interest. Heckman treats the selection bias as
an omitted variable bias. Plug in a statistically adjusted version of the likelihood estimate from the selection
equation as an explanatory variable in an OLS regression.
The result is a better estimate or fit compared with running it with those without wage information, or running
it with a smaller sample of only those for whom you have wage information.
Key Assumptions
Error terms for selection equation and OLS regression are jointly normal.
Vi in equation is normally distributed and E [i | Vi] is linear
Resources
Tobit
Tobit model setup in Stata: Microeconometrics Using Stata, Cameron and Trivedi. pg. 536
Heckman