HECKMAN
and captures the essential idea underlying causal models. This paper
develops the scientific model of causality developed in economics and
compares it to methods advocated in epidemiology, statistics, and in
many of the social sciences outside of economics that have been
influenced by statistics and epidemiology.
I make two main points that are firmly anchored in the econometric tradition. The first is that causality is a property of a model of
hypotheticals. A fully articulated model of the phenomena being
studied precisely defines hypothetical or counterfactual states.1 A
definition of causality drops out of a fully articulated model as an
automatic by-product. A model is a set of possible counterfactual
worlds constructed under some rules. The rules may be the laws of
physics, the consequences of utility maximization, or the rules governing social interactions, to take only three of many possible examples.
A model is in the mind. As a consequence, causality is in the mind.
In order to be precise, counterfactual statements must be made
within a precisely stated model. Ambiguity in model specification
implies ambiguity in the definition of counterfactuals and hence of
the notion of causality. The more complete the model of counterfactuals, the more precise the definition of causality. The ambiguity
and controversy surrounding discussions of causal models are consequences of analysts wanting something for nothing: a definition of
causality without a clearly articulated model of the phenomenon
being described (i.e., a model of counterfactuals). They want to
describe a phenomenon as being modeled causally without producing a clear model of how the phenomenon being described is generated or what mechanisms select the counterfactuals that are observed
in hypothetical or real samples. In the words of Holland (1986), they
want to model the effects of causes without modeling the causes of
effects. Science is all about constructing models of the causes of
effects. This paper develops the scientific model of causality and
shows its value in analyzing policy problems.
My second main point is that the existing literature on causal
inference in statistics confuses three distinct tasks that need to be
carefully distinguished:
1
I will use the term counterfactual as defined in philosophy. A counterfactual need not be contrary to certain facts. It is just a hypothetical. The term
hypothetical would be better and I will use the two concepts interchangeably.
Definitions of counterfactuals.
Identification of causal models from population distributions
(infinite samples without any sampling variation). The hypothetical populations producing these distributions may be subject to
selection bias, attrition, and the like. However, issues of sampling
variability of empirical distributions are irrelevant for the analysis
of this problem.
Identification of causal models from actual data, where sampling
variability is an issue. This analysis recognizes the difference
between empirical distributions based on sampled data and population distributions generating the data.
TABLE 1
Three Distinct Tasks Arising from Analysis of Causal Models
Task
Description
Requirements
A Scientific Theory
Identifying Parameters
(Causal or Otherwise)
from Hypothetical Population Data
Mathematical Analysis of
Point or Set Identification
HECKMAN
The exchange between Heckman and Tukey in Wainer (1986) anticipates many of the issues raised in this paper.
HECKMAN
HECKMAN
P1: Evaluating the impact of historical interventions on outcomes including their impact in terms of welfare.
By historical, I refer to interventions actually experienced. A variety of
outcomes and welfare criteria might be used to form these evaluations.
By impact, I mean constructing either individual-level or populationlevel counterfactuals and their valuations. By welfare, I mean the valuations of the outcomes obtained from the intervention by the agents being
analyzed or some other party (e.g., the parents of the agent).
P1 is the problem of internal validity. It is the problem of
identifying a given treatment parameter or a set of treatment parameters in a given environment (see Campbell and Stanley 1963). This
is the policy question addressed in the epidemiological and statistical
literature on causality. A drug trial for a particular patient population
is the prototypical problem in that literature.
Most policy evaluation is designed with an eye toward the future
and toward decisions about new policies and application of old policies
to new environments. I distinguish a second task of policy analysis:
P2: Forecasting the impacts (constructing counterfactual
states) of interventions implemented in one environment in other environments, including their impacts in terms of welfare.
Included in these interventions are policies described by generic characteristics (e.g., tax or benefit rates, etc.) that are applied to different
groups of people or in different time periods from those studied in
previous implementations of these policies. This is the problem of
external validity: taking a treatment parameter or a set of parameters
estimated in one environment to another environment. The environment includes the characteristics of individuals and of their social
and economic setting.
Finally, the most ambitious problem is forecasting the effect of
a new policy, never previously experienced:
P3: Forecasting the impacts of interventions (constructing counterfactual states associated with interventions) never historically experienced to other environments, including their impacts in terms of welfare.
This problem requires that one use past history to forecast the
consequences of new policies. It is a fundamental problem in
10
HECKMAN
11
A-1
This assumption says that outcomes for person ! under treatment s with
assignment mechanism are the same in two different policy regimes
which both include s as a possible treatment. It rules out social interactions and general equilibrium effects. A second assumption rules out any
effect of the assignment mechanism on potential outcomes.
Irrespective of assignment mechanism ; for all policies
p 2 P; Ys; !; Ys; ! for all s 2 S p and
! 2 ; so the outcome is not affected by the assignment:
A-2
12
HECKMAN
s 6 s0 ;
This makes clear that the policy treatment effect is defined under a particular
policy regime and for a particular mechanism of selection within a policy regime.
One could define treatment effects for policy regimes or regime selection mechanisms by varying the arguments p or respectively, holding the other arguments
fixed.
8
There is a disagreement in the literature on whether or not the individual-level treatment effects are constants or random at the individual level. I
develop both cases in this paper.
13
WYsp !; !d!
WYsp0 !; !d!;
One might compare outcomes in different sets that are ordered. Thus,
for a particular policy regime and assignment mechanism, if Y (s, !) is scalar
income and we compare outcomes for s 2 S A with outcomes for s0 2 S B , where
S A \ S B , then one might compare YsA YsB , where
sA arg maxYs; !
s 2 SA
and
sB arg maxYs; !:
s 2 SB
This compares the best in one choice set with the best in the other. A particular
case is the comparison of the best choice with the next best choice. To do so,
define s0 arg maxs2S Ys; !; S B S n fs0 g and define the treatment effect as
Ys0 YsB . This is the comparison of the highest outcome over S with the next best
outcome. In principle, many different individual level comparisons might be
constructed, and they may be computed using personal preferences, V!, using
the preferences of the planner, VG, or using the preferences of the planner over
preferences of agents.
14
HECKMAN
VYsp !; !d!
VYsp0 !; !d!:
I now discuss a fundamental problem that arises in constructing these and other criteria from data. This takes me to the problem of
causal inference, the second task delineated in Table 1. Recall that
I am talking about inference in a population, not in a sample, so no
issues of sampling variability arise.
1.3. The Evaluation Problem
Operating purely in the domain of theory, I have assumed a world with a
well-defined set of individuals ! 2 and a universe of counterfactuals or
hypotheticals defined for each person Y (s, !), s 2 S. Different policies
p 2 P select treatment for persons. Each policy can in principle
assign treatment to persons by different mechanisms 2 T . In the
absence of a theory, there are no well-defined rules for constructing
counterfactual or hypothetical states or constructing the assignment to
treatment rules p .11 Scientific theories provide algorithms for
generating the universe of internally consistent, theory-consistent counterfactual states.
These hypothetical states are possible worlds. They are products of a purely mental activity. No empirical problem arises in
constructing these theoretically possible worlds. Indeed, in forecasting
new policies, or projecting the effects of old policies to new
10
These willingness-to-pay measures are standard in the economics
literature (e.g., see Boadway and Bruce 1984).
11
Efforts like those of Lewis (1974) to define admissible counterfactual
states without an articulated theory as closest possible worlds founder on the
lack of any meaningful metric or topology to measure closeness among possible
worlds. Statisticians often appeal to this theory, but it is not operational (e.g., see
Gill and Robins 2001 for one such appeal).
15
environments, some of the Y(s, !) may have never been observed for
anyone. Different theories produce different outcomes Y(s, !) and
different p (!).
The evaluation problem, in contrast to the model construction
problem, is an identification problem that arises in constructing the
counterfactual states and treatment assignment rules produced by
abstract models from population data. This is the second task presented in Table 1.
This problem is not precisely stated until the data available to the
analyst are precisely defined. Different subfields in science and social
science assume access to different types of data. They also make different
assumptions about the underlying models generating the counterfactuals
and mechanisms for selecting which counterfactuals are actually
observed.
At any point in time, we can observe person ! in one state but
not in any of the other states. The states are mutually exclusive. Thus
we do not observe Y(s0 , !) for person ! if we observe Y(s, !), s 6 s0 .
Let D(s, !) 1 if we observe person ! in state s. Then D(s0 , !) 0
for s 6 s0 . D(s, !) is generated by p (!) : D(s, !) 1 if p (!) s.
We observe Y(s, !), if D(s, !) 1 but we do not observe
Y(s0 , !), s 6 s0 . We can define observed Y(!) as
Y!
s2S
16
HECKMAN
17
18
HECKMAN
3a
3b
17
The effects of causes and not the causes of effects, in the language of
Holland (1986).
19
4a
4b
5a
5b
18
20
HECKMAN
j and k, given that these are the best two choices available is, with
respect to personal preferences, and with respect to choice-specific
costs P ( j, !),
EOTMV
! Y j; ! Yk; !
0
1
VY j; !; P j; !; ! VYk; !; Pk; !; !;
9
B
C
VY j; !; P j; !; ! =
B
C
B
C:
B
VYl; !; Pl; !; !; C
E! BY j; ! Yk; !
C
;
VY k; !; P k; !; !
B
C
@
A
l 6 j; k
6
This is the mean gain to people indifferent between j and k, given that
these are the best two options available. In a parallel fashion, we can
G
define EOTMV
! (Y( j) Y(k)) using the preferences of another person
(e.g., the parent of a child or a paternalistic bureaucrat).19
A generalization of this parameter called the marginal treatment effectdeveloped in Heckman and Vytlacil (1999, 2000, 2005,
2006b), Heckman (2001), and estimated in Carneiro, Heckman, and
Vytlacil (2005)plays a central role in organizing and interpreting a
wide variety of evaluation estimators. Many other mean treatment
parameters can be defined depending on the choice of the conditioning set. Analogous definitions can be given for median and other
quantile versions of these parameters (see Heckman, Smith, and
Clements 1997; Abadie, Angrist, and Imbens 2002). Although means
are conventional, distributions of treatment parameters are also of
considerable interest, and we consider them in the next section.
Mean treatment effects play a special role in the statistical
approach to causality. They are the centerpiece of the Rubin (1986)
Holland (1986) model and in many other studies in statistics and
epidemiology. Social experiments with full compliance and no disruption can identify these means because of a special mathematical
property of means. If we can identify the mean of Y( j, !) and the
mean of Y(k, !) from an experiment where j is the treatment and k is
the baseline, we can form the average treatment effect for j compared
19
21
with k (3a). These can be formed over two different groups of people
classified by their X values. By a similar argument, we can form the
treatment on the treated parameter (TT) (4a) or (TUT) (5a) by
randomizing over particular subsets of the population (D 1 or
D 0, respectively) assuming full compliance and no randomization
(disruption) bias. Disruption bias arises when the experiment itself
affects outcomes (Y(s, !))!2 and (A-2) is violated.20
The case for randomization is weaker if the analyst is interested
in other summary measures of the distribution, or the distribution
itself. Experiments do not solve the problem that we cannot form
Y (s, !) Y(s0 , !) for any person. Randomization is not an effective
procedure for identifying median gains, or the distribution of gains,
under general conditions. The elevation of population means to be the
central population-level causal parameters promotes randomization
as an ideal estimation method. By focusing exclusively on mean outcomes, the statistical literature converts a metaphor for outcome
selectionrandomizationinto an ideal.
1.5. Criteria of Interest Besides the Mean: Distributions of
Counterfactuals
Although means are traditional, the answer to many interesting policy
evaluation questions requires knowledge of features of the distribution of program gains other than some mean. It is also of interest to
know the following for scalar outcomes
a.
22
HECKMAN
Each of these measures can be defined conditional on observed characteristics X. Measure (a) is of interest in determining how widely
program gains are distributed among participants. Voters in an electorate in a democratic society are unlikely to assign the same weight
to two programs with the same mean outcome, one of which produced large favorable outcomes for only a few persons while the other
distributed smaller gains more broadly. This issue is especially relevant if program benefits are not transferrable or if restrictions on
feasible social redistributions prevent distributional objectives from
being attained.
Measure (b) is the proportion of the entire population that
benefits from a program. In a study of the political economy of
interest groups, it is useful to know which groups benefit from a
program and how widely distributed the program benefits are.
Measure (c) reveals the gains at different percentiles of the impact
distribution. Criterion (d) focuses on the distribution of impacts for
subgroups of participants with particular outcomes in the nonparticipation state. Concerns about the impact of policies on the disadvantaged emphasize such criteria (Rawls 1971). All of these measures
require knowledge of features of the joint distribution of outcomes
for participants for their construction, not just the mean. Identifying
distributions is a more demanding task than identifying means.
Distributions of counterfactuals are also required in computing
the option values conferred by social programs.22 Heckman and
Smith (1998), Aakvik, Heckman, and Vytlacil (1999, 2005),
Carneiro, Hansen, and Heckman (2001, 2003), and Cunha,
Heckman, and Navarro (2005a) develop methods for identifying
distributions of counterfactuals.
1.6. Accounting for Private and Social Uncertainty
Persons do not know the outcomes associated with possible states not
yet experienced. If some potential outcomes are not known at the time
treatment decisions are made, the best that agents can do is to forecast
them with some rule. Even if, ex post, agents know their outcome in a
benchmark state, they may not know it ex ante, and they may always
22
23
24
HECKMAN
24
25
26
HECKMAN
27
10
The components cs provide the basis for generating the counterfactuals across treatments from a base set of characteristics. This
approach models different treatments as consisting of different bundles of characteristics. g maps c, s, us into y(s), where the domain of
definition D of g may differ from its empirical support. Different
treatments s are characterized by different bundles of the same characteristics that generate all outcomes. This framework provides the
26
28
HECKMAN
basis for solving policy problem P3 since new policies (treatments) are
generated as different packages of common characteristics, and all
policies are put on a common basis. If a new policy is characterized by
known transformations of (c, x, us) that lie in the known empirical
support of g, policy forecasting problem P3 can be solved.30 This
point is discussed further in the Appendix.
Part of the a priori specification of a causal model is the choice of
the arguments of the functions gs and g. Analysts may disagree about
appropriate arguments to include based on alternative theoretical frameworks. One benefit of the statistical approach that focuses on problem P1 is that it works solely with the outcomes rather than the inputs.
However, it is silent on how to solve problems P2 and P3 and provides
no basis for interpreting the population-level treatment effects.
Consider alternative models of schooling outcomes of pupils
where s indexes the schooling type (e.g., regular public, charter public,
private secular, and private parochial). The cs are the observed characteristics of schools of type s. The x are the observed characteristics of
the pupil. The us are the unobserved characteristics of both the schools
and the pupil. If we can characterize a proposed new type of school as a
new package of different levels of the same ingredients x, cs, and us and
we can identify (10) over the domain defined by the new package, we
can solve problem P3. If the same schooling input (same cs) is applied
to different students (those with different x) and we can identify (9) or
(10) over the new domain of definition, we solve problem P2. By
digging deeper into the causes of the effects we can do more than
just compare the effects of treatments in place with each other. In
addition, as we shall see, modeling the us and its relationship with the
corresponding unobservables in the treatment choice equation is informative on appropriate identification strategies.
Equations (9) and (10) describing ex post outcomes are sometimes called Marshallian causal functions (see Heckman 2000).
Assuming that the components of (x, us) or (cs, x, us) can be independently varied or are variation-free,31 a feature that may or may not be
30
29
11
30
HECKMAN
12
13
However, agents may not act on these ex ante effects if they have
decision criteria (utility functions) that are not linear in Y(s, !),
I discuss ex ante valuations of outcomes in the next
s 1, . . . , S.
section.
The value of the scientific (or explicitly structural) approach to
the construction of counterfactuals is that it explicitly models the
unobservables and the sources of variability among observationally
33
Thus agents do not necessarily use rational expectations, so the distribution used by the agent to make decisions need not equal the distribution
generating the data.
31
32
HECKMAN
approach sharply distinguishes these two issues. One can in theory define
the effect even if one cannot identify it from population or sample data.
I next turn to an important distinction between fixing and
conditioning on factors that gets to the heart of the distinction
between causal models and correlational relationships. This point
is independent of any problem with the supports of the samples
compared to the domains of definition of the functions.
35
33
14
36
34
HECKMAN
35
components of future outcomes, the uppercase letters become lowercase variables that are known constants. The I ! are the causal factors
for !. In a utility-maximizing framework, choice b
s is made if b
s is
maximal in the set of valuations of potential outcomes:
fEVYs; !; Ps; !; Cs !; !jI ! : s 2 Sg:
In this interpretation, the information set plays a key role in specifying
agent preferences. Actual realizations may not be known at the time
decisions are made. Accounting for uncertainty and subjective valuations of outcomes (e.g., pain and suffering for a medical treatment) is a
major contribution of the scientific approach. The factors that lead an
agent to participate in treatment s may be dependent on the factors
affecting outcomes. Modeling this dependence is a major source of
information used in the scientific approach to constructing counterfactuals from real data, as I demonstrate in Section 4. A parallel
analysis can be made if the decision maker is not the same as the
agent whose objective outcomes are being evaluated.
2.4. The Scientific Model Versus the NeymanRubin Model
Many statisticians and social scientists invoke a model of counterfactuals and causality attributed to Donald Rubin by Paul Holland (1986)
but which actually dates back to Neyman (1923).37 Neyman and Rubin
postulate counterfactuals fY(s, !)gs2S without modeling the factors
determining the Y(s, !) as I have done in equations (9)(12), using
the scientific, structural approach. Rubin and Neyman offer no model
of the choice of which outcome is selected. Thus there no lowercase,
all causes models explicitly specified in this approach, nor is there any
discussion of the science or theory producing the outcomes studied.
In my notation, Rubin assumes (A-1) and (A-2) as presented in
Section 1.38 Recall that (A-1) assumes no general equilibrium effects or
social interactions among agents. Thus the outcome for the person is the
37
36
HECKMAN
39
37
generate Y(s, !). Holland and Rubin focus on mean treatment effects
as the interesting causal parameters.
The scientific (econometric) approach to causal inference supplements the model of counterfactuals with models of the choice of
counterfactuals fD(s, !)gs2S generated by the maps p (!) and the
relationship between choice equations and the counterfactuals. The
D(s, !) are assumed to be generated by the collection of random
variables (Cs (!), Z(s, !), (s, !), Y(s, !)jI ! ), s 2 S, where Cs(!) is
the characteristic of the treatment s for person !, Z(s, !) are observed
determinants of costs, the (s, !) are unobserved (by the analyst) cost
(or preference) factors and Y(s, !) are the outcomes, and the j
denotes that these variables are defined conditional on I ! (the agents
information set).40 Along with the ex ante valuations that generate
D(s, !) are the ex post valuations discussed in Section 1.6.
Random utility models generating D(s, !) go back to
Thurstone (1930) and McFadden (1974, 1981).41 The full set of counterfactual outcomes for each agent is assumed to be unobserved by
the analyst. It is the dependence of unmeasured determinants of
treatment choices with unmeasured determinants of potential outcomes that gives rise to selection bias in empirically constructing
counterfactuals and treatment effects, even after conditioning on the
observables. Knowledge of the relationship between choices and
counterfactuals suggests appropriate methods for solving selection
problems. By analyzing the relationship of the unobservables in the
outcome equation, and the unobservables in the treatment choice
equation, the analyst can use a priori theory to devise appropriate
estimators to identify causal effects.
The scientific approach is more general than the Neyman
Rubin model because it emphasizes the welfare of the agents being
studied (through VG or V(Y(s, !), !))the subjective evaluations
as well as the objective evaluations. The econometric approach also
40
If other agents make the treatment assignment decisions, then the
determinants of D(s; !) are modified according to what is in their information set.
41
Corresponding to these random variables are the deterministic all causes
counterparts d(s), cs, z(s), (s), {y(s)}, i, where the (fz(s)gs2S ; fcs gs2S ; f(s)gs2S ;
fy(s)gs2S ; i) generate the d(s) 1 if (fz(s)gs2S ; fcs gs2S ; f(s)gs2S ; fy(s)gs2S ) 2 , a
subset of the domain of the generators of d (s). Again the domain of definition of
d(s) is not necessarily the support of z(s; !); cs (!); (s; !); fY(s; !)gs2S and I ! .
38
HECKMAN
EU 0;
15
42
See Cunha, Heckman, and Navarro (2005a,b) for estimates of subjective evaluations and regret in schooling choices.
43
This approach merges tasks 1 and 2 in Table 1. I do this here because
the familiarity of the simultaneous equations model as a statistical model makes
the all causes ex post version confusing to many readers familiar with this model.
44
For simplicity, I work with the linear model in the text, developing the
nonlinear case in footnotes.
39
40
HECKMAN
16a
Y2 a2 21 Y1 21 X1 22 X2 U2 :
16b
This model is sufficiently flexible to capture the notion that the consumption of person 1 (Y1) depends on the consumption of person 2 (if
6 0), as well as person 1s value of X (if 11
6 0), X1 (assumed to be
12
observed), person 2s value of X, X2 (if 12 0), and unobservable
factors that affect person 1 (U1). The determinants of person 2s consumption are defined symmetrically. I allow U1 and U2 to be freely
correlated. I assume that U1 and U2 are mean independent of (X1, X2) so
EU1 jX1 ; X2 0
17a
EU2 jX1 ; X2 0:
17b
and
Completeness guarantees that (16a) and (16b) have a determinate solution for (Y1,Y2).
Applying Haavelmos argument to (16a) and (16b), the causal
effect of Y2 on Y1 is 12. This is the effect on Y1 of fixing Y2 at different
values, holding constant the other variables in the equation.
Symmetrically, the causal effect of Y1 on Y2 is g 21. Conditioning,that
is, using least squareswhich is the method of matching, in general fails
to identify these causal effects because U1 and U2 are correlated with Y1
and Y2. This is a traditional argument. It is based on the correlation
between Y2 and U1. But even if U1 0 and U2 0, so that there are no
41
18a
18b
1 12 21
12 22 12
1 12 21
:
12 12 22
1 12 21
11
12
21
22
19
U1 21 U2
1 12 21
:
12 U1 U2
R2
1 12 21
and
R1
50
42
HECKMAN
R2 0, Y1 and Y2 are exact functions of X1 and X2. There is no mechanism yet specified within the model to independently vary the right-hand
sides of equations (16a) and (16b).51 The X effects on Y1 and Y2, identified through the reduced forms, combine the direct effects (through ij)
and the indirect effects (as they operate through Y1 and Y2, respectively).
If we assume exclusions ( 12 0) or (21 0) or both, we can
identify the ceteris paribus causal effects of Y2 on Y1 and of Y1 on Y2
respectively. Thus if 12 0 from the reduced form,
12
12 :
22
If 21 0, we obtain
21
21 :
11
These exclusions say that the social interactions only operate through
the Ys. Person 1s consumption depends only on person 2s consumption and not on his or her X2 or directly through his or her U2. Person
2 is modeled symmetrically versus person 1. Observe that I have not
ruled out correlation between U1 and U2. When the procedure for
identifying causal effects is applied to samples, it is called indirect
least squares. The method traces back to Haavelmo (1943, 1944).52
The intuition for these results is that if 12 0, we can vary Y2
in equation (16a) by varying the X2. Since X2 does not appear in the
51
Some readers of an earlier draft of this paper suggested that the mere
fact that we can write (16a) and (16b) means that we can imagine independent
variation. By the same token, we can imagine a model
Y 0 1 X1 2 X2 ;
but if part of the model is (*) X1 X2, the rules of the model constrain X1 X2.
No causal effect of X1 holding X2 constant is possible. If we break restriction (*)
and permit independent variation in X1 and X2, we can define the causal effect
of X1 holding X2 constant.
52
The analysis for social interactions in this section is of independent
interest. It can be generalized to the analysis of N person interactions if the
outcomes are continuous variables. For binary outcomes variables, the same
analysis goes through for the special case analyzed by Heckman and MaCurdy
(1985). However, in the general case, for discrete outcomes generated by latent
variables it is necessary to modify the system to obtain a coherent probability
model; see Heckman (1978).
43
equation, under exclusion, we can keep U1, X, fixed and vary Y2 using
X2 in (18b) if 22 6 0.53 Symmetrically, by excluding X1 from (16b),
we can vary Y1, holding X2 and U2 constant. These results are more
clearly seen when U1 0 and U2 0.
Observe that in the model under consideration, where the
domain of definition and the supports of the variables coincide, the
causal effects of simultaneous interactions are defined if the parameters are identified in the traditional Cowles definition of identification (e.g., see Ruud 2000 for a modern discussion of these conditions).
A hypothetical thought experiment justifies these exclusions. If agents
do not know or act on the other agents X, these exclusions are
plausible.
An implicit assumption in using (16a) and (16b) for causal
analysis is invariance of the parameters ( , , U) to manipulations
of the external variables. This invariance embodies the key idea in
assumption (A-2). Invariance of the coefficients of equations to classes
of manipulation of the variables is an essential part of the definition of
structural models that I develop more formally in the next section.
This definition of causal effects in an interdependent system
generalizes the recursive definitions of causality featured in the statistical treatment effect literature (Holland 1988; Pearl 2000). The key to
this definition is manipulation of external inputs and exclusion, not
randomization or matching. Indeed matching or, equivalently, OLS,
using the right-hand side variables of (16a) and (16b), does not
identify causal effects as Haavelmo (1943) established long ago. We
can use the population simultaneous equations model to define the
class of admissible variations and address problems of definitions
(task 1 in Table 1). If for a given model, the parameters of (16a) or
(16b) shift when external variables are manipulated, or if external
variables cannot be independently manipulated, causal effects of one
internal variable on another cannot be defined within that model. If
people were randomly assigned to pair with their neighbors, and the
parameters of (16a) were not affected by the randomization, then Y2
would be exogenous in equation (16b) and we could identify causal
53
44
HECKMAN
45
46
HECKMAN
X and/or components of the Y** are varied. Viewed in this way, the
reduced form and the whole class of quasi-structural models do not
define any ceteris paribus causal effect relative to all of the variables
(internal and external) in the system since they do not fix the levels of
the other Y or Y* in the case of the quasi-structural models.
Nonetheless, the reduced form may provide a good guide to forecasting the effects of certain interventions that affect the external variables. The quasi-structural models may also provide a useful guide for
predicting certain interventions, where Y** are fixed by policy. The
reduced form defines a net causal effect of variations in X as they
affect the internal variables. There are many quasi-structural models
and corresponding thought experiments.
This discussion demonstrates another reason why causal knowledge is provisional. Different analysts may choose different subsystems
of equations derived from equation (15) to work with and define
different causal effects within the different possible subsystems. Some
of these causal effects may not be identified, while others may be.
Systems smaller or larger than (15) can be imagined. The role of a
priori theory is to limit the class of models and the resulting class of
counterfactuals and to define which ones are interesting.
I now present a basic definition of structure in terms of invariance of equations to classes of interventions. Invariance is a central
idea in causal analysis and in policy analysis.
2.6. Structure as Invariance
A basic definition of a system of structural relationships is that it is a
system of equations invariant to a class of modifications or interventions. In the context of policy analysis, this means a class of policy
modifications. This is the definition that was proposed by Hurwicz
(1962). It is implicit in Marschak (1953) and it is explicitly utilized by
Sims (1977), Lucas and Sargent (1981), and Leamer (1985), among
others. This definition requires a precise definition of a policy, a class
of policy modifications, and specification of a mechanism through
which policy operates.
The mechanisms generating counterfactuals and the choices of
counterfactuals have already been characterized in Sections 2.1 and
2.3. Policies can act on preferences and the arguments of preferences
(and hence choices), on outcomes Y(s, !) and the determinants
47
2.
3.
56
48
HECKMAN
In the simultaneous equations model analyzed in the last section, invariance requires stability of , B, and U to interventions.
Such models can be used to accurately forecast the effects of policies
that can be cast as variations in the inputs to the model. Policyinvariant parameters are not necessarily causal parameters, as we
noted in our analysis of reduced forms in the preceding section.
Thus, in the simultaneous equations model, depending on the a priori
information available, no causal effect of one internal variable on
another may be defined but if is invariant to modifications in X,
the reduced form is policy invariant for those modifications. The class
of policy-invariant parameters is thus distinct from the class of causal
parameters, but invariance is an essential attribute of a causal model.
For counterfactuals Y(s, !), if assumption (A-3) is not postulated, all
of the treatment effects defined in Section 1 would be affected by
policy shifts. Rubins assumption (A-2) makes Y(s, !) invariant to
policies that change f but not policies that change g or the support of
Q. Within the treatment effects framework, a policy that adds a new
treatment to S is not policy invariant for treatment parameters comparing the new treatment to any other treatment unless the analyst
can model all policies in terms of a generating set of common characteristics specified at different levels. The lack of policy invariance
makes it difficult to forecast the effects of new policies using treatment
effect models within the framework of the Appendix.
Deep structural parameters generating the f and g are invariant to policy modifications that affect technology, constraints, and
information sets except when the policies extend the historical supports. Invariance can only be defined relative to a class of modifications and a postulated set of preferences, technology, constraints, and
information sets. Thus causal parameters can be precisely identified
only within a class of modifications.
2.7. Marschaks Maxim and the Relationship Between Structural
Literature and Statistical Treatment Effect Literature
The absence of explicit models is a prominent feature of the statistical
treatment effect literature. Scientifically well-posed models make
explicit the assumptions used by analysts regarding preferences, technology, the information available to agents, the constraints under
which they operate, and the rules of interaction among agents in
49
market and social settings and the sources of variability among persons. These explicit features make these models, like all scientific
models, useful vehicles: (1) for interpreting empirical evidence using
theory; (2) for collating and synthesizing evidence using theory; (3) for
measuring the welfare effects of policies; and (4) for forecasting the
welfare and direct effects of previously implemented policies in new
environments and the effects of new policies.
These features are absent from the modern treatment effect
literature. At the same time, this literature makes fewer statistical
assumptions in terms of exogeneity, functional form, exclusion, and
distributional assumptions than the standard structural estimation
literature in econometrics. These are the attractive features of this
approach.
In reconciling these two literatures, I reach back to a neglected
but important paper by Jacob Marschak. Marschak (1953) noted that
for many specific questions of policy analysis, it is unnecessary to
identify full structural models, where by structural I mean parameters
invariant to classes of policy modifications as defined in the last
section. All that is required are combinations of subsets of the structural parameters, corresponding to the parameters required to forecast particular policy modifications, which are much easier to
identify (i.e., require fewer and weaker assumptions). Thus in the
simultaneous equations system examples, policies that only affect X
may be forecast using reduced forms, not knowing the full structure,
provided that the reduced forms are invariant to the modifications.57 Forecasting other policies may require only partial knowledge of the system. I call this principle Marschaks maxim in honor
of this insight. I interpret the modern statistical treatment effect
literature as implicitly implementing Marschaks maxim where the
policies analyzed are the treatments and the goal of policy analysis
is restricted to evaluating policies in place (task 1; P1) and not in
forecasting the effects of new policies or the effects of old policies
on new environments.
Population mean treatment parameters are often identified under weaker conditions than are traditionally assumed in
econometric structural analysis. Thus to identify the average
57
Thus we require that the reduced form does not change when we
change the X.
50
HECKMAN
51
52
HECKMAN
58
53
54
HECKMAN
60
55
62
56
HECKMAN
(e.g., the agent could be a parent choosing outcomes for the child). Let V
be the agents valuation of treatment. I write
V V W; UV
D 1V > 0;
20
21a
Y0 0 X; U0 :
21b
22a
Y1 1 X U1 EU1 0;
22b
Y0 0 X U0 EU0 0:
22c
measure.
57
23a
Y0 X0 C0 U0 ;
23b
58
HECKMAN
24
59
If conditioning on W makes (Y0, Y1) independent of D, selection on observables is said to characterize the selection process.64 This
is the motivation for the method of matching. If conditional on W,
(Y0, Y1) are not independent of D, then we have selection on unobservables and alternative methods must be used.
For the Roy model, Heckman and Honore (1990) show that
it is possible to identify the distribution of treatment outcomes
(Y1 Y0) under the conditions they specify. Randomization can
identify only the marginal distributions of Y0 and of Y1, not
the joint distribution of (Y1 Y0) or the quantiles of (Y1 Y0).
Thus, under its assumptions, the Roy model is more powerful
than randomization in producing the distributional counterfactuals.65
The role of the choice equation is to motivate and justify the
choice of an evaluation estimator. This is a central feature of
the econometric approach that is missing from the statistical and
epidemiological literature on treatment effects. Heckman and
Smith (1998), Aakvik, Heckman, and Vytlacil (2005), Carneiro,
Hansen, and Heckman (2003), and Cunha, Heckman, and Navarro
(2005a,b) extend these results to estimate distributions of treatment
effects.
64
60
HECKMAN
EY0 jX; D 0:
EY1 Y0 jV 0
61
For MTE,
Bias MTE EYjX; Z; D 1 EYjX;Z; D 0
EY1 Y0 jX;Z;V 0
EU1 jX; Z;D 1 EU1 jX; Z;V 0
EU0 jX; Z; D 0 EU0 jX; Z; V 0;
for the case of additive separability in outcomes. The MTE is defined for
a subset of persons indifferent between the two sectors and so is defined
for X and Z. The bias is the difference between average U1 for participants and marginal U1 minus the difference between average U0 for
nonparticipants and marginal U0. Each of these terms is a bias that can
be called a selection bias. These biases can be defined conditional on X
(or X and Z or X, Z, and V in case of the MTE) or unconditionally.
62
HECKMAN
identify the three parameters using the data on mean outcomes. I also
briefly discuss the method of directed acyclic graphs or the g-computation method for one causal parameter. I discuss sources of unobservables, implicit assumptions about how unobservables are eliminated as
sources of selection problems, and the assumed relationship between
outcomes and choice equations. I start with the method of matching.
4.4.1. Matching
The method of matching as conventionally formulated makes
no distinction between X and Z. Define the conditioning set as
W (X, Z). The strong form of matching advocated by Rosenbaum
and Rubin (1983) and in numerous predecessor papers, assumes that
Y1 ; Y0 ?? DjW
M-1
M-2
and
M-3
63
64
HECKMAN
assumes that, given W, some unspecified randomization device allocates people to treatment. The fact that the cases P(W) 1 and
P(W) 0 must be eliminated suggests that methods for choosing
which variables enter W based on the fit of the model to data on
choices (D) are potentially problematic; see Heckman and Navarro
(2004) and Heckman and Vytlacil (2005) for further discussion of this
point.
What justifies (M-1) or (M-3)? Absent an explicit theoretical
model of treatment assignment and an explicit model of the sources of
randomness, analysts are unable to justify the assumption except by
appeal to convenience. Because there are no exclusion restrictions in
the observables, the only possible source of variation in D given W are
the unobservable elements generating D. These elements are assumed
to act like an ideal randomization that assigns person to treatment but
is independent of (U1, U2), the unobservables generating (Y0, Y1),
given W.
If agents partially anticipate the benefits of treatment and
make enrollment decisions based on these anticipations, (M-1) or
(M-3) is false. In the extreme case of the Roy model, where
D 1(Y1 > Y0), (M-1) or (M-3) is certainly false. Even if agents are
only imperfectly prescient but can partially forecast (Y1,Y0) and use
that information in deciding whether or not to participate, (M-1) or
(M-3) is false.
Without a model of interventions justifying these assumptions,
and without a model of the sources of unobservables, (M-1) or (M-3)
cannot be justified. The model cannot be tested without richer sources
of data.71 Judgments about whether agents are as ignorant about
potential outcomes given W, as is assumed in (M-1) or (M-3), can
only be settled by the theory unless it is possible to randomize persons
into treatment, and randomization does not change the outcome
that is, under assumption (A-2). The matching model makes strong
implicit assumptions about the unobservables.
In the recent literature, the claim is sometimes made that matching is for free (e.g., see Gill and Robins 2001). The idea underlying
this claim is that since E (Y0jD 1, W) is not observed, we might as
well set it to E (Y0jD 0, W), an implication of (M-1). This argument
71
See Heckman, Ichimura, Smith, and Todd (1998) for a test of matching assumptions using data from randomized trials.
65
is correct so far as data description goes. Matching imposes just-identifying restrictions and in this senseat a purely empirical levelis as
good as any other just-identifying assumption in describing the data.
However, the implied behavioral restrictions are not for free.
Imposing thatconditional on X and Z or conditional on P(W) the
marginal person entering a program is the same as the average person
is a strong and restrictive implication of the conditional independence
assumptions and is not a for free assumption in terms of its behavioral
content.72 In the context of estimating the economic returns to schooling, it implies that, conditional on W, the economic return to schooling
for persons who are just at the margin of going to school are the same as
the return for persons with strong preferences for schooling.
Introducing a distinction between X and Z allows the analyst
to overcome the problem arising from perfect prediction of treatment
assignment for some values of (X, Z) if there are some variables Z not
in X. If P is a nontrivial function of Z (so P(X, Z) varies with Z for all
X) and Z can be varied independently of X for all points of support of
X,73 and if outcomes are defined solely in terms of X, the problem of
perfect classification can be solved. Treatment parameters can be
defined for all support values of X since for any value (X, Z) that
perfectly classifies D, there is another value (X, Z0 ), Z0 6 Z, that does
not (see Heckman, Ichimura, and Todd 1997).
Offsetting the disadvantages of matching, the method of
matching with a known conditioning set that satisfies (M-1) does
not require separability of outcome or choice equations into observable and unobservable components, exogeneity of conditioning variables, exclusion restrictions, or adoption of specific functional forms
of outcome equations. Such assumptions are commonly used in conventional selection (control function) methods and conventional
applications of IV although recent work in semiparametric estimation
72
As noted by Heckman, Ichimura, Smith, and Todd (1998), if one seeks
to identify E (Y1 Y0jD 1, W) one only needs to impose a weaker condition
[E (Y0jD 1, W) E(Y0jD 0, W)] or Y0 ?? DjW rather than (M-1). This
imposes the assumption of no selection on levels of Y0 (given W) and not the
assumption of no selection on levels of Y1 or on Y1 Y0, as (M-1) does.
Marginal can be different from average in this case.
73
A precise sufficient condition is that Supp (ZjX) Supp (Z). We can
get by with a weaker condition that in any neighborhood of X, there is a Z* such
that 0 < Pr(D 1jX, Z*) < 1, and that Z* is in the support of ZjX.
66
HECKMAN
M-10
74
Examples of nonseparable selection models are found in Cameron and
Heckman (1998).
75
Or mean independent in the case of mean parameters.
67
C-1
76
Heckman and Robb (1985, 1986) introduce this general formulation of control functions. The identifiability requires that the members of the
pairs (1(X), E (U1jX, Z, D 1)) and (0(X), E (U0jX, Z, D 0)) be variation free so that they can be independently varied against each other; see
Heckman and Vytlacil (2006a, b) for a precise statement of these conditions.
68
HECKMAN
77
More precisely, assume that Supp (ZjX) Supp (Z) and that limit
sets of Z, Z0, and Z1 exist such that as Z Z0, P(Z, X) 0 and as Z Z1,
P(Z, X) 1. This is also the support condition used in the generalization of
matching by Heckman, Ichimura, and Todd (1997).
78
This condition is sometimes called identification at infinity; see
Heckman (1990) or Andrews and Schafgans (1998).
79
Since
EU0 0
EU0 jD 1; X; ZPX; Z EU0 jD 0; X; Z1 PX; Z
EU0 jD 1; X; Z
1 PX; Z
1 PX; Z
EU0 jD 0; X; Z
K0 PX; Z
PX; Z
PX; Z
See Heckman and Robb (1986). The expression EZjX, D 1 integrates out Z for a
given X, D 1.
69
MTEX; Z; V 0 1 X 1 X EU1 U0 j V Z; X
UV
1 X 0 X
@ EU1 U0 jX; Z; D 1PX; Z 80
:
@PX; Z
Unlike the method of matching, the method of control functions
allows the marginal treatment effect to be different from the average
treatment effect or from the effect of treatment on the treated (i.e.,
the second term on the right-hand side of the first equation for
MTE(X, Z, U 0) is, in general, nonzero). Although conventional
practice is to derive the functional forms of K0(P) and K1(P) by making
distributional assumptions (e.g., normality or other conventional
distributional assumptions about (U0, U1, UV); see Heckman, Tobias,
and Vytlacil 2001, 2003), this is not an intrinsic feature of the method
and there are many non-normal and semiparametric versions of this
method (see Powell 1994 or Heckman and Vytlacil 2006a,b for surveys).
Without invoking parametric assumptions, the method of control functions requires an exclusion restriction (a Z not in X) to
achieve nonparametric identification.81 Without any functional form
assumptions, one cannot rule out a worst-case analysis wherefor
example, if X Z, K1(P(X)) a(X) where a is a scalar. Then, there
80
As established in Heckman and Vytlacil (2000, 2005) and Heckman
(2001), under assumption (C-1) and additional regularity conditions
Z
EU1 U0 jX;Z;D1PX;Z
PX;Z 1
U1 U0 f U1U0 jU*V dU1 U0 dU*V ;
70
HECKMAN
82
Symmetry of the errors can be used in place of the appeal to limit sets
that put P(X, Z) 0 or P(X, Z) 1; see Chen (1999).
83
Relaxing it, however, requires that the analyst model the dependence
of the unobservables on the observables and that certain variation-free conditions
are satisfied; see Heckman and Robb (1985).
71
84
See Aakvik et al. (2005); Carneiro et al. (2003); and Cunha et al.
(2005a, 2005b) for a generalization of matching that allows for selection on
unobservables by imposing a factor structure on the errors and estimating the
distribution of the unobserved factors.
72
HECKMAN
85
This result also holds if (C-1) is not satisfied, but then the treatment
effects include
EU1 jPW EU0 jPW
.
73
j f0; 1g:
74
HECKMAN
TABLE 2
Mean Bias for Treatment on the Treated
0V
Average Bias (0 1)
Average Bias (0 2)
1.7920
1.3440
0.8960
0.4480
0.0000
0.4480
0.8960
1.3440
1.7920
3.5839
2.6879
1.7920
0.8960
0.0000
0.8960
1.7920
2.6879
3.5839
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
BIASTT
0V * 0 * M(p)
M(p)
(1 (p))
p*(1 p)
(1p(z)))
where M(p(z)) (
p(z)(1p(z)) , () and () are the probability density
function (pdf) and cumulative distribution function (cdf) of a standard
normal random variable and p(z) is the propensity score evaluated at
Z z. I assume that 1 0 so that the true average treatment effect
is zero.
I simulate the mean bias for TT (Table 2) and ATE (Table
3) for different values of the
jV and j. The results in the tables
show that, as one lets the variances of the outcome equations
grow, the value of the mean bias that one obtains can become
substantial. With larger correlations come larger biases. These
1.00
1.7920
1.5680
1.3440
1.1200
0.8960
0.6720
0.4480
0.2240
0
2.6879
2.4639
2.2399
2.0159
1.7920
1.5680
1.3440
1.1200
0V
1.00
0.75
0.50
0.25
0
0.25
0.50
0.75
1.00
1.00
0.75
0.50
0.25
0
0.25
0.50
0.75
2.2399
2.0159
1.7920
1.5680
1.3440
1.1200
0.8960
0.6720
1.5680
1.3440
1.1200
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.75
1.7920
1.5680
1.3440
1.1200
0.8960
0.6720
0.4480
0.2240
1.3440
1.1200
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.4480
0.50
1.3440
1.1200
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.4480
0.6720
0.8960
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.4480
0.6720
1V (1 2)
1.1200
0.8960
0.6720
0.4480
0.2240
0
0.2240
0.4480
0.6720
1V (1 1)
0.25
(0 1)
0.4480
0.2240
0
0.2240
0.4480
0.6720
0.8960
1.1200
0.6720
0.4480
0.2240
0
0.2240
0.4480
0.6720
0.8960
1.1200
0.25
TABLE 3
Mean Bias for Average Treatment Effect
0
0.2240
0.4480
0.6720
0.8960
1.1200
1.3440
1.5680
0.4480
0.2240
0
0.2240
0.4480
0.6720
0.8960
1.1200
1.3440
0.50
0.4480
0.6720
0.8960
1.1200
1.3440
1.5680
1.7920
2.0159
0.2240
0
0.2240
0.4480
0.6720
0.8960
1.1200
1.3440
1.5680
0.75
0.8960
1.1200
1.3440
1.5680
1.7920
2.0159
2.2399
2.4639
0
0.2240
0.4480
0.6720
0.8960
1.1200
1.3440
1.5680
1.7920
continued
1.00
75
76
HECKMAN
77
IV-1
PrD 1jX; Z
IV-2
and
PX x; Z z PX x; Z z0
x
IV x
78
HECKMAN
based in part on the gain in the outcome measure (Y1,Y0) (e.g., the
difference in earnings) and this is a nondegenerate random variable,
then (IV-1) is violated and IV does not identify ATE. The validity of
the estimator is conditional on an untestable behavioral assumption.
Similar remarks apply to LATE as developed by Imbens and Angrist
(1994) and popularized by Angrist, Imbens, and Rubin (1996); see
Heckman and Vytlacil (1999, 2000, 2005), and Vytlacil (2002) for
more discussion of the implicit behavioral assumptions underlying
LATE.
The more interesting case for many problems arises when
U1 6 U0 and D (U1 U0) so agents participate in a program based
at least in part on factors not measured by the economist. To identify
ATE(X) using IV, it is required that
EU0 DU1 U0 jPX; Z; X EU0 DU1 U0 jX
IV-3
IV-4
79
LIV-1
80
HECKMAN
U0 ; U1 ; UV ?? ZjX
LIV-2
LIV-3
LIV-4
89
1
PZ
1
PZ
1
y0 f y0 ; U*V jX dU*V dy0
@EYjX; PZ
E Y1 Y0 jX; U*V PZ
@PZ
:
MTE
81
No
LIV
Yes
Marginal
Average?
(Given X, Z)
No
No
No
No (Yes in
standard
case)
Conventional, No
but not
required
No
Functional
Forms
Required?
E(U1jX, D 1, Z) E(U1jX, Z)
E(U0jX, D 0, Z) E(U0jX, Z)
Key Identification
Condition for Means
(assuming separability)
*For propensity score matching, (X, Z) are replaced with P (X, Z) in defining parameters and conditioning sets.
**Conditions for writing the control function in terms of P (X, Z) are given in the text.
Yes
Yes
IV
Yes
(conventional)
Yes (for
Conventional,
nonparametric but not
identification) required
Control
Function**
No
Separability of
Observables and
Unobservables in
Outcome Equations?
No
Exclusion
Required?
Matching*
Method
TABLE 4
Identifying Assumptions and Implicit Economic Assumptions Underlying the Four Methods Discussed in this Paper
Conditional on X and Z
82
HECKMAN
83
A = UA
(unobserved)
UB
UC
F
UF
We know
Pr (C = c | B = b)
Pr(F = f | C = c) =
Pr (F = f | A = a, C = c)Pr(A = a)
a
and
Pr(F = f | B = b) =
Pr(F = f | C = c)Pr(C = c | B = b)
c
Hence we can identify the desired causal object using the following
calculation:
PrF fjset B b
X
c
84
HECKMAN
A = UA
(unobserved)
UB
UC
UF
85
A = UA
(unobserved)
UB
UC
UF
86
HECKMAN
Social/market interactions
Does not project
Ignored
Treatment of interdependence
Nonparametric
Recursive
Mechanism of intervention
for defining counterfactuals
Parametric?
Hypothetical randomization
Implicit
Implicit
Sources of randomness
Econometric Models
Becoming nonparametric
Projects
Explicit
Explicit
TABLE 5
Econometric Versus Statistical Causal Models
87
88
HECKMAN
X x; udFH ujX x;
U
Z
UT
X x; udFT ujX x
89
3.
4.
5.
92
90
HECKMAN
REFERENCES
Aakvik, A., J. J. Heckman, and E. J. Vytlacil. 1999. Training Effects on
Employment When the Training Effects are Heterogeneous: An Application
93
McFaddens stochastic specification is different from Lancasters specification. See Heckman and Snyder (1997) for a comparison of these two
approaches. Lancaster assumes that the U (!) are the same for each consumer
in all choice settings. (They are preference parameters in his setting.) McFadden
allows for U (!) to be different for the same consumer across different choice
settings but assumes that the U (!) in each choice setting are draws from a
common distribution that can be determined from the demand for old goods.
91
92
HECKMAN
93
94
HECKMAN
Market Data, Vol. 10, edited by J. Heckman and B. Singer. New York:
Cambridge University Press.
. 1986. Alternative Methods for Solving the Problem of Selection Bias in
Evaluating the Impact of Treatments on Outcomes. Pp. 63107 in Drawing
Inferences from Self-Selected Samples, edited by H. Wainer. New York:
Springer-Verlag.
Heckman, J. J., and J. A. Smith. 1998. Evaluating the Welfare State. Pp. 241
318 in Econometrics and Economic Theory in the Twentieth Century: The
Ragnar Frisch Centennial Symposium, edited by S. Strom. New York:
Cambridge University Press.
Heckman, J. J., J. Smith, and N. Clements. 1997. Making the Most Out of
Programme Evaluations and Social Experiments: Accounting for
Heterogeneity in Programme Impacts. Review of Economic Studies
64(221):487536.
Heckman, J. J., and J. M. Snyder Jr. 1997. Linear Probability Models of the
Demand for Attributes with an Empirical Application to Estimating the
Preferences of Legislators (Special issue). RAND Journal of Economics
28:S142.
Heckman, J. J., J. L. Tobias, and E. J. Vytlacil. 2001. Four Parameters of
Interest in the Evaluation of Social Programs. Southern Economic Journal
68(2):21023.
. 2003. Simple Estimators for Treatment Parameters in a Latent Variable
Framework. Review of Economics and Statistics 85(3):74854.
Heckman, J. J., and E. J. Vytlacil. 1999. Local Instrumental Variables and
Latent Variable Models for Identifying and Bounding Treatment Effects.
Proceedings of the National Academy of Sciences 96:473034.
. 2000. The Relationship Between Treatment Parameters Within a Latent
Variable Framework. Economics Letters 66(1):3339.
. 2001. Local Instrumental Variables. Pp. 146 in Nonlinear Statistical
Modeling: Proceedings of the Thirteenth International Symposium in Economic
Theory and Econometrics: Essays in Honor of Takeshi Amemiya, edited by
C. Hsiao, K. Morimue, and J. L. Powell. New York: Cambridge University
Press.
. 2005. Structural Equations, Treatment Effects and Econometric Policy
Evaluation. Econometrica 73(3):669738.
. 2006a. Econometric Evaluation of Social Programs, Part I: Causal
Models, Structural Models and Econometric Policy Evaluation. In
J. Heckman and E. Leamer (Eds.), Handbook of Econometrics, Volume 6.
Amsterdam: Elsevier, forthcoming.
. 2006b. Econometric Evaluation of Social Programs, Part II: Using
Economic Choice Theory and the Marginal Treatment Effect to Organize
Alternative Econometric Estimators. In J. Heckman and E. Leamer (Eds.),
Handbook of Econometrics, Volume 6. Amsterdam: Elsevier, forthcoming.
Holland, P. W. 1986. Statistics and Causal Inference. Journal of the American
Statistical Association 81(396):94560.
95
96
HECKMAN
97