Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Problems involvingcausalinference
havedoggedattheheelsofstatistics wherestatistics,whichisconcerned withmeasurement, has
sinceitsearliest
days.Correlationdoesnotimplycausation,
andyetcausal contributions to make.It is myopinionthatan emphasis
conclusions drawnfroma carefullydesignedexperimentareoftenvalid.
Whatcan a statisticalmodelsayaboutcausation?Thisquestionis ad-
on theeffects ofcausesrather thanon thecausesofeffects
dressedbyusinga particular modelforcausalinference (Hollandand is,initself,
animportant consequence ofbringingstatistical
Rubin1983;Rubin1974)to critique thediscussions
ofotherwriterson reasoning to bearon theanalysisofcausationanddirectly
causationandcausalinference.Theseincludeselected med-
philosophers, opposesmoretraditional analysesofcausation.
icalresearchers,statisticians, andproponents
econometricians, ofcausal
modeling. 2. MODEL FOR ASSOCIATIONALINFERENCE
KEY WORDS: Causal model;Philosophy; Association;Experiments;
Mill'smethods;Causaleffect;
Koch'spostulates;
Hill'sninefactors;
Gran-
Themodelappropriate forassociationalinference issim-
gercausality;
Pathdiagrams; Probabilistic
causality. plythe standardstatistical modelthatrelatestwovari-
ables overa population.For clarityand forcomparison
1. INTRODUCTION with themodelforcausalinference described in thenext
section,however, I willbrieflyreviewassociation here.If
Thereaction ofmanystatisticians whenconfronted with I seemoverlyexplicitin describing themodelit is only
thepossibility thattheirprofession mightcontribute to a becauseI wishto be absolutely clearon thefundamental
discussion of causationis immediately to denythatthere elements ofthetheory presented here.
is anysuchpossibility. "Thatcorrelation is notcausation The modelbeginswitha populationor universeU of
is perhapsthefirst thing thatmustbe said"(Barnard1982, "units."A unitin U willbe denotedbyu. Unitsare the
p. 387). Possiblythisevasiveactionis inresponseto all of basicobjectsofstudyinaninvestigation. Examplesofunits
thoseneedlinglittleheadlinesthatpop up in the most are humansubjects,laboratory equipment, households,
unexpected places,forexample,"If thestatistics cannot andplotsofland.A variableis simply a real-valued func-
relatecauseandeffect, theycan certainly add to therhet- tion-thatis definedon everyunitin U. The value of a
oric"(Smith1980,p. 998). variablefora givenunitu is thenumber assignedbysome
One need onlyrecallthata well-designed randomized measurement processto u. A population ofunitsandvari-
experiment can be a powerful aid in investigating causal ablesdefinedon theseunitsare thebasicelements ofthe
relations to questiontheneedforsucha defensive posture modelsforbothassociation andcausation presented here.
by statisticians. Randomizedexperiments have trans- Theycorrespond tothemathematical conceptsofa setand
formed manybranches ofscience,andtheearlyproponents real-valued functions definedon theelementsof theset.
of suchstudieswerethe sanlestatisticians whofounded Theyaretheprimitives ofthetheory andwillnotbe further
themodernera ofourfield. defined.
Thisarticletakestheviewthatstatistics hasa greatdeal Supposethatforeach unitu in U thereis associateda
tosayaboutcertain problems ofcausalinference andought value Y(u) of a variableY Supposefurther thatY is a
to playa moresignificant roleinphilosophical analysesof variableofscientific interestin thesensethatone wishes
causationthanit has heretofore. In addition,I willtryto to understand whythevaluesof Y varyovertheunitsin
showwhythestatistical modelsusedto drawcausalinfer- U. Y is the responsevariablebecauseof its statusas a
encesare distinctly different fromthoseusedto drawas- "variableto be explained."In makingassociational infer-
sociational inferences. encesone is satisfied withdiscovering howthevaluesof
Thearticle isorganized as follows.First,statisticalmodels Y areassociatedwiththevaluesofothervariablesdefined
appropriate forassociational andcausalinferences willbe on theunitsof U. Let A be a secondvariabledefinedon
discussed andcompared.Thentheywillbe appliedtovari- U. Distinguish A fromY bycallingA an attribute of the
ous ideas about causationthathave been expressedby unitsin u. Logically,however, A and Y are on an equal
severalwriters on thissubject.One difficulty thatarisesin footing, sincetheyarebothsimply variablesdefined on U.
talking aboutcausationis thevariety ofquestions thatare All probabilities,distributions,and expectedvaluesin-
subsumed undertheheading.Someauthorsfocuson the volving variablesare computed overU. A probability will
ultimate meaningfulness ofthenotionofcausation.Others meannothing morenorlessthana proportion ofunitsin
are concerned withdeducingthecausesofa giveneffect. U. The expectedvalueof a variableis merelyitsaverage
Stillothersare interested in understanding thedetailsof valueoverall of U. Conditional expectedvaluesare av-
causalmechanisms. The emphasis herewillbe on measur- eragesoversubsetsofunitswherethesubsetsaredefined
ingtheeffects of causesbecausethisseemsto be a place byconditioning inthevaluesofvariables.Itis inthissense
thatthemodelsdescribed herearepopulation models.
* PaulW.Hollandis Director,
ResearchStatistics
Group,Educational The roleof time needs to be mentioned here. Popula-
TestingService,Princeton,
NJ08541.A preliminary draftofthisarticle
wasthebasisofan invitedGeneralMethodology LecturefortheAmer-
icanStatistical
Association,
August1985.Thecomments byGlymour and ? 1986AmericanStatistical
Association
Granger includedhereweregivenatthatsessioninresponsetothatdraft oftheAmericanStatistical
Journal Association
ofthisarticle. December1986,Vol.81,No.396,Theory
and Methods
945
thisis a largetopic,andI canonlyscratch itssurface here. Diagram(43) indicates thatR changesthevaluesofS and
Pathdiagramsare used to represent visually causal re- Y and thatS changesthevalueof Y R has, potentially,
lationships amonga set of variables.For example,ifX botha directandan indirect (i.e., through S) effecton Y
causesY thisis expressedbythediagram the
An examplemayhelpclarify meaning (43). Sup- of
pose thatwewishtomeasuretheeffect ofstudying certain
X -> Y. (36)
on
material theperformance on a particular test.We might
Fromthepointofviewadoptedinthisarticlesomedia- be able to encourage or notencourage students to study
gramslike(36) are meaningful andsomeare not.For ex- thematerial-these are theR causes,t' andc'. We might
ample, ifA is an attributeof unitsand Y is a response thenbe able to ascertain whether thestudents did or did
variable, then notstudythematerial-these are the S causes,t and c.
The response variable is the score Y on the testgiven
A -Y (37)
subsequentto these events. Diagram(43) indicatesthat
is meaningless. On theotherhand,ifS indicates exposure encouragement can affect studying and possiblythe test
to causesand Ys is an observedresponsevariable, then scoresand thatstudying can affect the scores.Forexample,
one might hypothesize that encouragement reallydoesnot
S --->
Ys (38) affect in the
testscoresdirectly. Thiswouldbe expressed
is a meaningful diagram. model by
Whathappenswhenwe add a thirdvariableto thissys-
Ytf'(u) - Yc,'(u) = 0 (44)
tem?Thereare severalpossibilities. If A is an attribute,
thenit is eithera pre-or post-exposure variable.In the forall u in U ands = torc. Formoreon "encouragement
firstcase we mightdenotethisas designs"see Powersand Swinton(1984).
A S- Ys (39)
The essential pointI wishtomakeaboutthesediagrams
isthattheyareeasilyinterpreted interms ofRubin'smodel
toindicatethetimeflowbutwithout anyarrowfrom A to whentheyarenotcausallymeaningless. Thecausalmodel
S orYs. In thesecondcasethevalueofA might has not been carefulin separating
be affected literature meaningful
byexposureto thecause and we wouldneedto indicate and meaningless causalstatements and pathdiagrams, in
Blalock,H. M., Jr.(ed.) (1971),CausalModelsin theSocialSciences, McCurdy,R. (1957), "Letterto the Editor," BritishMedical Journal,2,
Chicago:Aldine-Atherton. July20.
Bunge,M. (1959),Causality andModernScience(3rded.), NewYork: Mill,J.S. (1843),A System
ofLogic.
DoverPublications. Neyman, J. (with Iwaszkiewicz, K., and Kolodziejczyk, S.) (1935),
Cochran, W.G. (1983),Planning andAnalysis ofObservational Studies, "Statistical Problems in AgriculturalExperimentation"(with dis-
NewYork:JohnWiley. cussion),Supplement
of Journalof theRoyalStatistical
Society,2,
Cook,R. D. (1980),"Smoking andLungCancer,"inR. A. Fisher:An 107-180.
Appreciation, eds. S. Fienberg andD. Hinkley, NewYork:Springer- Powers, D. E., and Swinton,S. S. (1984), "Effectsof Self-Studyfor
Verlag. Coachable Test ItemTypes,"JournalofEducationalMeasurement, 76,
Cornfield,J.,Haenszel,W.,Hammond, E. C., Lilienfeld,
A. M., Shimkin, 266-278.
M. B., andWynder, E. L. (1959),"Smoking andLungCancer:Recent Rosenbaum, P. R. (1984a), "From Association to Causation in Obser-
Evidenceanda Discussion ofSomeQuestions," JournaloftheNational vational Studies: The Role of Tests of StronglyIgnorableTreatment
CancerInstitute, 22, 173-203. Assignment,"
Journal
oftheAmerican
Statistical
Association,
79,41-
Cox,D. R. (1958),ThePlanning ofExperiments, NewYork:JohnWiley. 48.
Doll, R., and Hill,B. (1950),"Smoking andCarcinoma oftheLung," (1984b), "The Consequences of Adjustmentfora Concomitant
British MedicalJournal, 2, September 30,739-748. Variable That Has Been Affectedby the Treatment,"Journalof the
(1952),"A StudyoftheAetiology ofCarcinoma oftheLung," Royal StatisticalSociety,Ser. A, 147, 656-666.
BritishMedicalJournal, 2, December13,1272-1286. (1984c), "ConditionalPermutationTestsand thePropensityScore
(1956),"LungCancerandOtherCausesofDeathinRelationto in Observational
Studies,"Journal
oftheAmerican
Statistical
Associ-
Smoking," British MedicalJournal, 2, November 10,1071-1081. ation,79, 565-574.
Duncan,0. D. (1975),Introduction toStructural EquationModels,New Rosenbaum, P. R., and Rubin, D. B. (1983a), "The CentralRole of the
York:AcademicPress. PropensityScore in ObservationalStudies for Causal Effects,"Bio-
Evans,A. S. (1978),"Causation andDisease:A Chronological Journey," metrika,70, 41-55.
American Journal ofEpidemiology, 108,249-258. (1983b), "Assessing Sensitivityto an Unobserved Binary Co-
Fisher,R. A. (1926),"TheArrangement ofFieldExperiments," Journal variatein an ObservationalStudyWithBinaryOutcome," Journalof
ofMinistry ofAgriculture, 33,503-513. theRoyalStatistical
Society,
Ser.B, 45,212-218.
-- (1957),"LettertotheEditor,"British MedicalJournal,2, July
6, (1984a), Discussion of "On the Nature and Discoveryof Struc-
43. ture,"byJ.W. Prattand R. Schlaifer,JournaloftheAmerican Statistical
Florens,J.P., andMouchart, M. (1985),"A LinearTheoryforNoncau- Association,79, 26-28.
sality,"Econometrica, 53, 157-175. (1984b), "Reducing Bias in ObservationalStudies Using Sub-
Goldberger, A. S., andDuncan,0. D. (1973),Structural Equation Models classificationon the PropensityScore," Journalof theAmericanSta-
intheSocialSciences, NewYork:Seminar Press. tistical
Association,
79,516-524.
Granger,C. W. J. (1969),"Investigating Causal RelationsbyEcono- (1985a), "Constructing
a ControlGroupUsingMultivariate
Matched
metric ModelsandCross-Spectral Methods,"Econometrica, 37,424- SamplingMethodsThat IncorporatethePropensityScore," TheAmer-
438. icanStatistician,
39,33-38.
(1980),"Testing forCausality: A PersonalViewpoint," Journal (1985b), "The Bias Due to IncompleteMatching,"Biometrics,
ofEconomicDynamics and Control, 2, 329-352. 41, 103-116.
Hamilton, M. A. (1979),"Choosinga Parameter for2 x 2 Tableor2 x Rubin, D. B. (1974), "EstimatingCausal Effectsof Treatmentsin Ran-
2 x 2 TableAnalysis," American Journal ofEpidemiology, 109,362- domized and NonrandomizedStudies," Journalof Educational Psy-
375. chology,66, 688-701.
Hill,A. B. (1965),"TheEnvironment andDisease:Association orCau- (1977), "Assignmentof TreatmentGroup on the Basis of a Co-
sation,"Proceedings oftheRoyalSociety ofMedicine, 58,295-300. variate,"Journal
ofEducational
Statistics,
2, 1-26.
Holland,P.W.,andRubin,D. B. (1980,"CausalInference inProspective (1978), "Bayesian InferenceforCausal Effects:The Role ofRan-
and Retrospective Studies,"addressgivenat theJeromeCornfield domization,"The Annals of Statistics,
6, 34-58.
MemorialSessionof the AmericanStatistical AssociationAnnual (1980), Discussion of "RandomizationAnalysisof Experimental
Meeting, August. Data: The Fisher RandomizationTest," by D. Basu, Journalof the
(1983),"On Lord'sParadox,"in Principals ofModernPsycho- American
Statistical
Association,
75,591-593.
logicalMeasurement, eds. H. Wainerand S. Messick,Hillsdale,NJ: Saris, W., and Stronkhorst,H. (1984), Causal Modelling in Non-
LawrenceErlbaum. experimentalResearch, Amsterdam: Sociometric Research Foun-
Hume,D. (1740),A Treatise onHumanNature. dation.
(1748),An InquiryConcerning HumanUnderstanding. Smith,R. Jeffrey (1980), "GovernmentSays CancerRate Is Increasing,"
Kempthorne, 0. (1952),TheDesignandAnalysis ofExperiments, New Science,227, 998-1002.
York:JohnWiley. Suppes, P. C. (1970), A ProbabilisticTheoryof Causality,Amsterdam:
(1978),"Logical,Epistemological AspectsofNa-
andStatistical North-Holland.
ture-Nurture Data Interpretation," Biometrics, 34, 1-24. Yerushalmy,J., and Palmer, C. E. (1959), "On the Methodologyof
Locke,J.(1690),An EssayConcerning HumanUnderstanding, BookII, Investigationsof Etiologic Factors in ChronicDiseases," Journalof
ChapterXXVI. Chronic
Diseases,10,27-40.