Anda di halaman 1dari 10

What Are Value-Added Models Estimating and What Does This Imply for Statistical Practice?

Author(s): Stephen W. Raudenbush


Source: Journal of Educational and Behavioral Statistics, Vol. 29, No. 1, Value-Added
Assessment Special Issue (Spring, 2004), pp. 121-129
Published by: American Educational Research Association and American Statistical Association
Stable URL: http://www.jstor.org/stable/3701310 .
Accessed: 06/12/2014 20:27
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Educational Research Association and American Statistical Association are collaborating with
JSTOR to digitize, preserve and extend access to Journal of Educational and Behavioral Statistics.

http://www.jstor.org

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

Journal of Educationaland BehavioralStatistics


Spring2004, Vol. 29, No. 1, pp. 121-129

What Are Value-Added Models Estimating and


What Does This Imply for Statistical Practice?
Stephen W. Raudenbush
Universityof Michigan
The question of how to estimate school and teacher contributionsto student
learningis fundamentalto educationalpolicy and practice,and the threethoughtful articlesin this issue representa majoradvance.The currentlevel of public confusion aboutthese issues is so severe and the consequencesfor schooling so great
thatit is a big relief to see thisjournalhighlightthe key issues.
A common theme in these articlesis thatwe should compareschools or teachers by comparingtheir"valueadded"to studentlearningratherthanby comparing
unadjustedmean levels of achievement or, as is currentlycommon practice, the
percentof studentsin a school or class who are classified as "proficient."As Ballou, Sanders,and Wright(BSW) note, it makes no sense to hold schools accountable for mean achievement levels when students enter those schools with large
meandifferencesin achievement.Moreover,given the remarkablemobilityof studentsacross schools, particularlyin largeurbandistricts,changesin meanachievement at the school level may bear little relationto instructionaleffectiveness.
In contrast,the value-addedphilosophyis to hold schools andteachersaccountable for the learninggains of studentsthey serve. This seems simple enough, yet
the technical questions raised in these articles are many: whether and how to
adjustfor covariates, whetherteachers (or schools) should be treatedas fixed or
random, how to represent cumulative effects of teachers or schools, how to
model covariationin studentresponses and teacher effects, whether and how to
incorporatemultiple cohorts, and how to formulate models that appropriately
handle missing data.
A priorquestionis: "Whatarewe tryingto estimatewith these models?"School
and teachereffects are causal effects, yet the treatmentsstudentsexperience and
the potentialoutcomes underalternativetreatments(Rubin, 1978; Rosenbaum&
Rubin, 1983; Holland, 1986) are not clearly defined in these discussions. As a
result, we are not clear about the experimentswe are trying to approximatewith
value-addedanalyses or, therefore,about the prospects of achieving reasonable
approximations.In my view, definingpossible treatmentsand potentialoutcomes
eliminatessome of the confusionby showing whatkindsof effects can andcannot
reasonablybe estimated.
Two Kinds of Effects
Raudenbushand Willms (1995) (RW) defined two kinds of causal effects that
might be estimatedin a school accountabilitysystem. The first or "TypeA" effect
121

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

Raudenbush
is of interestto a parentselectinga school for herchildren.The second,or '"TypeB"
effect is of interestto districtor state administratorswho wish to hold school personnel accountablefor theircontributionsto studentoutcomes.RW describedplausible conditionsfor unbiasedestimationof Type A effects. In contrast,they found
the prospectsfor Type B effects unpromisinggiven the kind of data available in
accountabilitysystems.'
RW reasoned that the child's potential outcomes would be a function of preassignmentstudentcharacteristicsS, randomerrore, and two aspects of schools:
school context, C, and school practice,P. C includes the social environmentof the
school (e.g., the neighborhoodin which it is located) andthe social compositionof
the school. Teachersand administratorshave little or no controlover C, though C
might stronglycontributeto school effectiveness throughpeer interactions,parent
involvement, social norms, and the availability of role models (Coleman et al.,
1966; Willms, 1986; Lee & Bryk, 1989). In contrast,school leadersand teachers
do have substantialinfluenceover P, thoughP is likely also associatedwith C.
TypeA Effect
In termsof the Rubincausalmodel,theType A effect (of interestto parents)is the
andthat
differencebetweenchild i's potentialoutcomein schoolj, say Y1i(Si,C
,Pj,e11)
RW reasonedthat
child's potentialoutcomes in schoolj', thatis, Yij,(Si,Cj,,Pj,,eij,).
parentswould be indifferentregardingthe relativecontributionsof C and P to this
effect. Therefore,an experimentthat would reveal the Type A effect for parenti
wouldbe a studyin which studentshavinga commonS = Siwere randomlyassigned
to eitherschoolj or schoolj'.2 Treatmentassignmentwould be ignorable(independent of S) and so the expectedtreatmenteffect estimatefor comparingschoolsj and
one might
Withoutthebenefitof randomization,
j' would dependonly on Cj,Pj,Ci,,Pi,.
obtain an unbiasedestimate of the same causal effect by controllingfor observed
student-levelcovariatesX underthe assumptionof strongignorability,namely that
the potentialoutcomes are not associatedwith school assignmentaftercontrolling
for X. In particular,this assumptionimplies thatX capturesthe associationbetween
S andschoolassignment,so thatonly Cj,Pj,CjC,Pj,
contributesystematicallyto theestimatedschooleffects. TypeA effects arearguablyestimablewithtolerablysmallbias
because the data availableto school accountabilityanalystsinclude some Xs that
likely are extremelyimportantin explainingthe link between studentbackground
and school assignment.In particular,schools thatcollect historicaldataon student
achievementalong with ethnicityandpovertystatusprovideXs thatarelikely very
informativeaboutpotentialoutcomes.
TypeB Effect
In contrast,the Type B effect (of interestto districtor state officials) is the difference between child i's potentialoutcome in schoolj when school practiceP3 is
in operation,yielding
andthatchild's potentialoutcomes in school
RW reaPi, is in operationthat is, yielding Y1i(Si,Cj,Pj,eij).
j when school practiceYIo(Si,Cj,Pj,e*)
soned that district or state officials would not want to hold school personnel
122

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

WhatAre Value-AddedModels Estimating


accountablefor C, over which those personnelhave no control. Officials would,
however, wantto hold personnelaccountablefor theirpractice,P. Importantly,the
accountabilitysystem, if effective, would lead to a change in P, but not, at least in
the short run, to a change in C.3Therefore,an experimentthat would reveal the
Type B effect would be a studyin which schools were assigned at randomtopractices, P.4 Treatmentassignmentwould be ignorable(independentof S and C), and
so the expected treatmenteffect estimate would depend only on P', and Pi. Without the benefitof randomization,one mightobtainan unbiasedestimateof the same
causal effect by controllingfor observed student-levelcovariates,X, and schoollevel covariates, W, underthe assumptionof strong ignorability,namely that the
potentialoutcomes are not associated with the school-level treatmentassignment
after controllingfor X and W. Thus, strong ignorabilityimplies thatX,W capture
the associationbetween S,C and the assignmentof schools to P.
The problem with non-experimentalapproximationsto the school-based randomizedtrialis not thatcovariatesX,Wareunavailable.The difficultyis thatschool
practiceP is not defined, much less observed! Thereforewe cannot assess which
Xs and Ws are correlatedto treatmentassignment. A common practicein school
accountabilityresearchis to regress the outcome on X and W and to assume that
the school mean residualis a good estimate of P. But this practicecannot reveal
the effect of P unless we assume that P is uncorrelatedwith X and W.5Thus, the
prospectsfor estimatingType B effects are dim at best.
Implicationsfor ValueAddedModels (VAM)
This reasoningconcerningType A and Type B effects has importantimplications for VAM. I believe it explains in partwhy BSW expresseddiscomfortinterpretingVAM resultswhen school-level poverty(as indexedby percentof students
receiving free lunch) was controlled. It also helps explain why Tekwe et al.
expressed uncertaintyabout theirresults controllingfor covariatesat both levels.
And it explains in a conceptual way a vexing problem revealed in McCafferty
et al.'s technicalanalysis:namely, thatestimationof teachereffects is most problematic when schools serve very differentkindsof students.Clearly,the morevariable Cj is in the RW model, the more problematic it is to assume that VAM
estimatescorrespondto Pi, the implicit object of interestin these articles.
The problemof estimatingType B effects is even morepronouncedwhen the aim
is to estimateschool andteachereffects simultaneously,as in the VAM proposedby
McCaffertyet al. Classroomsas well as schools will be characterizedby contextual
conditionsand practicesthatcontributeto studentlearningindependentof student
background.A Type B analysiswould aim to separatethe effects of the practiceat
the school level and at the teacherlevel. Since practiceis unobservedat both levels
in accountabilitysystems, this separationappearsinaccessible in accountability
analyses.However,the indeterminacyof school versusteachereffects is a nonissue
for TypeA effects.In thiscase the parentmightfirstselecta school andthena teacher
withina school. Alternatively,the parentcan look acrossall schools andclassrooms
and pick the classroomthathas the highest expectedvalue for his child, regardless
123

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

Raudenbush
of whetherthatvalue is attributableto school contextor practiceor classroomcontext or practice.If we view the type A effect for a class to be the combinedresultof
school andclassroomcontextandpractice,this effect can be estimatedwithoutbias
conditionalon the strong ignorabilityassumptionthat student-levelcovariatesX
accountfor the associationbetweenpotentialoutcomesandclassroomassignment.
As BSW point out, care must be takenin estimatingand adjustingfor X in estimatingwhat I am calling Type A effects. They use a two-step procedure:estimate
a regressionusingX as covariatewith fixed effects of teachers.The coefficientsfor
X are then estimatesof the pooled, within-schoolcoefficient, often denotedPw.As
RW point out, this estimation can easily be accomplishedby centeringX within
teachers,obviatingthe need to enterteacherdummyvariables.In the second step,
an adjusteddependentY- Xp is used in the accountabilityanalysis.
Modeling Teacher and School Effects on Student Growth
As the previousdiscussionshows, it does not appearpossible to separateteacher
and school effects using currentlyavailable accountabilitydata. At one extreme,
one mightattributeall variationbetweenclassroomsto teachers.In thatview, mean
differences between schools are just differences in aggregateteacher effects. At
the other extreme, all variationbetween schools is attributableto variationin the
skill of school managementand other school organizationalfeatures, including
instructionalcoordinationacrossgrades,teachercollaboration,teachercontrol,and
school-level resources.In this view, teacherscan be held accountableonly for the
classroomvariationwithin schools. A rangeof views arelocatedon the continuum
between these two extremes,but these views cannotbe adjudicatedwithouta theory of what makes schools and teacherseffective and without a researchagenda
that explicitly assesses the causal effects at each level. In short, one needs good
estimates of Type B effects at each level, but these are inaccessible at eitherlevel
if the relevantschool and classroomprocesses are not observed.
So VAM are best aimed at assessing the Type A effect definedas the combined
effectsof contextandpracticeat the classroomand school levels. I believe it is useful to definethe potentialoutcomesassociatedwith this effect as a way of informing
model specification,evaluation,andinterpretation.
A useful way to do so is to view
each studentas possessing a smooth trajectorythat would describe that student's
growthif thatstudentencountered"average"teachersandschools.The TypeA effect
in any year is then definedas a deflectionfrom this expectedcurve. Of course this
assumesan equatedmetricover time, as these articlesemphasize.
This idea is displayed in Figure 1. The dashed line describes a hypothetical
student's expected trajectorygiven "average"schools and classrooms. This student encounters a "non-average"classroom (classroom j) at time t, yielding
observedachievementY(j) at time t + 1. If this studenthad insteadencounteredan
averageclassroomat time t, the outcomewould have been the counterfactualY'?)
The causaleffect associatedwith attendancein classroomj is then YI1This
Y~1.
seems straightforward
whom
enough.But whataboutthe causaleffect of teacherj',
our student experiences at time t + 1? Presumably
the combined
Yt 2 is
Yt?
124

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

What Are Value-Added Models Estimating

Outcome

STime

t+l

t+2

FIGURE 1. Thedashed curve is the expectedtrajectoryof a given student "average"for


schools and teachers. For simplicitythis studentis "on trajectory"until time t. If assigned
to teacherj, the studentwill exhibitoutcome Y~)+. Thecausal effect of teacherj is thus the
-

deflection
Y)+,

Yt+1.

causal effect of having experiencedteachersj andj', but how should we decompose this combinedeffect into pieces attributableto the two teachers?McCafferty
et al. make an extremely useful contributionby parameterizinga "rateof decay"
in teachereffects over time. This enablesthe datato drivethe decompositionrather
thanassuminga priori thateffects are cumulativeand additive.
A Polynomial GrowthModel
To representthe conceptionof Figure 1 in the VAM, it seems sensible to represent each student'scounterfactualexpectedtrajectoryas a polynomialof appropriate degree.This implies a randomcoefficientmodel for studentgrowthaugmented
by a "deflectionmodel"for value added (Raudenbush& Bryk 2002). In contrast,
BSW use an unstructured
covariancematrixto representstudentcontributionsto the
covariancestructurewith addedrandomeffects of teachers.And McCaffreyet al.
expressa preferencefor the unstructuredcovariancestructureas more generalthan
the randomcoefficient model illustratedin Raudenbushand Bryk or "RB."RB's
illustrativeexample involved a polynomial of degree 1 or "straight-line"growth
model. McCaffreycriticize such a model for placing strongrestrictionson the vari125

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

Raudenbush
ance structure(the model implies increasing varianceif the correlationbetween
interceptand slope is positive). Yet RB never recommendeda life-long commitmentto the straight-linemodel! In reality,the polynomialapproachallows a range
of models varyingfrom simple (e.g., the straight-linemodel) to complex. Indeed,
if the numberof time points is T, then a T- 2 degreepolynomialwith time-specific
within-subjectvariancesis a saturatedmodel identicalto the unstructuredmodel. A
good argumentcan be made for selecting the lowest-orderpolynomialthatreasonably fits the data.One may anticipatethatthe simplermodel, if justified,will supply
moreprecisionin estimatingteachereffects. It also is moreflexiblethanthe unstructuredcovariancematrixin allowingfor the timingof testingto varyacrossstudents.
Considera simple model for studentgrowthand value added:

Yi

= A;i, + Zib + ei,

(1)

where Yiis a Tiby 1 vector of outcomes for studenti = , ... , n, 7iiis p + 1 vector
of randomcoefficients,Ai is a known Tiby p + 1 design matrixwith columns containing polynomial coefficients of degree p, and ei a within subject errorvector
assumedfor simplicity here to be distributedas N(0,o2,1T).By design each student
should have T observationsbut in fact only Tioutcomes were observed.Now Zi is
a Ti by J matrix having entries of 0 or 1 indicating whether student i had ever
encounteredteacherj by time t = 1,
Ti, and b is a J by 1 vector of teacher
...., , J and assumedN(0,21j). For simplicity
effects associatedwith teachersj = 1,....
I omit covariatesand assume - iidN(Aiy,t).Note thatI have assumed additive
ti
and cumulativeteachereffects. However, I do so for simplicity of exposition here
and acknowledge McCaffreyet al.'s advice to check and if necessary revise this
assumption.
Then, given knowledge of the variancecomponentsand y, the posteriormean
of the teachereffects is given by
=
E(bIY) [

Z(I -

A/I
x
i=1

Z(Y-

+o2/82-1

ClAT)Zi
A

).

(2)

ai

- Ay) is the posteriormeanof ni and C- =


Here = Aiy+
+
G2(ATiAia2z-')-~
is the posteriorCii'AiT(Y,
varianceof in a model withoutteachereffects.
ti
Hence, Equation2 representsa regressionin which the outcome is, YiA-i~,
the discrepancybetween the observed Yiand its predictedvalue using the standard
"empiricalBayes" polynomial coefficients. This outcome correspondsconceptually to the causal effect describedin Figure 1 where the dashedcurve is the empirical Bayes estimatedpolynomial for studenti. If the left-handside of Equation2
126

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

WhatAre Value-AddedModels Estimating


n

weresimply

-1

ZTZi ,these residualswouldsimplybe averagedoverthestudents

taughtby each teacher.The left-hand side would involve the inverse of a J by J


matrix
i=1

T
ZTz,and would likely be ill-conditioned. The addition of the term

adds priorinformation,increasingprecision throughappropriateshrinkage


and insuring the invertability of the matrix. Inclusion of the term I - AiCT1AT
weights down studentswhose counterfactualsare estimated with large posterior
varianceas a resultof missing data.
Here is an interesting trade off between assumptions and robustness that
deserves more research.In Model 2 the variationbetween studentsis Var(YiIb)=
covari1i = ATAT+
o21.This is a strongerassumptionthanallowingan unstructured
ance matrix,Ii. However,if justified,this strongerassumptionmay makebetteruse
of theobservedinformation,reducingthefractionof missinginformationandthereby
increasingrobustnessto non-ignorablemissingnesswhile also increasingprecision.
BSW wisely exploit the availabilityof tests in multiple subjectsto improvethe
precision of estimationof teachereffects on any specific subject, and McCaffrey
et al. include this multivariateapproachin their general model. This multivariate
outcome approachnot only reduces confoundingof teacherassignmentwith student background,as BSW indicate.It should also increaserobustnessof resultsto
non-ignorablemissingness.
02/82I

Multiple Cohorts
Raudenbush,Bryk,andPonisciak(2003) analyzeddatacollected on five cohorts
of studentsover five years in Washington,DC. Even with over 50,000 students,
precisionin estimatingteachereffects was modest.Using multiplecohortsappears
essential to obtain adequateprecision. Moreover, school effects were somewhat
unstable,implying a need to averageschool effects over multiplecohortsin order
to obtain a stable average effect. Finally, trends in improvement(gains in value
added)cannotbe estimatedwithout multiplecohorts.
Fixed vs. RandomEffects
Tekwe et al. find that a simpler fixed effect model produces similar "value
added"effects thana more complex randomeffects model. However theirinterest
is confined to estimating school effects with large samples of students and data
with two time points.It is well knownthatthe fixed effects andrandomeffects estimates converge as clustersizes grow large. Largecluster sizes do not apply,however, when teacher effects are of interest. And fixed effects models become
unwieldy when multipletime points and multiplecohorts are available.Give that
fixed effect estimateshave good propertiesonly in special circumstances,I would
recommendrandomeffects as a generalapproach.
127

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

Summary
In sum, the potentialbenefitsof specifying low-orderpolynomialmodels can be
combinedwith the benefitsof multiplesubject-areatests to yield a model with multiple growth curves per child. CovariatesX included as indicatedby BSW would
add furtherinformation.
Such an approachmay be useful in reducingconfoundingandincreasingrobustness to nonignorablemissingness and is worthy of furtherresearch.Moreover,
multiplecohortscan increaseprecisionandallow studyof changein valuedadded.
However, we must keep in mind thatour estimates are, at best, Type A effects,
of interestto parentsselecting schools, not Type B effects, of interestto officials
holding schools and teachersaccountablefor instructionalpractice.Certainlythe
estimates from VAM, when combined with other information,have potential to
stimulateuseful discussions about how to improve practice.But they should not
be takenas directevidence of the effects of instructionalpractice.
Notes
Goldsteinand Spiegelhalter(1996) discussedthis distinctionandit emergedin
the commentsof severalof theirdiscussantsin an issue of the Journalof the Royal
StatisticalSociety thathighlightedthemes common to those consideredin the current issue of the Journal of Educational and Behavioral Statistics. Willms and
Raudenbush(1989) considerthe stabilityof these effects over time.
2We must assumethatthe numberof studentsso randomizedper school is comparativelysmall lest the influxof new studentsmodify the context, C, which is part
of the treatment.
3A change in P could lead to a change in C over the long run if, for example,
more advantagedparentssend theirchildrento a school in orderto reapthe benefits of improvedpractice.
4 One might imagine an experimentin which studentsare assigned at randomto
schools thatvaryon P buthave the same C. While such an experimentwould reveal
the impact of P, conducting it would require that C be completely observed.
Assigning schools at randomto P eliminatesthat strongrequirement.
5If we assume strong ignorability(thatX and W adequatelycapturethe selection of schools into values of P), and thatthe regressionmodel assumptionshold,
then the varianceof the estimatesof the effect of Pj based on regressionis a lower
bound on the varianceof the Type B effects. The varianceof the Type A effect is
the upperbound.If these boundsareclose together,one can claim to have "bracketed"the varianceof the Type B effects. This doesn't help with estimatingeffects
for particularschools, however, and such individualestimatesare the object of an
accountabilitysystem.
References
Coleman,J., Campbell,E., Hobson,C., McPartland,
J., Mood,A., Weinfeld,F., et al.
(1966). Equalityof educationalopportunity.WashingtonDC: National Centerfor Edu-

cationalStatistics.
128

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions

WhatAre Value-AddedModels Estimating


Goldstein,H., & Spiegelhalter,D. J. (1996). League tables andtheirlimitations:Statistical
issues in comparisonsof institutionalperformance.Journalof the Royal StatisticalSociety A, 159(3), 385-443.
Holland,P. (1986). Statisticsandcausalinference.Journalof theAmericanStatisticalAssociation, 81(396), 945-960.
Lee, V., & Bryk, A. (1989). A multilevel model of the social distributionof educational
achievement.Sociology of Education,62, 172-192.
Raudenbush,S., & Willms. (1995). The estimationof school effects. JournalofEducational
and BehavioralStatistics,20(4), 307-335.
Raudenbush,S., Bryk, A., & Ponsiciak. S. (2003, April). School accountability.Paperpresented at annualmeeting of AmericanEducationalResearchAssociation, Chicago, II.
Raudenbush,S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applicationsand
data analysis methods,Second Edition.NewburyPark,CA: Sage.
Rosenbaum,P., & Rubin, D. (1983). The centralrole of the propensityscore in observational studies for causal effects. Biometrika,17, 41-55.
Rubin,D. B. (1978). Bayesian inferencefor causal effects: The role of randomization.The
Annals of Statistics,6, 34-58.
Willms, J. (1986). Social class segregation and its relationshipto student's examination
resultsin Scotland.AmericanSociological Review,51, 224-241.
Willms, J., & Raudenbush,S. (1989). A longitudinalhierarchicallinearmodel for estimating school effects and their stability. Journal of Educational Measurement, 26(3),
209-232.

Author
STEPHEN W. RAUDENBUSH is Professor, School of education, Professor, Survey
Research Center, and Professor,Departmentof Statistics and Sociology, University of
Michigan, 610 East University,4109 SEB, Ann Arbor,MI 48109. His areas of specialization are analysisof multileveldataand experimentaldesign.

129

This content downloaded from 193.205.210.44 on Sat, 6 Dec 2014 20:27:57 PM


All use subject to JSTOR Terms and Conditions