10.1177/0193841X05275649
Paccagnella / MULTILEVEL
REVIEW / FEBR
MODELS
UARY 2006
OMAR PACCAGNELLA
University of Padua, Italy
In multilevel regression, centering the model variables produces effects that are different and
sometimes unexpected compared with those in traditional regression analysis. In this article, the
main contributions in terms of meaning, assumptions, and effects underlying a multilevel center-
ing solution are reviewed, emphasizing advantages and critiques of this approach. In addition, in
the spirit of Manski, contextual and correlated effects in a multilevel framework are defined to
detect group effects. It is shown that the decision of centering in a multilevel analysis depends on
the way the variables are centered, on whether the model has been specified with or without
cross-level terms and group means, and on the purposes of the specific analysis.
Keywords: multilevel model; group mean centering; contextual and correlated effects; col-
linearity; school effectiveness
AUTHORS NOTE: The author thanks Ita Kreft, Nick Longford, Enrico Rettore, Enrica Croda,
and an anonymous referee for their helpful comments. Financial support from the Italian Minis-
try of Education and Scientific Research to the project Dynamics and Inertia in the Italian
Labour Market and Policy Evaluation (Databases, Measurement Issues, Substantive Analyses)
is gratefully acknowledged.
EVALUATION REVIEW, Vol. 30 No. 1, February 2006 66-85
DOI: 10.1177/0193841X05275649
2006 Sage Publications
66
Paccagnella / MULTILEVEL MODELS 67
Raudenbush (1989b). Kreft, de Leeuw, and Aiken (1995) provided the main
contribution in terms of model specification. A comprehensive review was
given by Kreft and de Leeuw (1998).
This article provides an accurate overview of this topic, investigating the
meaning and the assumptions underlying a multilevel centering approach.
We emphasize the effects of various centering solutions, the advantages, and
the critiques with respect to these choices. In addition, in the spirit of Manski
(1995), we define endogenous, contextual, and correlated effects in a multi-
level framework, stressing how these are strongly related to the evaluation of
Type B effects in the school effectiveness issue. The decision of centering in a
multilevel analysis is significantly determined by research or policy ques-
tions: We study how a centering solution could help the researcher to detect
group effects.
We focus on descriptive models rather than heavy technicalities because
our purpose is to reach a wide audience and to introduce the multilevel cen-
tering issue to researchers who have had little experience with the multilevel
approach.
The work is organized as follows. We introduce the multilevel approach
and its main features. Then, we analyze the differences between the tradi-
tional and the multilevel regression interpretations of a centered variable.
Next, we examine the centering issue from a technical point of view, showing
the effects of introducing grand mean and group means in terms of model
specification. The subsequent two sections analyze the centering approach
from a theoretical point of view: We focus on the reasons and the advantages
of a centering solution to assess contextual and correlated effects, discussing
also some specific examples. We take into account the critiques to this
choice. Finally, we provide some conclusions and suggestions for practices.
a j = m + g Z j + u j
b = q + h Z + v ,
j j j
uj w l
V = .
vj l t
The error terms eij and (uj, nj) are assumed uncorrelated. Usually, they are
assumed to be normally distributed. In a compact form, Equation 1 is equal to
J nj nj
1 1
X =
S j nj
X ij X j =
nj
X ij .
j =1 i =1 i =1
Paccagnella / MULTILEVEL MODELS 69
m* = (m + qX) q* = q
g* = (g + h X) h* = h
with
w* = (w + t X2 + 2l X)
l* = l + t X
t* = t.
72 EVALUATION REVIEW / FEBRUARY 2006
1
l>- ( w + t X 2 )
2 X
to ensure w* > 0.
~ ~
The group mean centered score X ij is defined as X ij = Xij Xj, and we
~
proceed as above. Replacing Xij = X ij + Xj in Equations 3 and 4 and re-
arranging the terms, we get
~
E(Yij|I) = (m + q Xj) + q X ij + (g + h Xj)Zj
~ ~ ~
+ h Zj X ij = m*j + q* X ij + g*jZj + h*Zj X ij (7)
~
V(Yij|I) = (w + t X2j + 2l Xj) + 2(l + t Xj) X ij
~ ~ ~
+ t X 2ij + x = w*j + 2l*j X ij + t* X 2ij + x. (8)
aj = m + g Zj + d Xj + uj
bj = q + h Zj + nj, (9)
The error terms eij and (uj, nj) are defined as above. The conditional vari-
ance of Yij does not change with respect to Equation 4, whereas the condi-
tional expected value becomes
~
Replacing Xij = X ij + Xj in Equation 11 and rearranging, we get
~
E(Yij|I) = m + q X ij + (g + h Xj)Zj + (q + d)Xj
~ ~ ~
+ h Zj X ij = m* + q* X ij + g*jZj + d*Xj + h*Zj X ij.
Again, the parameters in the group meancentered model are not simple
functions of the parameters in the raw score model: The g* coefficient
depends on Xj, even if for all other parameters we are able to give inverse
relationships. However, if the slope parameter bj in Equation 9 does not
account for any group variables Zj (that is, h = 0), all relationships may be
inverted.
The conditional variance of Yij in Model 10 with the group mean
centering solution is equal to Equation 8. As above, we can invert uniquely
the coefficients only in the case of a random intercept model specification.
Apart from the specification of a random intercept model without the
cross-level term, the group meancentering solution in Model 10 does not
provide the model equivalence in the sense of Kreft, de Leeuw, and Aiken
(1995). However, random intercept models are frequently applied in multi-
level analyses.
Because deviation scores from the grand mean are not interesting, for the
rest of this article, the term centering will be applied only to the difference
~
from the group mean (that is, X ij = Xij Xj).
generally more similar than are observations from other groups. Currently,
the debate among economists, sociologists, and psychologists is about the
possible explanations for such evidence. Manski (1995) discussed three
hypotheses to explain how groups (or societies, or environments) may affect
individuals: endogenous, contextual, and correlated effects.3 He proposed
the following example. Studying high school achievements of students, we
may be interested in the relationship between an individual achievement y, an
individual observed characteristic x (e.g., socioeconomic status), and an indi-
vidual unobserved ability u. The model is completely characterized for each
student by a value for the variables (y, x, u). The variables (x, u) determine y
through a linear model:
Endogenous effects: The individual behavior tends to vary with the prevalence
of that behavior in the group. In the above example of high school achievement
of students, this effect arises when individual achievement y tends to vary with
the average achievement of all students in the group (or class, or school) E(y). In
Equation 12, an endogenous effect appears if b 0.
Contextual effects: The individual behavior tends to vary with the distribution of
background characteristics in the group. There is a contextual effect when the
propensity of an individual to behave in some ways varies with the mean of ex-
ogenous variables. In the example of Model 12, these exogenous variables are
the characteristics x. Parameter g expresses the direct effect of x on y, whereas
parameter d (if different from zero) expresses the contextual effect on y.
Correlated effects: Individuals tend to behave similarly because they have simi-
lar individual characteristics in the group (i.e., ability, propensity of learning) or
become part of similar institutional environments (i.e., students are in the same
class, with the same teachers, etc.). In Model 12, correlated effect is expressed
by u variable.
adopt other approaches, such as the Cronbach model (Cronbach and Webb
1975), the ANOVA model (Aitkin and Longford 1986), the contextual model
(Iversen 1991), and the aggregate regression (Kreft and de Leeuw 1998).
However, the multilevel solution is the most flexible available thus far: It
allows decomposition of the total variation in an individual component and a
group component, as well as in a between-group component and a within-
group component, explicitly accounting for the group structure in the model.
We consider a random intercept multilevel model with group means as
Level 2 regressors but without any cross-level term to guarantee the model
equivalence as shown in the previous section:
The parameter q expresses the direct effect of Xij on Yij. According to the
above definitions, we observe that such model specification does not allow
for any endogenous effect. If d 0, Model 13 expresses contextual effects
because Yij varies with the mean of the exogenous variable Xij among individ-
uals in the jth group. The correlated effect is accounted for by the term (g Zj +
uj). The Zj variable describes the observed group characteristics, whereas uj
represents the unobserved group effects. Model 13 expresses correlated
effects through the variance of uj and the parameter g, if it is different from
zero.
Distinction between contextual and correlated effects is important
because each of these effects is different and has different consequences for
the prediction of social interactions.
Burstein (1980) proposed the following example. A group mean ability
may be expected to affect instructional practices by causing teachers in the
schools to adjust their instructions to the level of the group. We may expect
that individuals within the group (or school) learn more or less than they
would have in other groups (or schools). Consequently, individuals learn
more when they have a high ability (individual effect) and when they learn in
a group with a generally high ability because teachers may offer a higher
level of instruction. Burstein defined the effect due to the behavior of teachers
as contextual. So, he proposed the ability group mean as a proxy of it.
Burstein estimated a contextual effect because he specified the average abil-
ity of the students in the class, but its real purpose was to assess a correlated
effect, wherein all students are affected by the same instructional practices.
An interesting application of contextual analysis and the centering
approach is the estimation of frog pond effect, discussed by Davis (1966).
Individuals in every group could use some group characteristics or properties
as a point of comparison. Their success is judged by their relative standing in
76 EVALUATION REVIEW / FEBRUARY 2006
the group because they can choose to be big frogs in little ponds or little
frogs in big ponds (p. 21). Frog pond effect measures a relative (as regards
the group) characteristic of the individual. It is typically calculated by the
impact of the within component Xij Xj (Burstein 1980). So, according to
personal beliefs and study purposes, researchers may specify a model such as
or
The differences are only in the magnitude of the parameters for the individual
and the contextual effects.
Firebaugh (1978) introduced the X rule for detecting cross-level bias.
Cross-level inferences arise when the researcher studies relations at one level
from analyses conducted at another level. This may be either downward (the
ecological fallacy; Robinson 1950), when relations determined between
aggregated data are translated to individual data, or upward (the individual-
istic fallacy; Alker 1969), when the researcher uses individual-level data to
find relationships about aggregate-level variables. Here, cross-level bias is
meant as downward fallacy. The justifications of the use of cross-level infer-
ence involve practical rather than theoretical considerations. It may happen
that for inferring about individuals, there is prevalence of readily accessible
and easily analyzable aggregated data, or simply individual-level data are
unavailable (e.g., census or voting data). In a simple model such as
Yij = a + b Xij + g Xj + e,
where e is a generic zero mean error term with constant variance, Firebaugh
showed that cross-level bias is absent when and only when g = 0.
Hofmann and Gavin (1998) used four paradigms (called incremental,
mediational, moderational, and separate paradigms) as a description
of the way in which multilevel research has been conducted within the orga-
nizational sciences. They focused on the different types of hierarchical mod-
els that have been investigated within the organizational sciences, discussing
how the centering decision relates to these different models: They pointed
out that centering options should be chosen carefully and thoughtfully, with
a view less towards the statistical differences and more towards the concep-
tual questions under investigation (p. 639).
Paccagnella / MULTILEVEL MODELS 77
As indicated also by Aitkin and Longford (1986), this model often suffers
from high collinearity, so it performs poor precision estimates. For this rea-
son, Raudenbush suggested specifying deviations from the group mean:
aj = m j
b = bW
g = bB bW .
SOME EXAMPLES:
STUDYING SCHOOL EFFECTIVENESS
NELS test batteries have been linked by both the Educational Testing Service
and RAND, and the linking of these test batteries is sufficiently strong to jus-
tify using the linked scores as a basis for judging changes in grading stan-
dards. However, these two mathematical tests are not equivalent, so differ-
ences between them have to be taken into account in the findings of any
analysis.
For their study, Koretz and Berends (2001) applied a linear multilevel
model with 11 student-level variables and the school means of each of
the individual variables. However, none of the student-level variables were
centered on their school means: A model like Model 14 had been speci-
fied, but this model may perform poor precision estimates because of high
collinearity.
Although Koretz and Berends (2001) found no substantial grade inflation
between 1982 and 1992, at least in mathematics, comparing HSB and NELS
data gave the following interesting result: The multilevel model estimated in
NELS shows the same general patterns as the comparable model in HSB, but
the size of some estimates differ, and the authors explained that for some vari-
ables, the cause of these changes is unclear. Other than differences in HSB
and NELS tests, another explanation for these results could be the collinear-
ity among some variables in the model.
Berends et al. (2002) assessed the effect of New American Schools (NAS)
designs implemented in a sample of schools in an American high-poverty
district during the 1997-1998 school year. NAS is a private, nonprofit organi-
zation that in 1991 launched an ambitious effort for whole-school reform to
address the common perception that American schools were failing students
(particularly in high-poverty settings) and that the piecemeal reform efforts
had produced so few meaningful improvements in the nations educational
system. Berends et al. investigated the impact of NAS designs and classroom
practices through a comparison of conditions within NAS and non-NAS
classrooms, studying how differences in institutional conditions were related
to student achievements, the net of other student, classroom, and school
factors.
They defined a linear multilevel model in which the dependent variable
was the fourth-grade Texas Assessment of Academic Skills reading and
mathematics scores. A wide set of student-level variables were available for
the analysis, and all of these were centered on their classroom-level means;
some teacher background and school characteristics as well as the classroom
means of the student-level variables were also included in the model. No
cross-interaction was taken into account. They found that NAS designs had
no significant effects on student achievement, but this was expected because
Paccagnella / MULTILEVEL MODELS 81
schools and classroom were at the early stages in the implementation of the
designs. Some contextual and correlated effects were statistically significant,
but a very interesting result was the effect of the class gender composition:
Classes with more boys than girls tended to have lower average reading
scores, whereas at the student level, no difference in performance between
boys and girls appeared.
Biggeri, Bini, and Grilli (2001) studied the individual factors (graduates
characteristics) that determine the transition from university to work and
assessed the differences between course programs and universities with
respect to the labor market outcomes of their graduates in an Italian case;
courses and universities were compared through an analysis of the residuals
(Goldstein and Healy 1995). A three-level model was estimated, but the indi-
vidual variables were not centered on any means, and no group means or any
university characteristics were introduced. Because the aim of the study was
on individual effects (Type A effects), centered variables were not needed in
this model specification, and the effects of course programs and universities
were accounted only by the two-level and three-level variance terms. The
analysis of the residuals did not require the estimation of any group variables.
whereas in a model with group mean deviation scores, the random part is
where
Yj = mj + bBXj + uj.
Third, Plewis (1989) noted that the mean is not the only available variable
for measuring contextual effects, but there are also, for example, the group
median, the within-group standard deviation, and the coefficient of variation.
However, Raudenbush (1989b, 9) replied that the effects of the median of Xij
may be similar, though in most cases the median will be less stable than the
mean; also the median, unlike Xj, is not orthogonal to Xij Xj.
The clustering approach could lead to an incorrect assessment of contex-
tual and correlated effects if the nature of the groups is not properly
accounted. As explained by Robinson (1950) and Hannan and Burstein
(1974), researchers may involve natural groups (e.g., neighborhoods or the
whole population) or arbitrary-created regions (e.g., census tracts or
administrative units). Individuals in arbitrary-created groups tend to be
more homogeneous than are those in natural groups. In the former, grouping
is created by criteria that may be correlated with many social characteristics
Paccagnella / MULTILEVEL MODELS 83
(e.g., race, age, literacy, etc.). Therefore, membership does not tend to be ran-
dom with respect to those characteristics. In the latter, units tend to interact
with each other and share relevant life experience, so random group effects
are most likely.
CONCLUSIONS
NOTES
1. Kreft, de Leeuw, and Aiken (1995) investigated the centering issue by means of a random
slope model without any group variables. Kreft and de Leeuw (1998) discussed these
84 EVALUATION REVIEW / FEBRUARY 2006
relationships in models with group variables, but their proofs are a bit different from those of
Kreft, de Leeuw, and Aiken and from ours.
2. This may happen in repeated measures analysis.
3. Defining a single-group effect is probably restrictive. Groups are not one-dimensional
but may have several distinguishable characteristics and properties. Firebaugh (1978) talked
about a composite group effect because the aim is to study the overall impact of all group char-
acteristics on Yij.
REFERENCES
Aitkin, M., and N. T. Longford. 1986. Statistical modelling issues in school effectiveness studies
(with discussion). Journal of the Royal Statistical Society A 149 (1): 1-43.
Alker, H. R. 1969. A tipology of ecological fallacy. In Quantitative ecological analysis in the
social sciences, ed. M. Dogan and S. Rokkan, chap. 4, 69-86. Cambridge, MA: MIT Press.
Berends, M., J. Chun, G. Schuyler, S. Stockly, and R. J. Briggs. 2002. Challenges of conflicting
school reforms: Effects of new American schools in a high-poverty district. MR-1483-EDU.
Santa Monica, CA: RAND.
Biggeri, A., M. Bini, and L. Grilli. 2001. The transition from university to work: A multilevel
approach to the analysis of the time to obtain the first job. Journal of the Royal Statistical
Society A 164 (2): 293-305.
Burstein, L. 1980. The analysis of multilevel data in educational research and evaluation. Review
of Research in Education 8:158-233.
Cronbach, L. J. 1976. Research in classrooms and schools: Formulation of questions, designs
and analysis. Occasional paper. Stanford, CA: Stanford Evaluation Consortium.
Cronbach, L. J., and N. Webb. 1975. Between-class and within-class effects in a reported apti-
tude treatment interaction: Reanalysis of a study by G. L. Anderson. Journal of Educa-
tional Psychology 67 (6): 717-24.
Davis, J. A. 1966. The campus as a frog pond: An application of the theory of relative deprivation
to career decisions of college men. American Journal of Sociology 72:17-31.
Firebaugh, G. 1978. A rule for inferring individual-level relationships from aggregate data.
American Sociological Review 43:557-72.
Goldstein, H. 1997. Methods in school effectiveness research. School Effectiveness and School
Improvement 8 (4): 369-95.
. 2003. Multilevel statistical models. 3rd ed. London: Edward Arnold.
Goldstein, H., and M. J. R. Healy. 1995. The graphical presentation of a collection of means.
Journal of the Royal Statistical Society A 158 (1): 175-77.
Hannan, M. T., and L. Burstein. 1974. Estimation from grouped observations. American Socio-
logical Review 39:374-92.
Hofmann, D. A., and M. B. Gavin. 1998. Centering decisions in hierarchical linear models:
Implications for research in organizations. Journal of Management 24 (5): 623-41.
Iversen, G. 1991. Contextual analysis. Newbury Park, CA: Sage.
Koretz, D., and M. Berends. 2001. Changes in high school grading standards in mathematics,
1982-1992. MR-1445-CB. Santa Monica, CA: RAND.
Kreft, I. G. G., and J. de Leeuw. 1998. Introducing multilevel modeling. London: Sage.
Kreft, I. G. G., J. de Leeuw, and L. Aiken. 1995. The effect of different forms of centering in hier-
archical linear models. Multivariate Behavioral Research 30 (1): 1-21.
Paccagnella / MULTILEVEL MODELS 85
Longford, N. T. 1989a. Contextual effects and group means. Multilevel Modelling Newsletter 1
(3): 5, 11.
. 1989b. To center or not to center. Multilevel Modelling Newsletter 1 (3): 7, 11.
Manski, C. F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard
University Press.
Plewis, I. 1989. Comment on centering predictors in multilevel analysis. Multilevel Modelling
Newsletter 1 (3): 6, 11.
Raudenbush, S. W. 1989a. Centering predictors in multilevel analysis: Choices and conse-
quences. Multilevel Modelling Newsletter 1 (2): 10-12.
. 1989b. A response to Longford and Plewis. Multilevel Modelling Newsletter 1 (3): 8-11.
Raudenbush, S. W., and J. Willms. 1995. The estimation of school effects. Journal of Educa-
tional and Behavioral Statistics 20 (4): 307-35.
Rettore, E., and A. Martini. 2001. Constructing league tables of service providers when the per-
formance of the provider is correlated to the characteristics of the clients. Proceedings of the
SIS Meeting: Processes and Statistical Methods of Evaluation, 159-62.
Robinson, W. S. 1950. Ecological correlations and the behavior of individuals. American Socio-
logical Review 15:351-57.
Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis: An introduction to basic and
advanced multilevel modeling. London: Sage.
van den Eeden, P., and H. J. M. Httner. 1982. Multi-level research. Current Sociology 30 (3): 1-
181.