Anda di halaman 1dari 20

EVALUATION

10.1177/0193841X05275649
Paccagnella / MULTILEVEL
REVIEW / FEBR
MODELS
UARY 2006

CENTERING OR NOT CENTERING


IN MULTILEVEL MODELS?
The Role of the Group Mean
and the Assessment of Group Effects

OMAR PACCAGNELLA
University of Padua, Italy

In multilevel regression, centering the model variables produces effects that are different and
sometimes unexpected compared with those in traditional regression analysis. In this article, the
main contributions in terms of meaning, assumptions, and effects underlying a multilevel center-
ing solution are reviewed, emphasizing advantages and critiques of this approach. In addition, in
the spirit of Manski, contextual and correlated effects in a multilevel framework are defined to
detect group effects. It is shown that the decision of centering in a multilevel analysis depends on
the way the variables are centered, on whether the model has been specified with or without
cross-level terms and group means, and on the purposes of the specific analysis.

Keywords: multilevel model; group mean centering; contextual and correlated effects; col-
linearity; school effectiveness

In multilevel regression, centering the model variables produces effects


that are different and sometimes unexpected compared with those in tradi-
tional (or one-level) regression analysis. These differences depend on the
way the variables are centered, the model that has been specified, and the pur-
poses of the analysis.
The issue of centering methodology in multilevel regression appeared in
the 1970s in educational studies (Cronbach 1976; Firebaugh 1978; Burstein
1980). Then, it was the matter of a discussion in the Multilevel Modelling
Newsletter begun with Raudenbush (1989a), followed by two critiques
of Plewis (1989) and Longford (1989b), and ending with a response of

AUTHORS NOTE: The author thanks Ita Kreft, Nick Longford, Enrico Rettore, Enrica Croda,
and an anonymous referee for their helpful comments. Financial support from the Italian Minis-
try of Education and Scientific Research to the project Dynamics and Inertia in the Italian
Labour Market and Policy Evaluation (Databases, Measurement Issues, Substantive Analyses)
is gratefully acknowledged.
EVALUATION REVIEW, Vol. 30 No. 1, February 2006 66-85
DOI: 10.1177/0193841X05275649
2006 Sage Publications
66
Paccagnella / MULTILEVEL MODELS 67

Raudenbush (1989b). Kreft, de Leeuw, and Aiken (1995) provided the main
contribution in terms of model specification. A comprehensive review was
given by Kreft and de Leeuw (1998).
This article provides an accurate overview of this topic, investigating the
meaning and the assumptions underlying a multilevel centering approach.
We emphasize the effects of various centering solutions, the advantages, and
the critiques with respect to these choices. In addition, in the spirit of Manski
(1995), we define endogenous, contextual, and correlated effects in a multi-
level framework, stressing how these are strongly related to the evaluation of
Type B effects in the school effectiveness issue. The decision of centering in a
multilevel analysis is significantly determined by research or policy ques-
tions: We study how a centering solution could help the researcher to detect
group effects.
We focus on descriptive models rather than heavy technicalities because
our purpose is to reach a wide audience and to introduce the multilevel cen-
tering issue to researchers who have had little experience with the multilevel
approach.
The work is organized as follows. We introduce the multilevel approach
and its main features. Then, we analyze the differences between the tradi-
tional and the multilevel regression interpretations of a centered variable.
Next, we examine the centering issue from a technical point of view, showing
the effects of introducing grand mean and group means in terms of model
specification. The subsequent two sections analyze the centering approach
from a theoretical point of view: We focus on the reasons and the advantages
of a centering solution to assess contextual and correlated effects, discussing
also some specific examples. We take into account the critiques to this
choice. Finally, we provide some conclusions and suggestions for practices.

DEFINING MULTILEVEL MODELS

Multilevel models are developed for the analysis of hierarchically struc-


tured data. A hierarchy consists of lower level observations nested within
higher level(s). Examples are students nested within schools, employees
nested within firms, patients nested within clinics, and repeated measure-
ments nested within individuals. The lowest level observations are said to be
at Level 1 or at the micro-level. Higher levels, with respect to the hierarchy
structure, are defined as the macro-levels, Level 2, Level 3, and so on. Macro-
levels are usually referred to as groups. In educational and sociological envi-
ronments, they are often called contexts.
68 EVALUATION REVIEW / FEBRUARY 2006

The linear multilevel model is a type of regression model that is particu-


larly suitable for clustered data. It differs from the traditional regression
specification because the equation defining the model contains more than
one error term, one (or more) for each level. The dependent variable must be a
variable at Level 1 because the multilevel model explains what happens at the
lowest level.
Let j be the index for Level 2 units (j = 1, . . ., J) and let i be the index for
Level 1 units (i = 1, . . ., nj). The dependent variable Yij is a Level 1 variable
that is explained by the model:

Yij = aj + bjXij + eij (1)

a j = m + g Z j + u j
b = q + h Z + v ,
j j j

where Xij is a nonstochastic individual variable, Zj is a nonstochastic group


variable, (m, g, q, h)' is a vector of parameters that have to be estimated, eij is
an independent and identically distributed Level 1 error term with zero mean
and variance x, and (uj, nj)' is an i.i.d. vector of Level 2 error term with zero
mean and covariance matrix

uj w l
V = .
vj l t

The error terms eij and (uj, nj) are assumed uncorrelated. Usually, they are
assumed to be normally distributed. In a compact form, Equation 1 is equal to

Yij = m + q Xij + g Zj + h ZjXij + uj + njXij + eij. (2)

Equation 2 specifies a random slope model, so called because both the


intercept and the slope vary randomly in the model. When only the intercept
is assumed to vary randomly (i.e., nj = 0), a random intercept model follows.
The term ZjXij defines a cross-level interaction.
Let X be the grand mean and let Xj (j = 1, . . ., J) be the group mean of the
model; that is

J nj nj
1 1
X =
S j nj
X ij X j =
nj
X ij .
j =1 i =1 i =1
Paccagnella / MULTILEVEL MODELS 69

The conditional expectation and the conditional variance of Model 2 are


given by

E(Yij|I) = m + q Xij + g Zj + h ZjXij , (3)

V(Yij|I) = w + 2l Xij + t X2ij + x, (4)

where I is the set of information {Xij, Zj, X, Xj}.


A multilevel specification without any centered variables is called a raw
score model.
This article focuses on univariate multilevel models, but all results can be
easily extended to any multivariate hypotheses.1 For further details about the
specification, the meaning, and the application of a multilevel solution, see
Goldstein (2003) and Snijders and Bosker (1999).

THE CENTERING ISSUE:


FORMS, MEANING, AND THE
TRADITIONAL REGRESSION ANALYSIS

The term centering has to be interpreted in a broad sense. Many different


ways of scaling the variables are available: the mean, the median, the within-
group standard deviation, and the coefficient of variation (Plewis 1989).
Each of these has different implications in terms of intercept interpretation,
mean and variance form of the dependent variable Yij, and statistical proper-
ties. Nevertheless, the opposite of any form of centering is using raw scores.
We focus our discussion on the mean centering approach for several rea-
sons. First, it is the simplest and most applied scaling factor in many empiri-
cal analyses. Second, the group mean is one of the most used summary statis-
tical measures of the social context in educational and sociological studies.
Third, the mean centering solution offers an important statistical advantage,
which will be particularly useful in estimating the model (we will return to
this in detail later in the article).
In this section, we investigate the meaning of the model parameters adopt-
ing a mean centering solution instead of the raw score form. To this aim, a
comparison between the traditional (or one-level) regression model and a
multilevel regression model is particularly interesting: In the former case,
there is only one mean (the overall or grand mean) that can be used for the
centering solution; in multilevel regression, two different means could be
involved.
70 EVALUATION REVIEW / FEBRUARY 2006

In ordinary regression, centering does not change the magnitude of all


coefficients but only the magnitude of the intercept. The reason for doing this
is the change in the interpretation of only the intercept parameter. The inter-
cept term may be interpreted as the expected value of the dependent variable
when the explanatory variables are equal to zero. However, in a raw score
hypothesis, this is not often particularly meaningful because a zero value for
the explanatory variable itself may not be meaningful. On the contrary, cen-
tering can make regression coefficients easier to understand: We are studying
the expected value when each covariate is equal to its mean value.
In a multilevel approach, this situation is more complicated. Let us con-
sider as an example the school outcomes of students who belong to different
classrooms j.
Without centering (the raw metric scaling), the intercept can be inter-
preted as the expected outcome for a student in classroom j who has a zero
value on all explanatory variables. Centering with respect to the grand mean
allows us to interpret the intercept as the expected outcome for a student in
classroom j who is at the overall mean (that is, the mean across all students in
the sample) of all the explanatory variables. Grand mean centering has the
same effects and is done for the same reasons in traditional analysis as well as
in multilevel analysis: The intercept is the only change.
Centering around grand mean is a simple transformation of the original
scores, and only constant terms are involved. On the other hand, centering
with respect to the group mean allows us to interpret the intercept as the
expected outcome for a student in classroom j whose covariate values are
equal to the classrooms j means (that is, the mean across all students in the
classroom j). This is done when the researcher is particularly interested in
separating the between-group and the within-group components from the
total variation to investigate how groups (contexts) affect the student per-
formances, explicitly accounting for the group structure into the model
(Cronbach and Webb 1975). This will be further investigated later in the
article.
In this way, centering within context is not a simple transformation of the
original scores, so the statistical model is no longer the same. Indeed, we
introduce a new variable. This is what we are going to show in the next
section.
Paccagnella / MULTILEVEL MODELS 71

CHANGES IN MODEL SPECIFICATION


ADOPTING A CENTERING SOLUTION

Kreft, de Leeuw, and Aiken (1995) provided a precise definition of the


concept of equivalence of models with different centering methods. Two
models are defined as equivalent if they generate the same fit, the same pre-
dicted values. However, equivalence does not mean that models give the
same parameter estimates. Nevertheless, considering the expected value and
the covariance matrix of the dependent variable, two different models are
equivalent if they produce the same expectation and variance.
The grand mean centering solution (X&& ij = Xij X) does not change the
model but only the values of some parameters. Grand mean centering leads
only to a reparameterization of Model 2. In the sense of Kreft, de Leeuw, and
Aiken (1995), the two models are equivalent. Indeed, the conditional
expected value is

E(Yij|I) = (m + q X) + q X&&ij + (g + h X)Zj

+ h Zj X&&ij = m* + q* X&&ij + g*Zj +h*Zj X&&ij, (5)

and the relationships

m* = (m + qX) q* = q

g* = (g + h X) h* = h

follow. The conditional variance becomes

V(Yij|I) = w + 2l(X&&ij + X) + t(X&&ij + X)2 + x = (w + tX2

+ 2l X) + 2(l + t X)X&&ij + t X&&2ij + x = w* + 2l*X&&ij + t*X&&2ij + x, (6)

with

w* = (w + t X2 + 2l X)

l* = l + t X

t* = t.
72 EVALUATION REVIEW / FEBRUARY 2006

We point out the condition

1
l>- ( w + t X 2 )
2 X

to ensure w* > 0.
~ ~
The group mean centered score X ij is defined as X ij = Xij Xj, and we
~
proceed as above. Replacing Xij = X ij + Xj in Equations 3 and 4 and re-
arranging the terms, we get
~
E(Yij|I) = (m + q Xj) + q X ij + (g + h Xj)Zj
~ ~ ~
+ h Zj X ij = m*j + q* X ij + g*jZj + h*Zj X ij (7)
~
V(Yij|I) = (w + t X2j + 2l Xj) + 2(l + t Xj) X ij
~ ~ ~
+ t X 2ij + x = w*j + 2l*j X ij + t* X 2ij + x. (8)

Unlike Equations 5 and 6, we cannot find relationships for inverting the


parameters. The grand mean is a unique value to add to the original coeffi-
cients, whereas the group mean is not. The group mean varies over groups,
and some parameters of group meancentered models depend on the group
mean itself. These findings imply that the parameters in a model that uses
group meancentered scores are no longer simple functions of the parame-
ters in a model using raw score predictors: In the sense of Kreft, de Leeuw,
and Aiken (1995), the two models are not equivalent.
The model equivalence cannot be achieved adopting a random intercept
specification because the conditional expectation still depends on Xj. The
only case that may generate two equivalent models is when group means are
all equal over groups2; that is, Xj = X for all j.
We may obtain different results introducing group means as Level 2 pre-
dictors. Model 1 becomes

Yij = aj + bj Xij + eij

aj = m + g Zj + d Xj + uj

bj = q + h Zj + nj, (9)

and in a compact form

Yij = m + q Xij + g Zj + d Xj + h ZjXij + uj + njXij + eij. (10)


Paccagnella / MULTILEVEL MODELS 73

The error terms eij and (uj, nj) are defined as above. The conditional vari-
ance of Yij does not change with respect to Equation 4, whereas the condi-
tional expected value becomes

E(Yij|I) = m + qXij + gZj + dXj + hZjXij. (11)

~
Replacing Xij = X ij + Xj in Equation 11 and rearranging, we get
~
E(Yij|I) = m + q X ij + (g + h Xj)Zj + (q + d)Xj
~ ~ ~
+ h Zj X ij = m* + q* X ij + g*jZj + d*Xj + h*Zj X ij.

Again, the parameters in the group meancentered model are not simple
functions of the parameters in the raw score model: The g* coefficient
depends on Xj, even if for all other parameters we are able to give inverse
relationships. However, if the slope parameter bj in Equation 9 does not
account for any group variables Zj (that is, h = 0), all relationships may be
inverted.
The conditional variance of Yij in Model 10 with the group mean
centering solution is equal to Equation 8. As above, we can invert uniquely
the coefficients only in the case of a random intercept model specification.
Apart from the specification of a random intercept model without the
cross-level term, the group meancentering solution in Model 10 does not
provide the model equivalence in the sense of Kreft, de Leeuw, and Aiken
(1995). However, random intercept models are frequently applied in multi-
level analyses.
Because deviation scores from the grand mean are not interesting, for the
rest of this article, the term centering will be applied only to the difference
~
from the group mean (that is, X ij = Xij Xj).

WHY A CENTERING SOLUTION?


THE ASSESSMENT OF GROUP EFFECTS

In a previous section, we argued that the total variation can be decom-


posed in a between-group and a within-group component by means of a
group-centering transformation. Why should we be interested in separating
these group effects?
In social sciences, it is often the case that individuals belonging to the
same group tend to behave similarly. Observations in the same group are
74 EVALUATION REVIEW / FEBRUARY 2006

generally more similar than are observations from other groups. Currently,
the debate among economists, sociologists, and psychologists is about the
possible explanations for such evidence. Manski (1995) discussed three
hypotheses to explain how groups (or societies, or environments) may affect
individuals: endogenous, contextual, and correlated effects.3 He proposed
the following example. Studying high school achievements of students, we
may be interested in the relationship between an individual achievement y, an
individual observed characteristic x (e.g., socioeconomic status), and an indi-
vidual unobserved ability u. The model is completely characterized for each
student by a value for the variables (y, x, u). The variables (x, u) determine y
through a linear model:

y = a + bE(y) + gx + dE(x) + u, (12)

where (a, b, g, d) is a vector of parameters (at this stage, the identification


problem does not matter). Manski defined the following:

Endogenous effects: The individual behavior tends to vary with the prevalence
of that behavior in the group. In the above example of high school achievement
of students, this effect arises when individual achievement y tends to vary with
the average achievement of all students in the group (or class, or school) E(y). In
Equation 12, an endogenous effect appears if b 0.
Contextual effects: The individual behavior tends to vary with the distribution of
background characteristics in the group. There is a contextual effect when the
propensity of an individual to behave in some ways varies with the mean of ex-
ogenous variables. In the example of Model 12, these exogenous variables are
the characteristics x. Parameter g expresses the direct effect of x on y, whereas
parameter d (if different from zero) expresses the contextual effect on y.
Correlated effects: Individuals tend to behave similarly because they have simi-
lar individual characteristics in the group (i.e., ability, propensity of learning) or
become part of similar institutional environments (i.e., students are in the same
class, with the same teachers, etc.). In Model 12, correlated effect is expressed
by u variable.

Manski referred to the so-called reflection problem that arises when a


researcher observes the distribution of behavior in a population and wishes to
infer whether the average behavior in some group influences the behavior of
the individuals that compose the group (p. 129).
Our aim is to adapt such classification to a multilevel (and centering
approach) framework. Clearly, the multilevel solution is not the sole statisti-
cal instrument available for studying how group membership affects individ-
ual behavior (Aitkin and Longford 1986; Kreft and de Leeuw 1998). We can
Paccagnella / MULTILEVEL MODELS 75

adopt other approaches, such as the Cronbach model (Cronbach and Webb
1975), the ANOVA model (Aitkin and Longford 1986), the contextual model
(Iversen 1991), and the aggregate regression (Kreft and de Leeuw 1998).
However, the multilevel solution is the most flexible available thus far: It
allows decomposition of the total variation in an individual component and a
group component, as well as in a between-group component and a within-
group component, explicitly accounting for the group structure in the model.
We consider a random intercept multilevel model with group means as
Level 2 regressors but without any cross-level term to guarantee the model
equivalence as shown in the previous section:

Yij = m + q Xij + d Xj + g Zj + uj + eij. (13)

The parameter q expresses the direct effect of Xij on Yij. According to the
above definitions, we observe that such model specification does not allow
for any endogenous effect. If d 0, Model 13 expresses contextual effects
because Yij varies with the mean of the exogenous variable Xij among individ-
uals in the jth group. The correlated effect is accounted for by the term (g Zj +
uj). The Zj variable describes the observed group characteristics, whereas uj
represents the unobserved group effects. Model 13 expresses correlated
effects through the variance of uj and the parameter g, if it is different from
zero.
Distinction between contextual and correlated effects is important
because each of these effects is different and has different consequences for
the prediction of social interactions.
Burstein (1980) proposed the following example. A group mean ability
may be expected to affect instructional practices by causing teachers in the
schools to adjust their instructions to the level of the group. We may expect
that individuals within the group (or school) learn more or less than they
would have in other groups (or schools). Consequently, individuals learn
more when they have a high ability (individual effect) and when they learn in
a group with a generally high ability because teachers may offer a higher
level of instruction. Burstein defined the effect due to the behavior of teachers
as contextual. So, he proposed the ability group mean as a proxy of it.
Burstein estimated a contextual effect because he specified the average abil-
ity of the students in the class, but its real purpose was to assess a correlated
effect, wherein all students are affected by the same instructional practices.
An interesting application of contextual analysis and the centering
approach is the estimation of frog pond effect, discussed by Davis (1966).
Individuals in every group could use some group characteristics or properties
as a point of comparison. Their success is judged by their relative standing in
76 EVALUATION REVIEW / FEBRUARY 2006

the group because they can choose to be big frogs in little ponds or little
frogs in big ponds (p. 21). Frog pond effect measures a relative (as regards
the group) characteristic of the individual. It is typically calculated by the
impact of the within component Xij Xj (Burstein 1980). So, according to
personal beliefs and study purposes, researchers may specify a model such as

Yij = a1 + b1Xij + g1(Xij Xj) + eij,

or

Yij = a2 + b2(Xij Xj) + g2Xj + eij.

The differences are only in the magnitude of the parameters for the individual
and the contextual effects.
Firebaugh (1978) introduced the X rule for detecting cross-level bias.
Cross-level inferences arise when the researcher studies relations at one level
from analyses conducted at another level. This may be either downward (the
ecological fallacy; Robinson 1950), when relations determined between
aggregated data are translated to individual data, or upward (the individual-
istic fallacy; Alker 1969), when the researcher uses individual-level data to
find relationships about aggregate-level variables. Here, cross-level bias is
meant as downward fallacy. The justifications of the use of cross-level infer-
ence involve practical rather than theoretical considerations. It may happen
that for inferring about individuals, there is prevalence of readily accessible
and easily analyzable aggregated data, or simply individual-level data are
unavailable (e.g., census or voting data). In a simple model such as

Yij = a + b Xij + g Xj + e,

where e is a generic zero mean error term with constant variance, Firebaugh
showed that cross-level bias is absent when and only when g = 0.
Hofmann and Gavin (1998) used four paradigms (called incremental,
mediational, moderational, and separate paradigms) as a description
of the way in which multilevel research has been conducted within the orga-
nizational sciences. They focused on the different types of hierarchical mod-
els that have been investigated within the organizational sciences, discussing
how the centering decision relates to these different models: They pointed
out that centering options should be chosen carefully and thoughtfully, with
a view less towards the statistical differences and more towards the concep-
tual questions under investigation (p. 639).
Paccagnella / MULTILEVEL MODELS 77

STATISTICAL REASONS FOR A CENTERING SOLUTION

According to the previous discussions, centering or not centering the vari-


ables does not seem necessary to detect contextual and correlated effects: A
model such as Model 13, that is, a raw score model, could already be used for
this purpose.
Raudenbush (1989a) gave statistical reasons for preferring a centering
solution. He considered the model

Yij = aj + b Xij + g Xj + uj + eij. (14)

As indicated also by Aitkin and Longford (1986), this model often suffers
from high collinearity, so it performs poor precision estimates. For this rea-
son, Raudenbush suggested specifying deviations from the group mean:

Yij = mj + bW(Xij Xj) + bBXj + uj + eij, (15)

where suffixes W and B mean within and between, respectively. How-


ever, Model 15 is simply a reparameterization of Model 14 because it may be
written as

Yij = mj + bWXij + (bB bW)Xj + uj + eij,

and this leads to the inverse formulae

aj = m j

b = bW

g = bB bW .

The first advantage of Model 15 is that individual variable Xij Xj and


group mean Xj are uncorrelated. This implies that estimates are more accu-
rate than in Model 14. Then, testing the difference between bB and bW means
testing the presence of contextual effects. In Model 14, this is equal to the test
of the significance of the g parameter, but the correlation between Xij and Xj
may affect the result. If the estimated standard error of g$ is high, the test could
lead to an incorrect conclusion for accepting the null hypothesis.
Rettore and Martini (2001) studied the introduction of group mean Xj as a
Level 2 variable when there are reasons to believe that observed individual
78 EVALUATION REVIEW / FEBRUARY 2006

characteristics (Xij) and unobserved group characteristics (uj) are correlated.


In a simulation study with a model such as Model 14 but without the intercept
coefficient, they found that the estimates of bW and the variance of uj are un-
biased, whereas g is a bit underestimated (because Xj is only a sample coun-
terpart of the true but unknown expected value of the mean of the group).
Kreft and de Leeuw (1998) generalized Raudenbushs results. They con-
sidered a model like Model 10, and they showed that the centering approach
is a good solution when there is multicollinearity in this model. They ex-
plained that when the variables are expressed as a deviation from the group
mean, these are uncorrelated with all Level 2 variables. This is also true for
the relationship between the cross-level interactions of the deviation score
individual variable and any Level 2 variables. Indeed, they showed that in
some cases, a cross-level interaction is the first source of instability of the
model, hence the first reason of poor precise estimates.
Using data from the National Education Longitudinal Study of 1988
(NELS-88), Kreft and de Leeuw (1998) evaluated the merits of the private
school sector compared to the public school sector. They found that in the raw
score model, the effect of the public sector is moderately negative, whereas it
is strongly negative when a centering model without group means as
regressors is specified. Estimating a centering model with group means as
Level 2 variables, the effect of the public sector becomes positive. Centering
stabilizes the model because it removes correlations. The estimates are there-
fore more accurate.

SOME EXAMPLES:
STUDYING SCHOOL EFFECTIVENESS

The multilevel centering solution plays an important role in school effec-


tiveness analysis. Indeed, the term school effectiveness has come to be used
for describing educational studies concerned with exploring differences
within and between schools: Its principal aim is the study of the factors that
explain school differences, determining which practices are related to their
effectiveness and assessing the magnitude and stability of school contribu-
tions to student outcome (Aitkin and Longford 1986; Goldstein 1997).
A school effect is defined as the extent to which attending a particular
school modifies student performance (Raudenbush and Willms 1995). The
main distinction is between Type A and Type B effect. Type A effect is the
difference between a students actual performance and the performance that
he or she would have experienced attending a typical school. The notion of
typical school can be explained by this example: A block of J students of
Paccagnella / MULTILEVEL MODELS 79

identical backgrounds and aptitudes are assigned at random to the J schools


under evaluation. Only student characteristics are taken into account. Type A
effect cannot explain whether that school effectiveness derives from the prac-
tice of its staff, from its student composition, or from the influence of the
social and economic context of the community in which the school is located.
Type B effect is the difference between a students actual performance in a
particular school and the performance that he or she would have experienced
attending a school with identical context. Let us give an example: J schools,
having identical contexts, are first assigned to treatment levels that vary in
terms of practice. Next, a block of J students of identical backgrounds and
aptitudes are assigned at random to these J schools. Type B effect is designed
to isolate the effects of a school practice (administrative leadership, curricu-
lar content, use of resources, classroom instructions, etc.) from the school
context (school factors that are exogenous to the practices of the school
administrators and teachers). Student characteristics and social and eco-
nomic context variables are taken into account.
Both contextual and correlated effects, as defined above, are Type B
effects. In such a way, a multilevel centering approach may be used when a
research or policy question asks to evaluate Type B effects. However, contex-
tual effects relate to the composition of the classrooms or schools (e.g., crite-
ria for admitting students), whereas correlated effects account for the prac-
tices of the schools (e.g., teacher background or school characteristics):
Finding a contextual rather than a correlated effect may give more detailed
information than a generic Type B effect and useful suggestions regarding
policy maker decisions.
An interesting example of the relationship between multilevel centering
approaches for detecting Type B effects and policy maker decisions is given
by Koretz and Berends (2001), who examined trends in American high
school grades between 1982 and 1992. Colleges rely heavily on test scores
and grades in selecting students for admission; however, many researchers
argue that grades in both secondary and postsecondary institutions have
become inflated in recent years. Some observers maintain that the shift
has been so substantial that grade point averages from some schools are no
longer useful to selective postsecondary schools attempting to identify able
students.
Comparing two nationally representative longitudinal databases, the High
School and Beyond (HSB) and the NELS-88, Koretz and Berends (2001)
analyzed changes in the grade distribution over time, looking at the concomi-
tant changes in the educational system and in the characteristics of the stu-
dent population that might have contributed to the trends. The HSB and
80 EVALUATION REVIEW / FEBRUARY 2006

NELS test batteries have been linked by both the Educational Testing Service
and RAND, and the linking of these test batteries is sufficiently strong to jus-
tify using the linked scores as a basis for judging changes in grading stan-
dards. However, these two mathematical tests are not equivalent, so differ-
ences between them have to be taken into account in the findings of any
analysis.
For their study, Koretz and Berends (2001) applied a linear multilevel
model with 11 student-level variables and the school means of each of
the individual variables. However, none of the student-level variables were
centered on their school means: A model like Model 14 had been speci-
fied, but this model may perform poor precision estimates because of high
collinearity.
Although Koretz and Berends (2001) found no substantial grade inflation
between 1982 and 1992, at least in mathematics, comparing HSB and NELS
data gave the following interesting result: The multilevel model estimated in
NELS shows the same general patterns as the comparable model in HSB, but
the size of some estimates differ, and the authors explained that for some vari-
ables, the cause of these changes is unclear. Other than differences in HSB
and NELS tests, another explanation for these results could be the collinear-
ity among some variables in the model.
Berends et al. (2002) assessed the effect of New American Schools (NAS)
designs implemented in a sample of schools in an American high-poverty
district during the 1997-1998 school year. NAS is a private, nonprofit organi-
zation that in 1991 launched an ambitious effort for whole-school reform to
address the common perception that American schools were failing students
(particularly in high-poverty settings) and that the piecemeal reform efforts
had produced so few meaningful improvements in the nations educational
system. Berends et al. investigated the impact of NAS designs and classroom
practices through a comparison of conditions within NAS and non-NAS
classrooms, studying how differences in institutional conditions were related
to student achievements, the net of other student, classroom, and school
factors.
They defined a linear multilevel model in which the dependent variable
was the fourth-grade Texas Assessment of Academic Skills reading and
mathematics scores. A wide set of student-level variables were available for
the analysis, and all of these were centered on their classroom-level means;
some teacher background and school characteristics as well as the classroom
means of the student-level variables were also included in the model. No
cross-interaction was taken into account. They found that NAS designs had
no significant effects on student achievement, but this was expected because
Paccagnella / MULTILEVEL MODELS 81

schools and classroom were at the early stages in the implementation of the
designs. Some contextual and correlated effects were statistically significant,
but a very interesting result was the effect of the class gender composition:
Classes with more boys than girls tended to have lower average reading
scores, whereas at the student level, no difference in performance between
boys and girls appeared.
Biggeri, Bini, and Grilli (2001) studied the individual factors (graduates
characteristics) that determine the transition from university to work and
assessed the differences between course programs and universities with
respect to the labor market outcomes of their graduates in an Italian case;
courses and universities were compared through an analysis of the residuals
(Goldstein and Healy 1995). A three-level model was estimated, but the indi-
vidual variables were not centered on any means, and no group means or any
university characteristics were introduced. Because the aim of the study was
on individual effects (Type A effects), centered variables were not needed in
this model specification, and the effects of course programs and universities
were accounted only by the two-level and three-level variance terms. The
analysis of the residuals did not require the estimation of any group variables.

CRITIQUES TO THE CENTERING APPROACH

The most important critiques to the centering approach are given by


Longford and Plewis. These appeared in Raudenbushs discussion about
centering methodology in multilevel regression in a special issue of the
Multilevel Modelling Newsletter.
From a methodological point of view, Longford (1989a) maintained that
in some analyses, we are not able to assess the quality of the group mean Xj as
a proxy of the context, especially when the size of the sample at Level 1 is
small.
From a statistical point of view, Longford (1989b) focused the attention
on the variance form in the random slope model. For the raw score model, as
in Equation 2 or 10, the random part is equal to

uj + nj Xij + eij, (16)

whereas in a model with group mean deviation scores, the random part is

uj + nj(Xij Xj) + eij. (17)


82 EVALUATION REVIEW / FEBRUARY 2006

These two random parts may correspond to substantially different mod-


els. Whereas in Equation 16 the variance of Yij is a quadratic function of Xij, in
Equation 17, the variance is a function of deviation from the mean (see also
Equations 4 and 8). The context plays an important role in the variance
heterogeneity with this specification. If there are doubts about the quality of
Xj for representing contextual effects (again, for instance, with small sample
sizes), this could become a problem.
In his critique to Raudenbushs results, Plewis (1989) warned researchers
of routinely adopting the centering specification in multilevel models, and he
listed three important reasons. First, he pointed out that it is not true that the
correlation between Xij and Xj is often high. It can happen but not so fre-
quently as claimed by Raudenbush. For estimate precision, the problem of a
small sample size is probably more serious. Second, in Model 15, an implicit
assumption is that Xij Xj is the best predictor for Yij. Plewis argued that Xij
should be the best predictor for Yij, whereas Xij Xj should be the best predic-
tor for Yij Yj. Researchers risk inserting spurious variation at Level 2 for Yij
simply because the Level 1 model is not properly specified. But van den
Eeden and Httner (1982) suggested that Model 15 may be interpreted as

(Yij Yj) = bW(Xij Xj) + eij,

where

Yj = mj + bBXj + uj.

Third, Plewis (1989) noted that the mean is not the only available variable
for measuring contextual effects, but there are also, for example, the group
median, the within-group standard deviation, and the coefficient of variation.
However, Raudenbush (1989b, 9) replied that the effects of the median of Xij
may be similar, though in most cases the median will be less stable than the
mean; also the median, unlike Xj, is not orthogonal to Xij Xj.
The clustering approach could lead to an incorrect assessment of contex-
tual and correlated effects if the nature of the groups is not properly
accounted. As explained by Robinson (1950) and Hannan and Burstein
(1974), researchers may involve natural groups (e.g., neighborhoods or the
whole population) or arbitrary-created regions (e.g., census tracts or
administrative units). Individuals in arbitrary-created groups tend to be
more homogeneous than are those in natural groups. In the former, grouping
is created by criteria that may be correlated with many social characteristics
Paccagnella / MULTILEVEL MODELS 83

(e.g., race, age, literacy, etc.). Therefore, membership does not tend to be ran-
dom with respect to those characteristics. In the latter, units tend to interact
with each other and share relevant life experience, so random group effects
are most likely.

CONCLUSIONS

The centering approach is an important tool in multilevel analysis. Unfor-


tunately, there is not a rule of thumb for introducing or not deviating scores
from the mean and group mean in a multilevel model. When choosing such
model specification, it is important to take into account the aims of the analy-
sis. What are the research/policy questions of the study?
When the research interests focus on individual effects, a model without
individual-centered variables and without any group mean variables may be
specified. Group effects are accounted only by the variance term, and vice
versa, if the purpose of the analysis is to distinguish group-level effects (con-
textual or correlated) from individual-level characteristics (e.g., studying
school effectiveness, are differences in instructional conditions related to stu-
dent achievements?), the choice of a model with centered variables and group
means as predictors is preferable.
Models without group means as Level 2 regressors are more parsimonious
in terms of number of parameters to estimate. However, the models in center-
ing form yield faster rates of convergence than do those not in a centering
form.
If the multilevel collinearity is high, the estimation procedure may yield
poor precise estimates in a model not in centering form. A centering ap-
proach removes collinearity, stabilizes the model, and avoids this problem.
However, the group meancentering solution may change the raw score
model of the analysis (the lack of model equivalence), particularly when
cross-level interactions are specified and group means are not introduced as
Level 2 regressors. Adopting a centering solution, the researcher may risk
estimating a different model from the one thinking of the analysis.

NOTES

1. Kreft, de Leeuw, and Aiken (1995) investigated the centering issue by means of a random
slope model without any group variables. Kreft and de Leeuw (1998) discussed these
84 EVALUATION REVIEW / FEBRUARY 2006

relationships in models with group variables, but their proofs are a bit different from those of
Kreft, de Leeuw, and Aiken and from ours.
2. This may happen in repeated measures analysis.
3. Defining a single-group effect is probably restrictive. Groups are not one-dimensional
but may have several distinguishable characteristics and properties. Firebaugh (1978) talked
about a composite group effect because the aim is to study the overall impact of all group char-
acteristics on Yij.

REFERENCES

Aitkin, M., and N. T. Longford. 1986. Statistical modelling issues in school effectiveness studies
(with discussion). Journal of the Royal Statistical Society A 149 (1): 1-43.
Alker, H. R. 1969. A tipology of ecological fallacy. In Quantitative ecological analysis in the
social sciences, ed. M. Dogan and S. Rokkan, chap. 4, 69-86. Cambridge, MA: MIT Press.
Berends, M., J. Chun, G. Schuyler, S. Stockly, and R. J. Briggs. 2002. Challenges of conflicting
school reforms: Effects of new American schools in a high-poverty district. MR-1483-EDU.
Santa Monica, CA: RAND.
Biggeri, A., M. Bini, and L. Grilli. 2001. The transition from university to work: A multilevel
approach to the analysis of the time to obtain the first job. Journal of the Royal Statistical
Society A 164 (2): 293-305.
Burstein, L. 1980. The analysis of multilevel data in educational research and evaluation. Review
of Research in Education 8:158-233.
Cronbach, L. J. 1976. Research in classrooms and schools: Formulation of questions, designs
and analysis. Occasional paper. Stanford, CA: Stanford Evaluation Consortium.
Cronbach, L. J., and N. Webb. 1975. Between-class and within-class effects in a reported apti-
tude treatment interaction: Reanalysis of a study by G. L. Anderson. Journal of Educa-
tional Psychology 67 (6): 717-24.
Davis, J. A. 1966. The campus as a frog pond: An application of the theory of relative deprivation
to career decisions of college men. American Journal of Sociology 72:17-31.
Firebaugh, G. 1978. A rule for inferring individual-level relationships from aggregate data.
American Sociological Review 43:557-72.
Goldstein, H. 1997. Methods in school effectiveness research. School Effectiveness and School
Improvement 8 (4): 369-95.
. 2003. Multilevel statistical models. 3rd ed. London: Edward Arnold.
Goldstein, H., and M. J. R. Healy. 1995. The graphical presentation of a collection of means.
Journal of the Royal Statistical Society A 158 (1): 175-77.
Hannan, M. T., and L. Burstein. 1974. Estimation from grouped observations. American Socio-
logical Review 39:374-92.
Hofmann, D. A., and M. B. Gavin. 1998. Centering decisions in hierarchical linear models:
Implications for research in organizations. Journal of Management 24 (5): 623-41.
Iversen, G. 1991. Contextual analysis. Newbury Park, CA: Sage.
Koretz, D., and M. Berends. 2001. Changes in high school grading standards in mathematics,
1982-1992. MR-1445-CB. Santa Monica, CA: RAND.
Kreft, I. G. G., and J. de Leeuw. 1998. Introducing multilevel modeling. London: Sage.
Kreft, I. G. G., J. de Leeuw, and L. Aiken. 1995. The effect of different forms of centering in hier-
archical linear models. Multivariate Behavioral Research 30 (1): 1-21.
Paccagnella / MULTILEVEL MODELS 85

Longford, N. T. 1989a. Contextual effects and group means. Multilevel Modelling Newsletter 1
(3): 5, 11.
. 1989b. To center or not to center. Multilevel Modelling Newsletter 1 (3): 7, 11.
Manski, C. F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard
University Press.
Plewis, I. 1989. Comment on centering predictors in multilevel analysis. Multilevel Modelling
Newsletter 1 (3): 6, 11.
Raudenbush, S. W. 1989a. Centering predictors in multilevel analysis: Choices and conse-
quences. Multilevel Modelling Newsletter 1 (2): 10-12.
. 1989b. A response to Longford and Plewis. Multilevel Modelling Newsletter 1 (3): 8-11.
Raudenbush, S. W., and J. Willms. 1995. The estimation of school effects. Journal of Educa-
tional and Behavioral Statistics 20 (4): 307-35.
Rettore, E., and A. Martini. 2001. Constructing league tables of service providers when the per-
formance of the provider is correlated to the characteristics of the clients. Proceedings of the
SIS Meeting: Processes and Statistical Methods of Evaluation, 159-62.
Robinson, W. S. 1950. Ecological correlations and the behavior of individuals. American Socio-
logical Review 15:351-57.
Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis: An introduction to basic and
advanced multilevel modeling. London: Sage.
van den Eeden, P., and H. J. M. Httner. 1982. Multi-level research. Current Sociology 30 (3): 1-
181.

Omar Paccagnella, Ph.D., is a research officer in the Department of Economics, University of


Padua, Italy. His research activities focus on policy evaluations, survey designs (SHARE pro-
ject), and multilevel models.

Anda mungkin juga menyukai