8) Multilevel Analysis

Advanced quantitative research methods
Multilevel Analysis
Prof. Getu Degu

Bahir Dar University
February 2017
Multilevel Analysis
Once you know that hierarchies

exist, you see them everywhere.
Kreft and Deleeuw (1998)
2
Multilevel Analysis
The term multilevel refers to a hierarchical or nested data structure,
usually subjects within organizational groups, but the nesting may also
consist of repeated measures within subjects, or respondents within
clusters, as in cluster sampling.
The multilevel regression model has become known in the research
literature under a variety of names, such as random coefcient model,
random-effects model, variance component model, hierarchical linear
model and nested models. Statistically oriented publications tend to refer to
the model as a mixed-effects or mixed model.
The expression multilevel model is used as a generic term for all models for
nested data.
Multilevel analysis is used to examine relations between variables
measured at different levels of the multilevel data structure.
The units of analysis are usually individuals (at a lower level) who are
nested within contextual/aggregate units (at a higher level).
3
Multilevel Analysis
In multilevel research, the data structure in the
population is hierarchical, and the sample data are a
sample from this hierarchical population.
Thus, in educational research, the population

consists of schools and pupils within these schools,
and the sampling procedure often proceeds in two
stages: First, we take a sample of schools, and next
we take a sample of pupils within each school.
One should keep rmly in mind that the central

statistical model in multilevel analysis is one of
successive sampling from each level of a
hierarchical population. 4
Types of models
Before conducting a multilevel model analysis, a
researcher must decide on several aspects, including
which predictors are to be included in the analysis.
The researcher must decide whether parameter

values (i.e., the elements that will be estimated) will
be fixed or random.
The researcher must decide whether to employ ML

or REML estimation type.
Centering (raw metric scaling, grand mean centering,

group mean centering)
5
Random intercepts model
A random intercepts model is a model in which
intercepts are allowed to vary, and therefore, the
scores on the dependent variable for each
individual observation are predicted by the
intercept that varies across groups.
This model assumes that slopes are fixed (the

same across different contexts).
6
Random slopes model
A random slopes model is a model in which
slopes are allowed to vary, and therefore, the
slopes are different across groups.
This model assumes that intercepts are fixed

(the same across different contexts).
7
Random intercepts and slopes model
A model that includes both random intercepts and
random slopes is likely the most realistic type of
model, although it is also the most complex.
In this model, both intercepts and slopes are allowed
to vary across groups, meaning that they are different
in different contexts.
This model assumes that each group has a different
regression model - with its own intercept and slope.
Because groups are sampled, the model assumes that
the intercepts and slopes are also randomly sampled
from a population of group intercepts and slopes.
8
AGGREGATION AND DISAGGREGATION
In multilevel research, variables can be dened at any level
of the hierarchy. Some of these variables may be measured
directly at their own natural level; for example, at the
school level we may measure school size, and at the pupil
level we may measure intelligence.
In addition, we may move variables from one level to
another by aggregation or disaggregation.
Aggregation means that the variables at a lower level are
moved to a higher level, for instance by assigning to the
schools the school mean of the pupils intelligence scores.
Disaggregation means moving variables to a lower level,
for instance by assigning to all pupils in the schools a
variable that indicates the denomination of the school they
belong to. 9
The lowest level (level 1) is usually dened by the
individuals. However, this is not always the case.
At each level in the hierarchy, we may have

several types of variables. The distinctions made
in the following are based on the typology offered
by Lazarsfeld and Menzel (1961).
In this typology (classification), we distinguish

between global, structural and contextual
variables (Hox, 2010).
10
Global variables are variables that refer only to the level at which
they are dened, without reference to other units or levels. A pupils
intelligence or gender would be a global variable at the pupil level.
School size would be a global variable at the school level. A global
variable is measured at the level at which that variable actually
exists.
Structural variables are operationalized by referring to the sub-units

at a lower level. They are constructed from variables at a lower level,
for example, in dening the school variable mean intelligence as
the mean of the intelligence scores of the pupils in that school.
Using the mean of a lower-level variable as an explanatory variable

at a higher level is a common procedure in multilevel analysis.
It is clear that constructing a structural variable from the lower-level

data involves aggregation.
11
Contextual variables, on the other hand, refer to the super-
units; all units at the lower level receive the value of a
variable for the super-unit to which they belong at the
higher level.
For instance, we can assign to all pupils in a school the

school size, or the mean intelligence, as a pupil-level
variable.
This is called disaggregation; data on higher-level units are

disaggregated into data on a larger number of lower-level
units.
The resulting variable is called a contextual variable,

because it refers to the higher-level context of the units we
are investigating. 12
Historically, multilevel problems have led to
analysis approaches that moved all variables by
aggregation or disaggregation to one single level
of interest followed by an ordinary multiple
regression, analysis of variance, or some other
standard analysis method.
However, analyzing variables from different

levels at one single common level is inadequate,
and leads to two distinct types of problems.
13
The rst problem is statistical. If data are aggregated, the result is
that different data values from many sub-units are combined into
fewer values for fewer higher-level units. As a result, much
information is lost, and the statistical analysis loses power.
On the other hand, if data are disaggregated, the result is that a few
data values from a small number of super-units are blown up into
many more values for a much larger number of sub-units. Ordinary
statistical tests treat all these disaggregated data values as
independent information from the much larger sample of sub-units.
The proper sample size for these variables is of course the number of
higher-level units. Using the larger number of disaggregated cases for
the sample size leads to signicance tests that reject the null-
hypothesis far more often than the nominal alpha level suggests.
In other words: investigators come up with many signicant results
that are totally spurious. 14
The second problem is conceptual. If the analyst is not very careful in the
interpretation of the results, s/he may commit the fallacy of the wrong level,
which Consists of analyzing the data at one level, and formulating
conclusions at another level.
Probably the best-known fallacy is the ecological fallacy, which is
interpreting aggregated data at the individual level. It is also known as the
Robinson effect after Robinson (1950).
Robinson presents aggregated data describing the relationship between the
percentage of blacks and the illiteracy level in nine geographic regions in
1930.
The ecological correlation, that is, the correlation between the aggregated
variables at the region level, is 0.95. In contrast, the individual-level
correlation between these global variables is 0.20.
Robinson concludes that in practice an ecological correlation is almost
certainly not equal to its corresponding individual-level correlation.
Formulating inferences at a higher level based on analyses performed at a
lower level is just as misleading. This fallacy is known as the atomistic
15
fallacy.
WHY DO WE NEED SPECIAL MULTILEVEL ANALYSIS TECHNIQUES?
A multilevel problem concerns a population with a hierarchical structure. A
sample from such a population can be described as a multistage sample:
First, we take a sample of units from the higher level (e.g., schools), and
next we sample the sub-units from the available units (e.g., we sample
pupils from the schools).
In such samples, the individual observations are in general not completely
independent. For instance, pupils in the same school tend to be similar to
each other, because of selection processes and because of the common
history the pupils share by going to the same school.
As a result, the average correlation between variables measured on pupils
from the same school will be higher than the average correlation between
variables measured on pupils from different schools.
Standard statistical tests lean heavily on the assumption of independence of
the observations.
If this assumption is violated (and in multilevel data this is almost always
the case) the estimates of the standard errors of conventional statistical tests
are much too small, and this results in many spuriously signicant results.
16
The Basic Two-Level Regression Model : example
Assume that we have data from J classes, with a different number of
pupils nj in each class.
On the pupil level, we have the outcome variable popularity (Y),

measured by a self-rating scale that ranges from 0 (very unpopular) to
10 (very popular).
Also, we have two explanatory variables on the pupil level: pupil

gender (X1: 0=boy, 1= girl) and pupil extroversion (X 2, measured on
a self-rating scale ranging from 1 to 10).
We have one class level explanatory variable teacher experience (Z:

in years, ranging from 2 to 25).
There are data on 2000 pupils in 100 classes, so the average class size
is 20 pupils.
j
17
The Basic Two-Level Regression Model
To analyze these data, we can set up separate regression

equations in each class to predict the outcome variable Y
using the explanatory variables X as follows:
Yij = 0j + 1jX1ij + 2jX2ij + eij
Using variable labels instead of algebraic symbols, the equation reads:

Popularityij = 0j + 1j genderij + 2j extroversionij + eij.
18
In this regression equation, 0j is the intercept, 1j is the regression
coefcient (regression slope) for the dichotomous explanatory
variable gender, 2j is the regression coefcient (slope) for the
continuous explanatory variable extroversion, and eij is the usual
residual error term.
The subscript j is for the classes ( j = 1 ... J) and the subscript i is for
individual pupils (i = 1 ... nj). The difference with the usual regression
model is that we assume that each class has a different intercept
coefcient oj, and different slope coefcients 1j and 2j.
Since the intercept and slope coefcients are random variables that
vary across groups (e.g., schools) they are often referred to as
random coefficients.
19
In our example, the specic values for the intercept and the
slope coefcients are a class characteristic. In general, a class
with a high intercept is predicted to have more popular pupils
than a class with a low value for the intercept.
Similarly, differences in the slope coefcient for gender or

extroversion indicate that the relationship between the pupils
gender or extroversion and their predicted popularity is not the
same in all classes.
Some classes may have a high value for the slope coefcient of
gender; in these classes, the difference between boys and girls is
relatively large. Other classes may have a low value for the slope
coefcient of gender; in these classes, gender has a small effect
on the popularity, which means that the difference between boys
and girls is small.
20
The next step in the hierarchical regression model is to
explain the variation of the regression coefcients j
introducing explanatory variables at the class level:
0j = 00 + 01Zj +u0j
The above equation predicts the average popularity in a
class (the intercept 0j ) by the teachers experience (Z).
Thus, if 01 is positive, the average popularity is higher in
classes with a more experienced teacher.
Conversely, if 01 is negative, the average popularity is

lower in classes with a more experienced teacher.
21
We hope to explain at least some of the variation by

introducing higher-level variables.
Generally, we will not be able to explain all the

variation, and there will be some unexplained
residual variation.
22
1j = 10 + 11Zj + u1j
2j = 20 + 21Zj + u2j
The rst equation (given above) states that the relationship, as expressed by the slope
coefcient 1j , between the popularity (Y) and the gender (X) of the pupil depends on the
amount of experience of the teacher (Z).
If 11 is positive, the gender effect on popularity is larger with experienced teachers.
Conversely, if 11 is negative, the gender effect on popularity is smaller with experienced

teachers.

Similarly, the second equation states, if 21 is positive, that the effect of extroversion is
larger in classes with an experienced teacher.
Thus, the amount of experience of the teacher acts as a moderator variable for the
relationship between popularity and gender or extraversion; this relationship varies according
to the value of the moderator variable.
23
The u-terms u0j , u1j and u2j in the above equations are (random)
residual error terms at the class level. These residual errors uj are
assumed to have a mean of zero, and to be independent from the
residual errors eij at the individual (pupil) level.
The variance of the residual errors u0j is specied as u0 , and the

variances of the residual errors u1j and u2j are specied as u1 and
u2, respectively . The covariances between the residual error terms
are denoted by u01, u02 and u12 , which are generally not assumed
to be zero.
Note that in the above equations the regression coefcients are not
assumed to vary across classes. They therefore have no subscript j to
indicate to which class they belong.
24
Because they apply to all classes,
they are referred to as xed coefcients.
The model with two pupil-level and one class-level
explanatory variable can be written as a single complex
regression equation by combining the previous equations.
Therefore, rearranging terms gives:

Yij= 00 + 10X1ij + 20X2ij + 01Zj + 11X1ijZj + 21X2ijZj
+ u1jX1ij + u2jX2ij + u0j + eij
Using variable labels instead of algebraic symbols, we have:

Popularityij= 00 + 10genderij + 20extroversionij +
01experiencej + 11genderij x experiencej +
21extroversionij x experiencej + u1jgenderij +
u2jextroversionij + u0j + eij 25
The segment [00 + 10X1ij + 20X2ij + 01Zj +11X1ijZj + 21X2ijZj]
contains the fixed coefficients. It is often called the xed (or
deterministic) part of the model.
The segment [u1jX1ij + u2jX2ij + u0j + eij ] contains the random
error terms, and it is often called the random (or stochastic)
part of the model.
The terms X1ijZj and X2ijZj are interaction terms that appear
in the model as a consequence of modeling the varying
regression slope j of a pupil-level variable Xij with the
class-level variable Zj .
Thus, the moderator effect of Z on the relationship between
the dependent variable Y and the predictor X is expressed in
the single equation version of the model as a cross level
interaction.
26
An empty (i.e., lacking predictors) pupil level model is
specified first: Yij = + e 0j ij
The outcome variable Y is for individual i nested in

class j is equal to the average outcome in unit j plus an
individual-level error eij.
Because there may also be an effect that is common
to all pupils within the same class, it is necessary to
add a class level error term.
This is done by specifying a separate equation for the
intercept: 0j = 00 + u0j ; where
00 is the average outcome for the population and u0j is a class specific effect.
27
Combining the above equation:

Yij= 00 + u0j + eij
Denoting the variance of eij as e and the variance of
U0j as u , the percentage of observed variation in the
0
dependent variable attributable to class-level

characteristics is found by dividing u by the total
0
variance.
28
The intercept-only model of the equation given
above does not explain any variance in Y. It only
decomposes the variance into two independent
components:
e is the variance of the lowest-level errors eij,
and u0 is the variance of the highest-level errors
u0j.
Using this model, we can dene the intraclass
correlation by the equation:
Intraclass correlation coefficient = =
u0
u0+e
29
The intraclass correlation indicates the proportion of the
variance explained by the grouping structure in the
population.
It simply states that the intraclass correlation is the

proportion of group-level variance compared to the total
variance.
The intraclass correlation can also be interpreted as the

expected correlation between two randomly drawn units
that are in the same group.
A researcher who finds a significant variance component for

u0 may wish to incorporate macro-level variables in an
attempt to account for some of this variation.
30
Why ICC is useful
1) It can help you determine whether or not a linear mixed
model is even necessary. If you find that the correlation is
zero, that means the observations within clusters are no more
similar than observations from different clusters. Go ahead
and use a simpler analysis technique.
2) It can be theoretically meaningful to understand how much

of the overall variation in the response is explained simply by
clustering.
3) It can also be meaningful to see how the ICC (as well as the
between and within cluster variances) change as variables are
added to the model.
31
Example: intercept-only model
(consider the popularity data)
Analyze Mixed models Linear
Move class to the Subjects box and click continue.
Move popular to the Dependent variable box
Click Random and check the Include intercept box. Also,
move class to the Combinations box and click continue.
Click Estimation and check Maximum Likelihood (ML),
continue.
Click Statistics and select Parameter estimates and
Tests for covariance parameters and click continue.
Finally, click OK
32
(some of the outputs)
Information Criteriaa
-2 Log Likelihood 6327.468
Akaike's Information Criterion Estimates of Fixed Effectsa
6333.468
(AIC)
Hurvich and Tsai's Criterion Parameter Estimate Std. df t Sig. 95% Confidence Interval
6333.480
(AICC) Error Lower Upper Bound
Bozdogan's Criterion (CAIC) 6353.270
Bound
Schwarz's Bayesian Criterion
6350.270 Intercept 5.077859 .086956 99.908 58.396 .000 4.905339 5.250379
(BIC)
The information criteria are displayed in a. Dependent Variable: Pupil popularity.
smaller-is-better forms.
a. Dependent Variable: Pupil popularity.
Estimates of Covariance Parametersa
Parameter Estimate Std. Error Wald Z Sig. 95% Confidence Interval
Lower Bound Upper Bound
Residual 1.221794 .039641 30.821 .000 1.146518 1.302013

Intercept [subject =
Variance .694460 .106994 6.491 .000 .513457 .939271
class]
33
The intercept-only model estimates the intercept as 5.08, which is
simply the average popularity across all classes and pupils.
The variance of the pupil-level residual errors, symbolized by e,
is estimated as 1.22.
The variance of the class-level residual errors, symbolized by u0,
is estimated as 0.69.
All parameter estimates are significant, P < 0.005.
The ICC for this model is equal to:

= u0 = (0.69/(1.22+0.69)) = 0.69/1.91= 0.36
u0+e
Therefore, 36% of the variance of the popularity scores is at the group level.
The deviance is reported as 6327.468. It is a measure of model misfit.
When explanatory variables are added to the model, the deviance is expected
to go down.
34
Example: Model with explanatory variables
(consider the popularity data)
Analyze Mixed models Linear
Move class to the Subjects box and click continue.
Move popular to the Dependent variable box. Also, move
Cextrav, csex and ctexp to the covariate(s) box.
Click Fixed and take Cextrav, csex and ctexp to the big box under
model. Make sure include intercept is ticked.
Click Random and check the Include intercept box. Make sure that
unstructured is selected as a covariance type. Move csex and cextrav to
the big box under model. Also, move class to the Combinations box
and click continue.
Click Estimation and check Maximum Likelihood (ML), continue.
Click Statistics and select Parameter estimates and Tests for
covariance parameters and click continue.
Finally, click OK 35
(some of the outputs)
Information Criteriaa
-2 Log Likelihood 4811.402
Estimates of Fixed Effectsa
Akaike's Information Criterion
4833.402 Parameter Estimate Std. Error df t Sig. 95% Confidence Interval
(AIC)
Hurvich and Tsai's Criterion
4833.534 Intercept 5.023069 .055954 95.736 89.771 .000 4.911997 5.134141
(AICC)
cextrav .451432 .024466 96.129 18.451 .000 .402868 .499996
Bozdogan's Criterion (CAIC) 4906.012
csex 1.250717 .037016 2056.780 33.789 .000 1.178125 1.323310
Schwarz's Bayesian Criterion
4895.012 ctexp .088654 .008497 104.550 10.433 .000 .071804 .105503
(BIC)
The information criteria are displayed in
smaller-is-better forms.
Estimates of Covariance Parametersa

Parameter Estimate Std. Error Wald Z Sig. 95% Confidence Interval
Residual .547243 .018123 30.195 .000 .512850 .583942
UN (1,1) .279165 .045159 6.182 .000 .203314 .383313
UN (2,1) -.031155 .023032 -1.353 .176 -.076298 .013987
Intercept + csex + cextrav UN (2,2) .003885b .000000 . . . .
[subject = class] UN (3,1) -.009494 .018750 -.506 .613 -.046244 .027256
UN (3,2) -.002652 .010231 -.259 .796 -.022704 .017401
UN (3,3) .034249 .008499 4.030 .000 .021058 .055704
b. This covariance parameter is redundant. The test statistic and confidence interval cannot be computed.
For simplicity, the covariances can be excluded. 36

Estimates of the variance/covariance
parameters when the covariances are removed
Parameter Estimate Std.error P-value 95% C.I.
Lower Upper
Bound Bound
Residual (eij) .547243 .018123 < 0.001 .512850 .583942
Intercept (uoj) .279165 .045159 < 0.001 .203314 .383313
Sex (u1j) .003885 .000000 _ _ _
.034249 .008499 .02105 .055704

Extraversion (u2j) < 0.001
ICC = = 33.8% 37
(Explanations of the outputs)
The model shown in the previous slides include pupil gender,
pupil extraversion and teacher experience as explanatory
variables. The regression coefficients for all three variables are
significant (P<0.001).
The regression coefficient for pupil gender is 1.25. Since pupil
gender is coded 0=boy, 1=girl, this means that on the average
the girls score 1.25 points higher on the popularity measure.
The regression coefficient for pupil extraversion is 0.45, which
means that with each scale point higher on the extraversion
measure, the popularity is expected to increase by 0.45 scale
points.
The regression coefficient for teacher experience is 0.09, which
means that for each year of experience of the teacher, the
average popularity score of the class goes up by 0.09 points.
38
The same model with the three explanatory variables includes variance
components for the regression coefficients of pupil gender and pupil
extraversion, symbolized by u1 and u2.
The variance of the regression coefficients for pupil extraversion across

classes is estimated as 0.03, with a standard error of 0.008 (P<0.001).
Therefore, the research hypothesis that the regression slopes for pupil
extraversion vary across classes is supported by the data.
The variance of the regression coefficient for pupil gender is estimated as

0.00. This variance component is clearly not significant, so the
hypothesis that the regression slopes for pupil gender vary across classes
is not supported by the data. Therefore, we can remove the residual
variance term for the gender slopes from the model.
Note that the ICC for this model has decreased from 36% to 33.8%.
39
(only a slight reduction!)
The intercept in the empty model is equal to the

overall average popularity across all classes and
pupils , which is 5.08.
The -2 Log Likelihood is the most basic measure for

model selection.
The other four measures are modifications of the Log

Likelihood which penalize more complex models.
40
THREE- AND MORE-LEVEL REGRESSION MODELS
In principle, the extension of the two-level

regression model to three and more levels is
straightforward.
There is an outcome variable at the rst, the

lowest level.
In addition, there may be explanatory variables

at all available levels.
41

8) Multilevel Analysis

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

8) Multilevel Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

Advanced quantitative research methods

Prof. Getu Degu

Once you know that hierarchies

Kreft and Deleeuw (1998)

Thus, in educational research, the population

One should keep rmly in mind that the central

The researcher must decide whether parameter

The researcher must decide whether to employ ML

Centering (raw metric scaling, grand mean centering,

This model assumes that slopes are fixed (the

This model assumes that intercepts are fixed

At each level in the hierarchy, we may have

In this typology (classification), we distinguish

Structural variables are operationalized by referring to the sub-units

Using the mean of a lower-level variable as an explanatory variable

It is clear that constructing a structural variable from the lower-level

For instance, we can assign to all pupils in a school the

This is called disaggregation; data on higher-level units are

The resulting variable is called a contextual variable,

However, analyzing variables from different

On the pupil level, we have the outcome variable popularity (Y),

Also, we have two explanatory variables on the pupil level: pupil

We have one class level explanatory variable teacher experience (Z:

To analyze these data, we can set up separate regression

Yij = 0j + 1jX1ij + 2jX2ij + eij

Using variable labels instead of algebraic symbols, the equation reads:

Similarly, differences in the slope coefcient for gender or

Conversely, if 01 is negative, the average popularity is

We hope to explain at least some of the variation by

Generally, we will not be able to explain all the

If 11 is positive, the gender effect on popularity is larger with experienced teachers.

Conversely, if 11 is negative, the gender effect on popularity is smaller with experienced

The variance of the residual errors u0j is specied as u0 , and the

Therefore, rearranging terms gives:

Using variable labels instead of algebraic symbols, we have:

The outcome variable Y is for individual i nested in

Combining the above equation:

dependent variable attributable to class-level

It simply states that the intraclass correlation is the

The intraclass correlation can also be interpreted as the

A researcher who finds a significant variance component for

2) It can be theoretically meaningful to understand how much

Estimates of Covariance Parametersa

Parameter Estimate Std. Error Wald Z Sig. 95% Confidence Interval

Lower Bound Upper Bound

Residual 1.221794 .039641 30.821 .000 1.146518 1.302013

The ICC for this model is equal to:

Estimates of Covariance Parametersa

For simplicity, the covariances can be excluded. 36

Parameter Estimate Std.error P-value 95% C.I.

Intercept (uoj) .279165 .045159 < 0.001 .203314 .383313

Sex (u1j) .003885 .000000 _ _ _

.034249 .008499 .02105 .055704

The variance of the regression coefficients for pupil extraversion across

The variance of the regression coefficient for pupil gender is estimated as

The intercept in the empty model is equal to the

The -2 Log Likelihood is the most basic measure for

The other four measures are modifications of the Log

In principle, the extension of the two-level

There is an outcome variable at the rst, the

In addition, there may be explanatory variables

Anda mungkin juga menyukai