Application SEM To QOL. Lee N Song 2005 PDF

Application of Structural Equation
Models to Quality of Life

Sik-Yum Lee and Xin-Yuan Song
The Chinese University of Hong Kong, Hong Kong
Suzanne Skevington
University of Bath, Great Britain
Yuan-Tao Hao
Zhongshang University, China
Quality of life (QOL) has become an important concept for health care. As QOL is a
multidimensional concept that is best evaluated by a number of latent constructs, it is
well recognized that latent variable models, such as exploratory factor analysis
(EFA) and confirmatory factor analysis (CFA) are useful tools for analyzing QOL
data. Recently, QOL researchers have realized the potential of structural equation
modeling (SEM), which is a generalization of EFA and CFA in formulating a regres-
sion type equation in the model for studying the effects of the latent constructs to the
QOL or health-related QOL. However, as the items in a QOL questionnaire are usu-
ally measured on an ordinal categorical scale, standard methods in SEM that are
based on the normal distribution may produce misleading results. In this article, we
propose an approach that uses a threshold specification to handle the ordinal categor-
ical variables. Then, on the basis of observed ordinal categorical data, a maximum
likelihood (ML) approach for analyzing CFA and SEM is introduced. This approach
produces the ML estimates of the parameters, estimates of the scores of latent con-
structs, and the Bayesian information criterion for model comparison. The methodol-
ogies are illustrated with a dataset that was obtained from the WHOQOL group.
There is increasing recognition that measures of quality of life (QOL) or health-
related QOL have great value for clinical work and the planning and evaluation of
health care as well as for medical research. It has generally been accepted that
STRUCTURAL EQUATION MODELING, 12(3), 435453
Copyright 2005, Lawrence Erlbaum Associates, Inc.
Requests for reprints should be sent to Sik-YumLee, Department of Statistics, The Chinese Univer-
sity of Hong Kong, Shatin, N.T., Hong Kong. E-mail: sylee@sparc2.sta.cuhk.edu.hk
QOL is a multidimensional concept (Staquet, Hayes, & Fayers, 1998) that is best
evaluated by a number of different latent constructs such as physical function,
health status, mental status, and social relationships. As these latent constructs of-
ten cannot be measured objectively and directly, they are treated as latent variables
in QOL analyses. The most popular method used to assess a latent construct is a
survey that incorporates a number of related items that are intended to reflect the
underlying latent construct of interest. Hence, QOL questionnaires often contain a
number of items that are treated as observed (or manifest) variables. For example,
the WHOQOL-BREF assessment (The WHOQOL Group, 1998a) contains 24
items for measuring four latent constructs of QOL.
Exploratory factor analysis (EFA) is a statistical method that is used to group to-
gether the items (manifest variables) that are related to a particular latent construct
(factor) but relatively uncorrelated with other latent constructs. It has been used as
a method for exploring the structure of a new QOL instrument (Fayers & Machin,
1998; The WHOQOL Group, 1998a). Confirmatory factor analysis (CFA) is a nat-
ural extension of EFA that allows the identified latent constructs to be correlated
and any parameter to be fixed at a preassigned value. This model has been used to
confirm the factor structure of the instrument. Hence, hypothesis testing or model
comparison among nested and nonnested models is a basic issue in CFA. It is im-
portant to realize that latent constructs in a CFA model are never regressed on the
other latent constructs. The basic goal of generalizing CFA to structural equation
modeling (SEM; Bentler, 1992; Bollen, 1989; Jreskog &Srbom, 1996) is to add
a component for regressing the endogenous latent construct to exogenous latent
constructs. Hence, the causal effects among the latent constructs can be analyzed.
Pointing out some weaknesses of EFA, Fayers and Hand (1997) argued that SEM
may have great potential in QOL research. In fact, SEM based on the normal the-
ory have recently been applied to QOL analyses (Meuleners, Lee, Binns, &Lower,
2003; Power, Bullingen, & Hazper, 1999). However, as the factor score estimates
that are obtained by the standard regression method (Bentler, 1992; Bollen, 1989)
have several deficiencies (see, e.g., Fayers & Machin, 1998), they are rarely used
as a method of deriving outcome scores.
Items in a QOL instrument are usually measured on an ordinal categorical
scale, typically with 3 to 5 points. It is well known in the psychometric and statis-
tics literature that ignoring the discrete ordinal nature and treating the data as con-
tinuous leads to erroneous results (Lee, Poon, &Bentler, 1990; Olsson, 1979; Poon
& Lee, 1987). Consequently, a number of statistical methods, such as polychoric
and polyserial correlations, logit and probit models, and ordinal regression, have
been developed for achieving correct analyses. The discrete ordinal nature of the
items also draws much attention in QOL analyses (Fayers &Hand, 1997; Fayers &
Machin, 1998). It has been pointed out that nonrigorous treatment of the ordinal
items as continuous can be subjected to criticism (Glonek & McCullagh, 1995),
and models such as the item response model and ordinal regression that take into
436 LEE, SONG, SKEVINGTON, HAO
account the ordinal nature are more appropriate (Lall, Campbell, Walters, Morgan,
& MRC CFAS Co-operative, 2002; Olschewski & Schumacker, 1990).
The aim of this article is to introduce recently developed methods in SEM with
ordinal categorical variables for analyzing common QOL instruments with ordinal
categorical items that can be missing at random(MAR; Little &Rubin, 1987). The
objectives are: (a) to determine the measurement properties of the latent constructs
underlying the QOL instrument, and to illustrate the method by using part of the
data that are obtained from the ordinal categorical items of the WHOQOL-BREF
assessment (The WHOQOL Group, 1998a); (b) to assess intercorrelations among
the established latent constructs; (c) to provide a more sensible estimate of latent
construct scores (factor scores); (d) to establish a structural equation model with a
regression type structural equation for assessing the casual effects of the latent
constructs to the overall QOL; and (e) to introduce a model comparison method
that can be applied to compare nested or nonnested models.
BACKGROUND
Most QOL items are not only measured using the discrete ordinal scale, but are
also highly skewed; see Fayers and Machin (1998) for a good example in relation
to the Hospital Anxiety and Depression Scale. Hence, distributions of the items are
nonnormal, and the results produced by the normal theory maximum likelihood
(ML) approach may be misleading. The basic idea of the multivariate probit model
approach or the polychoric correlation approach is relating the ordinal categorical
item to an underlying continuous normal distribution through a threshold specifi-
cation. For example, let z be an ordinal categorical variable that corresponds to an
ordinal item that measures, say pain, with a 4-point scale (14). Suppose that for a
given dataset, the proportions of 1, 2, 3, and 4 are .05, .15, .30, and .50, respectively
(see the histogram in Figure 1a that is highly skewed to the right). Clearly, the ob-
servations of z cannot be regarded as coming froma normal distribution. However,
they can be treated as manifestions of an underlying normal variable y that is di-
rectly related to pain but exact continuous measurements of which are not avail-
able due to the design of the discrete ordinal categorical scale in the questionnaire.
The relation of z and y is defined as follows: for k = 1, 2, 3, 4
z = k if
k 1
< y
k
(1)
where =
0
<
1
<
2
<
3
<
4
= , and
1
,
2
, and
3
are unknown threshold
parameters. Note that
2

1
can be different from
3

2
, hence unequal-interval
scales are allowed. For example, the ordinal categorical observations that give the
histogram in Figure 1a can be captured by a standard normal distribution, N(0, 1),
with appropriate thresholds (see Figure 1b). For a random vector z = (z
1
, , z
p
) of
ordinal categorical items, the distribution of the underlying continuous random
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 437
vector y = (y
1
, , y
p
) is multivariate normal with a correlated structure. Depending
on the nature of the real problem, the correlation structure can be a particular CFA,
or a structural equation model. In this approach, a model is proposed for the latent
random vector y, and a rigorous analysis is conducted on the basis of the observed
data set z
1
, , z
n
of ordinal categorical responses that are not assumed to have a
continuous distribution.
As the probability density function of z involves a complicated integral of high
dimension, the statistical analysis is nontrivial. Multistage estimation methods for
structural equation models that are based on the polychoric correlations have been
proposed (Lee, Poon, &Bentler, 1995). Recently, the optimal MLapproach (Shi &
FIGURE 1 (a) Histogram of a hypothetical ordinal categorical dataset; (b)The underlying
normal distribution with a threshold specification.
Lee, 2000) and the Bayesian approach (Lee &Zhu, 2000) have been developed us-
ing powerful tools in statistical computing such as the Monte Carlo EM algorithm
(Wei &Tanner, 1990) and the Gibbs sampler (Geman &Geman, 1984). It has been
shown (Lee, Song, &Lee, 2003; Song &Lee, 2002) that these computing tools are
efficient and able to handle data that are MAR. The analysis presented here is
based on a structural equation model with the ML approach.
METHODS
Instrument and Data
The WHOQOL100 (The WHOQOL Group, 1995, 1998b) assessment was devel-
oped by the WHOQOL Group in 15 international field centers. As this assessment
may be too lengthy for some uses, the WHOQOL-BREF (The WHOQOL Group,
1998a) instrument was derived fromit by selecting 24 ordinal categorical items out
of the 100 items. It has been shown that the WHOQOL-BREF is highly correlated
with the WHOQOL100 and is useful to health professionals (The WHOQOL
Group, 1998a). This instrument was establishedtoevaluatefour latent domains. The
first 7 items are intended to address physical health, the next 6 items are intended to
address psychological QOL, the 3 items that followare for social relationships, and
thelast 8itemsareintendedtoaddresstheenvironment. Inadditiontothe24itemsat-
tributed to domains, the instrument also includes 2 items for the overall QOL and
general health, givingatotal of 26items. All of theitems aremeasuredwitha5-point
scale; for example, 1 (not at all/very dissatisfied), 2 (a little/dissatisfied), 3 (moder-
ate/neither), 4 (very much/satisfied), and 5 (extremely/very satisfied).
In this article, only data that were obtained from the United Kingdom with a
sample size n = 475 are used. This dataset contains a number of incomplete obser-
vations, which were assumed to be MAR. The frequencies of all the items and the
number of incomplete observations are presented in Table 1. As can be seen from
the table, several items tended to take maximum values for most patients, and
hence skew to the right. Examples of these items are Q1, Q4, Q6, Q8, Q16, Q18,
Q24, Q25, and Q26. Treating these data as coming from normal is not recom-
mended. There were a number of missing observations, especially for Q17, which
is related to the question about sexual activity. For simplicity, missing data were
assumed to be MAR. The number of fully observed observations is 416. In contrast
to the listwise deletion approach, incomplete observations with missing entries
were included in statistical analysis.
Statistical Analysis
Usually, QOLanalyses start with an EFAmodel to identify the appropriate number
of latent constructs that are likely to be present in an instrument. The conclusion
that is drawn from the EFA analysis is then confirmed via CFA, by testing the
goodness-of-fit of the conclusion to the data. For instruments that are specifically
designed to have certain items relating to a particular latent domain, such as the
WHOQOL-BREF instrument (The WHOQOL Group, 1998a), there is sufficient
prior knowledge to suggest that the instrument can be represented by some latent
domains or factors. Hence, statistical analyses can be started with a CFA model
with a given number of factors or a structural equation model with a specific path
diagram (see the examples in Bollen, 1989; Fayers & Hand, 1997; Fayers &
Machin, 1998). Based on a posited model, the first basic statistical analysis is to
obtain good estimates of the unknown parameters, and then on the basis of the esti-
mates and a goodness-of-fit statistic, evaluate the fit of the posited model to the
sample data. Finally, it is desirable to have a method with which to choose a better
model from a number of competing models.
TABLE 1
Frequencies of the Ordinal Scores of the Items
Ordinal Scores Number of
Incomplete
Observations WHOQOL Items 1 2 3 4 5
Q1 Overall QOL 3 41 90 233 107 1
Q2 Overall health 32 127 104 154 58 0
Q3 Pain and discomfort 21 65 105 156 127 1
Q4 Medical treatment dependence 21 57 73 83 239 2
Q5 Energy and fatigue 15 57 166 111 118 8
Q6 Mobility 16 36 58 120 243 2
Q7 Sleep and rest 28 87 95 182 83 0
Q8 Daily activities 7 73 70 224 100 1
Q9 Working capacity 19 83 88 191 91 3
Q10 Positive feelings 2 30 141 241 59 2
Q11 Spirituality/personal beliefs 13 45 149 203 61 4
Q12 Memory and concentration 4 40 222 184 21 4
Q13 Bodily image and appearance 9 46 175 137 106 2
Q14 Self-esteem 13 72 130 210 50 0
Q15 Negative feelings 4 54 137 239 39 2
Q16 Personal relationships 8 46 68 218 134 1
Q17 Sex 25 55 137 149 76 33
Q18 Social support 2 23 84 228 136 2
Q19 Physical safety and security 2 25 193 191 62 2
Q20 Physical environment 4 29 187 206 43 6
Q21 Financial resources 27 56 231 105 54 2
Q22 Information and skills 5 27 176 194 70 3
Q23 Recreation and leisure 10 99 156 163 47 0
Q24 Home environment 9 27 53 235 151 0
Q25 Health accessibility and quality 0 17 75 321 61 1
Q26 Transport 8 38 61 253 113 2
Estimatesof theunknownparametersareobtainedonthebasisof theordinal cate-
gorical data that are correctly treated by a underlying multivariate normal distribu-
tion with a threshold specification. As the approach that is based on the polychoric
correlation cannot be applied to large dimensional problems (e.g., EQS [Bentler,
1992] only allows a maximum of 20 ordinal categorical variables) approaches that
use advanced tools in statistical computing have been recommended (Lee & Zhu,
2000; Shi &Lee, 2000). We analyzed the ordinal categorical data with the expecta-
tion-maximization (EM) algorithm(Dempster, Laird, &Rubin, 1977). The EMal-
gorithm is perhaps the best known tool for analyzing missing data and latent con-
structs instatistical computing, andit has beenextensivelyusedfor estimatingmany
important models in biostatistics (see Bock &Gibbons, 1996; Laird &Ware, 1982,
among others), psychometrics (see Bock &Aitkin, 1981; Rubin, 1991, among oth-
ers), and statistics (see Meng &van Dyk, 1997; Shi &Copas, 2002, among others).
This algorithmis particularly useful for analyzing CFAor structural equation mod-
els bytreatingthelatent constructs as hypothetical missingdatainthemodel (Jedidi,
Jagpal, &DeSarbo, 1997; Lee &Tsang, 1999; Rubin &Thayer, 1982; Song, Lee, &
Zhu, 2001). In this article, ML estimates of the unknown parameters in the CFA or
structural equation model with the incomplete ordinal categorical data are obtained
onthe basis of the EMprocedure that is giveninLee, Song, andLee (2003), Jedidi et
al. (1997), and Song et al. (2001). This procedure has the following advantages over
the multistage approaches (Lee et al., 1995) that are basedonthe polychoric correla-
tions: (a) It gives the MLestimate, which is not only more optimal but also useful in
the computationof the Bayesianinformationcriterion(BIC) for model comparison;
(b) it is more efficient and can handle problems with a large number of ordinal cate-
gorical items (variables); (c) it produces estimates of the latent construct scores (fac-
tor scores) as by-products; and (d) it can handle missing data much more efficiently.
A brief description of this procedure is given in the appendix.
Assessing the goodness-of-fit of a posited model is an important issue in apply-
ing SEM to QOL analysis. This is usually done by testing a null hypothesis that
specifies the proposed model fits the sample data. The traditional approach is to
use a significant test on the basis of p values that are determined by the asymptotic
2
distribution of the likelihood ratio test (Bentler, 1992; Jreskog & Srbom,
1996). As pointed out in the statistics literature (Berger & Delampady, 1987; Kass
&Raftery, 1995), there are some difficulties with this approach. For instance, the p
value is only a measure of evidence against the null hypothesis; hence, not reject-
ing the null hypothesis does not mean the null hypothesis is better than the alterna-
tive hypothesis. Moreover, it cannot be used to test nonnested hypotheses. Hence,
methods that do not have these difficulties are proposed in statistics as well as in
SEM. One common approach is to consider the comparison of two nonnested com-
peting models, say M
1
and M
2
(when applying to hypothesis testing, M
1
and M
2
represent two hypotheses, say H
1
and H
2
, respectively). In particular, the following
BIC has been widely applied in the field of SEM (Lee & Song, 2001, 2003a;
Raftery, 1993). Given an observed dataset Z that may or may not involve incom-
plete data,
where

k
is the ML estimate of the unknown parameter vector under M
k
, d
k
is the
number of the parameters in

k
, n is the sample size, and L
0
(Z ;

k
, M
k
) is the
log-likelihood function of the observed dataset Zevaluated at

k
under M
k
. Accord-
ing to Kass and Raftery (Raftery, 1993), the BIC
12
is commonly applied to model
comparison as follows: It gives evidence of support to M
1
(hence accepting H
1
) if
BIC
12
< 0; and gives evidence of support or strong evidence of support to M
2
(ac-
cepting H
2
) if BIC
12
> 2 and BIC
12
> 6, respectively; and when M
1
is nested in M
2
,
it cannot give a definite conclusion if 0 < BIC
12
< 2. For structural equation models
with fully observed data that come fromthe normal distribution, the log-likelihood
function values in BIC
12
can be obtained directly using LISREL (Jreskog &
Srbom, 1996) or EQS (Bentler, 1992); hence, the computation is easy. For struc-
tural equation models with incomplete ordinal categorical data, the log-likelihood
cannot be directly obtained. In this article, it was computed via path sampling
(Gelman & Meng, 1998), a well-known technique in statistical computing that has
been successful applied to structural equation models (Lee & Song, 2003a,
2003b).
RESULTS
CFA
To study the measurement properties of the QOLinstrument, we only used the data
that corresponded to the 24 ordinal items that reflected the latent constructs. We
proposed a CFA model with four correlated factors for accessing the domains:
physical health, psychological QOL, social relationships, and environment. Ac-
cording to the meaning of the questions, the structure of the proposed CFA is dis-
played by the path diagram in Figure 2 (see Fayers & Hand, 1997 for a good de-
scription of a path diagram). This CFA model defined a specific covariance
structure for fitting the data. To assess its goodness-of-fit to the sample data, we re-
garded it as M
2
and compared it with the general model M
1
, which has a general
nonstructured covariance matrix. In covariance structure analysis terminology, M
1
is known as a saturated model. ML estimates of the parameters in the competing
models were obtained by the EM algorithm, and the BIC
12
was computed via
Equation 2. The value of BIC
12
is 687.0. According to the criterion given in
Raftery (1993), the proposed CFA model is definitely selected. Hence, the ob-
served data give strong evidence of support to the CFA model with four correlated
( ) ( )
( )
12 0 1 1 0 2 2 1 2

BIC 2 log ; , log ; , log (2) L M L M d d n
l
=
l
l
Z Z
factors. This conclusion is more meaningful and stronger than the conclusion ob-
tained from the common asymptotic chi-square test.
To give a further illustration of the BIC application, we compared the selected
CFA model M
2
with M
3
, which is a similar CFA model with four factors that are
uncorrelated. We found that BIC
32
= 378.25, and hence M
2
should be selected.
That means that a CFAmodel with correlated factors can fit the data much better.
The completely standardized (see Jreskog & Srbom, 1996) ML estimates of
the factor loadings and factor correlations in the selected CFAmodel with four cor-
related factors are given in Figure 2. To save space, threshold parameter estimates
are not presented. On the basis of the meaning of the questions, the latent factors
FIGURE 2 Path diagram and completely standardized ML estimates of the parameters in the
CFA model. In all diagrams, double arrows are the factor correlations.
were interpreted as physical health,
1
; psychological QOL,
2
; social relationship,
3
; and environment,
4
. This interpretation agrees with that given by the
WHOQOL Group (1998a). All of the factor loading estimates, except for that cor-
responding to sexual activity and social relationship, which is 0.126, were high.
This indicates a strong association between each of the latent constructs (factors)
and its respective items.
Each z
i
(y
i
) in the factor analysis model was associated with a vector of factor
scores,
i
=(
1i
,
2i
,
3i
,
4i
). Theproposedestimationproceduregavethefactor score
estimates,

, , , ,
i
i m =1 as by-products. Histograms that were obtained from
{
, , , },
ki
i n =1 k=1, 2, 3, 4, arepresentedinFigure3. It is evident that theempirical
distributions of the factor scores were close to normal with zero mean values. The
sample correlations {
12
,
13
,
14
,
23
,
24
,
34
} that were obtained fromthis sample
FIGURE 3 Histogram of the factor score estimates.
of factor scores estimates were {0.74, 0.55, 0.92, 0.70, 0.87, 0.84}. Although all of
these correlationestimates were generallylarger thanthe MLestimates of the corre-
lations given in Figure 2, the differences were not substantial. Hence, these factor
score estimates can be used for further statistical inference.
Structural Equation Models
Toaddress the causations of the four identifiedlatent constructs tothe health-related
QOL, weappliedSEMtothedataset that additionallycontainedtheoverall QOLand
general healthitems. Wefirst studiedtherelationof QOLandthelatent constructs
1
,
2
,
3
, and
4
bymeansof SEM. Thepathdiagramof theproposedstructural equation
model for analyzing the overall QOL with the 24 ordinal categorical items is dis-
played in Figure 4. This structural equation model, which is called M
4
, defines a
covariance structure for the 25 ordinal categorical items. To assess its good-
ness-of-fit to the sample data, it was compared with M
5
, the general saturated model
with 25 observed variables that had a general nonstructured covariance matrix. We
found that BIC
54
= 492.58. This result clearly indicated that M
4
is much better than
the saturatedmodel. MLestimates of the parameters inM
4
are alsopresentedinFig-
ure 4. As the structural equation model was different fromthe previous CFAmodel,
the estimates of the factor correlations among the latent constructs
1
,
2
,
3
, and
4
were not exactly the same as those obtained on the basis of the previous CFAmodel.
However, as expected, their differences wereveryminor (inthethirddecimal places;
hence, some estimates inFigure 3appearedtobe the same as those inFigure 2). This
is alsotrue for the factor loadingestimates, althoughthe differences were larger. The
most important additional result that can be obtained using a structural equation
model is thefollowingestimatedstructural equationinrelationtotheendogenous la-
tent construct QOL() and the exogenous latent constructs
1
,
2
,
3
, and
4
:
= 0.46
1
+ 0.24
2
+ 0.21
3
+ 0.13
4
+ (3)
with an estimated residual variance 0.19. This estimated structural equation indi-
cates that physical health (
1
) had the most substantial causal effect on QOL(),
which was larger than the causal effects of psychological QOL (
2
) and social rela-
tionship (
3
), which were in turn larger than the causal effect of environment (
4
).
It is also interesting to observe that causal effects of psychological QOL and social
relationship were basically the same. Moreover, it is also nice to see that the esti-
mated residual variance was not large.
Furthermore, we investigated the relation of the health-related QOL (HRQOL,
) with the latent constructs. The latent construct HRQOLwas formed by the over-
all QOL item and the general health item. The path diagram of the proposed struc-
tural equation model, which is called M
6
, is displayed in Figure 5. To assess the
goodness-of-fit of M
6
, we again compared it with the saturated model M
7
with 26
observed variables. We found that BIC
76
= 401.35, which clearly indicated that the
proposed model M
6
was much better than the saturated model. The ML estimates
of the parameters in M
6
are presented in Figure 5. We also observed that the behav-
ior of the factor loading and factor correlation estimates was similar to that in the
analysis of M
4
. Now, the estimated structural equation that addresses the causal ef-
fects of the latent constructs
1
,
2
,
3
, and
4
to HRQOL() is:
= 0.68
1
+ 0.28
2
+ 0.08
3
+ 0.05
4
+ (4)
with an estimated residual variance of 0.08. Hence, for the health-related QOL,
which involves an additional component on the general health of the patients when
FIGURE 4 Path diagram and completely standardized ML estimates of parameters in the
structural equation model for the overall QOL.
comparing the single overall QOL, only the physical health (
1
) and psychological
QOL (
2
) had substantial causal effects, whereas social relationship (
3
) and envi-
ronment (
4
) were less important.
DISCUSSION
In QOL or HRQOL analysis, the fundamental aimof factor analysis and SEMis to
find an appropriate model for assessing the relations among the observed items in
FIGURE 5 Path diagram and completely standardized ML estimates of parameters in the
structural equation model for the HRQOL.
the questionnaire and the underlying endogenous and exogenous latent constructs.
EFA and CFA models are used to explore and confirm the appropriate number of
latent domains for explaining the items. A main objective of this article is to apply
SEM to study the causal relations of the latent constructs to the QOL or the
HRQOL.
Aslightly different approach using a hierarchical measurement model, which is
basically a second-stage factor analysis model, may also be considered. In this ap-
proach, a new latent variable
*
for addressing QOL was constructed from the la-
tent factors (physical health, mental health, social relationships, and environment).
This
*
can be compared with in Figure 4 that is directly obtained from the over-
all QOL(Q1). However, to avoid confusing the SEMapproach with a second-order
factor analysis approach, detailed analyses are not included here.
As the observed ordinal categorical observations are clearly nonnormal, more
rigorous statistical methods that take into account the discrete nature of the ob-
served data may be preferable to the standard methods in the commonly used
packages. This article introduces the ML approach for analyzing structural
equation models with ordinal categorical data that can be MAR. We utilized re-
cently developed tools in statistical computing to obtain the results. It is not too
difficult to implement computer programs to obtain these results. In fact, the
software BUGS (Gilks, Thomas, & Spiegelhalter, 1994, freely available at
www.mrc-bsu.com.ac.uk/bugs), can be utilized to complete the E step of the
proposed EM algorithm. However, as far as we know, there is no user-friendly
software for obtaining the ML solution. Admittedly, this is a weakness of the
rigorous ML approach. In behavioral, educational, medical, and social-psycho-
logical research, the use of recently developed tools in statistical computing such
as the EM algorithm and the Markov chain Monte Carlo (MCMC) methods are
becoming more and more popular, mainly because they are very helpful in solv-
ing complex problems. Due to the existing demand, we expect that user-friendly
software will soon be developed. In the meantime, we are willing to share our
program with others. For the general problem of assessing QOL, an alternative
approach is to provide interval response scales in the questionnaires (Skevington
& Tucker, 1999).
In this article, we assume for brevity that missing data are MAR with an
ignorable mechanism. The observed-data log-likelihood is L
0
(Z
obs
;). For the
missing data associated with the sensitive question Q17, it may be better to analyze
them with a nonignorable missing mechanism. To consider this more general situ-
ation, we need to model the matrix of missing indicators, R, via a distribution with
a vector of unknown parameters, say . The observed-data log-likelihood L
0
*
(Z
obs
,
R; , ) is more complicated. The EM algorithm can be applied to L
0
(Z
obs
; ) or
L
0
*
({Z
obs
, R; , ). However, the treatment of L
0
*
is much more technical. Hence
we regard it as a topic for future research.
It is well known that QOL and HRQOL are complicated concepts, and the un-
derlying observed data are very complex in general. Generalizations of the stan-
dard structural equation model, for example, the nonlinear structural equation
models (Lee & Song, 2003b; Lee, Song, & Poon, 2004), multilevel structural
equation models (Song & Lee, 2004a; Lee & Song, 2004), diagnostic analysis
(Lee & Lu, 2004; Song & Lee, 2004b; Lee & Xu, 2004), and finite mixtures of
structural equation models (Lee &Song, 2003a; these are generalizations of multi-
ple-group structural equation models), appear to have the potential to provide rig-
orous statistical analysis of these complicated concepts in a variety of situations.
ACKNOWLEDGMENTS
This research was supported by Hong Kong UGCEarmarked grant 4243/03 H. We
are thankful to Professor M. Power for helpful comments. We are solely responsi-
ble for any mistakes.
REFERENCES
Bentler, P. M. (1992). EQS: Structural equation programmanual. Los Angels: BMDP Statistical Soft-
ware.
Berger, J. O., & Delampady, M. (1987). Testing precise hypothesis. Statistical Science, 3, 317352.
Bock, R. D., & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Appli-
cation of an EM algorithm. Psychometrika, 46, 443461.
Bock, R. D., & Gibbons, R. D. (1996). High dimensional multivariate probit analysis. Biometrics, 52,
11831193.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via
the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 138.
Fayers, P. M., &Hand, D. J. (1997). Factor analysis, causal indicators and quality of life. Quality of Life
Research, 8, 139150.
Fayers, P. M., & Machin, D. (1998). Factor analysis. In M. J. Staquet, R. D. Hayes, & P. M. Fayers
(Ed.), Quality of life assessment in clinical trials (pp. 191223). NewYork: Oxford University Press.
Gelman, A., & Meng, X. L. (1998). Simulating normalizing constant: From importance sampling to
bridge sampling to path sampling. Statistical Science, 13, 163185.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restora-
tion of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.
Gilks, W. R., Thomas, A., &Spiegelhalter, D. J. (1994). Alanguage and programfor complex Bayesian
modeling. Statistician, 43, 169177.
Glonek, G. F. V., &McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal Statistical
Society, Series B, 57, 533546.
Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997). STEMM: A general finite mixture structural equa-
tion model. Journal of Classification, 14, 2350.
Jreskog, K. G., & Srbom, D. (1996). LISREL 8: Structural equation modeling with the SIMPLIS
command language. Chicago: Scientific Software International.
Kass, R. E., &Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90,
773795.
Laird, N. M., & Ware, J. H. (1982). Random effects models for longitudinal data. Biometrics, 38,
963974.
Lall, R., Campbell, M. J., Walters, S. J., Morgan, K., & MRC CFAS Co-operative. (2002). A review of
ordinal regression models applied on health-related quality of life assessments. Statistical Methods
in Medical Research, 11, 4967.
Lee, S. Y., & Lu, B. (2004). Case-deletion diagnostics for nonlinear structural equation models.
Multivariate Behavorial Research, 38, 275400.
Lee, S. Y., Poon, W. Y., &Bentler, P. M. (1990). Athree-stage estimation procedure for structural equa-
tion models with polytomous variables. Psychometrika, 55, 4551.
Lee, S. Y., Poon, W. Y., &Bentler, P. M. (1995). A2-stage estimation of structural equation models with
continuous and polytomous variables. British Journal of Mathematical and Statistical Psychology,
48, 339358.
Lee, S. Y., & Song, X. Y. (2001). Hypothesis testing and model comparison in two-level structural
equation models. Multivariate Behavioral Research, 36, 639655.
Lee, S. Y., & Song, X. Y. (2003a). Bayesian model selection for mixtures of structural equation models
with an unknown number of components. British Journal of Mathematical and Statistical Psychol-
ogy, 56, 145165.
Lee, S. Y., &Song, X. Y. (2003b). Maximumlikelihood estimation and model comparison of nonlinear
structural equation models with continuous and polytomous variables. Computational Statistics and
Data Analysis, 44, 125142.
Lee, S. Y., Song, X. Y., & Lee, J. (2003). Maximum likelihood estimation of nonlinear structural equa-
tion models with ignorable missing data. Journal of Educational and Behavioral Statistics, 28,
111134.
Lee, S. Y., Song, X. Y., & Poon, W. Y. (2004). Comparison of approaches in estimating interaction and
quadratic effects of latent variables. Multivariate Behavioral Research, 39, 3767.
Lee, S. Y., & Tsang, S. Y. (1999). Constrained maximum likelihood estimation of two-level covariance
structure model via EM type algorithms. Psychometrika, 64, 435450.
Lee, S. Y., & Xu, L. (2004). Influence analyses of nonlinear mixed-effect models. Computational Sta-
tistics and Data Analysis, 45, 321342.
Lee, S. Y., & Zhu, H. T. (2000). Statistical analysis of nonlinear structural equation models with con-
tinuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53,
209232.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via ECM algorithm: A general
framework. Biometrika, 80, 267278.
Meng, X. L., &van Dyk, D. (1997). The EMalgorithmAn old folk-song sung to a fast newtune (with
discussion). Journal of the Royal Statistical Society, Series B, 59, 511567.
Meuleners, L. B., Lee, A. H., Binns, C. W., & Lower, A. (2003). Quality of life for adolescents: As-
sessing measurement properties using structural equation modeling. Quality of Life Research, 12,
283290.
Olschewski, M., & Schumacker, M. (1990). Statistical analysis of quality of life data in cancer clinical
trials. Statistics in Medicine, 9, 749763.
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psycho-
metrika, 44, 443460.
Poon, W. Y., & Lee, S. Y. (1987). Maximum likelihood estimation of multivariate polyserial and
polychoric correlation coefficients. Psychometrika, 52, 409430.
Power, M., Bullingen, M., & Hazper, A., on behalf of the WHOQOL Group. (1999). The World Health
Organization WHOQOL100: Tests of the universality of quality of life in 15 different cultural
groups worldwide. Health Psychology, 18, 495505.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp.163180). Newbury Park, CA: Sage.
Rubin, D. B. (1991). EM and beyond. Psychometrika, 56, 241254.
Rubin, D. B., &Thayer, D. T. (1982). EMalgorithmfor MLfactor analysis. Psychometrika, 47, 6976.
Shi, J. Q., & Copas, J. (2002). Publication bias and meta-analysis for 2 2 tables: An average Markov
chain Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B, 64, 221236.
Shi, J. Q., & Lee, S. Y. (2000). Latent variable models with mixed continuous and polytomous data.
Journal of the Royal Statistical Society, Series B, 62, 7787.
Skevington, S. M., &Tucker, C. (1999). Designing response scales for cross-cultural use in health care:
Data fromthe development of the UKWHOQOL. British Journal of Medical Psychology, 72, 5161.
Song, X. Y., &Lee, S. Y. (2002). Analysis of structural equation model with ignorable missing continu-
ous and polytomous data. Psychometrika, 67, 261288.
Song, X. Y., & Lee, S. Y. (2004a). Bayesian analysis of two-level nonlinear structural equation models
with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology,
57, 2952.
Song, X. Y., & Lee, S. Y. (2004b). Local influence of two-level latent variable models with continuous
and polytomous data. Statistical Sinica, 14, 317332.
Song, X. Y., Lee, S. Y., &Zhu, H. T. (2001). Model selection in structural equation models with contin-
uous and polytomous data. Structural Equation Modeling, 8, 378396.
Staquet, M. J., Hayes, R. D., & Fayers, P. M. (Eds.). (1998). Quality of life assessment in clinical trials.
New York: Oxford University Press.
Wei, G. C. G., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the
poor mans data augmentation algorithm. Journal of the American Statistical Association, 85,
699704.
The WHOQOL Group. (1995). The World Health Organization Quality of Life Assessment
(WHOQOL): Position paper from the World Health Organization. Social Science and Medicine, 41,
14031409.
The WHOQOL Group. (1998a). Development of the World Health Organization WHOQOL-BREF
quality of life assessment. Psychological Medicine, 28, 551558.
The WHOQOL Group. (1998b). The World Health Organization Quality of Life Assessment
(WHOQOL): Development and general psychometric properties. Social Science and Medicine, 46,
15691585.
APPENDIX
Let be the overall parameter vector of the CFAor structural equation model of in-
terest, and let Z
obs
be the observed data of the ordinal categorical items. As the
structure of Z
obs
is rather complicated, obtaining the MLsolution by directly maxi-
mizing the observed-data log-likelihood L
0
(Z
obs
; ) is very difficult. Like many ap-
plications in statistics, biostatistics, and psychometrics, the data augmentation
strategy is applied here to solve the difficulty. The key idea of data augmentation is
to augment the observed data Z
obs
with the latent quantities, which include Y =
(y
1
,, y
n
), the continuous measurements underlying Z, and , the latent con-
structs in the model. For example, for a CFA, = (
1
,,
n
), the collection of all
latent factors corresponding to Y. In the complete dataset (Y, , Z
obs
), Yand are
treated as missing data. The well-known EM(Dempster, Laird, &Rubin, 1977) al-
gorithm is applied to obtain the ML estimate of . This algorithm is implemented
as follows. At the rth iteration of the algorithm with a current value
(r)
:
1. E-step: Evaluate Q(;
(r)
) = E{L
c
(Y, , Z
obs
; )|Z
obs
,
(r)
}, in which the
expectation is taken with respect to the conditional distribution of (Y, )
given Z
obs
at
(r)
.
2. M-step: Determine
(r+1)
by maximizing Q(;
(r)
).
3. Check convergence. If the algorithmhas not converged, then update r to r +
1 and return to the E-step of the (r + 1)
th
iteration.
For strucutral equation models with ordinal categorical data, a common and
useful method for evaluating the E-step is to approximate the conditional expecta-
tion by a sufficiently large number of observations that are simulated from the un-
derlying conditional distribution (Y, ) given Z
obs
. The basic concept is to approx-
imate the mean by sufficiently large observations of the underlying distribution.
The task of drawing observations from the conditional distribution of (Y, ) given
Z
obs
at
(r)
is conducted by the most popular MCMC method, namely the Gibbs
sampler (Geman &Geman, 1984). The algorithmproduces a sample {(Y
(j)
,
(j)
): j
= 1, , J} for every E-step at the rth EM iteration. This sample will be used to ap-
proximate the E-step of the EM algorithm. More specifically,
The M-step updates the unknown parameters by solving the following system of
equations
In this article, following the recommendation of Gilks et al. (1994), a closed form
solution of the preceding equation is obtained by conditional maximization (Meng
&Rubin, 1993). Convergence of this Monte Carlo type EMalgorithmis monitored
by the approach given in Shi and Copas (2002).
At the E-step of the last iteration of the EM algorithm, where the current value
of the parameter vector is

, the ML estimate of , we have a sample {
(j)
; j = 1,,
( ) ( )
( ) ( ) ( )
1
1
; , , ;
J
r j j
obs
j
Q Lc
J
=
_
Y Z *
( )
( )
;
0
r
Q 0
=
0

J} that is simulated by the Gibbs sampler from the conditional distribution of

given Z
obs
. Hence, estimates of the factor scores in can be obtained from
These sampling estimates have better statistical properties than those obtained
fromstandard pa\ckages (Bentler, 1992; Jreskog &Srbom, 1996) via the regres-
sion method (see Bock and Aitkin, 1981, and Rubin, 1991).
( )
1
1
J
j
j
J
=
=
_

Application SEM To QOL. Lee N Song 2005 PDF

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Application SEM To QOL. Lee N Song 2005 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Application of Structural Equation

Models to Quality of Life

J} that is simulated by the Gibbs sampler from the conditional distribution of

Anda mungkin juga menyukai