2
distribution of the likelihood ratio test (Bentler, 1992; Jreskog & Srbom,
1996). As pointed out in the statistics literature (Berger & Delampady, 1987; Kass
&Raftery, 1995), there are some difficulties with this approach. For instance, the p
value is only a measure of evidence against the null hypothesis; hence, not reject-
ing the null hypothesis does not mean the null hypothesis is better than the alterna-
tive hypothesis. Moreover, it cannot be used to test nonnested hypotheses. Hence,
methods that do not have these difficulties are proposed in statistics as well as in
SEM. One common approach is to consider the comparison of two nonnested com-
peting models, say M
1
and M
2
(when applying to hypothesis testing, M
1
and M
2
represent two hypotheses, say H
1
and H
2
, respectively). In particular, the following
BIC has been widely applied in the field of SEM (Lee & Song, 2001, 2003a;
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 441
Raftery, 1993). Given an observed dataset Z that may or may not involve incom-
plete data,
where
k
is the ML estimate of the unknown parameter vector under M
k
, d
k
is the
number of the parameters in
k
, n is the sample size, and L
0
(Z ;
k
, M
k
) is the
log-likelihood function of the observed dataset Zevaluated at
k
under M
k
. Accord-
ing to Kass and Raftery (Raftery, 1993), the BIC
12
is commonly applied to model
comparison as follows: It gives evidence of support to M
1
(hence accepting H
1
) if
BIC
12
< 0; and gives evidence of support or strong evidence of support to M
2
(ac-
cepting H
2
) if BIC
12
> 2 and BIC
12
> 6, respectively; and when M
1
is nested in M
2
,
it cannot give a definite conclusion if 0 < BIC
12
< 2. For structural equation models
with fully observed data that come fromthe normal distribution, the log-likelihood
function values in BIC
12
can be obtained directly using LISREL (Jreskog &
Srbom, 1996) or EQS (Bentler, 1992); hence, the computation is easy. For struc-
tural equation models with incomplete ordinal categorical data, the log-likelihood
cannot be directly obtained. In this article, it was computed via path sampling
(Gelman & Meng, 1998), a well-known technique in statistical computing that has
been successful applied to structural equation models (Lee & Song, 2003a,
2003b).
RESULTS
CFA
To study the measurement properties of the QOLinstrument, we only used the data
that corresponded to the 24 ordinal items that reflected the latent constructs. We
proposed a CFA model with four correlated factors for accessing the domains:
physical health, psychological QOL, social relationships, and environment. Ac-
cording to the meaning of the questions, the structure of the proposed CFA is dis-
played by the path diagram in Figure 2 (see Fayers & Hand, 1997 for a good de-
scription of a path diagram). This CFA model defined a specific covariance
structure for fitting the data. To assess its goodness-of-fit to the sample data, we re-
garded it as M
2
and compared it with the general model M
1
, which has a general
nonstructured covariance matrix. In covariance structure analysis terminology, M
1
is known as a saturated model. ML estimates of the parameters in the competing
models were obtained by the EM algorithm, and the BIC
12
was computed via
Equation 2. The value of BIC
12
is 687.0. According to the criterion given in
Raftery (1993), the proposed CFA model is definitely selected. Hence, the ob-
served data give strong evidence of support to the CFA model with four correlated
442 LEE, SONG, SKEVINGTON, HAO
( ) ( )
( )
12 0 1 1 0 2 2 1 2
BIC 2 log ; , log ; , log (2) L M L M d d n
l
=
l
l
Z Z
factors. This conclusion is more meaningful and stronger than the conclusion ob-
tained from the common asymptotic chi-square test.
To give a further illustration of the BIC application, we compared the selected
CFA model M
2
with M
3
, which is a similar CFA model with four factors that are
uncorrelated. We found that BIC
32
= 378.25, and hence M
2
should be selected.
That means that a CFAmodel with correlated factors can fit the data much better.
The completely standardized (see Jreskog & Srbom, 1996) ML estimates of
the factor loadings and factor correlations in the selected CFAmodel with four cor-
related factors are given in Figure 2. To save space, threshold parameter estimates
are not presented. On the basis of the meaning of the questions, the latent factors
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 443
FIGURE 2 Path diagram and completely standardized ML estimates of the parameters in the
CFA model. In all diagrams, double arrows are the factor correlations.
were interpreted as physical health,
1
; psychological QOL,
2
; social relationship,
3
; and environment,
4
. This interpretation agrees with that given by the
WHOQOL Group (1998a). All of the factor loading estimates, except for that cor-
responding to sexual activity and social relationship, which is 0.126, were high.
This indicates a strong association between each of the latent constructs (factors)
and its respective items.
Each z
i
(y
i
) in the factor analysis model was associated with a vector of factor
scores,
i
=(
1i
,
2i
,
3i
,
4i
). Theproposedestimationproceduregavethefactor score
estimates,
, , , ,
i
i m =1 as by-products. Histograms that were obtained from
{
, , , },
ki
i n =1 k=1, 2, 3, 4, arepresentedinFigure3. It is evident that theempirical
distributions of the factor scores were close to normal with zero mean values. The
sample correlations {
12
,
13
,
14
,
23
,
24
,
34
} that were obtained fromthis sample
444 LEE, SONG, SKEVINGTON, HAO
FIGURE 3 Histogram of the factor score estimates.
of factor scores estimates were {0.74, 0.55, 0.92, 0.70, 0.87, 0.84}. Although all of
these correlationestimates were generallylarger thanthe MLestimates of the corre-
lations given in Figure 2, the differences were not substantial. Hence, these factor
score estimates can be used for further statistical inference.
Structural Equation Models
Toaddress the causations of the four identifiedlatent constructs tothe health-related
QOL, weappliedSEMtothedataset that additionallycontainedtheoverall QOLand
general healthitems. Wefirst studiedtherelationof QOLandthelatent constructs
1
,
2
,
3
, and
4
bymeansof SEM. Thepathdiagramof theproposedstructural equation
model for analyzing the overall QOL with the 24 ordinal categorical items is dis-
played in Figure 4. This structural equation model, which is called M
4
, defines a
covariance structure for the 25 ordinal categorical items. To assess its good-
ness-of-fit to the sample data, it was compared with M
5
, the general saturated model
with 25 observed variables that had a general nonstructured covariance matrix. We
found that BIC
54
= 492.58. This result clearly indicated that M
4
is much better than
the saturatedmodel. MLestimates of the parameters inM
4
are alsopresentedinFig-
ure 4. As the structural equation model was different fromthe previous CFAmodel,
the estimates of the factor correlations among the latent constructs
1
,
2
,
3
, and
4
were not exactly the same as those obtained on the basis of the previous CFAmodel.
However, as expected, their differences wereveryminor (inthethirddecimal places;
hence, some estimates inFigure 3appearedtobe the same as those inFigure 2). This
is alsotrue for the factor loadingestimates, althoughthe differences were larger. The
most important additional result that can be obtained using a structural equation
model is thefollowingestimatedstructural equationinrelationtotheendogenous la-
tent construct QOL() and the exogenous latent constructs
1
,
2
,
3
, and
4
:
= 0.46
1
+ 0.24
2
+ 0.21
3
+ 0.13
4
+ (3)
with an estimated residual variance 0.19. This estimated structural equation indi-
cates that physical health (
1
) had the most substantial causal effect on QOL(),
which was larger than the causal effects of psychological QOL (
2
) and social rela-
tionship (
3
), which were in turn larger than the causal effect of environment (
4
).
It is also interesting to observe that causal effects of psychological QOL and social
relationship were basically the same. Moreover, it is also nice to see that the esti-
mated residual variance was not large.
Furthermore, we investigated the relation of the health-related QOL (HRQOL,
) with the latent constructs. The latent construct HRQOLwas formed by the over-
all QOL item and the general health item. The path diagram of the proposed struc-
tural equation model, which is called M
6
, is displayed in Figure 5. To assess the
goodness-of-fit of M
6
, we again compared it with the saturated model M
7
with 26
observed variables. We found that BIC
76
= 401.35, which clearly indicated that the
proposed model M
6
was much better than the saturated model. The ML estimates
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 445
of the parameters in M
6
are presented in Figure 5. We also observed that the behav-
ior of the factor loading and factor correlation estimates was similar to that in the
analysis of M
4
. Now, the estimated structural equation that addresses the causal ef-
fects of the latent constructs
1
,
2
,
3
, and
4
to HRQOL() is:
= 0.68
1
+ 0.28
2
+ 0.08
3
+ 0.05
4
+ (4)
with an estimated residual variance of 0.08. Hence, for the health-related QOL,
which involves an additional component on the general health of the patients when
446 LEE, SONG, SKEVINGTON, HAO
FIGURE 4 Path diagram and completely standardized ML estimates of parameters in the
structural equation model for the overall QOL.
comparing the single overall QOL, only the physical health (
1
) and psychological
QOL (
2
) had substantial causal effects, whereas social relationship (
3
) and envi-
ronment (
4
) were less important.
DISCUSSION
In QOL or HRQOL analysis, the fundamental aimof factor analysis and SEMis to
find an appropriate model for assessing the relations among the observed items in
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 447
FIGURE 5 Path diagram and completely standardized ML estimates of parameters in the
structural equation model for the HRQOL.
the questionnaire and the underlying endogenous and exogenous latent constructs.
EFA and CFA models are used to explore and confirm the appropriate number of
latent domains for explaining the items. A main objective of this article is to apply
SEM to study the causal relations of the latent constructs to the QOL or the
HRQOL.
Aslightly different approach using a hierarchical measurement model, which is
basically a second-stage factor analysis model, may also be considered. In this ap-
proach, a new latent variable
*
for addressing QOL was constructed from the la-
tent factors (physical health, mental health, social relationships, and environment).
This
*
can be compared with in Figure 4 that is directly obtained from the over-
all QOL(Q1). However, to avoid confusing the SEMapproach with a second-order
factor analysis approach, detailed analyses are not included here.
As the observed ordinal categorical observations are clearly nonnormal, more
rigorous statistical methods that take into account the discrete nature of the ob-
served data may be preferable to the standard methods in the commonly used
packages. This article introduces the ML approach for analyzing structural
equation models with ordinal categorical data that can be MAR. We utilized re-
cently developed tools in statistical computing to obtain the results. It is not too
difficult to implement computer programs to obtain these results. In fact, the
software BUGS (Gilks, Thomas, & Spiegelhalter, 1994, freely available at
www.mrc-bsu.com.ac.uk/bugs), can be utilized to complete the E step of the
proposed EM algorithm. However, as far as we know, there is no user-friendly
software for obtaining the ML solution. Admittedly, this is a weakness of the
rigorous ML approach. In behavioral, educational, medical, and social-psycho-
logical research, the use of recently developed tools in statistical computing such
as the EM algorithm and the Markov chain Monte Carlo (MCMC) methods are
becoming more and more popular, mainly because they are very helpful in solv-
ing complex problems. Due to the existing demand, we expect that user-friendly
software will soon be developed. In the meantime, we are willing to share our
program with others. For the general problem of assessing QOL, an alternative
approach is to provide interval response scales in the questionnaires (Skevington
& Tucker, 1999).
In this article, we assume for brevity that missing data are MAR with an
ignorable mechanism. The observed-data log-likelihood is L
0
(Z
obs
;). For the
missing data associated with the sensitive question Q17, it may be better to analyze
them with a nonignorable missing mechanism. To consider this more general situ-
ation, we need to model the matrix of missing indicators, R, via a distribution with
a vector of unknown parameters, say . The observed-data log-likelihood L
0
*
(Z
obs
,
R; , ) is more complicated. The EM algorithm can be applied to L
0
(Z
obs
; ) or
L
0
*
({Z
obs
, R; , ). However, the treatment of L
0
*
is much more technical. Hence
we regard it as a topic for future research.
448 LEE, SONG, SKEVINGTON, HAO
It is well known that QOL and HRQOL are complicated concepts, and the un-
derlying observed data are very complex in general. Generalizations of the stan-
dard structural equation model, for example, the nonlinear structural equation
models (Lee & Song, 2003b; Lee, Song, & Poon, 2004), multilevel structural
equation models (Song & Lee, 2004a; Lee & Song, 2004), diagnostic analysis
(Lee & Lu, 2004; Song & Lee, 2004b; Lee & Xu, 2004), and finite mixtures of
structural equation models (Lee &Song, 2003a; these are generalizations of multi-
ple-group structural equation models), appear to have the potential to provide rig-
orous statistical analysis of these complicated concepts in a variety of situations.
ACKNOWLEDGMENTS
This research was supported by Hong Kong UGCEarmarked grant 4243/03 H. We
are thankful to Professor M. Power for helpful comments. We are solely responsi-
ble for any mistakes.
REFERENCES
Bentler, P. M. (1992). EQS: Structural equation programmanual. Los Angels: BMDP Statistical Soft-
ware.
Berger, J. O., & Delampady, M. (1987). Testing precise hypothesis. Statistical Science, 3, 317352.
Bock, R. D., & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Appli-
cation of an EM algorithm. Psychometrika, 46, 443461.
Bock, R. D., & Gibbons, R. D. (1996). High dimensional multivariate probit analysis. Biometrics, 52,
11831193.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via
the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 138.
Fayers, P. M., &Hand, D. J. (1997). Factor analysis, causal indicators and quality of life. Quality of Life
Research, 8, 139150.
Fayers, P. M., & Machin, D. (1998). Factor analysis. In M. J. Staquet, R. D. Hayes, & P. M. Fayers
(Ed.), Quality of life assessment in clinical trials (pp. 191223). NewYork: Oxford University Press.
Gelman, A., & Meng, X. L. (1998). Simulating normalizing constant: From importance sampling to
bridge sampling to path sampling. Statistical Science, 13, 163185.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restora-
tion of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.
Gilks, W. R., Thomas, A., &Spiegelhalter, D. J. (1994). Alanguage and programfor complex Bayesian
modeling. Statistician, 43, 169177.
Glonek, G. F. V., &McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal Statistical
Society, Series B, 57, 533546.
Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997). STEMM: A general finite mixture structural equa-
tion model. Journal of Classification, 14, 2350.
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 449
Jreskog, K. G., & Srbom, D. (1996). LISREL 8: Structural equation modeling with the SIMPLIS
command language. Chicago: Scientific Software International.
Kass, R. E., &Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90,
773795.
Laird, N. M., & Ware, J. H. (1982). Random effects models for longitudinal data. Biometrics, 38,
963974.
Lall, R., Campbell, M. J., Walters, S. J., Morgan, K., & MRC CFAS Co-operative. (2002). A review of
ordinal regression models applied on health-related quality of life assessments. Statistical Methods
in Medical Research, 11, 4967.
Lee, S. Y., & Lu, B. (2004). Case-deletion diagnostics for nonlinear structural equation models.
Multivariate Behavorial Research, 38, 275400.
Lee, S. Y., Poon, W. Y., &Bentler, P. M. (1990). Athree-stage estimation procedure for structural equa-
tion models with polytomous variables. Psychometrika, 55, 4551.
Lee, S. Y., Poon, W. Y., &Bentler, P. M. (1995). A2-stage estimation of structural equation models with
continuous and polytomous variables. British Journal of Mathematical and Statistical Psychology,
48, 339358.
Lee, S. Y., & Song, X. Y. (2001). Hypothesis testing and model comparison in two-level structural
equation models. Multivariate Behavioral Research, 36, 639655.
Lee, S. Y., & Song, X. Y. (2003a). Bayesian model selection for mixtures of structural equation models
with an unknown number of components. British Journal of Mathematical and Statistical Psychol-
ogy, 56, 145165.
Lee, S. Y., &Song, X. Y. (2003b). Maximumlikelihood estimation and model comparison of nonlinear
structural equation models with continuous and polytomous variables. Computational Statistics and
Data Analysis, 44, 125142.
Lee, S. Y., Song, X. Y., & Lee, J. (2003). Maximum likelihood estimation of nonlinear structural equa-
tion models with ignorable missing data. Journal of Educational and Behavioral Statistics, 28,
111134.
Lee, S. Y., Song, X. Y., & Poon, W. Y. (2004). Comparison of approaches in estimating interaction and
quadratic effects of latent variables. Multivariate Behavioral Research, 39, 3767.
Lee, S. Y., & Tsang, S. Y. (1999). Constrained maximum likelihood estimation of two-level covariance
structure model via EM type algorithms. Psychometrika, 64, 435450.
Lee, S. Y., & Xu, L. (2004). Influence analyses of nonlinear mixed-effect models. Computational Sta-
tistics and Data Analysis, 45, 321342.
Lee, S. Y., & Zhu, H. T. (2000). Statistical analysis of nonlinear structural equation models with con-
tinuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53,
209232.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via ECM algorithm: A general
framework. Biometrika, 80, 267278.
Meng, X. L., &van Dyk, D. (1997). The EMalgorithmAn old folk-song sung to a fast newtune (with
discussion). Journal of the Royal Statistical Society, Series B, 59, 511567.
Meuleners, L. B., Lee, A. H., Binns, C. W., & Lower, A. (2003). Quality of life for adolescents: As-
sessing measurement properties using structural equation modeling. Quality of Life Research, 12,
283290.
Olschewski, M., & Schumacker, M. (1990). Statistical analysis of quality of life data in cancer clinical
trials. Statistics in Medicine, 9, 749763.
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psycho-
metrika, 44, 443460.
Poon, W. Y., & Lee, S. Y. (1987). Maximum likelihood estimation of multivariate polyserial and
polychoric correlation coefficients. Psychometrika, 52, 409430.
450 LEE, SONG, SKEVINGTON, HAO
Power, M., Bullingen, M., & Hazper, A., on behalf of the WHOQOL Group. (1999). The World Health
Organization WHOQOL100: Tests of the universality of quality of life in 15 different cultural
groups worldwide. Health Psychology, 18, 495505.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp.163180). Newbury Park, CA: Sage.
Rubin, D. B. (1991). EM and beyond. Psychometrika, 56, 241254.
Rubin, D. B., &Thayer, D. T. (1982). EMalgorithmfor MLfactor analysis. Psychometrika, 47, 6976.
Shi, J. Q., & Copas, J. (2002). Publication bias and meta-analysis for 2 2 tables: An average Markov
chain Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B, 64, 221236.
Shi, J. Q., & Lee, S. Y. (2000). Latent variable models with mixed continuous and polytomous data.
Journal of the Royal Statistical Society, Series B, 62, 7787.
Skevington, S. M., &Tucker, C. (1999). Designing response scales for cross-cultural use in health care:
Data fromthe development of the UKWHOQOL. British Journal of Medical Psychology, 72, 5161.
Song, X. Y., &Lee, S. Y. (2002). Analysis of structural equation model with ignorable missing continu-
ous and polytomous data. Psychometrika, 67, 261288.
Song, X. Y., & Lee, S. Y. (2004a). Bayesian analysis of two-level nonlinear structural equation models
with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology,
57, 2952.
Song, X. Y., & Lee, S. Y. (2004b). Local influence of two-level latent variable models with continuous
and polytomous data. Statistical Sinica, 14, 317332.
Song, X. Y., Lee, S. Y., &Zhu, H. T. (2001). Model selection in structural equation models with contin-
uous and polytomous data. Structural Equation Modeling, 8, 378396.
Staquet, M. J., Hayes, R. D., & Fayers, P. M. (Eds.). (1998). Quality of life assessment in clinical trials.
New York: Oxford University Press.
Wei, G. C. G., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the
poor mans data augmentation algorithm. Journal of the American Statistical Association, 85,
699704.
The WHOQOL Group. (1995). The World Health Organization Quality of Life Assessment
(WHOQOL): Position paper from the World Health Organization. Social Science and Medicine, 41,
14031409.
The WHOQOL Group. (1998a). Development of the World Health Organization WHOQOL-BREF
quality of life assessment. Psychological Medicine, 28, 551558.
The WHOQOL Group. (1998b). The World Health Organization Quality of Life Assessment
(WHOQOL): Development and general psychometric properties. Social Science and Medicine, 46,
15691585.
APPENDIX
Let be the overall parameter vector of the CFAor structural equation model of in-
terest, and let Z
obs
be the observed data of the ordinal categorical items. As the
structure of Z
obs
is rather complicated, obtaining the MLsolution by directly maxi-
mizing the observed-data log-likelihood L
0
(Z
obs
; ) is very difficult. Like many ap-
plications in statistics, biostatistics, and psychometrics, the data augmentation
strategy is applied here to solve the difficulty. The key idea of data augmentation is
to augment the observed data Z
obs
with the latent quantities, which include Y =
(y
1
,, y
n
), the continuous measurements underlying Z, and , the latent con-
APPLICATION OF STRUCTURAL EQUATION MODELS TO QOL 451
structs in the model. For example, for a CFA, = (
1
,,
n
), the collection of all
latent factors corresponding to Y. In the complete dataset (Y, , Z
obs
), Yand are
treated as missing data. The well-known EM(Dempster, Laird, &Rubin, 1977) al-
gorithm is applied to obtain the ML estimate of . This algorithm is implemented
as follows. At the rth iteration of the algorithm with a current value
(r)
:
1. E-step: Evaluate Q(;
(r)
) = E{L
c
(Y, , Z
obs
; )|Z
obs
,
(r)
}, in which the
expectation is taken with respect to the conditional distribution of (Y, )
given Z
obs
at
(r)
.
2. M-step: Determine
(r+1)
by maximizing Q(;
(r)
).
3. Check convergence. If the algorithmhas not converged, then update r to r +
1 and return to the E-step of the (r + 1)
th
iteration.
For strucutral equation models with ordinal categorical data, a common and
useful method for evaluating the E-step is to approximate the conditional expecta-
tion by a sufficiently large number of observations that are simulated from the un-
derlying conditional distribution (Y, ) given Z
obs
. The basic concept is to approx-
imate the mean by sufficiently large observations of the underlying distribution.
The task of drawing observations from the conditional distribution of (Y, ) given
Z
obs
at
(r)
is conducted by the most popular MCMC method, namely the Gibbs
sampler (Geman &Geman, 1984). The algorithmproduces a sample {(Y
(j)
,
(j)
): j
= 1, , J} for every E-step at the rth EM iteration. This sample will be used to ap-
proximate the E-step of the EM algorithm. More specifically,
The M-step updates the unknown parameters by solving the following system of
equations
In this article, following the recommendation of Gilks et al. (1994), a closed form
solution of the preceding equation is obtained by conditional maximization (Meng
&Rubin, 1993). Convergence of this Monte Carlo type EMalgorithmis monitored
by the approach given in Shi and Copas (2002).
At the E-step of the last iteration of the EM algorithm, where the current value
of the parameter vector is
, the ML estimate of , we have a sample {
(j)
; j = 1,,
452 LEE, SONG, SKEVINGTON, HAO
( ) ( )
( ) ( ) ( )
1
1
; , , ;
J
r j j
obs
j
Q Lc
J
=
_
Y Z *
( )
( )
;
0
r
Q 0
=
0
J
j
j
J
=
=
_