Wei Pan
Stable URL:
http://links.jstor.org/sici?sici=0006-341X%28200103%2957%3A1%3C120%3AAICIGE%3E2.0.CO%3B2-Q
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ibs.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Mon Feb 18 07:59:51 2008
BIOMETRICS 57, 120-125
March 2001
Wei Pan
Division of Biostatistics, University of Minnesota,
MMC 303, 420 Delaware Street SE, Minneapolis, Minnesota 55455, U.S.A.
email: weipQbiostat.umn.edu
SUMMARY.Correlated response data are common in biomedical studies. Regression analysis based on the
generalized estimating equations (GEE) is an increasingly important method for such data. However, there
seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion
(AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is
nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-
likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through
simulation studies. For illustration, the method is applied to a real data set.
KEY WORDS: Akaike Information Criterion; Generalized estimating equations; Generalized linear models;
Model selection; Quasi-likelihood.
estimating equations (Liang and Zeger, 1986): With a 1 x p covariate x and a specified regression model
E(y) = y = g-l(zp) and var(y) = $V(y), the quasi-likeli-
hood can be written as a function of the regression coefficients
P, i.e., &(PI 4; (Y,2 ) ) = Q ( g l ( d l , 4; y).
In the current context, if the working independence model
where D, = D,(P) = dyz(P)/dP1 and V, is a working R = I is used, the working assumption is that the paired
covariance matrix of Y,. V, can be expressed in terms of a observations (Y,j , Xi3) in 2) are independent. Hence, the
working correlation matrix R = R(cv), V, = A ~ / ~ R ( ~ ) Aquasi-likelihood ~ / ~ , based on V is
where A, is a diagonal matrix with elements var(Y,:,) =
$V(p,:,), which is specified as a function of the mean yz3.The
cv may be some unknown parameters involved in the working
correlation structure, which can be estimated through the
method of moments or another set of estimating equations. It is easy to verify that the left-hand side of the GEE
An attractive point of the GEE approach is that it yields S(P; I,2)) in (1) is equivalent to dQ(P, 4; I,V)/dp. Thus, the
a consistent estimator of p, b, even when the working GEE (1) can be regarded as a quasi-likelihood score equation.
correlation matrix R is misspecified (Liang and Zeger, However, if we use a more general working correlation ma-
1986). For instance, it is often convenient to use a working trix R, there is no guarantee that a corresponding quasi-likeli-
independence model where R = I . Some other popular choices hood exists unless certain conditions are satisfied (McCullagh
include compound symmetry (CS) (i.e., exchangeable) with and Nelder, 1989, p. 333-335). Furthermore, even if it exists,
Rz3 = p for any z # 3 or first-order autoregressive (AR-1) in general it is difficult to construct. How to construct a quasi-
likelihood with a general working correlation matrix is beyond
with Rt3 = where R,:, denotes the (z,j)th element of
the scope of this article. The main goal of this article is to
R. Due to its simplicity, the working independence model is
propose a criterion based on Q(,!?, 4; I,V ) , the quasi-likelihood
attractive. Many studies have shown that b obtained under
under the working independence model with an estimated P,
the independence model is relatively efficient (Zeger, 1988;
using any general working correlation structure in GEE.
McDonald, 1993), at least when the correlation between
responses is not large. Another compelling reason for using 2.3 AIC and a Modzficatzon to AIC in GEE
the working independence model is in partly conditional We first briefly review the derivation of AIC, which will
modeling of means for longitudinal data (Pepe and Anderson, motivate our modification to AIC. A more rigorous and
1994). However, for time-varying or cluster-specific covariates, general discussion is available from Linhart and Zucchini
Fitzmaurice (1995) showed that the resulting estimator from (1986). For simplicity of notation, we first assume that the
the independence model may be very inefficient; its efficiency dispersion parameter 4 is known; hence, we can ignore it in
may be as low as 60% compared with the estimator obtained the (quasi-)likelihood function. At the end of this section, we
by using the correct correlation structure. Hence, this poses a will discuss the situation when 4 is unknown.
model-selection problem in selecting the working correlation Suppose we have a candidate model Ml and the true
structure. Of course, we may also need to decide which model M* with log-likelihood functions L(P; V) and L(P*;V ) ,
covariates are to be included in the regression model g(p,). respectively. Throughout, we assume that each model can be
Below we propose a quasi-likelihood-based model-selection indexed by the parameter vector P. A well-known measure
criterion that can be applied to address the above issues. of separation between two models is given by the Kullback-
Leibler information (Kullback and Leibler, 1951), also known
2.2 Quasz-Lzkelihood as the cross entropy. The Kullback-Leibler information
Now we need to briefly review the quasi-likelihood. For the between M1 and M- is
moment, suppose we only have a scalar response variable, y.
We first construct the quasi-likelihood function for the mean
parameter y = E(y) (and dispersion parameter 4); then we where the expectation Elv* is taken with respect to the
will write it in terms of the regression parameter P. true distribution of V (i.e., under model M*). From a set of
Based on the model specification E(y) = p and var(y) = candidate models M , in which each can be indexed by P, we
$V(y), the (log) quasi-likelihood function is (McCullagh and would like to choose the model with the smallest Ao(P,p,).
Nelder, 1989, p. 325) However, in practice, since both P and P* are unknown,
we have to estimate Ao(P,P*). AIC was motivated as an
(b,
asymptotically unbiased estimator of Ehr, [no P*)],where
b
For instance, with grouped binary data, y -
is often specified that V(y) = p(1 - p l n ) ; then (up to a random
is the maximum likelihood estimator (MLE) under any
B i n ( n , ~ i)t candidate model in JV and the expectation is taken over the
b. Akaike proposed using AIC as a model-selection
constant) Q(P, 4 ; ~ = ) L ( p , d ; y ) l 4 , where L ( p , d ; y ) = Y X criterion, i.e.,
+
logjy/(n - y)] n log(n - y) is the log likelihood for the
+
A I C = -2L(b; V) 2p, (4)
binomial distribution. When 4 = 1, the quasi-likelihood
Q reduces to L. However, 4 > 1 is extremely useful in where p is the dimension of p. Model selection is accomplished
modeling overdispersion that commonly occurs in practice. by selecting from M the one that minimizes AIC.
Some common examples of the quasi-likelihood are given in Since GEE is nonlikelihood based, we do not have a likeli-
McCullagh and Nelder (1989, p. 326). hood function in this context. However, we may have a quasi-
122 Biometries, March 2001
likelihood. We propose replacing the likelihood L in (3) by flI, and v,. and that ~(b; I,V) = 0, we know Q I C ( I ) is an
the quasi-likelihood Q under the working independence model asymptotically unbiased estimator of (7). Furthermore, fiI
and define a new discrepancy as and fi are directly available from the model fitting results in
many statistical packages, such as SAS and S-Plus. Hence, we
recommend the routine use of Q I C ( I ) whenever possible. QIC
We assume that any quasi-likelihood model in M can can also be applied to select a working correlation structure
be indexed by the parameter vector P and that P, is in GEE: one needs to calculate the QIC for various candidate
the corresponding parameter for the quasi-likelihood model working correlation structures and then pick the one with the
induced by the true data-generating model M e . For simplicity, smallest QIC. Note that here the goal of selecting a working
with a slight abuse of notation, we suppress the dependence correlation structure is to estimate P more efficiently.
of A(p, p,, I ) on the true model M - . It is well known that In practice, since 4 is unknown, we plug in 4, which
is estimated from the largest model available. In variable
selection, that means we estimate 4 based on the regression
model including all covariates. This is similar to estimating
the dispersion parameter in linear regression with Mallows'
(1973) C,. A more general but also more difficult approach is
to use the extended quasi-likelihood (McCullagh and Nelder,
1989, p. 349), which we do not pursue here.
and the latter is positive semidefinite. Under suitable 2.4 Remarks
EM, -
we can approximate End, [A@,P*, I)]as
+
[~(b;
P*,I)] - ~ E A I * 1,V)1
2E~.r,[ (b- ~ - ) ' s ( bI,
; V)]
3. Simulations
Simulation studies were conducted to investigate the
performance of our proposed model-selection criterion QIC in
selecting the working correlation structure and selecting the
+2 trace(RI J), (7) covariates in a marginal logistic regression model. SVe used
the same true model as in Fitzmaurice (1995). The response
where J = cov(b), which can be consistently estimated by variable YZt is binary and its ma~ginalmean is y,t, with
the robust or sandwich covariance estimator, say, pT (Liang
and Zeger, 1986). RI can also be consistently estimated by logit(y,t)=Po+Plxl,t+Pa(t-1), t=1,2,3and
its empirical estimator flI = -d2&(/3; I,V)/d/3dp'Ia=j. Note z = 1,. . . , n ,
b
that, for = B(R), we have ~(b;
R, 2)) = 0 but not necessarily where the xl ,t are 1.i.d. Bernoulli, i.e., xl ,t = 0 or 1
~ ( 6I,;V) = 0 unless R = I. By ignoring the second term that with probability 112 and Po = 0.25 = -PI = -Pa. The
is difficult to estimate, we have an estimator of the right-hand true correlation matrix is CS. We used a large correlation,
side of (7), p = 0.5, and moderate sample size, n = 50 or 100. The joint
distribution of the Y, was simulated from Bahadur's (1961)
representation (see Fitzmaurice, 1995, for more details).
This is our proposed quasi-likelihood under the independence For each sample size, n = 50 or 100, our proposed method
model criterion (QIC) for GEE. Our simulation results (see is most likely to correctly select the CS from the three given
Section 3) show that ignoring the second term in (7) does not correlation structures (Table 1).Since the distribution form
dramatically, but does somewhat, influence the performance of the data is known, we can also compute the MLE and
of Q I C ( R ) , and Q I C ( I ) is the best. Note that, if the working thus AIC. For comparison, we also attach the results of using
independence model is used in GEE, by the consistency of j, AIC by assuming various correlation matrices. Unsurprisingly,
AIC in Generalized Estimating Equations 123
Table 2
Frequency of the set of variables selected by QIC versus AIC for the marginal logistic model from 1000 independent
replications. T h e true model has { X I , X Z ) , and AIC/CS is calculated correctly using the CS correlation matrix.
n = 50 n = 100
Criterion XI ~ 1 , ~ XI,^
2 ~ 1 , ~ 2 , ~~ 13 , ~ 2 , ~ 3 ,X I~ 4 xi,x2 X I , X ~ ~ 1 ~ ~ 2 ~~ x1 3 ~ ~ 2 ~ ~
Biometrics, March 2001
Table 3
QIC and robust p-values for each covariate i n the top four models and the other two models with the W E S D R data
Model
Covariate 1 2 3 4 8 10
Intraocular pressure
Systolic blood pressure
Pulse rate
Proteinuria
Duration of diabetes
Glycosylated hemoglobin
Diastolic blood pressure
Body mass index
(Duration of d i a b e t e ~ ) ~
(Body mass index)2
QIC(1nd)
the quasi-likelihood constructed under the working indepen- ory, B. N . Petrov and F . Csaki (eds), 267-281. Budapest:
dence model and the naive and robust covariance estimates of Akademiai Kiado.
estimated regression coefficients. Although using other more Bahadur, R. R. (1961). A representation of the joint distri-
general quasi-likelihood seems possible, we choose to use the bution of responses to n dichotomous items. In Studies
quasi-likelihood under the working independence model due i n Item Analysis and Prediction, Volume VI, Stanford
to its simplicity. However, QIC allows one to use any gen- Mathematical Studies i n the Social Sciences, H. Solomon
eral working correlation structure to estimate the parameters (ed.), 158-168. Stanford, California: Stanford University
in GEE. In simulation studies, we found that the QIC works Press.
well in variable selection and selecting the working correlation Barnhart, H. X. and Williamson, J. M. (1998). Goodness-of-
matrix. We were particularly impressed with the performance fit tests for GEE modeling with binary data. Biometries
of QIC(1) in variable selection. Further applications warrant 54, 720-729.
future studies. Fitzmaurice, G. M. (1995). A caveat concerning independence
estimating equations with multiple multivariate binary
data. Biometrics 51, 309-317.
J . J . and Liang, K.-Y. (Igg5). likeli-
The author thanks Dr Huiman Barnhart for providing the
WESDR data set. The author is grateful to Dr Lynn Eberly, hood ratios for general estimating functions. Biometrika
two referees, an associate editor, and the editor for extremely 82, 461-477.
thorough and helpful comments that greatly improved the Klein, R., Klein, B. E. K., Moss, S. E., Davis, M. D., and
article. DeMets, D. L. (1984). The Wisconsin Epidemiologic
Study of Diabetic Retinopathy: 11. Prevalence and risk
of diabetic retinopathy when age at diagnosis is less than
30 years. Archives of Ophthalmology 102, 520-526.
Les donnkes & rkponses corrklkes sont habituelles dans les Kullback, S. and Leibler, R. A. (1951). On information and
ktudes biomkdicales. L'analyse de regression baske sur sufficiency. Annals of Mathematical Statistics 22, 79-86.
les kquations d'estimation gknkraliskes (GEE) est une mkthode Lehmann, E. L. (1983). Theory of Point Estimation. New
d'importance croissante pour de telles donnkes. Poutant, il
semble exister peu de critkres de sklection de modkles disponi- York: Wiley.
bles pour GEE. Le critkre d'information d'Akaike (AIC) bien Li, B. (1993). A deviance function for the quasi-likelihood
connu, ne peut 6tre appliquk directement, ktant donnk que method. Biometrika 80, 741-753.
I'AIC est bask sur l'estimation du maximum de vraisemblance. Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data anal-
alors que GEE est bask sur la quasi-vraisemblance, Nous pro- ysis using generalized linear models. Biometrika 73, 13-
posons une modification de AIC, oh la vraisemblance est rem- 22.
placke par la quasi-vraisemblance et un ajustement adapt6 est Linhart, L. and Zucchini, W . (1986). Model Selection. New
fait pour le terme de pknalitk. Ses performances sont kvalukes York: Wiley.
au travers d'ktudes de simulation. Pour illustration, la mk- Mallows, C. L. (1973). Some comments on C p .Technometrics
thode est appliquke & un jeu de donnkes rkel.
15, 661-675.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear
Models, 2nd edition. London: Chapman and Hall.
Akaike, H. (1973). Information theory and an extension of McDonald, B. W. (1993). Estimating logistic regression pa-
the maximum likelihood principle. In Proceedings of the rameters for bivariate binary data. Journal of the Royal
Second International Symposium on Information The- Statistical Society, Series B 55, 391-397.
AIC in Generalized Estimating Equations 125
Miller, A. J. (1990). Subset Selection i n Regression. London: Zeger, S. L. (1988). The analysis of discrete longitudinal data:
Chapman and Hall. Commentary. Statistics i n Medicine 7, 161--168.
Pepe, M.S. and Anderson, G. (1994). A cautionary note on in- Zeger, S. L., Liang, K.-Y., and Albert, P. S. (1988). Models
ference for marginal regression models with longitudinal for longitudinal data: A generalized estimating equation
data and general correlated response data. Communica- approach. Biometrics 42, 121-130.
tions i n Statistics, Series B 23, 939-951.
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, Received June 1999. Revised December 1999 and June 2000.
generalized linear models, and the Gauss-Newton meth- Accepted June 2000.
od. Biometrika 61, 439-447.
http://www.jstor.org
LINKED CITATIONS
- Page 1 of 2 -
This article references the following linked citations. If you are trying to access articles from an
off-campus location, you may be required to first logon via your library web site to access JSTOR. Please
visit your library's website or contact a librarian to learn about options for remote access to JSTOR.
References
LINKED CITATIONS
- Page 2 of 2 -