Anda di halaman 1dari 11

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

www.srl-journal.org

Approximate Prediction Intervals for


Generalized Linear Mixed Models Having a
Single Random Factor
Daniel R. Jeske*1, Cheng-Hsueh Yang2
Department of Statistics, University of California
Riverside, CA USA
daniel.jeske@ucr.edu; 2cyang007@ucr.edu

*1

Abstract
Three methods to construct prediction intervals in a
generalized linear mixed model (GLMM) are the methods
based on pseudo-likelihood, Laplace, and Quadrature
approximations. All three of these methods are available in
the SAS procedure GLIMMIX. The pseudo-likelihood
method involves approximate linearization of the GLMM
into a linear mixed model (LMM) framework, and the other
two methods utilize approximate conditional mean squared
error (MSE) formulas for the empirical best predictor (eBP).
A new method has been proposed based on the
unconditional MSE of the eBP, working entirely within the
GLMM context; then inherent computational challenges
were confronted by proposing a Monte Carlo algorithm to
evaluate the plug-in estimators of the unconditional MSE.
For three illustrated examples, the negative binomial, the
Poisson and the Bernoulli GLMMs, numerical results
showed that our prediction interval methodology improves
the coverage probability over the three methods available in
GLIMMIX. Moreover, our results showed that with
bootstrap adjustments, our method achieves coverage
probabilities satisfactorily close to the nominal level.
Keywords
Count Data; Best Prediction; Pseudo-likelihood; Laplace Approximation

Introduction
The literature on LMMs, including prediction intervals
for linear combinations of the fixed and random effects,
is extensive.
Useful entry points to the classic
literature on this topic include Jiang and Lahiri (2006),
Harville (2008) and McCulloch, Searle and Neuhaus
(2008). The SAS system has implemented much of the
relevant literature on LMMs in the procedure MIXED
(see, for example, Littell et al. (2006) and references
therein).
In contrast, prediction intervals are a less well
developed topic for the important class of GLMMs.
The most well known method is the method

implemented in the SAS procedure GLIMMIX (see, for


example, SAS Institute (2008)) based on Wolfinger and
O'Connell's (1993) pseudo-likelihood approximations
to LMMs. The GLIMMIX procedure also provides an
option to obtain a prediction interval based on Laplace
approximations discussed in Booth and Hobert (1998)
and a variation of that approach using quadrature
approximations. The pseudo-likelihood approach
involves linearization of the GLMM and the Laplace
and quadrature approaches are based on the
conditional mean squared error (MSE) of the empirical
best predictor (eBP) rather than the unconditional
MSE. Each of these methods provides coverage
probabilities that are pretty close to nominal levels in
large samples, but too low in small samples. These
observations leave open the possibility that prediction
intervals based on a direct (i.e., within the context of
the GLMM framework) approximation to the
unconditional MSE of the eBP that may yield a better
approximate prediction interval.
In this paper, a new method has been proposed to
construct prediction intervals for GLMMs that have a
single random factor capturing cluster effects in the
data. The levels of the random factor could correspond
to random block effects in an ANOVA design, random
hospital effects in a clinical trial design, or random
intercepts in a longitudinal data analysis design. Our
GLMM context covers applications where the
response variable is count data modeled by
distributions such as Poisson, negative binomial or
Bernoulli. Our goal is to develop a better prediction
interval method for linear combinations of the
underlying fixed and random effects.
Let yij denote the j-th sampling unit within the i-th
cluster, for i = 1 , , m and j = 1 , , ni . Let
s = ( s1 , , sm )
denote the unobservable random cluster effects and let

85

www.srl-journal.org

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

ij denote the conditional (given si ) mean of the j-th


observation from the i-th cluster. Our GLMM is
defined as follows:
a.

Conditional on si , the observations from the ith cluster { yij }nji=1 are independently distributed
from distributions whose probability functions
are denoted by f ( | ij , )

b. A link function is defined as g ( =


xij + si ,
ij )

the marginal distribution of yi ,


f ( yi ; ) =

parameters
c.

The random effects {si }im=1 are independent and


identically distributed from a N (0 , 2 ) distribution

Although other definitions of GLMMs can be found


(see, for example, McCulloch et al. 2008) the framework
we use is popular for reasons including the flexibility
to extend beyond independent random effects and the
availability of PROC GLIMMIX to fit these models.
The covariate vector can be used to optionally describe
differences between the observations that are
attributable to identifiable fixed effects such as
treatment effects. The parameter may or may not
be needed, depending on the model specification. If,
for example, f is a negative binomial distribution,
represents the over-dispersion parameter, whereas
if f is a Poisson or Bernoulli distribution, is not
needed in the model specification.
Our primary focus is on how to compute a prediction
interval for=
w + s , where and are known
p 1 and

m 1 vectors of constants, respectively.

Define = ( , 2 , ) , where it is understood that the

parameter may not be needed. Let the observations


from the i-th cluster be collectively referred to as
yi = ( yi1 , yi 2 , , yini ) , and let all the observations
from all of the clusters be collectively referred to as
y = ( y1 , , ym ) .
Probability functions we will subsequently use are the
conditional distribution of yi , given si ,
ni

f ( yi | si ; ) = f ( yij | ij , ) ,
j =1

the zero-mean Gaussian distribution for si , ( si ; 2 ) ,


the joint distribution of yi and si ,
f ( yi , si ; ) = f ( yi | si ; ) ( si | ) ,
2

86

f ( yi , si ; )dsi ,

and the conditional distribution of si , given yi ,


f ( si | yi ; ) = f ( yi , si ; ) / f ( yi ; ) .

The conditional distribution of s , given y , can be


expressed as
f ( s | y; ) = i =1 f ( si | yi ; ) .
m

The integrated likelihood function is

where xij = (1 , xij1 , , xij , p 1 ) is a vector of


fixed covariates associated with the j-th
observation in the i-th cluster and
= ( 0 , 1 , , p 1 ) is a vector of unknown

L( | y ) = i =1 f ( yi ; )
m

and the maximum likelihood estimator of is defined


as = arg max L( | y ) .

Proposed Prediction Interval


Best Predictor and Approximate MSE
The MSE predictor of w, usually referred to as the best
) + E ( s | y; ) . In practice,
predictor (BP), is ( y;=

is unknown and the predictor used is the so-called


empirical best predictor (eBP), denoted as ( y;) . The
prediction error of the eBP is e=

w ( y;) and the

exact MSE is M ( ) = E (e ) . Consider the decomposition


(1)
e =
[ w ( y; ) ] + ( y; ) ( y;) .
Using a conditioning argument, it can be shown that
the two terms in (1) are uncorrelated and thus we have
2

2
M ( ) =
E [ w ( y; ) ] + E ( y; ) ( y;)
= M1 ( ) + M 2 ( ) .
The first term in (2) is simply
=
M1 ( ) E=
[Var ( w | y)] E [Var ( s | y; )]

(2)

whereas the second term is more complicated. Let


d ( y; ) =
( y; ) /
and define A( ) = E [ d ( y; )d ( y; )] and B( ) = I 1 ( ) ,
where I ( ) is Fisher's information matrix whose ij-th
element is given by
E 2 log L( | y ) / i j .
Following Kackar and Harville (1984), a second order
Taylor expansion of ( y; ) ( y;) around =
2

yields ( y; ) ( y;) d ( y; )( ) and then


ultimately M 2 ( ) tr [ A( ) B ( ) ] . An approximation to
M ( ) is therefore
=
M ( ) E [Var ( s | y; ) ] + tr [ A( ) B ( ) ] .

(3)

Estimation of Mean Squared Error


Consider estimating M ( ) by using the plug-in

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

estimator M () . In this section, we discuss the details


of computing this estimator. In so doing, we will
make repeated use of Algorithm 1 which is designed
to receive and an arbitrary function q ( y; ) (which
for our use is a matrix-valued function) and return the
plug-in estimator of E [ q( y; ) ] through Monte Carlo
evaluation of E q( y ;) , where E () denotes the

estimated expectation with respect to the distribution


f ( y;) . Algorithm 1 is specifically suited for
situations where E [ q ( y; ) ] does not have a closed-

www.srl-journal.org

Approximate Prediction Interval


The reason for our interest in obtaining M () is to use
it in the construction of a prediction interval for w
using the formula
(4)
( y;) z /2 M () .
The Appendix summarizes all the calculations needed
to compute (4). We are also interested in comparing
the interval in (4) with the alternative interval

( y;) z /2 M1 () ,

(5)

not have a closed form expression.

based on the use of a (naive) estimator of MSE that


ignores the expected increase in MSE caused by
having to use the eBP instead of the BP.

The first term in (3) involves E [Var ( s | y ) ] . The plug-in

Recognizing that neither

form expression. We also note that q ( y; ) itself may

estimator of this quantity can be obtained from


Algorithm 1 by choosing q ( y; ) equal to the matrix
Var ( s | y ) . [Recall that Var ( s | y ) is a diagonal matrix

with elements Var ( si | yi ) ]. The second term in (3)


separately involves both A( ) and B( ) . Since
A( ) = E [ d ( y; )d ( y; )] , Algorithm 1 will return A()
by choosing q ( y ; ) equal to the matrix d ( y; )d ( y; ) .
Since B( ) = I 1 ( ) , the plug-in estimator B()
requires

the

( ( y;) w) / M () nor ( ( y;) w) / M1 ()

may be adequately approximated by a N (0,1)


distribution, we also consider bootstrap percentile
adjustments of (4) and (5). Algorithm 2 below shows
how to obtain the bootstrap percentile adjustments.
Algorithm 2
1.

plug-in

estimator of the matrix


I (=
) E log L( | y ) / i j . Algorithm 1 will
return I () by choosing q ( y ; ) equal to the observed
2

information matrix I o ( ) = 2 log L( | y ) / i j . It is

noted that the Monte Carlo evaluation algorithm is


used to approximate tr [ A( ) B ( ) ] instead of
2

E ( y; ) ( y;) because in the latter case we would

need to find in each iteration of the algorithm.


Algorithm 1
For k = 1 to K (we use K = 1000 )

2.

Simulate si( k ) independently from a N (0, 2 )


distribution, i = 1 , , m
j = 1 , , ni

4.
Simulate yi( k ) (=
=
yi(1k ) , , yin( k ) ) , i 1 , , m by
i

generating the components independently


from the distributions f ( | ij( k ) ;) and let
5.

Compute q ( y ( k ) ;)

6.

Next k

7.

Return

K
q ( y ( k ) ;) /
k =1

i-th position.
2.

Compute
ij = g 1 ( xij + si ) , i= 1, , m , j = 1 , , ni

3.

For k = 1 to B (we use B = 1000 )

4.

Simulate a conditional bootstrap data set,


fixing s , by generating the components of
independently

from the distributions f ( | ij ;) , and compute


the bootstrap estimator ( k )
5.

3.

y ( k ) = ( y1( k ) , , ym( k ) )

described in Section 2.1, choosing = 0 and


= ei , where ei is zero except for a one in the

, , m , j 1 , , ni
=
y ( k ) , i 1=

1.

Compute
ij( k ) = g 1 ( xij + si( k ) ) , i = 1, , m ,

Use the data y to obtain parameter estimates


and eBPs of the cluster effects
s = ( s1 , , sm ) . The eBPs are obtained as

Compute

Z k ( y ( k ) ;( k ) ) ( y;) / M (( k ) )
=
if using interval (4) or instead

Z k ( y ( k ) ;( k ) ) ( y;) / M1 (( k ) )
=
if using interval (5)

6.

Next k

7.

Extract lower and upper / 2 percentiles, L /2


and U /2

respectively,

from

the {Z k }kK=1

quantities
K

8.

Construct the bootstrap percentile interval


87

www.srl-journal.org

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

( y;) U



/2 M ( ) , ( y; ) L /2 M ( )

if using interval (4),


( y;) U


/2 M 1 ( ) , ( y; ) L /2 M 1 ( )

if using interval (5).


Illustrative Applications

f ( yi ; )
=
0

Negative Binomial GLMM


Here we have
ij

( yij + 1)( ) ij + ij +
and the typical link function is g (u ) = log u . For our
f ( yij | ij , ) =

( yij + )

illustration, we assume log =


0 + si and consider a
ij
prediction interval for =
w 0 + si . In this case we have

= ( 0 , 2 , ) and
f ( yi , si ; ) =

( y + ) yij

ij
ij
j =1 ( y + 1)( ) + + (si ; 2 )
ij

ij
ij

Following the computational summary outlined in the
Appendix, E ( si | yi ; ) can be obtained from (A.1) and
ni

used

to

evaluate ( y;=
) 0 + E ( si | yi ; ) .

The

likelihood function is
L( | y )

+s
m
ni ( yij + ) e 0 i

=i 1 =
j 1
0 + si +

( ) e

yij


0 + si

+
e

( si ; 2 ) dsi

and the MLE is obtained as


= arg max L( | y )

and can then be used to evaluate ( y;) .


To evaluate M () , we use Algorithm 1 by means of
referenced formulas in the Appendix to derive the
Var ( si | yi ; ) can be computed
required inputs.
directly

using equation (A.2). Let (u ) denote the

digamma function. Computation of ( y; ) / 0 ,


( y; ) / 2 and ( y; ) / is enabled by (A.3)

with the required pieces (A.4)-(A.6) and (A.7)-(A.9)


given by
f ( yi , si ; )

=
( yi ni e 0 + si ) f ( yi , si ; )

+
0
e 0 si +

f ( yi , si ; )
1 si2
=
1 f ( yi , si ; )
2
2
2

88

f ( yi ; )
=

ni
( yij
j =1

+ ) ni ( )

yi ni e 0 + si

f ( yi ; )
=
2

yij

f ( yi , si ; )
=

0 + si

+ ni log + s
i +
0
+
e

e + s
0


f ( yi , si ; )

( yi ni e 0 + si ) f ( yi , si ; ) dsi

1 si2
2 2 2 1 f ( yi , si ; ) dsi

{ j =1 ( yij + ) ni ( )
ni

yi ni e 0 + si
e

0 + si

+ ni log + s
i +
0
+
e


f ( yi , si ; ) dsi

Finally, the observed information matrix quantities are


calculated from (A.10)-(A.15). The quantities needed
for these formulas are (A.16)-(A.21) and respectively
given by
2 f ( yi , si ; )
=
2 0

( yi ni e 0 + si )
0 + si
+

e 0 + si

yi ni e 0 + si + + s
ni e 0 + si f ( yi , si ; )
+s
2
e 0 i +
(e 0 i + )

2 f ( yi , si ; )
0
2

1 si2

( yi ni e 0 + si ) f ( yi , si ; )
=
1

2 2 2 e 0 + si +

2 f ( yi , si ; ) ni
= j =1 ( yij + ) ni ( )

0
yi ni e 0 + si

yi ni e 0 + si
+ ni log + s
0 + si
0
i
e
+
+ e
+
e

0 + si
0 + si
e
( yi ni e
)
+
f ( yi , si ; )
0 + si
2
+)
(e

0 + si

2 f ( yi , si ; ) 1 si2
1 si2 1
1
f ( yi , si ; )
=

4 4 2 4 2 2
2 2

2 f ( yi , si ; )
1 si2
n
=
1 ji=1 ( yij + ) ni ( )
2
2
2

yi ni e 0 + si

+ ni log + s
f ( yi , si ; )
0
i
e
+
+
e
2 f ( yi , si ; ) ni
= j =1 ( yij + ) ni ( )

0 + si

yi ni e 0 + si
e 0 + si

+ ni log + s

+
e 0 i +

j =1 ( yij + ) ni ( ) +
ni

yi + ni e2( 0 + si )
(e 0 + si + ) 2

f ( yi , si ; )

Poisson GLMM
y

ij
Here we have f ( yij | =
ij ) exp( ij ) ij / yij ! and the

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

typical link function is g (u ) = log u . We again assume


log =
0 + si and consider a prediction interval for
ij

=
w 0 + si . In this case we have = ( 0 , ) and
2

f ( yi , si ; ) = exp ni e 0 + si + ( 0 + si ) yi

( si ; 2 ) / ji=1 yij ! .
n

E ( si | yi ; ) can be obtained from (A.1) and used to


) 0 + E ( si | yi ; ) .
evaluate ( y;=

The likelihood

function is

m
L( | y ) i =1 exp ni e 0 + si + ( 0 + si ) yi ( si ; 2 )dsi

and the MLE is obtained as


= arg max L( | y )

and can then be used to evaluate ( y;) .


Because the negative binomial distribution becomes
the Poisson distribution when = , all of the results
in the negative binomial set pertaining to evaluating
M () are applied to the Poisson GLMM case when
the following two conventions are adopted: i) use the
above formula for f ( yi , si ; ) , ii) only use the
derivative equations that are with respect to 0 and/or

2 , and use their limiting form as .


Bernoulli GLMM
Here we have

1 yij

f ( yij =
| ij ) ij ij (1 ij )

and the

typical link function =


is g (u ) log (u / (1 u )) . For our
illustration, we assume log ij / (1 ij ) =
0 + si and
consider a prediction interval for =
w 0 + si . In this
case we have = ( 0 , 2 ) and
e( 0 + si ) yi

f ( yi , si ; ) =

( si ; 2 ) .

(1 + e 0 + si ) ni

Following the computational summary outlined in the


Appendix, E ( si | yi ; ) can be obtained from (A.1) and
used

to

) 0 + E ( si | yi ; ) .
evaluate ( y;=

The

L( | y ) i =1
m

(1 + e

0 + si ni

( si ; 2 )dsi

and the MLE is obtained as


= arg max L( | y )

To evaluate M () , we use Algorithm 1 by means of


referenced formulas in the Appendix to derive the
required inputs. Var ( si | yi ; ) can be computed
using

equation

(A.2).

Computation

with the required pieces (A.4)-(A.5) and (A.7)-(A.8)


given by
f ( yi , si ; )
n e 0 + si
= yi i + s f ( yi , si ; )

0
1 + e 0 i

f ( yi , si ; )
1 si2
=
1 f ( yi , si ; )
2
2
2

f ( yi ; )
=
0
f ( yi ; )
=
2

ni e 0 + si

1 + e0 + si

1 si2

f ( yi , si ; ) dsi

2 2 2 1 f ( yi , si ; ) dsi .

Finally, the observed information matrix quantities are


calculated from (A.10), (A.11) and (A.13). The
quantities needed for these formulas are (A.16), (A.17)
and (A.19) and are respectively given by
2
ni e 0 + si
ni e 0 + si
2 f ( yi , si ; )
y
f ( yi , si ; )
=

i 1 + e 0 + si (1 + e 0 + si ) 2
2 0

+
s
2
2
0
i

ne
f ( yi , si ; )
1 si
=
1 yi i + s f ( yi , si ; )
2
2
2

0
i
2
0
1+ e

2
2
2

f ( yi , si ; ) 1 si
1 si
1
=
2 1 4 2 f ( yi , si ; ) .
2 2
4
4
2

Performance Comparisons
We use the illustrative applications in Section 3 to
compare the coverage probability and the expected
width of the proposed prediction interval with the
three alternative methods that are available in SAS.
Section 4.1 briefly reviews what these three methods
are. Section 4.2 outlines the simulation study that was
used to compare the prediction intervals, and the
results of the comparison study are summarized in
Section 4.3.
Prediction Intervals in SAS

interval using normal percentiles.

and can then be used to evaluate ( y;) .

directly

( y; ) / 0 and ( y; ) / 2 is enabled by (A.3)

The SAS procedure PROC GLIMMIX computes


prediction intervals for GLMMs using one of the
following three methods: Pseudo-likelihood (PL),
Laplace (L), and Quadrature (Q). In all three methods,
an estimate of the predictor and its associated
precision is used to construct a 100(1 )% prediction

likelihood function is
( 0 + si ) yi

www.srl-journal.org

of

The PL method is based on Wolfinger and O'Connell


(1993) who proposed an algorithm to calculate the
parameter estimates and the values of fixed and
random effects. The main idea of this algorithm is to
approximate the GLMM as a LMM through use of a
89

www.srl-journal.org

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

pseudo-variable obtained through a Taylor series


expansion. The algorithm iterates between updates of
the pseudo-variable and parameter estimator that
result from the LMM computations.
The Laplace method is described in Booth and Hobert
(1998) who utilized a Laplace approximation based on
the work of de Bruijn (1981) to approximate the value
of the BP. An iterative strategy is employed to obtain
the eBP, first approximated using current values of
parameters, and then updated by approximating the
likelihood using another Laplace approximation. This
process continues until convergence is achieved. The
precision of the eBP is evaluated using a Taylor series
approximation to the conditional mean squared error
(CMSE) derived in Booth and Hobert (1998). Zhao et
al. (2006) and Skrondal and Rabe-Hesketh (2009) have
also advocated CMSE as a suitable measure of
precision.
The quadrature method also calculates the BP using a
Laplace approximation, however, the likelihood
function is approximated by an adaptive quadrature
approximation [see, for example, Golub and Welsch
(1969), Abramowitz and Stegun (1972) and Pinheiro
and Chao (2006)]. The advantage of the adaptive
quadrature approximation is to improve the
approximation of the likelihood function by centering
and scaling the quadrature points. Again, the same
iterative strategy as the Laplace method is employed
until convergence criteria is met and CMSE is used to
measure the precision of the eBP. It is worth noting
that because the estimated CMSE is a function of
parameter estimates, its value is not the same for the
Laplace and quadrature methods since these methods
calculate parameter estimates differently.
TABLE 1 NEGATIVE BINOMIAL AND POISSON

.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4

2
2

1
2

1
2

1
2

90

(4)
0.900
0.902
0.905
0.899
0.908
0.911
0.905
0.902
0.898
0.894
0.896
0.893
0.901
0.899
0.892
0.893

Simulation Study Design


To compare the proposed prediction intervals with the
intervals in SAS, a simulation study has been
performed to evaluate coverage probability and
expected width of the alternative intervals. In terms of
alternative GLMMs, the three illustrative examples
were considered presented in Section 3. In terms of
the proposed prediction intervals, both (4) and (5)
were included in the simulation study along with
their bootstrap adjusted versions, then those four
intervals were compared to the three intervals in SAS
obtained by using the PL, L and Q methods.
For simulation parameters, we took m {10 , 20 } and
set ni n for n {5 , 10} . When considering the
negative binomial GLMM, we varied {1 , 2 , 5 , } .
The latter case = corresponds to simulating from
the Poisson GLMM. We also varied 0 {1 , 2} and

{.2 , .4} .

These combinations of parameter values

correspond to the response variable having a mean


that ranges between 2 and 8 and variance that ranges
between 3 and 94. For the Bernoulli GLMM, we varied
0 {2.5 , .5 , .5 , 2} and {.2 , 2} . These choices of
parameter
values
corresponded
to
success
probabilities that ranged from about 0.2 to 0.8. For
each scenario of parameter settings and sample size
values, we simulated 1000 data sets from the GLMM
and then evaluated each of the alternative prediction
intervals. The percentage of prediction intervals that
covered =
w 0 + si was recorded for each method.
With coverage probabilities on the order of 0.95, the
standard error of the estimated coverage probabilities
is less than 0.01.

GLMM COVERAGE PROBABILITIES FOR (m , n) = (10 , 5) . NOMINAL COVERAGE IS 0.95.

(4) w/BS
0.938
0.939
0.936
0.934
0.938
0.940
0.941
0.939
0.943
0.941
0.943
0.942
0.942
0.940
0.941
0.943

(5)
0.885
0.889
0.901
0.896
0.905
0.907
0.899
0.895
0.892
0.889
0.892
0.887
0.895
0.894
0.887
0.890

Prediction Interval
(5) w/BS
SAS/PL
0.933
0.879
0.934
0.873
0.931
0.875
0.928
0.876
0.935
0.869
0.936
0.871
0.939
0.868
0.936
0.873
0.938
0.878
0.937
0.875
0.939
0.872
0.936
0.876
0.937
0.880
0.936
0.882
0.938
0.879
0.939
0.878

SAS/L
0.881
0.875
0.876
0.876
0.871
0.872
0.870
0.874
0.879
0.876
0.873
0.877
0.881
0.883
0.881
0.879

SAS/Q
0.882
0.876
0.876
0.877
0.873
0.872
0.871
0.875
0.881
0.877
0.874
0.879
0.882
0.885
0.882
0.881

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

www.srl-journal.org

TABLE 2 NEGATIVE BINOMIAL AND POISSON GLMM COVERAGE PROBABILITIES FOR

.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4

2
2

1
2

1
2

1
2

(4)
0.909
0.912
0.914
0.916
0.911
0.915
0.920
0.917
0.915
0.918
0.909
0.907
0.919
0.921
0.920
0.916

(4) w/BS
0.941
0.942
0.939
0.943
0.939
0.942
0.940
0.939
0.943
0.941
0.941
0.938
0.941
0.943
0.940
0.938

TABLE 3 NEGATIVE BINOMIAL AND POISSON

.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4

2
2

1
2

1
2

1
2

(4)
0.916
0.917
0.921
0.923
0.926
0.924
0.915
0.912
0.925
0.928
0.919
0.917
0.918
0.921
0.921
0.925

(5)
0.895
0.909
0.912
0.913
0.908
0.912
0.916
0.915
0.914
0.916
0.905
0.903
0.915
0.918
0.917
0.913

Prediction Interval
(5) w/BS
SAS/PL
0.936
0.892
0.938
0.890
0.935
0.903
0.940
0.906
0.936
0.908
0.940
0.904
0.938
0.898
0.935
0.896
0.942
0.899
0.938
0.901
0.937
0.902
0.935
0.905
0.939
0.897
0.940
0.893
0.938
0.889
0.935
0.901

.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4

2
2

1
2

1
2

1
2

(4)
0.919
0.923
0.925
0.929
0.930
0.928
0.926
0.923
0.926
0.929
0.925
0.927
0.926
0.925
0.924
0.926

SAS/L
0.894
0.891
0.904
0.906
0.910
0.906
0.901
0.897
0.901
0.902
0.904
0.906
0.899
0.894
0.891
0.903

SAS/Q
0.895
0.891
0.905
0.906
0.911
0.906
0.902
0.899
0.903
0.904
0.905
0.907
0.901
0.896
0.893
0.904

GLMM COVERAGE PROBABILITIES FOR (m , n) = (20 , 5) . NOMINAL COVERAGE IS 0.95.

(4) w/BS
0.936
0.938
0.940
0.941
0.939
0.937
0.942
0.938
0.941
0.944
0.942
0.940
0.938
0.941
0.940
0.942

(5)
0.912
0.914
0.919
0.922
0.925
0.923
0.912
0.908
0.922
0.925
0.917
0.915
0.915
0.918
0.920
0.921

Prediction Interval
(5) w/BS
SAS/PL
0.933
0.912
0.935
0.918
0.938
0.909
0.940
0.913
0.937
0.914
0.935
0.918
0.940
0.920
0.936
0.917
0.939
0.915
0.941
0.911
0.940
0.921
0.939
0.918
0.937
0.915
0.938
0.913
0.938
0.907
0.940
0.910

TABLE 4 NEGATIVE BINOMIAL AND POISSON GLMM COVERAGE PROBABILITIES FOR

(m , n) = (10 , 10) . NOMINAL COVERAGE IS 0.95.

(4) w/BS
0.944
0.948
0.943
0.947
0.948
0.945
0.946
0.945
0.943
0.947
0.942
0.944
0.945
0.944
0.943
0.946

(5)
0.915
0.918
0.924
0.927
0.927
0.926
0.923
0.921
0.925
0.928
0.924
0.926
0.925
0.924
0.923
0.924

SAS/L
0.914
0.920
0.911
0.913
0.915
0.922
0.921
0.919
0.917
0.911
0.922
0.919
0.916
0.916
0.909
0.912

SAS/Q
0.915
0.921
0.912
0.915
0.917
0.924
0.923
0.921
0.919
0.913
0.924
0.919
0.918
0.917
0.910
0.912

(m , n) = (20 , 10) . NOMINAL COVERAGE IS 0.95.

Prediction Interval
(5) w/BS
SAS/PL
0.942
0.921
0.946
0.925
0.942
0.918
0.946
0.923
0.945
0.925
0.943
0.924
0.944
0.925
0.942
0.923
0.942
0.929
0.946
0.926
0.941
0.926
0.943
0.921
0.944
0.920
0.942
0.924
0.940
0.919
0.944
0.923

SAS/L
0.922
0.926
0.919
0.923
0.926
0.927
0.926
0.924
0.931
0.927
0.927
0.923
0.920
0.924
0.921
0.923

SAS/Q
0.924
0.926
0.921
0.924
0.926
0.929
0.927
0.926
0.932
0.927
0.929
0.925
0.923
0.925
0.922
0.925

91

www.srl-journal.org

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

TABLE 5 NEGATIVE BINOMIAL AND POISSON

GLMM EXPECTED WIDTHS. NOMINAL COVERAGE IS 0.95.


Expected Width for (m , n)

1
2

1
2

1
2
1

(10 , 5)

-2.5

(20 , 5)

(5) w/BS

(4) w/BS

(5) w/BS

(4) w/BS

(5) w/BS

(4) w/BS

(5) w/BS

.2
.4
.2
.4

2.348
2.576
2.987
3.042

2.298
2.498
2.893
2.983

1.338
1.548
1.632
1.738

1.289
1.493
1.583
1.695

0.787
0.812
0.825
0.831

0.713
0.795
0.806
0.828

0.489
0.492
0.503
0.509

0.463
0.471
0.493
0.501

.2
.4
.2
.4

2.563
2.681
3.034
3.142

2.478
2.512
2.983
3.015

1.284
1.292
1.318
1.322

1.238
1.258
1.263
1.301

0.756
0.779
0.781
0.792

0.748
0.763
0.772
0.781

0.499
0.512
0.516
0.522

0.488
0.503
0.510
0.518

.2
.4
.2
.4

2.487
2.534
2.834
2.931

2.397
2.498
2.795
2.887

1.326
1.341
1.358
1.361

1.298
1.328
1.348
1.351

0.748
0.751
0.762
0.768

0.732
0.738
0.749
0.759

0.475
0.479
0.485
0.489

0.468
0.470
0.479
0.481

.2
.4
.2
.4

2.503
2.612
2.883
2.912

2.498
2.583
2.862
2.889

1.258
1.301
1.321
1.325

1.223
1.286
1.305
1.318

0.792
0.813
0.829
0.833

0.782
0.802
0.816
0.829

0.453
0.455
0.460
0.470

0.441
0.446
0.452
0.462

GLMM COVERAGE PROBABILITIES FOR (m , n) = (10 , 5) . NOMINAL COVERAGE IS 0.95.


Prediction Interval

(4)

(4) w/BS

(5)

(5) w/BS

SAS/PL

SAS/L

SAS/Q

0.2
2

0.895
0.899

0.935
0.942

0.892
0.896

0.934
0.938

0.869
0.871

0.870
0.873

0.871
0.874

-0.5

0.2
2

0.908
0.903

0.941
0.939

0.905
0.901

0.939
0.936

0.875
0.873

0.876
0.875

0.877
0.876

0.5

0.2
2

0.894
0.897

0.934
0.935

0.891
0.893

0.931
0.932

0.875
0.878

0.876
0.878

0.878
0.879

2.0

0.2
2

0.905
0.902

0.940
0.938

0.902
0.897

0.938
0.935

0.874
0.876

0.876
0.874

0.877
0.875

TABLE 7 BERNOULLI GLMM COVERAGE PROBABILITIES FOR

-2.5

(m , n) = (10 , 10) . NOMINAL COVERAGE IS 0.95.

Prediction Interval
(4)

(4) w/BS

(5)

(5) w/BS

SAS/PL

SAS/L

SAS/Q

0.2
2

0.911
0.905

0.942
0.940

0.907
0.902

0.938
0.936

0.903
0.897

0.906
0.897

0.907
0.900

-0.5

0.2
2

0.915
0.911

0.944
0.941

0.911
0.907

0.942
0.939

0.899
0.895

0.901
0.897

0.903
0.898

0.5

0.2
2

0.918
0.920

0.935
0.938

0.915
0.916

0.931
0.935

0.892
0.896

0.893
0.896

0.895
0.898

2.0

0.2
2

0.917
0.912

0.943
0.941

0.914
0.908

0.938
0.939

0.901
0.898

0.904
0.901

0.906
0.903

TABLE 8 BERNOULLI GLMM COVERAGE PROBABILITIES FOR

92

(20 , 10)

(4) w/BS

TABLE 6 BERNOULLI

(10 , 10)

-2.5

(m , n) = (20 , 5) . NOMINAL COVERAGE IS 0.95.

Prediction Interval
(4)

(4) w/BS

(5)

(5) w/BS

SAS/PL

SAS/L

SAS/Q

0.2
2

0.925
0.921

0.944
0.941

0.921
0.916

0.939
0.939

0.921
0.919

0.922
0.920

0.923
0.921

-0.5

0.2
2

0.919
0.923

0.938
0.941

0.915
0.918

0.936
0.940

0.912
0.914

0.914
0.915

0.916
0.918

0.5

0.2
2

0.924
0.928

0.942
0.945

0.919
0.923

0.940
0.941

0.922
0.923

0.922
0.923

0.924
0.925

2.0

0.2
2

0.922
0.918

0.942
0.944

0.919
0.914

0.940
0.941

0.915
0.911

0.917
0.914

0.919
0.916

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

TABLE 9 BERNOULLI

-2.5

www.srl-journal.org

GLMM COVERAGE PROBABILITIES FOR (m , n) = (20 , 10) . NOMINAL COVERAGE IS 0.95.


Prediction Interval

(4)

(4) w/BS

(5)

(5) w/BS

SAS/PL

SAS/L

SAS/Q

0.2
2

0.929
0.928

0.947
0.945

0.928
0.925

0.946
0.943

0.925
0.923

0.926
0.925

0.927
0.926

-0.5

0.2
2

0.933
0.931

0.946
0.945

0.929
0.927

0.945
0.941

0.919
0.918

0.921
0.921

0.922
0.922

0.5

0.2
2

0.924
0.929

0.945
0.948

0.921
0.926

0.942
0.944

0.922
0.924

0.923
0.924

0.924
0.925

2.0

0.2
2

0.928
0.931

0.946
0.948

0.925
0.926

0.943
0.944

0.925
0.927

0.926
0.928

0.927
0.929

TABLE 10 BERNOULLI

GLMM EXPECTED WIDTHS. NOMINAL COVERAGE Is 0.95.


Expected Width for (m , n)

(10 , 5)

(10 , 10)

(20 , 5)

(20 , 10)

(4) w/BS

(5) w/BS

(4) w/BS

(5) w/BS

(4) w/BS

(5) w/BS

(4) w/BS

(5) w/BS

-2.5

0.2
2

1.873
2.048

1.756
1.957

1.131
1.238

1.023
1.187

0.742
0.813

0.728
0.793

0.318
0.325

0.305
0.311

-0.5

0.2
2

1.948
2.240

1.865
2.187

1.235
1.358

1.113
1.238

0.793
0.821

0.763
0.798

0.298
0.302

0.288
0.293

0.5

0.2
2

1.653
1.891

1.594
1.823

0.998
1.108

0.973
1.083

0.801
0.816

0.783
0.801

0.289
0.296

0.278
0.281

2.0

0.2
2

1.784
2.012

1.693
1.998

0.982
0.998

0.963
0.975

0.768
0.784

0.759
0.771

0.304
0.309

0.298
0.301

Results

Summary

Tables 1-4 show the coverage probabilities and Table 5


shows expected widths of the seven alternative
prediction intervals for the scenarios covering the
negative binomial and Poisson GLMMs. Table 6-9 and
Table 10 show corresponding results for the Bernoulli
GLMMs.

We have developed a new prediction interval


methodology for a class of GLMMs that are suitable
for analyzing clustered count data. Our approach was
to derive an approximation to the unconditional MSE
of the eBP, staying within the context of the GLMM,
then our proposed method was compared with three
existing prediction interval methods implemented in
the SAS procedure GLIMMIX based upon pseudolikelihood, Laplace, and quadrature approximations.

It is seen that the three prediction interval methods


implemented in SAS have coverage probabilities that
are lower than expected, and in general, there is very
little difference among these three methods. While
intervals (4) and (5) have coverage probability closer
to the nominal level, and their bootstrapped versions
offer the best solutions. Other simulations [see Yang
(2013)] show that bootstrap adjustments are less
effective with the three SAS intervals and generally
not adequate enough to make the coverage
probabilities satisfactory. From Table 5 and Table 10 it
can be seen that the increasing number of clusters is
more evident than that of sampling units in terms of
reducing the expected width. Namely, when doubling
the number of clusters, the expected width reduces by
approximately 75% compared to that approximately
50% when doubling the number of sampling units. In
addition, it is observed that interval (4) has an
appropriately slightly wider expected width compared
to interval (5) due to the correction term in the MSE
approximation.

Our simulation study showed that the coverage


probabilities for the intervals computed by three
methods in GLIMMIX are too low. The coverage
probabilities for our proposed interval (4) with
bootstrap adjustments are quite close to the nominal
value with an expected width that rapidly decreases as
the number of clusters increases, and less rapidly as
the number of sampling units within a cluster
increases. Future work includes consideration of
GLMMs having two or more random factors.
REFERENCES

Abramowitz,

M.,

Stegun,

I.,

1972.

Handbook

of

Mathematical Functions, New York: Dover Publications.


Booth, J.G., Hobert, J.P., 1998. Standard errors of prediction
in generalized linear mixed models. Journal of the

93

www.srl-journal.org

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

best predictor ( y; ) requires the quantities

American Statistical Association 93, 262-272.

s
i

E ( si | yi ; ) =

de Bruijn, N. G., 1981. Asymptotic Methods in Analysis,


New York: Dover.
Golub, G.H., Welsch, J.H., 1969. Calculation of Gaussian
quadrature rules. Mathematical Computing 23, 221230.
Harville, D.A., 2008. Accounting for the estimation of
variances and covariances in prediction under a general
linear model: an overview. Tatra Mt. Math 39, 115.

A standard optimization routine can be used to


maximize the integrated likelihood L( | y ) to find ,
which in turn can be used to find ( y;) .
Algorithm 1 can be used three times, as detailed below,
to obtain M () . For the first use of Algorithm 1, send
it the matrix Var ( s | y ) = Diag [ Var ( si | yi ; ) ]i =1 , using
Var ( si | yi ; ) =

Kackar, R.N., Harville, D.A., 1984. Approximations for

standard errors of estimators of fixed and random effect


in mixed linear models. Journal of the American

Schabenberger, O., 2006. SAS for Mixed Models, Second


Edition, Cary, NC: SAS Institute Inc.
McCulloch, C.E., Searle, S.R., Neuhaus, J.M., Generalized,

adaptive Gaussian quadrature algorithms for multilevel

f 2 ( yi ; )
f ( yi ; )
l

(A.3)

si f ( yi , si ; ) dsi
f 2 ( yi ; )

ni
log f ( yij | ij , ) / ij
j =1 xijl

g ( ij )

models: a pseudo-likelihood approach. Journal of


Statistical Computation and Simulation 4, 233243.

(A.4)
f ( yi , si ; )

1 si2
1

f ( yi , si ; )

2 2 2

f ( yi , si ; ) / =

f ( yi , si ; ) /=
2

Yang, C., 2013. Prediction Intervals in Generalized Linear


PhD Dissertation, Department of

Applied Statistics, University of California, Riverside.


General design Bayesian generalized linear mixed

f ( yi , si ; )
dsi
l

Expressions needed to evaluate (A.3) are


f ( yi , si ; ) / l =

Wolfinger, R., OConnell, M., 1993. Generalized linear mixed

Zhao, Y., Staudenmayer J., Coull, B.A., Wand, M.P., 2006.

s
i

Skrondal, A., Rabe-Hesketh, S., 2009. Prediction in multilevel

ni
log
j =1

f ( yij | ij , ) / f ( yi , si ; )

(A.5)
(A.6)

and the integrated forms of (A.4)-(A.6)


f ( yi ; ) / l =

models. Statistical Science 21, 3551.

Appendix

f ( yi ; ) / 2 =

Here we summarize all the formulas needed to


compute the prediction interval in (4), including those
formulas that are needed in conjunction with use of
Algorithm 1. All of the integrals required for these
computations are one-dimensional integrals that can
be easily evaluated using Gaussian quadrature. The

f ( yi ; ) / =

94

f ( yi ; )

of

Computational and Graphical Statistics 15, 5881.

Mixed Models.

(A.2)

E ( si | yi ; )
si ( f ( si | yi ; ) / l ) dsi
=

Pinheiro, J.C., Chao, E.C., 2006. Efficient Laplacian and

Society, Series B 172, 659687.

Evaluations of / l are either zero or one and

2008.

generalized linear models. Journal of the Royal Statistical

si f ( si | yi ; )dsi

where
( y; ) / l = ( / l ) + ( E ( s | y; ) / l ) .

Linear and Mixed Models. 2nd edition. Wiley; New York:

Journal

( y; ) ( y; )
d ( y; ) d ( y; ) =

Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D.,

models.

f ( si | yi ; )dsi

For the second use, send it the matrix

Statistical Association 79, 853-862.

mixed

s2
i

Small Area Estimation, 15, pp.1-96.

linear

(A.1)

Jang, J. and Lahiri, P., 2006. Mixed Model Prediction and

generalized

f ( si | yi ; )dsi

( f ( yi , si ; ) / l ) dsi

2
( f ( yi , si ; ) / ) dsi

( f ( yi , si ; ) / ) dsi .

(A.7)
(A.8)
(A.9)

Finally, for the third use of Algorithm 1, send it the


observed information matrix I o ( ) . Since
log L( | y ) = i =1 log f ( yi ; ) ,
m

and

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013

2
2 log f ( yi ; ) f ( yi ; ) f ( yi ; ) / j k
=
j k
f 2 ( yi ; )

( f ( yi ; ) / j ) ( f ( yi ; ) / k )

www.srl-journal.org

log f ( yij | ij , ) / ij
2 f ( yi , si ; ) ni
= j =1 xijl

g ( ij )
k l

n
log f ( yij | ij , ) / ij
ji=1 xijk

g ( ij )

(A.16)
2

log
f
(
y
|

1
n
ij
ij
+ ji=1 xijk xijl
2
2 ij
g ( ij )
g ( ij ) log f ( yij | ij , )
f ( yi , si ; )

ij
g ( ij )3

f 2 ( yi ; )

it suffices to combine the expressions (A.7)-(A.9) with


expressions for the Hessian matrix of f ( yi ; ) .
Starting with (A.7)-(A.9), we find
2
f ( yi , si ; )
2 f ( yi ; )
=
dsi

k l
k l

2 f ( yi ; )
2 l

2 f ( yi , si ; )

2 l

dsi

2
f ( yi , si ; )
2 f ( yi ; )
=
dsi

l
l

2 f ( yi ; )
2 2
2 f ( yi ; )
2
2 f ( yi ; )

2

2 f ( yi , si ; )

2 2

2 f ( yi , si ; )

2 f ( yi , si ; )

dsi

(A.10)

f ( yi , si ; )
2 f ( yi , si ; )
1 si2
(A.17)
=
1
2
2
2

l
2
l

(A.11)
2 f ( yi , si ; ) ni log f ( yij | ij , ) f ( yi , si ; )
=
j =1

(A.12)
2
n
1 log f ( yij | ij , )
(A.18)

+ ji=1 xijl

g ( ij )
ij

(A.13)
f ( yi , si ; )

dsi

(A.14)

dsi .

(A.15)

It can be seen from (A.10)-(A.15) that what we


ultimately need is the Hessian matrix of f ( yi , si ; ) ,
which can be shown, starting with (A.4)-(A.6), to be
the following:

2 f ( yi , si ; ) 1 si2
1 si2 1
1
=

2
4 2 f ( yi , si ; ) (A.19)
2 2
4
4
2

f ( yi , si ; )
2 f ( yi , si ; )
1 si2
=
1
2
2
2

2 f ( yi , si ; )
2

(A.20)

n log f ( y | , ) 2
ij
ij
= ji=1

n log f ( yij | ij , )
f ( yi , si ; )
+ ji=1
2

(A.21)

95

Anda mungkin juga menyukai