Approximate Prediction Intervals For Generalized Linear Mixed Models Having A Single Random Factor

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013
www.srl-journal.org
Approximate Prediction Intervals for

Generalized Linear Mixed Models Having a
Single Random Factor
Daniel R. Jeske*1, Cheng-Hsueh Yang2
Department of Statistics, University of California
Riverside, CA USA
daniel.jeske@ucr.edu; 2cyang007@ucr.edu
*1
Abstract
Three methods to construct prediction intervals in a
generalized linear mixed model (GLMM) are the methods
based on pseudo-likelihood, Laplace, and Quadrature
approximations. All three of these methods are available in
the SAS procedure GLIMMIX. The pseudo-likelihood
method involves approximate linearization of the GLMM
into a linear mixed model (LMM) framework, and the other
two methods utilize approximate conditional mean squared
error (MSE) formulas for the empirical best predictor (eBP).
A new method has been proposed based on the
unconditional MSE of the eBP, working entirely within the
GLMM context; then inherent computational challenges
were confronted by proposing a Monte Carlo algorithm to
evaluate the plug-in estimators of the unconditional MSE.
For three illustrated examples, the negative binomial, the
Poisson and the Bernoulli GLMMs, numerical results
showed that our prediction interval methodology improves
the coverage probability over the three methods available in
GLIMMIX. Moreover, our results showed that with
bootstrap adjustments, our method achieves coverage
probabilities satisfactorily close to the nominal level.
Keywords
Count Data; Best Prediction; Pseudo-likelihood; Laplace Approximation
Introduction
The literature on LMMs, including prediction intervals
for linear combinations of the fixed and random effects,
is extensive.
Useful entry points to the classic
literature on this topic include Jiang and Lahiri (2006),
Harville (2008) and McCulloch, Searle and Neuhaus
(2008). The SAS system has implemented much of the
relevant literature on LMMs in the procedure MIXED
(see, for example, Littell et al. (2006) and references
therein).
In contrast, prediction intervals are a less well
developed topic for the important class of GLMMs.
The most well known method is the method
implemented in the SAS procedure GLIMMIX (see, for

example, SAS Institute (2008)) based on Wolfinger and
O'Connell's (1993) pseudo-likelihood approximations
to LMMs. The GLIMMIX procedure also provides an
option to obtain a prediction interval based on Laplace
approximations discussed in Booth and Hobert (1998)
and a variation of that approach using quadrature
approximations. The pseudo-likelihood approach
involves linearization of the GLMM and the Laplace
and quadrature approaches are based on the
conditional mean squared error (MSE) of the empirical
best predictor (eBP) rather than the unconditional
MSE. Each of these methods provides coverage
probabilities that are pretty close to nominal levels in
large samples, but too low in small samples. These
observations leave open the possibility that prediction
intervals based on a direct (i.e., within the context of
the GLMM framework) approximation to the
unconditional MSE of the eBP that may yield a better
approximate prediction interval.
In this paper, a new method has been proposed to
construct prediction intervals for GLMMs that have a
single random factor capturing cluster effects in the
data. The levels of the random factor could correspond
to random block effects in an ANOVA design, random
hospital effects in a clinical trial design, or random
intercepts in a longitudinal data analysis design. Our
GLMM context covers applications where the
response variable is count data modeled by
distributions such as Poisson, negative binomial or
Bernoulli. Our goal is to develop a better prediction
interval method for linear combinations of the
underlying fixed and random effects.
Let yij denote the j-th sampling unit within the i-th
cluster, for i = 1 , , m and j = 1 , , ni . Let
s = ( s1 , , sm )
denote the unobservable random cluster effects and let
85
www.srl-journal.org
ij denote the conditional (given si ) mean of the j-th

observation from the i-th cluster. Our GLMM is
defined as follows:
a.
Conditional on si , the observations from the ith cluster { yij }nji=1 are independently distributed
from distributions whose probability functions
are denoted by f ( | ij , )
b. A link function is defined as g ( =

xij + si ,
ij )
the marginal distribution of yi ,

f ( yi ; ) =
parameters
c.
The random effects {si }im=1 are independent and

identically distributed from a N (0 , 2 ) distribution
Although other definitions of GLMMs can be found

(see, for example, McCulloch et al. 2008) the framework
we use is popular for reasons including the flexibility
to extend beyond independent random effects and the
availability of PROC GLIMMIX to fit these models.
The covariate vector can be used to optionally describe
differences between the observations that are
attributable to identifiable fixed effects such as
treatment effects. The parameter may or may not
be needed, depending on the model specification. If,
for example, f is a negative binomial distribution,
represents the over-dispersion parameter, whereas
if f is a Poisson or Bernoulli distribution, is not
needed in the model specification.
Our primary focus is on how to compute a prediction
interval for=
w + s , where and are known
p 1 and
m 1 vectors of constants, respectively.
Define = ( , 2 , ) , where it is understood that the
parameter may not be needed. Let the observations

from the i-th cluster be collectively referred to as
yi = ( yi1 , yi 2 , , yini ) , and let all the observations
from all of the clusters be collectively referred to as
y = ( y1 , , ym ) .
Probability functions we will subsequently use are the
conditional distribution of yi , given si ,
ni
f ( yi | si ; ) = f ( yij | ij , ) ,
j =1
the zero-mean Gaussian distribution for si , ( si ; 2 ) ,

the joint distribution of yi and si ,
f ( yi , si ; ) = f ( yi | si ; ) ( si | ) ,
2
86
f ( yi , si ; )dsi ,
and the conditional distribution of si , given yi ,

f ( si | yi ; ) = f ( yi , si ; ) / f ( yi ; ) .
The conditional distribution of s , given y , can be

expressed as
f ( s | y; ) = i =1 f ( si | yi ; ) .
m
The integrated likelihood function is
where xij = (1 , xij1 , , xij , p 1 ) is a vector of

fixed covariates associated with the j-th
observation in the i-th cluster and
= ( 0 , 1 , , p 1 ) is a vector of unknown
L( | y ) = i =1 f ( yi ; )
m
and the maximum likelihood estimator of is defined

as = arg max L( | y ) .
Proposed Prediction Interval

Best Predictor and Approximate MSE
The MSE predictor of w, usually referred to as the best
) + E ( s | y; ) . In practice,
predictor (BP), is ( y;=
is unknown and the predictor used is the so-called

empirical best predictor (eBP), denoted as ( y;) . The
prediction error of the eBP is e=
w ( y;) and the
exact MSE is M ( ) = E (e ) . Consider the decomposition

(1)
e =
[ w ( y; ) ] + ( y; ) ( y;) .
Using a conditioning argument, it can be shown that
the two terms in (1) are uncorrelated and thus we have
2
2
M ( ) =
E [ w ( y; ) ] + E ( y; ) ( y;)
= M1 ( ) + M 2 ( ) .
The first term in (2) is simply
=
M1 ( ) E=
[Var ( w | y)] E [Var ( s | y; )]
(2)
whereas the second term is more complicated. Let

d ( y; ) =
( y; ) /
and define A( ) = E [ d ( y; )d ( y; )] and B( ) = I 1 ( ) ,
where I ( ) is Fisher's information matrix whose ij-th
element is given by
E 2 log L( | y ) / i j .
Following Kackar and Harville (1984), a second order
Taylor expansion of ( y; ) ( y;) around =
2
yields ( y; ) ( y;) d ( y; )( ) and then

ultimately M 2 ( ) tr [ A( ) B ( ) ] . An approximation to
M ( ) is therefore
=
M ( ) E [Var ( s | y; ) ] + tr [ A( ) B ( ) ] .
(3)
Estimation of Mean Squared Error

Consider estimating M ( ) by using the plug-in
estimator M () . In this section, we discuss the details

of computing this estimator. In so doing, we will
make repeated use of Algorithm 1 which is designed
to receive and an arbitrary function q ( y; ) (which
for our use is a matrix-valued function) and return the
plug-in estimator of E [ q( y; ) ] through Monte Carlo
evaluation of E q( y ;) , where E () denotes the
estimated expectation with respect to the distribution

f ( y;) . Algorithm 1 is specifically suited for
situations where E [ q ( y; ) ] does not have a closed-
www.srl-journal.org
Approximate Prediction Interval

The reason for our interest in obtaining M () is to use
it in the construction of a prediction interval for w
using the formula
(4)
( y;) z /2 M () .
The Appendix summarizes all the calculations needed
to compute (4). We are also interested in comparing
the interval in (4) with the alternative interval
( y;) z /2 M1 () ,
(5)
not have a closed form expression.
based on the use of a (naive) estimator of MSE that

ignores the expected increase in MSE caused by
having to use the eBP instead of the BP.
The first term in (3) involves E [Var ( s | y ) ] . The plug-in
Recognizing that neither
form expression. We also note that q ( y; ) itself may
estimator of this quantity can be obtained from

Algorithm 1 by choosing q ( y; ) equal to the matrix
Var ( s | y ) . [Recall that Var ( s | y ) is a diagonal matrix
with elements Var ( si | yi ) ]. The second term in (3)

separately involves both A( ) and B( ) . Since
A( ) = E [ d ( y; )d ( y; )] , Algorithm 1 will return A()
by choosing q ( y ; ) equal to the matrix d ( y; )d ( y; ) .
Since B( ) = I 1 ( ) , the plug-in estimator B()
requires
the
( ( y;) w) / M () nor ( ( y;) w) / M1 ()
may be adequately approximated by a N (0,1)

distribution, we also consider bootstrap percentile
adjustments of (4) and (5). Algorithm 2 below shows
how to obtain the bootstrap percentile adjustments.
Algorithm 2
1.
plug-in
estimator of the matrix

I (=
) E log L( | y ) / i j . Algorithm 1 will
return I () by choosing q ( y ; ) equal to the observed
2
information matrix I o ( ) = 2 log L( | y ) / i j . It is
noted that the Monte Carlo evaluation algorithm is

used to approximate tr [ A( ) B ( ) ] instead of
2
E ( y; ) ( y;) because in the latter case we would
need to find in each iteration of the algorithm.

Algorithm 1
For k = 1 to K (we use K = 1000 )
2.
Simulate si( k ) independently from a N (0, 2 )

distribution, i = 1 , , m
j = 1 , , ni
4.
Simulate yi( k ) (=
=
yi(1k ) , , yin( k ) ) , i 1 , , m by
i
generating the components independently

from the distributions f ( | ij( k ) ;) and let
5.
Compute q ( y ( k ) ;)
6.
Next k
7.
Return
K
q ( y ( k ) ;) /
k =1
i-th position.
2.
Compute
ij = g 1 ( xij + si ) , i= 1, , m , j = 1 , , ni
3.
For k = 1 to B (we use B = 1000 )
4.
Simulate a conditional bootstrap data set,

fixing s , by generating the components of
independently
from the distributions f ( | ij ;) , and compute

the bootstrap estimator ( k )
5.
3.
y ( k ) = ( y1( k ) , , ym( k ) )
described in Section 2.1, choosing = 0 and

= ei , where ei is zero except for a one in the
, , m , j 1 , , ni
=
y ( k ) , i 1=
1.
Compute
ij( k ) = g 1 ( xij + si( k ) ) , i = 1, , m ,
Use the data y to obtain parameter estimates

and eBPs of the cluster effects
s = ( s1 , , sm ) . The eBPs are obtained as
Compute
Z k ( y ( k ) ;( k ) ) ( y;) / M (( k ) )
=
if using interval (4) or instead
Z k ( y ( k ) ;( k ) ) ( y;) / M1 (( k ) )
=
if using interval (5)
6.
Next k
7.
Extract lower and upper / 2 percentiles, L /2

and U /2
respectively,
from
the {Z k }kK=1
quantities
K
8.
Construct the bootstrap percentile interval

87
www.srl-journal.org
( y;) U

/2 M ( ) , ( y; ) L /2 M ( )
if using interval (4),

( y;) U

/2 M 1 ( ) , ( y; ) L /2 M 1 ( )
if using interval (5).

Illustrative Applications
f ( yi ; )
=
0
Negative Binomial GLMM

Here we have
ij
( yij + 1)( ) ij + ij +
and the typical link function is g (u ) = log u . For our
f ( yij | ij , ) =
( yij + )
illustration, we assume log =

0 + si and consider a
ij
prediction interval for =
w 0 + si . In this case we have
= ( 0 , 2 , ) and
f ( yi , si ; ) =
( y + ) yij

ij
ij
j =1 ( y + 1)( ) + + (si ; 2 )
ij
ij
ij

Following the computational summary outlined in the
Appendix, E ( si | yi ; ) can be obtained from (A.1) and
ni
used
to
evaluate ( y;=
) 0 + E ( si | yi ; ) .
The
likelihood function is
L( | y )
+s
m
ni ( yij + ) e 0 i
=i 1 =
j 1
0 + si +
( ) e
yij

0 + si

+
e
( si ; 2 ) dsi
and the MLE is obtained as

= arg max L( | y )
and can then be used to evaluate ( y;) .

To evaluate M () , we use Algorithm 1 by means of
referenced formulas in the Appendix to derive the
Var ( si | yi ; ) can be computed
required inputs.
directly
using equation (A.2). Let (u ) denote the
digamma function. Computation of ( y; ) / 0 ,

( y; ) / 2 and ( y; ) / is enabled by (A.3)
with the required pieces (A.4)-(A.6) and (A.7)-(A.9)

given by
f ( yi , si ; )
=
( yi ni e 0 + si ) f ( yi , si ; )
+
0
e 0 si +
f ( yi , si ; )
1 si2
=
1 f ( yi , si ; )
2
2
2
88
f ( yi ; )
=
ni
( yij
j =1
+ ) ni ( )
yi ni e 0 + si
f ( yi ; )
=
2
yij
f ( yi , si ; )
=
0 + si
+ ni log + s
i +
0
+
e
e + s
0

f ( yi , si ; )

( yi ni e 0 + si ) f ( yi , si ; ) dsi
1 si2
2 2 2 1 f ( yi , si ; ) dsi
{ j =1 ( yij + ) ni ( )
ni
yi ni e 0 + si
e
0 + si
+ ni log + s
i +
0
+
e

f ( yi , si ; ) dsi

Finally, the observed information matrix quantities are

calculated from (A.10)-(A.15). The quantities needed
for these formulas are (A.16)-(A.21) and respectively
given by
2 f ( yi , si ; )
=
2 0
( yi ni e 0 + si )
0 + si
+
e 0 + si

yi ni e 0 + si + + s
ni e 0 + si f ( yi , si ; )
+s
2
e 0 i +
(e 0 i + )

2 f ( yi , si ; )
0
2
1 si2
( yi ni e 0 + si ) f ( yi , si ; )
=
1
2 2 2 e 0 + si +
2 f ( yi , si ; ) ni
= j =1 ( yij + ) ni ( )
0
yi ni e 0 + si
yi ni e 0 + si
+ ni log + s
0 + si
0
i
e
+
+ e
+
e
0 + si
0 + si
e
( yi ni e
)
+
f ( yi , si ; )
0 + si
2
+)
(e
0 + si
2 f ( yi , si ; ) 1 si2
1 si2 1
1
f ( yi , si ; )
=
4 4 2 4 2 2
2 2
2 f ( yi , si ; )
1 si2
n
=
1 ji=1 ( yij + ) ni ( )
2
2
2
yi ni e 0 + si
+ ni log + s
f ( yi , si ; )
0
i
e
+
+
e
2 f ( yi , si ; ) ni
= j =1 ( yij + ) ni ( )
0 + si
yi ni e 0 + si
e 0 + si
+ ni log + s
+
e 0 i +
j =1 ( yij + ) ni ( ) +
ni
yi + ni e2( 0 + si )
(e 0 + si + ) 2
f ( yi , si ; )
Poisson GLMM
y
ij
Here we have f ( yij | =
ij ) exp( ij ) ij / yij ! and the
typical link function is g (u ) = log u . We again assume

log =
0 + si and consider a prediction interval for
ij
=
w 0 + si . In this case we have = ( 0 , ) and
2
f ( yi , si ; ) = exp ni e 0 + si + ( 0 + si ) yi
( si ; 2 ) / ji=1 yij ! .
n
E ( si | yi ; ) can be obtained from (A.1) and used to

) 0 + E ( si | yi ; ) .
evaluate ( y;=
The likelihood
function is
m
L( | y ) i =1 exp ni e 0 + si + ( 0 + si ) yi ( si ; 2 )dsi

= arg max L( | y )

Because the negative binomial distribution becomes
the Poisson distribution when = , all of the results
in the negative binomial set pertaining to evaluating
M () are applied to the Poisson GLMM case when
the following two conventions are adopted: i) use the
above formula for f ( yi , si ; ) , ii) only use the
derivative equations that are with respect to 0 and/or
2 , and use their limiting form as .

Bernoulli GLMM
Here we have
1 yij
f ( yij =
| ij ) ij ij (1 ij )
and the
typical link function =

is g (u ) log (u / (1 u )) . For our
illustration, we assume log ij / (1 ij ) =
0 + si and
consider a prediction interval for =
w 0 + si . In this
case we have = ( 0 , 2 ) and
e( 0 + si ) yi
f ( yi , si ; ) =
( si ; 2 ) .
(1 + e 0 + si ) ni
Following the computational summary outlined in the

Appendix, E ( si | yi ; ) can be obtained from (A.1) and
used
to
) 0 + E ( si | yi ; ) .
evaluate ( y;=
The
L( | y ) i =1
m
(1 + e
0 + si ni
( si ; 2 )dsi

= arg max L( | y )
To evaluate M () , we use Algorithm 1 by means of

referenced formulas in the Appendix to derive the
required inputs. Var ( si | yi ; ) can be computed
using
equation
(A.2).
Computation
with the required pieces (A.4)-(A.5) and (A.7)-(A.8)

given by
f ( yi , si ; )
n e 0 + si
= yi i + s f ( yi , si ; )
0
1 + e 0 i
f ( yi , si ; )
1 si2
=
1 f ( yi , si ; )
2
2
2
f ( yi ; )
=
0
f ( yi ; )
=
2
ni e 0 + si
1 + e0 + si
1 si2
f ( yi , si ; ) dsi
2 2 2 1 f ( yi , si ; ) dsi .
Finally, the observed information matrix quantities are

calculated from (A.10), (A.11) and (A.13). The
quantities needed for these formulas are (A.16), (A.17)
and (A.19) and are respectively given by
2
ni e 0 + si
ni e 0 + si
2 f ( yi , si ; )
y
f ( yi , si ; )
=
i 1 + e 0 + si (1 + e 0 + si ) 2
2 0
+
s
2
2
0
i
ne
f ( yi , si ; )
1 si
=
1 yi i + s f ( yi , si ; )
2
2
2
0
i
2
0
1+ e
2
2
2
f ( yi , si ; ) 1 si
1 si
1
=
2 1 4 2 f ( yi , si ; ) .
2 2
4
4
2

Performance Comparisons
We use the illustrative applications in Section 3 to
compare the coverage probability and the expected
width of the proposed prediction interval with the
three alternative methods that are available in SAS.
Section 4.1 briefly reviews what these three methods
are. Section 4.2 outlines the simulation study that was
used to compare the prediction intervals, and the
results of the comparison study are summarized in
Section 4.3.
Prediction Intervals in SAS
interval using normal percentiles.
directly
( y; ) / 0 and ( y; ) / 2 is enabled by (A.3)
The SAS procedure PROC GLIMMIX computes

prediction intervals for GLMMs using one of the
following three methods: Pseudo-likelihood (PL),
Laplace (L), and Quadrature (Q). In all three methods,
an estimate of the predictor and its associated
precision is used to construct a 100(1 )% prediction
likelihood function is
( 0 + si ) yi
www.srl-journal.org
of
The PL method is based on Wolfinger and O'Connell

(1993) who proposed an algorithm to calculate the
parameter estimates and the values of fixed and
random effects. The main idea of this algorithm is to
approximate the GLMM as a LMM through use of a
89
www.srl-journal.org
pseudo-variable obtained through a Taylor series

expansion. The algorithm iterates between updates of
the pseudo-variable and parameter estimator that
result from the LMM computations.
The Laplace method is described in Booth and Hobert
(1998) who utilized a Laplace approximation based on
the work of de Bruijn (1981) to approximate the value
of the BP. An iterative strategy is employed to obtain
the eBP, first approximated using current values of
parameters, and then updated by approximating the
likelihood using another Laplace approximation. This
process continues until convergence is achieved. The
precision of the eBP is evaluated using a Taylor series
approximation to the conditional mean squared error
(CMSE) derived in Booth and Hobert (1998). Zhao et
al. (2006) and Skrondal and Rabe-Hesketh (2009) have
also advocated CMSE as a suitable measure of
precision.
The quadrature method also calculates the BP using a
Laplace approximation, however, the likelihood
function is approximated by an adaptive quadrature
approximation [see, for example, Golub and Welsch
(1969), Abramowitz and Stegun (1972) and Pinheiro
and Chao (2006)]. The advantage of the adaptive
quadrature approximation is to improve the
approximation of the likelihood function by centering
and scaling the quadrature points. Again, the same
iterative strategy as the Laplace method is employed
until convergence criteria is met and CMSE is used to
measure the precision of the eBP. It is worth noting
that because the estimated CMSE is a function of
parameter estimates, its value is not the same for the
Laplace and quadrature methods since these methods
calculate parameter estimates differently.
TABLE 1 NEGATIVE BINOMIAL AND POISSON
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
2
2
1
2
1
2
1
2
90
(4)
0.900
0.902
0.905
0.899
0.908
0.911
0.905
0.902
0.898
0.894
0.896
0.893
0.901
0.899
0.892
0.893
Simulation Study Design

To compare the proposed prediction intervals with the
intervals in SAS, a simulation study has been
performed to evaluate coverage probability and
expected width of the alternative intervals. In terms of
alternative GLMMs, the three illustrative examples
were considered presented in Section 3. In terms of
the proposed prediction intervals, both (4) and (5)
were included in the simulation study along with
their bootstrap adjusted versions, then those four
intervals were compared to the three intervals in SAS
obtained by using the PL, L and Q methods.
For simulation parameters, we took m {10 , 20 } and
set ni n for n {5 , 10} . When considering the
negative binomial GLMM, we varied {1 , 2 , 5 , } .
The latter case = corresponds to simulating from
the Poisson GLMM. We also varied 0 {1 , 2} and
{.2 , .4} .
These combinations of parameter values
correspond to the response variable having a mean

that ranges between 2 and 8 and variance that ranges
between 3 and 94. For the Bernoulli GLMM, we varied
0 {2.5 , .5 , .5 , 2} and {.2 , 2} . These choices of
parameter
values
corresponded
to
success
probabilities that ranged from about 0.2 to 0.8. For
each scenario of parameter settings and sample size
values, we simulated 1000 data sets from the GLMM
and then evaluated each of the alternative prediction
intervals. The percentage of prediction intervals that
covered =
w 0 + si was recorded for each method.
With coverage probabilities on the order of 0.95, the
standard error of the estimated coverage probabilities
is less than 0.01.
GLMM COVERAGE PROBABILITIES FOR (m , n) = (10 , 5) . NOMINAL COVERAGE IS 0.95.
(4) w/BS
0.938
0.939
0.936
0.934
0.938
0.940
0.941
0.939
0.943
0.941
0.943
0.942
0.942
0.940
0.941
0.943
(5)
0.885
0.889
0.901
0.896
0.905
0.907
0.899
0.895
0.892
0.889
0.892
0.887
0.895
0.894
0.887
0.890
Prediction Interval
(5) w/BS
SAS/PL
0.933
0.879
0.934
0.873
0.931
0.875
0.928
0.876
0.935
0.869
0.936
0.871
0.939
0.868
0.936
0.873
0.938
0.878
0.937
0.875
0.939
0.872
0.936
0.876
0.937
0.880
0.936
0.882
0.938
0.879
0.939
0.878
SAS/L
0.881
0.875
0.876
0.876
0.871
0.872
0.870
0.874
0.879
0.876
0.873
0.877
0.881
0.883
0.881
0.879
SAS/Q
0.882
0.876
0.876
0.877
0.873
0.872
0.871
0.875
0.881
0.877
0.874
0.879
0.882
0.885
0.882
0.881
www.srl-journal.org
TABLE 2 NEGATIVE BINOMIAL AND POISSON GLMM COVERAGE PROBABILITIES FOR
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
2
2
1
2
1
2
1
2
(4)
0.909
0.912
0.914
0.916
0.911
0.915
0.920
0.917
0.915
0.918
0.909
0.907
0.919
0.921
0.920
0.916
(4) w/BS
0.941
0.942
0.939
0.943
0.939
0.942
0.940
0.939
0.943
0.941
0.941
0.938
0.941
0.943
0.940
0.938
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
2
2
1
2
1
2
1
2
(4)
0.916
0.917
0.921
0.923
0.926
0.924
0.915
0.912
0.925
0.928
0.919
0.917
0.918
0.921
0.921
0.925
(5)
0.895
0.909
0.912
0.913
0.908
0.912
0.916
0.915
0.914
0.916
0.905
0.903
0.915
0.918
0.917
0.913
Prediction Interval
(5) w/BS
SAS/PL
0.936
0.892
0.938
0.890
0.935
0.903
0.940
0.906
0.936
0.908
0.940
0.904
0.938
0.898
0.935
0.896
0.942
0.899
0.938
0.901
0.937
0.902
0.935
0.905
0.939
0.897
0.940
0.893
0.938
0.889
0.935
0.901
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
.2
.4
2
2
1
2
1
2
1
2
(4)
0.919
0.923
0.925
0.929
0.930
0.928
0.926
0.923
0.926
0.929
0.925
0.927
0.926
0.925
0.924
0.926
SAS/L
0.894
0.891
0.904
0.906
0.910
0.906
0.901
0.897
0.901
0.902
0.904
0.906
0.899
0.894
0.891
0.903
SAS/Q
0.895
0.891
0.905
0.906
0.911
0.906
0.902
0.899
0.903
0.904
0.905
0.907
0.901
0.896
0.893
0.904
(4) w/BS
0.936
0.938
0.940
0.941
0.939
0.937
0.942
0.938
0.941
0.944
0.942
0.940
0.938
0.941
0.940
0.942
(5)
0.912
0.914
0.919
0.922
0.925
0.923
0.912
0.908
0.922
0.925
0.917
0.915
0.915
0.918
0.920
0.921
Prediction Interval
(5) w/BS
SAS/PL
0.933
0.912
0.935
0.918
0.938
0.909
0.940
0.913
0.937
0.914
0.935
0.918
0.940
0.920
0.936
0.917
0.939
0.915
0.941
0.911
0.940
0.921
0.939
0.918
0.937
0.915
0.938
0.913
0.938
0.907
0.940
0.910
TABLE 4 NEGATIVE BINOMIAL AND POISSON GLMM COVERAGE PROBABILITIES FOR
(m , n) = (10 , 10) . NOMINAL COVERAGE IS 0.95.
(4) w/BS
0.944
0.948
0.943
0.947
0.948
0.945
0.946
0.945
0.943
0.947
0.942
0.944
0.945
0.944
0.943
0.946
(5)
0.915
0.918
0.924
0.927
0.927
0.926
0.923
0.921
0.925
0.928
0.924
0.926
0.925
0.924
0.923
0.924
SAS/L
0.914
0.920
0.911
0.913
0.915
0.922
0.921
0.919
0.917
0.911
0.922
0.919
0.916
0.916
0.909
0.912
SAS/Q
0.915
0.921
0.912
0.915
0.917
0.924
0.923
0.921
0.919
0.913
0.924
0.919
0.918
0.917
0.910
0.912
Prediction Interval
(5) w/BS
SAS/PL
0.942
0.921
0.946
0.925
0.942
0.918
0.946
0.923
0.945
0.925
0.943
0.924
0.944
0.925
0.942
0.923
0.942
0.929
0.946
0.926
0.941
0.926
0.943
0.921
0.944
0.920
0.942
0.924
0.940
0.919
0.944
0.923
SAS/L
0.922
0.926
0.919
0.923
0.926
0.927
0.926
0.924
0.931
0.927
0.927
0.923
0.920
0.924
0.921
0.923
SAS/Q
0.924
0.926
0.921
0.924
0.926
0.929
0.927
0.926
0.932
0.927
0.929
0.925
0.923
0.925
0.922
0.925
91
www.srl-journal.org
GLMM EXPECTED WIDTHS. NOMINAL COVERAGE IS 0.95.

Expected Width for (m , n)
1
2
1
2
1
2
1
(10 , 5)
-2.5
(20 , 5)
(5) w/BS
(4) w/BS
(5) w/BS
(4) w/BS
(5) w/BS
(4) w/BS
(5) w/BS
.2
.4
.2
.4
2.348
2.576
2.987
3.042
2.298
2.498
2.893
2.983
1.338
1.548
1.632
1.738
1.289
1.493
1.583
1.695
0.787
0.812
0.825
0.831
0.713
0.795
0.806
0.828
0.489
0.492
0.503
0.509
0.463
0.471
0.493
0.501
.2
.4
.2
.4
2.563
2.681
3.034
3.142
2.478
2.512
2.983
3.015
1.284
1.292
1.318
1.322
1.238
1.258
1.263
1.301
0.756
0.779
0.781
0.792
0.748
0.763
0.772
0.781
0.499
0.512
0.516
0.522
0.488
0.503
0.510
0.518
.2
.4
.2
.4
2.487
2.534
2.834
2.931
2.397
2.498
2.795
2.887
1.326
1.341
1.358
1.361
1.298
1.328
1.348
1.351
0.748
0.751
0.762
0.768
0.732
0.738
0.749
0.759
0.475
0.479
0.485
0.489
0.468
0.470
0.479
0.481
.2
.4
.2
.4
2.503
2.612
2.883
2.912
2.498
2.583
2.862
2.889
1.258
1.301
1.321
1.325
1.223
1.286
1.305
1.318
0.792
0.813
0.829
0.833
0.782
0.802
0.816
0.829
0.453
0.455
0.460
0.470
0.441
0.446
0.452
0.462

Prediction Interval
(4)
(4) w/BS
(5)
(5) w/BS
SAS/PL
SAS/L
SAS/Q
0.2
2
0.895
0.899
0.935
0.942
0.892
0.896
0.934
0.938
0.869
0.871
0.870
0.873
0.871
0.874
-0.5
0.2
2
0.908
0.903
0.941
0.939
0.905
0.901
0.939
0.936
0.875
0.873
0.876
0.875
0.877
0.876
0.5
0.2
2
0.894
0.897
0.934
0.935
0.891
0.893
0.931
0.932
0.875
0.878
0.876
0.878
0.878
0.879
2.0
0.2
2
0.905
0.902
0.940
0.938
0.902
0.897
0.938
0.935
0.874
0.876
0.876
0.874
0.877
0.875
TABLE 7 BERNOULLI GLMM COVERAGE PROBABILITIES FOR
-2.5
Prediction Interval
(4)
(4) w/BS
(5)
(5) w/BS
SAS/PL
SAS/L
SAS/Q
0.2
2
0.911
0.905
0.942
0.940
0.907
0.902
0.938
0.936
0.903
0.897
0.906
0.897
0.907
0.900
-0.5
0.2
2
0.915
0.911
0.944
0.941
0.911
0.907
0.942
0.939
0.899
0.895
0.901
0.897
0.903
0.898
0.5
0.2
2
0.918
0.920
0.935
0.938
0.915
0.916
0.931
0.935
0.892
0.896
0.893
0.896
0.895
0.898
2.0
0.2
2
0.917
0.912
0.943
0.941
0.914
0.908
0.938
0.939
0.901
0.898
0.904
0.901
0.906
0.903
TABLE 8 BERNOULLI GLMM COVERAGE PROBABILITIES FOR
92
(20 , 10)
(4) w/BS
TABLE 6 BERNOULLI
(10 , 10)
-2.5
Prediction Interval
(4)
(4) w/BS
(5)
(5) w/BS
SAS/PL
SAS/L
SAS/Q
0.2
2
0.925
0.921
0.944
0.941
0.921
0.916
0.939
0.939
0.921
0.919
0.922
0.920
0.923
0.921
-0.5
0.2
2
0.919
0.923
0.938
0.941
0.915
0.918
0.936
0.940
0.912
0.914
0.914
0.915
0.916
0.918
0.5
0.2
2
0.924
0.928
0.942
0.945
0.919
0.923
0.940
0.941
0.922
0.923
0.922
0.923
0.924
0.925
2.0
0.2
2
0.922
0.918
0.942
0.944
0.919
0.914
0.940
0.941
0.915
0.911
0.917
0.914
0.919
0.916
TABLE 9 BERNOULLI
-2.5
www.srl-journal.org

Prediction Interval
(4)
(4) w/BS
(5)
(5) w/BS
SAS/PL
SAS/L
SAS/Q
0.2
2
0.929
0.928
0.947
0.945
0.928
0.925
0.946
0.943
0.925
0.923
0.926
0.925
0.927
0.926
-0.5
0.2
2
0.933
0.931
0.946
0.945
0.929
0.927
0.945
0.941
0.919
0.918
0.921
0.921
0.922
0.922
0.5
0.2
2
0.924
0.929
0.945
0.948
0.921
0.926
0.942
0.944
0.922
0.924
0.923
0.924
0.924
0.925
2.0
0.2
2
0.928
0.931
0.946
0.948
0.925
0.926
0.943
0.944
0.925
0.927
0.926
0.928
0.927
0.929
TABLE 10 BERNOULLI
GLMM EXPECTED WIDTHS. NOMINAL COVERAGE Is 0.95.

Expected Width for (m , n)
(10 , 5)
(10 , 10)
(20 , 5)
(20 , 10)
(4) w/BS
(5) w/BS
(4) w/BS
(5) w/BS
(4) w/BS
(5) w/BS
(4) w/BS
(5) w/BS
-2.5
0.2
2
1.873
2.048
1.756
1.957
1.131
1.238
1.023
1.187
0.742
0.813
0.728
0.793
0.318
0.325
0.305
0.311
-0.5
0.2
2
1.948
2.240
1.865
2.187
1.235
1.358
1.113
1.238
0.793
0.821
0.763
0.798
0.298
0.302
0.288
0.293
0.5
0.2
2
1.653
1.891
1.594
1.823
0.998
1.108
0.973
1.083
0.801
0.816
0.783
0.801
0.289
0.296
0.278
0.281
2.0
0.2
2
1.784
2.012
1.693
1.998
0.982
0.998
0.963
0.975
0.768
0.784
0.759
0.771
0.304
0.309
0.298
0.301
Results
Summary
Tables 1-4 show the coverage probabilities and Table 5

shows expected widths of the seven alternative
prediction intervals for the scenarios covering the
negative binomial and Poisson GLMMs. Table 6-9 and
Table 10 show corresponding results for the Bernoulli
GLMMs.
We have developed a new prediction interval

methodology for a class of GLMMs that are suitable
for analyzing clustered count data. Our approach was
to derive an approximation to the unconditional MSE
of the eBP, staying within the context of the GLMM,
then our proposed method was compared with three
existing prediction interval methods implemented in
the SAS procedure GLIMMIX based upon pseudolikelihood, Laplace, and quadrature approximations.
It is seen that the three prediction interval methods

implemented in SAS have coverage probabilities that
are lower than expected, and in general, there is very
little difference among these three methods. While
intervals (4) and (5) have coverage probability closer
to the nominal level, and their bootstrapped versions
offer the best solutions. Other simulations [see Yang
(2013)] show that bootstrap adjustments are less
effective with the three SAS intervals and generally
not adequate enough to make the coverage
probabilities satisfactory. From Table 5 and Table 10 it
can be seen that the increasing number of clusters is
more evident than that of sampling units in terms of
reducing the expected width. Namely, when doubling
the number of clusters, the expected width reduces by
approximately 75% compared to that approximately
50% when doubling the number of sampling units. In
addition, it is observed that interval (4) has an
appropriately slightly wider expected width compared
to interval (5) due to the correction term in the MSE
approximation.
Our simulation study showed that the coverage

probabilities for the intervals computed by three
methods in GLIMMIX are too low. The coverage
probabilities for our proposed interval (4) with
bootstrap adjustments are quite close to the nominal
value with an expected width that rapidly decreases as
the number of clusters increases, and less rapidly as
the number of sampling units within a cluster
increases. Future work includes consideration of
GLMMs having two or more random factors.
REFERENCES
Abramowitz,
M.,
Stegun,
I.,
1972.
Handbook
of
Mathematical Functions, New York: Dover Publications.

Booth, J.G., Hobert, J.P., 1998. Standard errors of prediction
in generalized linear mixed models. Journal of the
93
www.srl-journal.org
best predictor ( y; ) requires the quantities
American Statistical Association 93, 262-272.
s
i
E ( si | yi ; ) =
de Bruijn, N. G., 1981. Asymptotic Methods in Analysis,

New York: Dover.
Golub, G.H., Welsch, J.H., 1969. Calculation of Gaussian
quadrature rules. Mathematical Computing 23, 221230.
Harville, D.A., 2008. Accounting for the estimation of
variances and covariances in prediction under a general
linear model: an overview. Tatra Mt. Math 39, 115.
A standard optimization routine can be used to

maximize the integrated likelihood L( | y ) to find ,
which in turn can be used to find ( y;) .
Algorithm 1 can be used three times, as detailed below,
to obtain M () . For the first use of Algorithm 1, send
it the matrix Var ( s | y ) = Diag [ Var ( si | yi ; ) ]i =1 , using
Var ( si | yi ; ) =
Kackar, R.N., Harville, D.A., 1984. Approximations for
standard errors of estimators of fixed and random effect

in mixed linear models. Journal of the American
Schabenberger, O., 2006. SAS for Mixed Models, Second

Edition, Cary, NC: SAS Institute Inc.
McCulloch, C.E., Searle, S.R., Neuhaus, J.M., Generalized,
adaptive Gaussian quadrature algorithms for multilevel
f 2 ( yi ; )
f ( yi ; )
l
(A.3)
si f ( yi , si ; ) dsi
f 2 ( yi ; )
ni
log f ( yij | ij , ) / ij
j =1 xijl
g ( ij )
models: a pseudo-likelihood approach. Journal of

Statistical Computation and Simulation 4, 233243.
(A.4)
f ( yi , si ; )
1 si2
1
f ( yi , si ; )
2 2 2
f ( yi , si ; ) / =
f ( yi , si ; ) /=
2
Yang, C., 2013. Prediction Intervals in Generalized Linear

PhD Dissertation, Department of
Applied Statistics, University of California, Riverside.

General design Bayesian generalized linear mixed
f ( yi , si ; )
dsi
l
Expressions needed to evaluate (A.3) are

f ( yi , si ; ) / l =
Wolfinger, R., OConnell, M., 1993. Generalized linear mixed
Zhao, Y., Staudenmayer J., Coull, B.A., Wand, M.P., 2006.
s
i
Skrondal, A., Rabe-Hesketh, S., 2009. Prediction in multilevel
ni
log
j =1
f ( yij | ij , ) / f ( yi , si ; )
(A.5)
(A.6)
and the integrated forms of (A.4)-(A.6)

f ( yi ; ) / l =
models. Statistical Science 21, 3551.
Appendix
f ( yi ; ) / 2 =
Here we summarize all the formulas needed to

compute the prediction interval in (4), including those
formulas that are needed in conjunction with use of
Algorithm 1. All of the integrals required for these
computations are one-dimensional integrals that can
be easily evaluated using Gaussian quadrature. The
f ( yi ; ) / =
94
f ( yi ; )
of
Computational and Graphical Statistics 15, 5881.
Mixed Models.
(A.2)
E ( si | yi ; )
si ( f ( si | yi ; ) / l ) dsi
=
Pinheiro, J.C., Chao, E.C., 2006. Efficient Laplacian and
Society, Series B 172, 659687.
Evaluations of / l are either zero or one and
2008.
generalized linear models. Journal of the Royal Statistical
si f ( si | yi ; )dsi
where
( y; ) / l = ( / l ) + ( E ( s | y; ) / l ) .
Linear and Mixed Models. 2nd edition. Wiley; New York:
Journal
( y; ) ( y; )
d ( y; ) d ( y; ) =
Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D.,
models.
f ( si | yi ; )dsi
For the second use, send it the matrix
Statistical Association 79, 853-862.
mixed
s2
i
Small Area Estimation, 15, pp.1-96.
linear
(A.1)
Jang, J. and Lahiri, P., 2006. Mixed Model Prediction and
generalized
f ( si | yi ; )dsi
( f ( yi , si ; ) / l ) dsi
2
( f ( yi , si ; ) / ) dsi
( f ( yi , si ; ) / ) dsi .
(A.7)
(A.8)
(A.9)
Finally, for the third use of Algorithm 1, send it the

observed information matrix I o ( ) . Since
log L( | y ) = i =1 log f ( yi ; ) ,
m
and
2
2 log f ( yi ; ) f ( yi ; ) f ( yi ; ) / j k
=
j k
f 2 ( yi ; )
( f ( yi ; ) / j ) ( f ( yi ; ) / k )
www.srl-journal.org
2 f ( yi , si ; ) ni
= j =1 xijl
g ( ij )
k l
n
ji=1 xijk
g ( ij )
(A.16)
2
log
f
(
y
|
1
n
ij
ij
+ ji=1 xijk xijl
2
2 ij
g ( ij )
g ( ij ) log f ( yij | ij , )
f ( yi , si ; )
ij
g ( ij )3

f 2 ( yi ; )
it suffices to combine the expressions (A.7)-(A.9) with

expressions for the Hessian matrix of f ( yi ; ) .
Starting with (A.7)-(A.9), we find
2
f ( yi , si ; )
2 f ( yi ; )
=
dsi
k l
k l
2 f ( yi ; )
2 l
2 f ( yi , si ; )
2 l
dsi
2
f ( yi , si ; )
2 f ( yi ; )
=
dsi
l
l
2 f ( yi ; )
2 2
2 f ( yi ; )
2
2 f ( yi ; )

2
2 f ( yi , si ; )
2 2
2 f ( yi , si ; )
2 f ( yi , si ; )
dsi
(A.10)
f ( yi , si ; )
2 f ( yi , si ; )
1 si2
(A.17)
=
1
2
2
2
l
2
l
(A.11)
2 f ( yi , si ; ) ni log f ( yij | ij , ) f ( yi , si ; )
=
j =1
(A.12)
2
n
1 log f ( yij | ij , )
(A.18)
+ ji=1 xijl
g ( ij )
ij
(A.13)
f ( yi , si ; )
dsi
(A.14)
dsi .
(A.15)
It can be seen from (A.10)-(A.15) that what we

ultimately need is the Hessian matrix of f ( yi , si ; ) ,
which can be shown, starting with (A.4)-(A.6), to be
the following:
2 f ( yi , si ; ) 1 si2
1 si2 1
1
=
2
4 2 f ( yi , si ; ) (A.19)
2 2
4
4
2
f ( yi , si ; )
2 f ( yi , si ; )
1 si2
=
1
2
2
2
2 f ( yi , si ; )
2
(A.20)
n log f ( y | , ) 2
ij
ij
= ji=1
n log f ( yij | ij , )
f ( yi , si ; )
+ ji=1
2

(A.21)
95

Approximate Prediction Intervals For Generalized Linear Mixed Models Having A Single Random Factor

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Approximate Prediction Intervals For Generalized Linear Mixed Models Having A Single Random Factor

Diunggah oleh

Hak Cipta:

Format Tersedia

Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013