Attribution Non-Commercial (BY-NC)

20 tayangan

Attribution Non-Commercial (BY-NC)

- Problem solving software development
- GoldSim Appendices
- AFNOR NF T72-281 - Complete Document
- Probabilistic Multi-Hypothesis Tracking
- EE 564-Stochastic Systems-Momin Uppal
- Probability Random Process R08 NovDec 10
- 81 Monte Carlo
- MCMC
- UT Dallas Syllabus for stat3332.001.08f taught by Yuly Koshevnik (yxk055000)
- MATH3081-Syllabus
- emnlp08
- writing sample
- Lecture 7 + Problem Set (2) - Statistics.pptx
- psychology assignment
- Glossary of Theory of Equations
- RR210403-PROBABILITYTHEORYANDSTOCHASTICPROCESS
- Sila Bus
- chapter04A
- tenney1
- Week 3_2016s1

Anda di halaman 1dari 11

The American Statistician, Vol. 46, No. 3. (Aug., 1992), pp. 167-174.

Stable URL:

http://links.jstor.org/sici?sici=0003-1305%28199208%2946%3A3%3C167%3AETGS%3E2.0.CO%3B2-R

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained

prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in

the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

http://www.jstor.org/journals/astata.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic

journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,

and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take

advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.

http://www.jstor.org

Tue Nov 6 02:28:34 2007

Explaining the Gibbs Sampler

GEORGE CASELLA and EDWARD I. GEORGE*

Computer-intensive algorithms, such as the Gibbs sam- applications of the Gibbs sampler have been in Bayesian

pler, have become increasingly popular statistical tools, models, it is also extremely useful in classical (likeli-

both in applied and theoretical work. The properties of hood) calculations [see Tanner (1991) for many ex-

such algorithms, however, may sometimes not be ob- amples]. Furthermore, these calculational methodolo-

vious. Here we give a simple explanation of how and gies have also had an impact on theory. By freeing

why the Gibbs sampler works. We analytically establish statisticians from dealing with complicated calculations,

its properties in a simple case and provide insight for the statistical aspects of a problem can become the main

more complicated cases. There are also a number of focus. This point is wonderfully illustrated by Smith and

examples. Gelfand (1992).

In the next section we describe and illustrate the ap-

KEY WORDS: Data augmentation; Markov chains;

plication of the Gibbs sampler in bivariate situations.

Monte Carlo methods; Resampling techniques.

Section 3 is a detailed development of the underlying

theory, given in the simple case of a 2 x 2 table with

multinomial sampling. From this detailed development,

1. INTRODUCTION

the theory underlying general situations is more easily

The continuing availability of inexpensive, high-speed understood, and is also outlined. Section 4 elaborates

computing has already reshaped many approaches to the role of the Gibbs sampler in relating conditional

statistics. Much work has been done on algorithmic and marginal distributions and illustrates some higher

approaches (such as the EM algorithm; Dempster, Laird, dimensional generalizations. Section 5 describes many

and Rubin 1977), or resampling techniques (such as the of the implementation issues surrounding the Gibbs

bootstrap; Efron 1982). Here we focus on a different sampler, and Section 6 contains a discussion and de-

type of computer-intensive statistical method, the Gibbs scribes many applications.

sampler.

The Gibbs sampler enjoyed an initial surge of pop-

ularity starting with the paper of Geman and Geman 2. ILLUSTRATING THE GIBBS SAMPLER

(1984), who studied image-processing models. The roots

of the method, however, can be traced back to at least Suppose we are given a joint density f(x, y , , . . . ,

Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller y,), and are interested in obtaining characteristics of

(1953), with further development by Hastings (1970). the marginal density

More recently, Gelfand and Smith (1990) generated

new interest in the Gibbs sampler by revealing its po-

tential in a wide variety of conventional statistical

problems. such as the mean or variance. Perhaps the most natural

The Gibbs sampler is a technique for generating ran- and straightforward approach would be to calculate f (x)

dom variables from a (marginal) distribution indirectly, and use it to obtain the desired characteristic. However,

without having to calculate the density. Although there are many cases where the integrations in (2.1) are

straightforward to describe, the mechanism that drives extremely difficult to perform, either analytically or nu-

this scheme may seem mysterious. The purpose of this merically. In such cases the Gibbs sampler provides an

article is to demystify the workings of these algorithms alternative method for obtaining f(x).

by exploring simple cases. In such cases, it is easy to Rather than compute or approximate f(x) directly,

see that Gibbs sampling is based only on elementary the Gibbs sampler allows us effectively to generate a

properties of Markov chains. sample XI, . . . , X,, - f(x) without requiring f(x). By

Through the use of techniques like the Gibbs sam- simulating a large enough sample, the mean, variance,

pler, we are able to avoid difficult calculations, replac- or any other characteristic off (x) can be calculated to

ing them instead with a sequence of easier calculations. the desired degree of accuracy.

These methodologies have had a wide impact on prac- It is important to realize that, in effect, the end result

tical problems, as discussed in Section 6. Although most of any calculations, although based on simulations, are

the population quantities. For example, to calculate the

mean of f(x), we could use (l/m)C~=lXi, and the fact

*George Casella is Professor, Biometrics Unit, Cornell University,

Ithaca, NY 14853. The research of this author was supported by that

National Science Foundation Grant DMS 89-0039. Edward I. George

is Professor, Department of MSIS, The University of Texas at Austin,

TX 78712. The authors thank the editors and referees, whose com-

ments led to an improved version of this article.

0 1992 American Statistical Association The Arnericrtrz Stutisfician, Augilst 1992, Vol. 46, No. 3 167

Thus, by taking rn large enough, any population char- can be obtained analytically from (2.5) as

acteristic, even the density itself, can be obtained to

any degree of accuracy.

To understand the workings of the Gibbs sampler,

we first explore it in the two-variable case. Starting with

a pair of random variables (X, Y), the Gibbs sampler

generates a sample from f (x) by sampling instead from the beta-binomial distribution. Here, characteristics of

the conditional distributions f(x I y) and fO, I x), dis- f(x) can be directly obtained from (2.7), either analyt-

tributions that are often known in statistical models. ically or by generating a sample from the marginal and

This is done by generating a "Gibbs sequence" of ran- not fussing with the conditional distributions. However,

dom variables this simple situation is useful for illustrative purposes.

Figure 1 displays histograms of two samples x,, . . . ,

Y;), x;), Y;, x ; , y;, x;, . . xm of size m = 500 from the beta-binomial distribution

The initial value Y;) = y;) is specified, and the rest of of (2.7) with n = 16, a = 2, and /3 = 4.

(2.3) is obtained iteratively by alternately generating The two histograms are very similar, giving credence

values from to the claim that the Gibbs scheme for random variable

generation is indeed generating variables from the mar-

X,' - f ( x I Yi' = y;) ginal distribution.

One feature brought out by Example 1 is that the

Gibbs sampler is really not needed in any bivariate

We refer to this generation of (2.3) as Gibbs sampling. situation where the joint distribution f(x, y) can be

It turns out that under reasonably general conditions, calculated, since f(x) = f(x, y)lf(y I x). However, as

the distribution of Xk converges to f (x) (the true mar- the next example shows, Gibbs sampling may be indis-

ginal of X) as k + a.Thus, for k large enough, the pensable in situations where f(x, y), f (x), or f (y) cannot

final observation in (2.3), namely Xk = x;, is effec- be calculated.

tively a sample point from f(x).

The convergence (in distribution) of the Gibbs se- Example 2. Suppose X and Y have conditional dis-

quence (2.3) can be exploited in a variety of ways to tributions that are exponential distributions restricted

obtain an approximate sample from f(x). For example, to the interval (0, B), that is,

Gelfand and Smith (1990) suggest generating m inde-

pendent Gibbs sequences of length k, and then using

the final value of Xk from each sequence. If k is chosen f(y I x) rn xe-", 0 < y < B < m, (2.8b)

large enough, this yields an approximate iid sample where B is a known positive constant. The restriction

from f(x). Methods for choosing such k, as well as to the interval (0, B) ensures that the marginal f(x)

alternative approaches to extracting information from exists. Although the form of this marginal is not easily

the Gibbs sequence, are discussed in Section 5. For the calculable, by applying the Gibbs sampler to the con-

sake of clarity and consistency, we have used only the ditionals in (2.8) any characteristic o f f ( x ) can be ob-

preceding approach in all of the illustrative examples tained.

that follow.

Example 1. For the following joint distribution of

X and Y,

teristics of the marginal distribution f(x) of X. The Gibbs

sampler allows us to generate a sample from this mar-

ginal as follows. From (2.5) it follows (suppressing the

overall dependence on n, a , and p) that

f(x I y) is Binomial (n, y) (2.6a)

f(y I x) is Beta (x + a, n - x + p). (2.6b)

If we now apply the iterative scheme (2.4) to the dis-

tributions (2.6), we can generate a sample XI, X,, . . . ,

Figure 1. Comparison of Two Histograms of Samples of Size

Xmfrom f (x) and use this sample to estimate any desired

m = 500 From the Beta-Binomial Distribution With n = 16, a = 2,

characteristic. and p = 4. The black histogram sample was obtained using Gibbs

As the reader may have already noticed, Gibbs sam- sampling with k = 10. The white histogram sample was generated

pling is actually not needed in this example, since f (x) directly from the beta-binomial distribution.

In Figure 2 we display a histogram of a sample of

size m = 500 from f(x) obtained by using the final

values from Gibbs sequences of length k = 15.

In Section 4 we see that if B is not finite, then the

densities in (2.8) are not a valid pair of conditional

densities in the sense that there is no joint density

f(x, y) to which they correspond, and the Gibbs se-

quence fails to converge.

Gibbs sampling can be used to estimate the density

itself by averaging the final conditional densities from

each Gibbs sequence. From (2.3), just as the values

XL = x; yield a realization of XI, . . . , X, -

f (x), the

values Yk = y; yield a realization of Y,, . . . , Y, - o ~ c u @ - J * u - , w t . a J L n ~ - ~ ~ * ~ ~

f (y). Moreover, the average of the conditional densities 7 7 7 7 7 - 7

f(x I Y; = y;) will be a close approximation to f(x), Figure 3. Comparison of Two Probability Histograms of the Beta-

and we can estimate f (x) with Binomial Distribution With n = 16, a = 2, and p = 4. The black

had length k = 10. The white histogram represents the exact beta-

theory behind the calculation in (2.9) is that the ex-

pected value of the conditional density is

with the exact beta-binomial probabilities for compar-

a calculation mimicked by (2.9), since y,, . . . , y, ap- ison.

proximate a sample from f (y). For the densities in (2.8), The density estimates (2.9) and (2.11) illustrate an

this estimate of f(x) is shown in Figure 2. important aspect of using the Gibbs sampler to evaluate

Example 1(continued): The density estimate meth- characteristics of f (x). The quantities f (x I y,), . . . ,

odology of (2.9) can also be used in discrete distribu- f (x I y,), calculated using the simulated values y,, . . .

tions, which we illustrate for the beta-binomial of Ex- , y,, carry more information about f(x) than x,, . . . ,

ample 1. Using the observations generated to construct x, alone, and will yield better estimates. For example,

Figure 1, we can, analogous to (2.9), estimate the mar- an estimate of the mean of f(x) is (lim) C?!, x;, but a

ginal probabilities of X using better estimate is (llm) Cy! "=,(X I y,), as long as these

conditional expectations are obtainable. The intuition

behind this feature is the Rao-Blackwell theorem (il-

lustrated by Gelfand and Smith 1990), and established

analytically by Liu, Wong, and Kong (1991).

It is not immediately obvious that a random variable

with distribution f(x) can be produced by the Gibbs

sequence of (2.3) or that the sequence even converges.

That this is so relies on the Markovian nature of the

iterations, which we now develop in detail for the simple

case of a 2 x 2 table with multinomial sampling.

Suppose X and Y are each (marginally) Bernoulli

random variables with joint distribution

the Pair of Conditional Distributions in (2.8), With B = 5, Obtained

Using Gibbs Sampling With k = 15 Along With an Estimate of the

Marginal Density Obtained From Equation (2.9) (solid line). The

dashed line is the true marginal density, as explained in Section

4.1.

or, in terms of the joint probability function, Stone 1972), that as long as all the entries of Axlxare

positive, then (3.3) implies that for any initial proba-

bility f,, as k + m, fk converges to the unique distri-

bution f that is a stationary point of (3.3), and satisfies

For this distribution, the marginal distribution of x is

given by

Thus, if the Gibbs sequence converges, the f that

satisfies (3.4) must be the marginal distribution of X.

a Bernoulli distribution with success probability p, + Intuitively, there is nowhere else for this iteration to

P4. go; in the long run we will get X's in the proportion

The conditional distributions of X I Y = y and Y I X dictated by the marginal distribution. However, it is

= x are straightforward to calculate. For example the

straightforward to check that (3.4) is satisfied by fx of

distribution of X I Y = 1is Bernoulli with success prob- (3.I ) , that is,

ability p4/(p, + p4). All of the conditional probabilities

can be expressed in two matrices,

we stop the iteration scheme (2.3) at a large enough

value of k, we can assume that the distribution of XL

is approximately fx. Moreover, the larger the value of

and k, the better the approximation. This topic is discussed

further in Section 5.

The algebra for the 2 x 2 case immediately works

for any n x m joint distribution of X's and Y's. We

can analogously define the n x n transition matrix

Axlxwhose stationary distribution will be the marginal

where Aylxhas the conditional probabilities of Y given distribution of X. If either (or both) of X and Y are

X = x, and Aylxhas the conditional probabilities of X continuous, then the finite dimensional arguments will

given Y = y. not work. However, with suitable assumptions, all of

The iterative sampling scheme applied to this distri- the theory still goes through, so the Gibbs sampler still

bution yields (2.3) as a sequence of 0's and 1's. The produces a sample from the marginal distribution of X.

matrices Ax[, and AyIxmay be thought of as transition Equation (3.2) would now represent the conditional

matrices giving the probabilities of getting to x states density of Xi given X;,, and could be written

from y states and vice versa, that is, P ( X = x I Y =

y) = probability of going from state y to state x.

If we are only interested in generating the marginal

distribution of X, we are mainly concerned with the X' (Sometimes it is helpful to use subscripts to denote the

sequence from (2.3). To go from X;, + X; we have to density.) Then, step by step, we could write the con-

go through Y ; , so the iteration sequence is ditional densities of X;IX;,, XAIXA, XAIX!, ..:. Similar to

X;, + Y; + Xi, and XA + X; forms a Markov chain the k-step transition matrix (Axlx)k,we derive an "in-

with transition probability finite transition matrix" with entries that satisfy the

relationship

P(x; = XI I x;, = x,) = C P(X; = XI / Y; = y)

Y

X

which is the continuous version of (3.3). The density

The transition probability matrix of the X' sequence,

fXklXii-l represents

a one-step transition, and the other

A,,,, is given by

two densities play the role of fk and fk_,. As k + m, it

Axlx = AylxAxly, again follows that the stationary point of (3.5) is the

and now we can easily calculate the probability distri- marginal density of X, the density to which fx;lxL-lcon-

bution of any XL in the sequence. That is, the transition verges.

matrix that gives P(XL = xk I X;, = x,) is (Axlx)k.Fur-

thermore, if we write 4. CONDITIONALS DETERMINE MARGINALS

Gibbs sampling can be thought of as a practical im-

plementation of the fact that knowledge of the condi-

to denote the marginal probability distribution of XL, tional distributions is sufficient to determine a joint

then for any k, distribution (if it exists!). In the bivariate case, the de-

rivation of the marginal from the conditionals is fairly

straightforward. Complexities in the multivariate case,

It is well known (see, for example, Hoel, Port, and however, make these connections more obscure. We

a

then investigate higher dimensional cases.

Suppose that, for two random variables X and Y, we

know the conditional densities fxlY(x1 y) and

fylx(y I x). We can determine the marginal density of

X, fx(x), and hence the joint density of X and Y, through

the following argument. By definition,

using the fact that f,(x, y) = fxlY(xI y)fY(y), we have 4 8 12 16 20 24

Figure 4. Histogram of a Sample of Size m = 500 From the Pair

of Conditional Distributions in (4.3), Obtained Using Gibbs Sampling

With k = 10.

and if we similarly substitute for fY(y), we have Substituting fx(t) = llt into (4.4) yields

[-I

1X = j (x +t t)2 1t dt,

solving (4.4). Although this is a solution, llx is not a

density function. When the Gibbs sampler is applied to

= jh(x, t)fx(t) dl, (4.1) the conditional densities in (4.3), convergence breaks

down. It does not give an approximation to llx, in fact,

where h(x, t) = [S fXIY(~I ylfYIX(~I t) dyl. Equation we do not get a sample of random variables from a

(4.1) defines a fixed point integral equation for which marginal distribution. A histogram of such random vari-

fx(x) is a solution. The fact that it is a unique solution ables is given in Figure 4, which vaguely mimics a graph

is explained by Gelfand and Smith (1990). of f(x) = llx.

Equation (4.1) is the limiting form of the Gibbs it- It was pointed out by Trevor Sweeting (personal com-

eration scheme, illustrating how sampling from condi- munication) that Equation (4.1) can be solvedvlsing the

tionals produces a marginal distribution. As k + w in truncated exponential densities in (2.8). Evaluating the

(3.51, constant in the conditional densities gives f(x1y) =

ye-Yxl(l - ecBy), 0 < x < B, with a similar expression

fX;llX"(~I xo) + fx(x) for f (ylx). Substituting these functions into (4.1) yields

and the solution f(x) (1 - ecBX)Ix.This density (properly

normalized) is the dashed line in Figure 2.

fxiIxi-,(x I t) + h(x, t), (4.2) The Gibbs sampler fails when B = w above because

and thus (4.1) is the limiting form of (3.5). $fx(x)dx = w , and there is no convergence as described

Although the joint distribution of Xand Y determines in (4.2). In a sense, we can say that a sufficient condition

all of the conditionals and marginals, it is not always for the convergence in (4.2) to occur is that fx(x) is a

the case that a set of proper conditional distributions proper density, that is J fx(x)dx < m. One way to guar-

will determine a proper marginal distribution (and hence antee this is to restrict the conditional densities to lie

a proper joint distribution). The next example shows in a compact interval, as was done in (2.8). General

this. convergence conditions needed for the Gibbs sampler

(and other algorithms) are explored in detail by Scher-

Example 2 (continued): Suppose that B = w in (2.8), vish and Carlin (1990), and rates of convergence are

so that X and Y have the conditional densities also discussed by Roberts and Polson (1990).

As the number of variables in a problem increase,

Applying (4.1), the marginal distribution of X is the the relationship between conditionals, marginals, and

solution to joint distributions becomes more complex. For exam-

ple, the relationship conditional x marginal = joint

does not hold for all of the conditionals and marginals.

This means that there are many ways to set up a fixed-

point equation like (4.1), and it is possible to use dif-

ferent sets of conditional distributions to calculate the

marginal of interest. Such methodologies are part of

the general techniques of substitution sampling (see

Gelfand and Smith 1990, for an explanation). Here we

merely illustrate two versions of this technique.

In the case of two variables, all substitution sampling

algorithms are the same. The three variable case, how-

ever, is sufficiently complex to illustrate the differences

between algorithms, yet sufficiently simple to allow us

to write things out in detail. Generalizing to cases of

more than three variables is reasonably straightforward.

Suppose we would like to calculate the marginal dis-

tribution fx(x) in a problem with random variables X ,

Y, and 2 . A fixed-point integral equation like (4.1) can

be derived if we consider the pair (Y, 2 ) as a single

random variable. We have o 2 6 10 14 18

of X Using Equation (2.1 I), Based on a Sample of Size m = 500

From the Three Conditional Distributions in (4.9) With A = 16, a =

analogous to (4.1). Cycling between fxlyzand fyzlxwould 2, and p = 4. The Gibbs sequences had length k = 10.

again result in a sequence of random variables con-

verging in distribution to fx(x). This is the idea behind However, from (4.8) it is reasonably straightforward to

the Data Augmentation Algorithm of Tanner and Wong calculate the three conditional densities. Suppressing

(1987). By sampling iteratively from fxlYz and fyzlx, dependence on A , a , and p ,

they show how to obtain successively better approxi- f(x I y, n) is binomial (n, y)

mations to fx(x).

In contrast, the Gibbs sampler would sample itera- f ( y Ix, n) is beta (x + a, n - x + p)

tively from fxlYz, fylxz,and fijxy. That is, the jth it-

eration would be

x; - f(x 1 Y; = y' I' 2'I = zI' )

Yitl - f(y I X,' = x;, 2; = 2,') If we now apply the iterative scheme (4.6) to the dis-

tributions in (4.9), we can generate a sequknce X I , X,,

. . . , X,, from f(x) and use this sequence to estimate

The iteration scheme of (4.6) produces a Gibbs se- the desired characteristic. The density estimate of

quence P ( X = x), using Equation (2.11) can also be con-

structed. This is done and is given in Figure 5. This

Y 2 , x Y zx Y 2 , x . . . , (4.7) figure can be compared to Figure 3, but here there is

with the property that, for large k, X i = xk is effec- a longer right tail from the Poisson variability.

tively a sample point from f(x). Although it is not im- The model (4.9) can have practical applications. For

mediately evident, the iteration in (4.6) will also solve example, conditional on n and y, let x represent the

the fixed-point equation (4.5). In fact, a defining char- number of successful hatchings from n insect eggs, where

acteristic of the Gibbs sampler is that it always uses the each egg has success probability y. Both n and y fluc-

full set of univariate conditionals to define the iteration. tuate across insects, which is modeled in their respective

Besag (1974) established that this set is sufficient to distributions, and the resulting marginal distribution of

determine the joint (and any marginal) distribution, and X i s a typical number of successful hatchings among all

hence can be used to solve (4.5). insects.

As an example of a three-variable Gibbs problem,

we look at a generalization of the distribution examined 5. EXTRACTING INFORMATION FROM

in Example 1. GIBBS SEQUENCE

Example 3. In the distribution (2.5), we now let n Some of the more important issues in Gibbs sampling

be the realization of a Poisson random variable with surround the implementation and comparison of the

mean A , yielding the joint distribution various approaches to extracting information from the

Gibbs sequence in (2.3). These issues are currently a

topic of much debate and research.

Again, suppose we are interested in the marginal dis- As illustrated in Section 3, the Gibbs sampler gen-

tribution of X. Unlike Example 1, here we cannot cal- erates a Markov chain of random variables which con-

culate the marginal distribution of X in closed form. verge to the distribution of interest f(x). Many of the

172 The American Statistician, A u g ~ ~1992,

st Vol. 46, No. 3

popular approaches to extracting information from the the likelihood function and characteristics of likelihood

Gibbs sequence exploit this property by selecting some esti~nators.

large value for k, and then treating any X,' for j 2 k as Although the theory behind Gibbs sampling is taken

a sample from f ( x ) . The problem then becomes that of from Markov chain theory, there is also a connection

choosing the appropriate value of k. to "incomplete data" theory, such as that which forms

A general strategy for choosing such k is to monitor the basis of the EM algorithm. Indeed, both Gibbs

the convergence of some aspect of the Gibbs sequence. sampling and the E M algorithm seem to share common

For example, Gelfand and Smith (1990) and Gelfand, underlying structure. The recent book by Tanner (1991)

Hills, Racine-Poor, and Smith (1990) suggest monitor- provides explanations of all these algorithms and gives

ing density estimates from m independent Gibbs se- many illustrative examples.

quences, and choosing k to be the first point at which The usefulness of the Gibbs sampler increases greatly

these densities appear to be the same under a "felt-tip as the dimension of a problem increases. This is because

pen test." Tanner (1991) suggests monitoring a se- the Gibbs sampler allows us to avoid calculating inte-

quence of weights that measure the discrepancy be- grals like (2.1). which can be prohibitively difficult in

tween the sampled and the desired distribution. Gew- high dimensions. Moreover, calculations of the high

eke (in press) suggests monitoring based on time series dimensional integral can be replaced by a series of one-

considerations. Unfortunately, such monitoring ap- dimensional random variable generations, as in (4.6).

proaches are not foolproof. illustrated by Gelman and Such generations can in many cases be accomplished

Rubin (1991). An alternative may be to choose k based efficiently (see Devroye 1986; Gilks and Wild 1992;

on theoretical considerations, as in Raftery and Ban- Ripley 1987).

field (1990). M.T. Wells (personal communication) has The ultimate value of the Gibbs sampler lies in its

suggested a connection between selecting k and the practical potential. Now that the groundwork has been

cooling parameter in simulated annealing. laid in the pioneering papers of Geman and Geman

(1984), Tanner and Wong (1987), and Gelfand and Smith

5.2 Approaches to Sampling the Gibbs Sequence (1990). research using the Gibbs sampler is exploding.

A natural alternative to sampling the kth or final A partial (and incomplete) list includes applications to

value from many independent repetitions of the Gibbs generalized linear models [Dellaportas and Smith (19901,

sequence, as we did in Section 2, is to generate one who implement the Gilks and Wild methodology, and

long Gibbs sequence and then extract every r-th obser- Zeger and Rizaul Karim (1991)l; to mixture models

vation (see Geyer, in press). For r large enough, this (Diebolt and Robert 1990; Robert 1990; to evaluating

would also yield an approximate iid sample from f(x). computing algorithms (Eddy and Schervish 1990); to

An advantage of this approach is that it lessens the general normal data models (Gelfand, Hill, -and Lee

dependence on initial values. A potential disadvantage 1992); to DNA sequence modeling (Churchill and

Casella 1991; Geyer and Thompson, in press); to ap-

is that the Gibbs sequence may stay in a small subset

of the sample space for a long time (see Gelman and plications in HIV modeling (Lange, Carlin, and Gel-

fand 1990); to outlier problems (Verdinelli and Was-

Rubin 1991).

serman 1990); to logistic regression (Albert and Chib

For large, computationally expensive problems, a less

1991); to supermarket scanner data modeling (Blattberg

wasteful approach to exploiting the Gibbs sequence is

and George 1991); to constrained parameter estimation

to use all realizations of X,' for j 5 k, as in George and

(Gelfand et al. 1992); and to capture-recapture mod-

McCulloch (1991). Although the resulting data will be

eling (George and Robert 1991).

dependent, it will still be the case that the empirical

distribution of X,' converges to f(x). Note that from this [Received December 1990. Revised September 1991.1

point of view one can see that the "efficiency of the

Gibbs sampler" is determined by the rate of this con- REFERENCES

vergence. Intuitively, this convergence rate will be fast-

Albert, J., and Chib, S. (1991), "Bayesian Analysis of Binary and

est when X,' moves rapidly through the sample space, Polychotomous Response Data." technical report, Bowling Green

a characteristic that may be thought of as mixing. Varia- State University, Dept. of Mathematics and Statistics.

tions on these and other approaches to exploiting the Besag. J. (1974), "Spatial Interaction and the Statistical Analysis of

Gibbs sequence have been suggested by Gelman and Life Systems," Jo~irnalof the Royal Statisticul Society, Scr. B, 36,

Rubin (1991), Geyer (in press), Muller (1991), Ritter 192-236.

Blattberg, R . , and George, E . I. (1991), "Shrinkage Estimation of

and Tanner (1990), and Tierney (1991). Price and Promotional Elasticities: Seemingly Unrelated Equa-

tions," Journal o f the American Statistical Associiltio~z,86, 304-

6. DISCUSSION 315.

Churchill, G., and Casclla, G . (1991), "Sampling Based Methods for

Both the Gibbs sampler and the Data Augmentation the Estimation of DNA Sequence Accuracy," Technical Report

Algorithm have found widespread use in practical prob- BU-1138-M, Cornell University, Biometries Unit.

lems and can be used by either the Bayesian or classical Dellaportas, P., and Smith, A. F. M. (1990), "Bayesian Inference

for Generalized Linear Models," Technical Report, Univcrsity of

statistician. For the Bayesian, the Gibbs sampler is mainly Nottingham, Dept. of Mathematics.

used to generate posterior distributions, whereas for Dempster, A . P . , Laird, N . , and Rubin, D . B. (1977). "Maximum

the classical statistician a major use is for calculation of Likelihood From Incomplete Data via the EM Algorithm" (with

discussion), Journal of the Royal Statistical Society, Ser. B, 39, 1- Hastings, W. K. (1970), "Monte Carlo Sampling Methods Using

Diebolt, J . , and Robert, C. (1990), "Estimation of Finite Mixture Hoel, P. G., Port, S. C., and Stone, C. J. (1972), Introduction to

Distributions Through Bayesian Sampling," technical report, Univ- Stochastic Processes, New York: Houghton-Mifflin.

ersitt Paris VI, L.S.T.A. Lnage, N., Carlin, B. P . , and Gelfand, A. E . (1990), "Hierarchical

Devroye, L. (1986), Non-Uniform Random Variate Generation, New Bayes Models for the Progression of HIV Infection Using Longi-

York: Springer-Verlag. tudinal CD4+ Counts," technical report, Carnegie Mellon Uni-

Eddy, W. F . , and Schervish, M. J. (1990), "The Asymptotic Dis- versity, Dept. of Statistics.

tribution of the Running Time of Quicksort," technical report, Liu, J . , Wong, W. H . , and Kong, A . (1991), "Correlation Structure

Carnegie Mellon University, Dept. of Statistics. and Convergence Rate of the Gibbs Sampler (I): Applications to

Efron, B. (1982), The Jackknife, the Bootstrap, and Other Resampling the Comparison of Estimators and Augmentation Schemes," Tech-

Plans, National Science Foundation-Conference Board of the nical Report 299, University of Chicago, Dept. of Statistics.

Mathematical Sciences Monograph 38, Philadelphia: SIAM. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A.

Gelfand, A. E., Hills, S. E., Racine-Poon, A , , and Smith, A . F. M. H . , and Teller, E . (1953), "Equations of State Calculations by Fast

(1990), "Illustration of Bayesian Inference in Normal Data Models Computing Machines," Journal of Chemical Physics, 21, 1087-

Using Gibbs Sampling," Journal of the American Statistical Asso- 1091.

ciation, 85, 972-985. Muller, P. (1991), "A Generic Approach to Posterior Integration and

Gelfand, A. E . , and Smith, A. F. M. (1990), "Sampling-Based Ap- Gibbs Sampling," Technical Report 91-09, Purdue University, Dept.

proaches to Calculating Marginal Densities," Journal ofthe Amer- of Statistics.

ican Statistical Association, 85, 398-409. Raftery, A. E . , and Banfield, J. D . (1990), "Stopping the Gibbs

Gelfand, A. E., Smith, A . F. M., and Lee, T-M. (1992), "Bayesian Sampler, the Use of Morphology, and Other Issues in Spatial Sta-

Analysis of Constrained Parameter and Truncated Data Problems tistics," technical report, University of Washington, Dept. of

Using Gibbs Sampling," Journal of the American Statistical Asso- Statistics.

ciation, 87, 523-532. Ripley, B. D . (1987), Stochastic Simulation, New York: John Wiley.

Gelman, A , , and Rubin, D. (1991), "An Overview and Approach Ritter, C., and Tanner, M. A . (1990), "The Griddy Gibbs Sampler,"

to Inference From Iterative Simulation," Technical Report, Uni- technical report, University of Rochester.

versity of California-Berkeley, Dept. of Statistics. Robert, C. P. (1990), "Hidden Mixtures and Bayesian Sampling,"

Geman, S., and Geman, D. (1984), "Stochastic Relaxation, Gibbs technical report, Universitt Paris VI, L.S.T.A.

Distributions, and the Bayesian Restoration of Images," IEEE Roberts, G. O . , and Polson, N. G . (1990), "A Note on the Geometric

Transactions on Pattern Analysis and Machine Intelligence, 6, 721- Convergence of the Gibbs Sampler," Technical Report, University

741. of Nottingham, Dept. of Mathematics.

George, E . I . , and McCulloch, R . E . (1991), "Variable Selection via Schervish, M. J., and Carlin, B. P. (1990), "On the Convergence of

Gibbs Sampling," Technical Report, University of Chicago, Grad- Successive Substitution Sampling," Technical Report 492, Carnegie

uate School of Business. Mellon University, Dept. of Statistics.

George, E . I . , and Robert, C. P. (1991), "Calculating Bayes Esti- Smith, A . F. M., and Gelfand, A . E . (1992), "Bayesian Statistics

mates for Capture-Recapture Models," Technical Report 94, Uni- Without Tears: A Sampling-Resampling Perspective, The Amer-

versity of Chicago, Statistics Research Center, and Cornell Uni- ican Statistician, 46, 84-88.

versity, Biometrics Unit. Tanner, M. A . (1991), Tools for Statistical Inference, New York:

Geweke, J. (in press), "Evaluating the Accuracy of Sampling-Based Springer-Verlag.

Approaches to the Calculation of Posterior Moments," Proceedings Tanner, M. A , , and Wong, W. (1987), "The Calculation of Posterior

of the Fourth Valencia International Conference on Bayesian Sta- Distributions by Data Augmentation" (with discussion), Journal

tistics. of the American Statistical Association, 82, 528-550.

Geyer, C. J . (in press), "Markov Chain Monte Carlo Maximum Like-

Tierney, L. (1991), "Markov Chains for Exploring Posterior Distri-

lihood," Computer Science and Statistics: Proceedings of the 23rd

butions," Technical Report 560, University of Minnesota, School

Symposium on the Interface.

of Statistics.

Geyer, C. J . , and Thompson, E . A. (in press), "Constrained Max-

imum Likelihood and Autologistic Models With and Application Verdinelli, I . , and Wasserman, L. (1990), "Bayesian Analysis of

to DNA Fingerprinting Data, Journal of the Royal Statistical So- Outlier Problems Using the Gibbs Sampler," Technical Report 469,

ciety, Ser. B. Carnegie Mellon University, Dept. of Statistics.

Gilks, W. R., and Wild, P. (1992), "Adaptive Rejection Sampling Zeger, S . , and Rizaul Karim, M. (1991), "Generalized Linear Models

for Gibbs Sampling," Journal of the Royal Statistical Society, Ser. With Random Effects: A Gibbs Sampling Approach," Journal of

C, 41, 337-348. the American Statistical Association, 86, 79-86.

http://www.jstor.org

LINKED CITATIONS

- Page 1 of 2 -

Explaining the Gibbs Sampler

George Casella; Edward I. George

The American Statistician, Vol. 46, No. 3. (Aug., 1992), pp. 167-174.

Stable URL:

http://links.jstor.org/sici?sici=0003-1305%28199208%2946%3A3%3C167%3AETGS%3E2.0.CO%3B2-R

This article references the following linked citations. If you are trying to access articles from an

off-campus location, you may be required to first logon via your library web site to access JSTOR. Please

visit your library's website or contact a librarian to learn about options for remote access to JSTOR.

References

Robert C. Blattberg; Edward I. George

Journal of the American Statistical Association, Vol. 86, No. 414. (Jun., 1991), pp. 304-315.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28199106%2986%3A414%3C304%3ASEOPAP%3E2.0.CO%3B2-U

Alan E. Gelfand; Susan E. Hills; Amy Racine-Poon; Adrian F. M. Smith

Journal of the American Statistical Association, Vol. 85, No. 412. (Dec., 1990), pp. 972-985.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28199012%2985%3A412%3C972%3AIOBIIN%3E2.0.CO%3B2-Y

Alan E. Gelfand; Adrian F. M. Smith

Journal of the American Statistical Association, Vol. 85, No. 410. (Jun., 1990), pp. 398-409.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28199006%2985%3A410%3C398%3ASATCMD%3E2.0.CO%3B2-3

Bayesian Analysis of Constrained Parameter and Truncated Data Problems Using Gibbs

Sampling

Alan E. Gelfand; Adrian F. M. Smith; Tai-Ming Lee

Journal of the American Statistical Association, Vol. 87, No. 418. (Jun., 1992), pp. 523-532.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28199206%2987%3A418%3C523%3ABAOCPA%3E2.0.CO%3B2-2

http://www.jstor.org

LINKED CITATIONS

- Page 2 of 2 -

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

W. K. Hastings

Biometrika, Vol. 57, No. 1. (Apr., 1970), pp. 97-109.

Stable URL:

http://links.jstor.org/sici?sici=0006-3444%28197004%2957%3A1%3C97%3AMCSMUM%3E2.0.CO%3B2-C

A. F. M. Smith; A. E. Gelfand

The American Statistician, Vol. 46, No. 2. (May, 1992), pp. 84-88.

Stable URL:

http://links.jstor.org/sici?sici=0003-1305%28199205%2946%3A2%3C84%3ABSWTAS%3E2.0.CO%3B2-X

Martin A. Tanner; Wing Hung Wong

Journal of the American Statistical Association, Vol. 82, No. 398. (Jun., 1987), pp. 528-540.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28198706%2982%3A398%3C528%3ATCOPDB%3E2.0.CO%3B2-M

Scott L. Zeger; M. Rezaul Karim

Journal of the American Statistical Association, Vol. 86, No. 413. (Mar., 1991), pp. 79-86.

Stable URL:

http://links.jstor.org/sici?sici=0162-1459%28199103%2986%3A413%3C79%3AGLMWRE%3E2.0.CO%3B2-I

- Problem solving software developmentDiunggah olehShofiana Fitri
- GoldSim AppendicesDiunggah olehMaynard Huanca Cordova
- AFNOR NF T72-281 - Complete DocumentDiunggah olehfdobonat613
- Probabilistic Multi-Hypothesis TrackingDiunggah olehEric Boler
- EE 564-Stochastic Systems-Momin UppalDiunggah olehvik05
- Probability Random Process R08 NovDec 10Diunggah olehRomeobenet
- 81 Monte CarloDiunggah olehJrg Trjlo
- MCMCDiunggah olehAnindya Apriliyanti Pravitasari Kurniawan
- UT Dallas Syllabus for stat3332.001.08f taught by Yuly Koshevnik (yxk055000)Diunggah olehUT Dallas Provost's Technology Group
- MATH3081-SyllabusDiunggah olehJennifer Langh
- emnlp08Diunggah olehEhsan Zamiri
- writing sampleDiunggah olehapi-242849165
- Lecture 7 + Problem Set (2) - Statistics.pptxDiunggah olehMohanad Suliman
- psychology assignmentDiunggah olehapi-3840828
- Glossary of Theory of EquationsDiunggah olehVeeraragavan Subramaniam
- RR210403-PROBABILITYTHEORYANDSTOCHASTICPROCESSDiunggah olehsandeshbaheti3492
- Sila BusDiunggah olehMiljan Kovacevic
- chapter04ADiunggah oleh7heavens
- tenney1Diunggah olehMario Cruz
- Week 3_2016s1Diunggah olehAnonymous jebUKGtD
- S1) Basic Probability ReviewDiunggah olehMAYANK HARSANI
- DAV Faridabad Maths AssignmentDiunggah olehAdwin Anil Saldanha
- Item Analysis STEM B OkDiunggah olehjun del rosario
- Item Analysis HUMMS a OkDiunggah olehjun del rosario
- 14 PGDEM Syllabus.pdfDiunggah olehSravani
- newresconst (1)Diunggah olehValter Silveira
- HJKDiunggah olehirsad
- Quantized Control Design for Cognitive R20160115-16229-1q88mr6Diunggah olehishir babu sharma
- Project GuidelinesDiunggah olehAbhinav Goyal
- qualitativesummary sp14Diunggah olehapi-320617015

- IAS 17 — LeasesDiunggah olehNico Rivera Callang
- Electrochem a Century of Organic ElectrochemistryDiunggah olehUDGARD
- Puff PastryDiunggah olehmxpxaxo
- Agri Dll JhoDiunggah olehJennifer J. Pascua
- 321321Diunggah olehKasu_777
- TB01200001E[1]eatonDiunggah olehRAUL354
- SS7 Tool Full Guide for Linux Users FULL TUTORIAL PART 2 – Termux CommunityDiunggah olehVishal Kumar
- Electromagnetic InductionDiunggah olehrahul1nayan
- Partial Discharge PaperDiunggah olehnutzy1967
- FingerLock SoftwareDiunggah olehCochineal Peru
- Topics of Master ThesisDiunggah olehRoberto Hernández
- SM-Ethernet User Guide - Issue 6Diunggah olehjunior
- Hystero and OtherDiunggah olehiisisiis
- English Experience Press Grammar ZappersDiunggah olehcsui2
- 3Diunggah olehAnggita Permatasari
- What is SUID GUID and Sticky BitDiunggah olehMohaideen
- evidence report fingerprintingDiunggah olehapi-239024206
- powerpoint l4Diunggah olehapi-134134588
- Health PolicyDiunggah olehSONIAHAIDER
- Mock MeetingDiunggah olehMerlynda Jipa
- BIOL3220K syllabus_Sum 15.docDiunggah olehGeorge Rojas
- 1558_KoreaSTEMI11Diunggah olehChaitanya Jsk
- ITS World Congress Tokyo 2013_OptiX OSN 3500 V100R008 Product Overview V1.8(20120319).docDiunggah olehritesh kumar singh
- prakashDiunggah olehPrakash Raj
- Vogel's Textbook of Practical Organic Chemistry 5thDiunggah olehcrazy_guy
- HUAWEI CloudEngine 12800 Switch Datasheet.pdfDiunggah olehdarius
- fcs470- micro teaching 1Diunggah olehapi-270439901
- Japanese Seaports SecurityDiunggah olehydmisra
- TIGERDIRECT, INC. v. TIGERDIRECTSUCKS.ORG - Document No. 6Diunggah olehJustia.com
- portfolio assessment.pdfDiunggah olehShu Xian Luo