Anda di halaman 1dari 67

BAYESIAN MODELS

(SINGLE-PARAMETER)

Shirlee Remoto-Ocampo
shirlee.ocampo@dlsu.edu.ph

Components of Bayesian Inference


Prior Distribution use probability to
quantify uncertainty about unknown quantities
(parameters)
Likelihood relates all variables into a full
probability model
Posterior Distribution result of using data
to update information about unknown
quantities (parameters)

Bayesian inference
Prior information p() on parameters
Likelihood of data given parameter
values f(y| )

Bayesian inference

f ( y | ) p ()
p ( | y )
f ( y)

or

( | y ) f ( y | ) p()

Posterior distribution is proportional


to likelihood prior distribution.

Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Importance of priors
Prior beliefs about uncertain parameters
are a fundamental part of Bayesian
statistics.
When we have few data about the
parameter of interest, our prior beliefs
dominate inference about that parameter.
In any application, effort should be made
to model our prior beliefs accurately.
24-25 January 2007

An Overview of State-ofthe-Art Data Modelling

Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Assessment of Prior Distributions


Discrete Case I: Very little or no information available
(1) Assign probabilities directly to various possible values of
the uncertainty quantity of interest
Ex. The probability that p = 0.6 is 0.01.
(2) Use of lotteries
Ex. X is obtained with P(E); Y is obtained with 1- P( E )
(3) Use of betting odds
If the probability of an event is p then the odds in favor of
that event are p to 1 p. If the odds in favor of an event are
a to b then the probability of that event is a/(a+b).
Shirlee Remoto-Ocampo

Assessment of Prior Distributions


Discrete Case 2: There is prior information
Rule of thumb: Let the probability be as close as
possible to the relative frequencies

Independent case: joint prior distribution is the


product of the marginals
Uncertain about independent case: use conditional
distribution

Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Assessment of Prior Distributions


Continuous Case 1: Historical or sample
information are available

Draw the histogram, smooth it out and determine if the


resulting curve more or less approximates a common
parametric family
Shirlee Remoto-Ocampo

Assessment of Prior Distributions

Use of measures such as mean, median, mode,


quantiles, etc.
Simply determine a series of quantiles and then
plot. Draw a rough curve through them and
determine the distribution

Shirlee Remoto-Ocampo

Continuous Case 2: Very little or no information

Shirlee Remoto-Ocampo

PRINCIPLE OF INSUFFICIENT REASON

When nothing is known about in


advance, let the prior p( ) be a uniform
distribution, that is, let all possible
outcome of have the same probability.
Shirlee Remoto-Ocampo

Example
1. Prior Distribution: A ball W is randomly thrown
(according to a uniform distribution on the table).
The horizontal position of the ball on the table is ,
expressed as a fraction of the table width.
2. Likelihood: A ball O is randomly thrown n times.
The value of y is the number of times ball O lands to
the right of ball W.

UNIFORM-BINOMIAL MODEL
Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Location Model
P(y-|) is free of and y
Example:
Show that N(,1) represents a location
Model.
Other examples:
U( 1/2 , + )
Cauchy density
Shirlee Remoto-Ocampo

Example

Shirlee Remoto-Ocampo

Scale Model
P(y/|) is free of and y
Y has a scale model if a function f and a quantity
such that Y has a distribution
f(y|) =

1
f( )

Show that N(0, 2 ) represents a scale model.


Other Examples:
Exponential family
U(0, )
Shirlee Remoto-Ocampo

Location-Scale Model
Y has a scale model if a function f and quantities
and such that Y has a distribution given (, )
satisfies
f(y|, ) =

1
f( )

Show that N(, 2 ) is a location-scale model.

Shirlee Remoto-Ocampo

Weak prior information


If we accept the subjective nature of Bayesian
statistics and are not comfortable using subjective
priors, then many have argued that we should try to
specify prior distributions that represent no prior
information.
These prior distributions are called noninformative,
reference, ignorance or weak priors.
The idea is to have a completely flat prior distribution
over all possible values of the parameter.
Unfortunately, this can lead to improper distributions
being used.
24-25 January 2007

An Overview of State-ofthe-Art Data Modelling

Prior and Improper Priors


Proper prior prior distribution that does not
depend on data and integrates to 1
Improper prior prior distribution that does not
integrate to 1 ( infinite integral or to any positive
finite value)

Note: Improper prior distributions can lead to


proper posterior distribution
Shirlee Remoto-Ocampo

Weak prior information


In our coin tossing example, Be(1,1), Be(0.5,0.5) and Be(0,0)
have been recommended as noninformative priors. Be(0,0) is
improper.

J() = I() = Fisher information for

Shirlee Remoto-Ocampo

Jeffreys non-informative prior


1. Consider a one-to-one transformations of the
parameter, say = h()
2. Prior density p() is equivalent to prior
density on .

p()| |

p() =
density

leading to the noninformative prior


1/2

p() [J(]

Fisher Information: I() = J() = Shirlee Remoto-Ocampo

2 (|)
E[
|]
2

Example
Find the Jeffreys noninformative prior
for the Poisson distribution (single
parameter).

Shirlee Remoto-Ocampo

Shirlee Remoto-Ocampo

Example
Find the Jeffreys noninformative prior
for the Poisson distribution.
1 , 2 | ~Poisson()
1 , 2 |1 , 2 ~Poisson(1 , 2 )

Shirlee Remoto-Ocampo

EXERCISE/SEATWORK

Shirlee Remoto-Ocampo

Informative priors
An informative prior is an accurate
representation of our prior beliefs.
An informative prior is essential when we
have few or no data for the parameter of
interest.
Elicitation is the process of translating
someones beliefs into a distribution.
24-25 January 2007

An Overview of State-ofthe-Art Data Modelling

Conjugate priors
When we move away from noninformative
priors, we might use priors that are in a
convenient form.
That is a form where combining them with the
likelihood produces a distribution from the
same family.

24-25 January 2007

An Overview of State-ofthe-Art Data Modelling

Informative Priors

Shirlee Remoto-Ocampo

Conjugate Priors
Conjugacy property that the posterior
distribution follows the same parametric form as
the prior distribution
Hyperparameters parameters of a prior
distribution

Shirlee Remoto-Ocampo

Assessment of Likelihood
Case 1: Sufficiently large set of data
1. Make an appropriate frequency histogram for the
data
2. Determine the mean and the variance.
3. Hypothesize a distribution that might fit the data
4. Estimate the parameter under the hypothesized
data
5. Test the good ness of fit of data against the
hypothetical data
Shirlee Remoto-Ocampo

Assessment of Likelihood
Case 2: Extremely (sparse) Set of Data
Make a smooth assessment of the CDF
1. Make a preliminary estimate of the cumulative
frequencies corresponding to each observed value
of the variable of interest, that is, array of n
observations. The kth observation will be the
k/(n+1)th quantile.
2. Adjust the estimates so that the whole distribution
will be smooth and of reasonable shape.
Shirlee Remoto-Ocampo

Assessment of Posterior Distributions


CONJUGATE PRIOR DISTRIBUTIONS
Formal Definition of CONJUGACY
If F is a class of sampling distributions h(|x), and P
is a class of prior distributions for , then the
class P is CONJUGATE for F is
h(|x) P for all h(.|) F and p(.) P

Shirlee Remoto-Ocampo

Conjugate Prior for a Bernoulli Process


Let 1 , 2 , be a random sample ~
Bernoulli(). Suppose that the prior distribution
of is Beta (, ), , > 0. Then the posterior
distribution of is Beta ( + , + ) where y =

=1 .

Shirlee Remoto-Ocampo

Recall: Bernoulli Distribution

Shirlee Remoto-Ocampo

Recall: Binomial Distribution

Shirlee Remoto-Ocampo

Binomial Distribution

Shirlee Remoto-Ocampo

Prior: Beta Distribution

Shirlee Remoto-Ocampo

Beta and Gamma Functions

Shirlee Remoto-Ocampo

Beta Distribution

Shirlee Remoto-Ocampo

Beta-Binomial Bayesian Model

Shirlee Remoto-Ocampo

Posterior Mean and Variance

Question:
What is the posterior mean and posterior
variance of the beta-binomial model?
Uniform-binomial model?
Shirlee Remoto-Ocampo

Poisson-Gamma Bayesian Model


Let 1 , 2 , be a random sample ~
Poisson(), > 0. Suppose that the prior
distribution of is Gamma (,), , > 0. Then the
posterior distribution of is Gamma (r = r + ,
n = n + ) where y = =1 .

Shirlee Remoto-Ocampo

Recall: Poisson Distribution

Shirlee Remoto-Ocampo

Recall: Gamma Distribution

Shirlee Remoto-Ocampo

Poisson Gamma Bayesian Model

Shirlee Remoto-Ocampo

Posterior Mean

What is the posterior mean for PoissonGamma model? posterior variance?


Shirlee Remoto-Ocampo

Gamma-Poisson Model
https://www.youtube.com/watch?v=0XD6C_MQX
XE

Shirlee Remoto-Ocampo

Negative Binomial -Beta Model


Let 1 , 2 , be a random sample ~ Negative
Binomial
(r,),

ranges from 0 to 1. Suppose that the prior
distribution of is Beta (, ), , > 0. Then the
posterior distribution of is
Beta ( + , + ) where y = =1 .
EXERCISE!
Shirlee Remoto-Ocampo

Exponential-Gamma Model
Let 1 , 2 , be a random sample ~ Exp(), >
0. Suppose that the prior distribution of is
Gamma (,), , > 0. Then the posterior
distribution of is Gamma (r = r + , n = n +
) where y = =1 .

EXERCISE!

Shirlee Remoto-Ocampo

Normal-Normal Model
Normal with known mean

PROBLEM SET

Shirlee Remoto-Ocampo

Normal-Normal Bayesian Model

Shirlee Remoto-Ocampo

Inverse Gamma- Normal


Normal (with known variance)

Shirlee Remoto-Ocampo

Conjugate Priors
Prior
Beta
Gamma
Gamma
Beta

Likelihood
Binomial
Poisson
Exponential
Negative Binomial

Normal

Normal (with known


variance)
Normal (with known
variance)

Posterior
Beta
Gamma
Gamma

Conjugate Priors
Prior

Likelihood

Dirichlet

Multinomial

Shirlee Remoto-Ocampo

Posterior

APPLICATIONS
Suppose that the prior information concerning p, the
proportion of defectives, can well be represented by a
Beta distribution with parameters = 1 and = 19.
Following the assessment of his prior distribution, the
manager takes a random sample of 5 items from the
production process, observing one defective item.
Determine the posterior distribution of p.
Suppose that the manager decides to take another
random sample of 5 items and observes 2 defectives.
Determine the new posterior distribution of p.
Shirlee Remoto-Ocampo

APPLICATIONS
Suppose that the magnetic recording tape is
manufactured by a certain process, the mean number of
defects W on a 1000 ft roll of tape is unknown, and
suppose that the prior distribution of W is a gamma
distribution with mean 2 and variance 1. Suppose that
the number of defects on any roll of tape when W = w
has a Poisson distribution with mean w. Suppose further
that after a random sample of rolls of tape has been
counted, the mean of the posterior distribution of W is
1.6 and the variance is 0.16. Show that 8 rolls of tape
were included in the sample.
Shirlee Remoto-Ocampo

APPLICATIONS
An unknown proportion W of the items produced by a
certain machine is defective. Suppose that the prior
distribution of W is Beta with parameters = 1 and
= 99. Suppose also that items produced by the
machine are selected at random and observed one at a
time until exactly 5 defective items have been found. If,
when sampling terminates, the mean of posterior
distribution of W is 0.02, show that 195 nondefective
items were observed during the sampling process.
EXERCISE!
Shirlee Remoto-Ocampo

APPLICATIONS
Suppose that two physicists A and B are concerned with
obtaining a more accurate of some physical constant ,
previously known approximately. Suppose physicist A,
being very familiar with this area of study, can make a
moderately good guess of what the answer will be, and
that his prior opinion about can be approximately
represented by a normal distribution centered at 900
with a standard deviation of 20. By contrast, suppose that
physicist B has had little experience in this area, and has
rather vague prior beliefs which can be represented by a
normal distribution with mean 800 and a standard
deviation of 80.
Shirlee Remoto-Ocampo

Suppose now that an unbiased method of experimental


measurement is available and that an observation made
by this method, to a sufficient information, follows a
normal distribution with a standard deviation of 40.
Suppose that the result of a single observation is X =
850. Determine the posterior distribution of .

Shirlee Remoto-Ocampo

Pythagorean Theorem is to
Geometry as Bayes Theorem is to
Probability.

THANK YOU!
Shirlee Remoto-Ocampo

Anda mungkin juga menyukai