Anda di halaman 1dari 7

# MATH3871

Assignment 1

Robert Tan
School of Mathematics and
Statistics
z5059256@student.unsw.edu.au
Robert Tan MATH3871: Assignment 1.

Question 1
Let be the true proportion of people over the age of 40 in your community with hypertension.
Consider the following thought experiment:

Part (a)
Making an educated guess, suppose we choose an initial point estimate of = 0.2, obtained by
taking the expectation of a Beta(2, 8) distribution. We choose this type of distribution since it
is the conjugate prior of a binomial distribution, which is the distribution of our data.

Part (b)
If we survey for hypertension within the community, and the first five people randomly selected
have 4 positives, then our posterior distribution can be evaluated as follows:

f () (1 )7
L x| 4 (1 )


## f|x () p () px| (x)

5 (1 )8

So our posterior has a Beta(6, 9) distribution, and the new point estimate using the expected
6
value is 6+9 = 0.4.

Part (c)
If our final survey results are 400 positives out of 1000 people, we can once again compute the
posterior as follows:

f () (1 )7
L x| 400 (1 )600


## f|x () p () px| (x)

401 (1 )607

So our posterior has a Beta(402, 608) distribution, and the new point estimate using the ex-
402
pected value is 402+608 = 0.39802.

1
Robert Tan MATH3871: Assignment 1.

Question 2
Let x1 , . . . , xn Rd be n iid d-dimensional vectors. Suppose that we wish to model xi
Nd (, ) for i = 1, . . . , n where R is an unknown mean vector, and is a known positive
semi-definite covariance matrix.

Part (a)
Claim. By adopting the conjugate  prior Nd (0 , 0 ), the resulting posterior distribution
for |x1 , . . . , xn is Nd , , where
 1  
= 1
0 + n1
1
0 0 + n 1
x

and  1
= 1
0 + n 1
.

## Proof. We have the prior Nd (0 , 0 ), so

 
1 1 > 1
f () = exp ( 0 ) 0 ( 0 ) .
(2)d/2 |0 |1/2 2
We also have the likelihood function as follows:

n
1 1X
(xi )> 1 (xi ) .

L x1 , . . . , xn | = nd/2 n/2
exp
(2) || 2 i=1

## Calculating the posterior:


f|x1 ,...,x2 () p () L x1 , . . . , xn |

n
1 1 X
exp ( 0 )> 1 0 ( 0 ) (xi )> 1 (xi )
2 2 i=1

## Expanding and eliminating the constant terms due to proportionality:

1
exp > 1 > 1 > 1
0 0 0 0 0 +
2
n n
!
X X
n> 1 x> 1
i > 1 xi
i=1 i=1

Adding in a constant term to complete the square and factorising (again, we can do this
because of proportionality):

 1  
1 >

> 1 > 1

1 1 1 1

exp
0 0 + nx 0 + n 0 + n
2

  1  
1 0 + n
1
1 1
0 0 + n x
.

2
Robert Tan MATH3871: Assignment 1.

Using (Ax)> = x> A> and the fact that covariance matrices (and their inverses) are symmetric
and hence invariant under the transpose, we obtain
 1  >   1
1 1 1 1 > 1 > 1 1 1
0 + n 0 0 + n x = 0 0 + nx 0 + n .

So we have
!
1
 1   >  
>
f|x1 ,...,x2 () exp 1
0
1
+ n 1
0 0
1
+ n x 1
0
1
+ n
2

  1  
01 + n1 1 1
0 0 + n x

 
which means the posterior distribution is a multivariate normal Nd , , where
 1  
= 1
0 + n1
1
0 0 + n 1
x

and  1
= 1
0 + n 1
.


Part (b)
We now derive Jeffreys prior J () for . We have the likelihood function from above:

n
1 1 X
(xi )> 1 (xi ) .

L x1 , . . . , xn | = nd/2 n/2
exp
(2) || 2 i=1

## Lemma. If x is a n 1 vector, and A is a n n matrix, then we have

d  >  >

>

x Ax = x A + A .
dx
Proof. We shall use Einsteins summation convention for this proof for clarity. Let x =
(x1 , . . . , xn )> , ej be the jth basis column vector, and let [A]ij = aij .

d  > 
x Ax = (xi aij e>
j x)
dx
= (xi aij xj )
= 2aii xi + (aij + aji)xj e>

i

where j 6= i, since we have one x2i term and the rest are xi xj terms

## = xi aii + aij xj e>

  >
i + xi aii + aji xj ei
 
= x> A> + A .

3
Robert Tan MATH3871: Assignment 1.

## Now, returning to the derivation of Jeffreys prior:

! n
1 1X
(xi )> 1 (xi )

log L x1 , . . . , xn | = log nd/2 n/2

(2) || 2 i=1

n n n
d 1 d > 1 X X X
L= n x> 1
i > 1 xi + x> 1
i xi
d 2 d i=1 i=1 i=1

n d  
= > 1 x> 1 > 1 x
2 d
Using the abovelemma, with the fact that and hence 1 are symmetric, and noting that
   > 
d
dx
x> A = dx
d
A> x = A where A is an n k matrix, with k Z+ (a result we can
confirm easily using summation notation):
d  
L = n > 1 x> 1
d

d2
L = n1 .
d2
! 21
2

d
E L 1.
d2

since the square root of the expectation of the determinant of a constant matrix will also be
a constant. We see that Jeffreys prior for the multivariate normal distribution with fixed
covariance matrix and unknown mean vector is simply proportional to a constant. This result
is similar to the one for a univariate Gaussian distribution with fixed variance, which also has
a constant (improper) distribution for its Jeffreys prior.

Question 3
Part (a)
We know that p is an estimate of the ratio of the area of the circle to the area of the square.
r2
This ratios true value is (2r) 2 = 4 , so this means 4p is an estimate of .

Part (b)
> n <- 1000
> x1 <- runif(n, -1, 1)
> x2 <- runif(n, -1, 1)
> ind <- ((x1^2 + x2^2) < 1)
> pi.hat <- 4 * (sum(ind) / n)
> pi.hat
[1] 3.156

## The above R code gives a one-trial estimate of 4p = = 3.156.

4
Robert Tan MATH3871: Assignment 1.

Part (c)
 
We know that bi is a Bernoulli r.v. with probability /4, so its variance is 4 1 4 . Then
n
X bi  
the sampling variability of = 4 is simply 42 n 4 1 4 n12 = (4)
n
, so by the
i=1
n
Central Limit Theorem we have
!
d (4 )
n
N , .
n
Note: we can re-write this in terms of p as
!
d 16p (1 p)
n
N 4p, .
n

Part (d)
n <- 1000
p <- 0.7854
pi <- 4*p
var <- 16*p*(1-p)/n
pi.hat <- c()
for (i in c(1:1000)) {
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
ind <- ((x1^2 + x2^2) < 1)
pi.hat[i] <- 4 * (sum(ind) / n)
}
hist(pi.hat, breaks = 20, freq = FALSE)
x <- seq(min(pi.hat),max(pi.hat),length = 100)
y <- dnorm(x, mean = 4*p, sd = sqrt(var))
points(x, y, type = "l")

Histogram of pi.hat
6
Density

4
2
0

## 3.00 3.05 3.10 3.15 3.20 3.25 3.30

pi.hat

We can see that the histogram fits the overlay distribution fairly well.

5
Robert Tan MATH3871: Assignment 1.

Part (e)
16p(1p)
We know that the variance is given by n
, which is maximised at p = 0.5, giving n4 .
We choose to maximise the variance since this will result in maximal Monte Carlo sampling
variability, which is what we need for the most conservative estimate of the sample size n
required to estimate to within 0.01 with at least 95% probability. Solving for n:

P | | 0.01 0.95

## We apply the CLT and use a normal approximation to get:

!
0.01 0.01
P Z 0.95 where Z N (0, 1)
2/ n 2/ n
!
0.01
2 P Z 0.5 0.95
2/ n
!
0.01
P Z 0.975
2/ n
0.01
1.96
2/ n

n 392
n 153664.

So n = 153664 is a conservative sample size for estimating to within 0.01 with at least 95%
probability. To be even more conservative, we could round up to 160000 samples (after all, we
are using a normal approximation).