Anda di halaman 1dari 10

Supplementary Note About The MLE

Theorem (Invariance Property of MLE). If the MLE of a parameter θ is θ̂ , then the


MLE of g(θ) is g(θ̂).

If the MLE of θ is θ̂, then the MLE of the parameter θ2 + sin θ is θ̂2 + sin(θ̂). So in
particular, if based on an observed data set we got the estimation θ̂ = 0.7226 for θ, then the
estimation for θ2 + sin θ is (0.7226)2 + sin(0.7226) = 1.1835

How to calculate the the Fisher Information

Let l(θ) be the log-likelihood. Here is how the (Fisher) information is calculated.

Case 1 one parameter only

[ ] [ ]
I(θ) = −E l′′ (θ) = E (l′ (θ))2

Case 2. Several parameters (θ1 , ... , θr )

In this case the information matrix is an r × r matrix whose (i, j)-th entry is

[ ∂2l ] [ ∂l ∂l ]
−E =E
∂θi ∂θj ∂θi ∂θj

For example, for two parameters θ1 and θ2 , the Information matrix is:
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E E
 ∂θ12 ∂θ1 ∂θ2   ∂θ12 ∂θ1 ∂θ2 
   
−  = −E  
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E ∂θ1 ∂θ2 E ∂θ22 ∂θ1 ∂θ2 ∂θ22

1
The asymptotic variance of the MLE estimator is then equal to

I(θ)−1

In the case of several parameter, the information matrix needs to be inverted. In the
professional exams only 2 × 2 matrices will be given. The inverse of any invertible 2 × 2
matrix can be calculated from the following formula:

 −1  
a b 1 d −b
  =  
c d ad − bc −c a

Note. The matrix


 
∂2l ∂2l
 ∂θ12 ∂θ1 ∂θ2 
 
 
 
∂2l ∂2l
∂θ1 ∂θ2 ∂θ22

is called the Hessian matrix.

Note. If n observations have been collected, then the amount of information for this n
observations is equal to n times the amount of information from one single observation.

Theorem. The asymptotic variance of the MLE is equal to

I(θ)−1

Example (question 13.66 of the textbook) ∗. A distribution has two parameters, α and
β. A sample of size 10 produced the following loglikelihood function:

l(α, β) = −2.5α2 − 3αβ − β 2 + 50α + 2β + k

where k is a constant. Estimate the covariance matrix of the MLE of (α̂ , β̂).

Solution.

2
∂l
= −5α − 3β + 50
∂α

∂l
= −3α − 2β + 2
∂β

∂2l
= −5
∂α2

∂2l
= −2
∂β 2

∂2l
= −3
∂α∂β
 
5 3
(information matrix) I = −E[Hessian Matrix] =  
3 2

 
1 2 −3
covariance matrix =  
(5)(2) − (3)(3) −3 5

Example. A single observation, x, is taken from a normal distribution with mean µ = 0 and
variance σ 2 = θ. The normal distribution has its probability density function given by

1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π

Let θ̂ be the maximum likelihood estimator of θ. Which of the following is the variance of θ̂ ?.

1 1 1
(A) (B) (C) (D) 2θ (E) θ2
θ θ2 2θ

Solution.

1 x2 x2
e− 2 θ = (2πθ)− 2 e− 2 θ
1
µ=0 σ2 = θ ⇒ L(θ) = √
2πθ

1 x2
l(θ) = C − ln(θ) − with some constant C
2 2θ

3
1 x2 1 x2
l′ (θ) = − + 2 ⇒ l′′ (θ) = −
2θ 2θ 2θ2 θ3

But:

E(X 2 ) = Var(X) + E(X)2 = σ 2 + µ2 = θ

Therefore:

[ ] 1 θ 1 [ ] 1
E l′′ (θ) = 2 − 3 = − 2 ⇒ I(θ) = −E l′′ (θ) = 2
2θ θ 2θ 2θ

1
asymptotic variance = = 2θ2
I(θ)

Distribution Asymptotic Variance


of MLE
θ2
Exponential n
nθ2
Uniform(0 , θ)
 (n+1) (n+2)
2


 Var(µ̂) = σn
2


σ 2
Lognormal Var(σ̂) = 2n



 Cov(µ̂ , σ̂) = 0
α2
Pareto with fixed θ Var(α̂) = n
(α+2)θ2
Pareto with fixed α Var(θ̂) = nα
θ2
Weibull with fixed τ Var(θ̂) = n τ2

Example. Verify the formula for the lognormal.

Solution.
( )n ( ∑ )
1 (x−µ)2 1 (xi − µ)2
f (x) = √ e− 2σ2 ⇒ 2
L(µ , σ ) = f (x1 , ..., xn ) = √ exp −
σ 2π σ 2π 2σ 2

(xi − µ)2
l(µ , σ 2 ) = C − n ln(σ) −
2σ 2

4
∑ ∑
∂l (xi − µ)2 xi − nµ
=− 2
=−
∂µ σ σ2
[ ]
∂2l n ∂2l n
2
=− 2 ⇒ E − 2 = 2
∂µ σ ∂µ σ
∑ [ ]
∂2l 2 (ln(xi ) − µ) ∂2l
= ⇒ E − =0
∂µ ∂σ σ3 ∂µ ∂σ

∂l (ln(xi ) − µ)2 n
= −
∂σ σ3 σ
∑ [ ]
∂2l −3 (ln(xi ) − µ)2 n ∂2l 3nσ 2 n 2n
2
= 4
+ 2 ⇒ E − 2 = 4
− 2 = 2
∂σ σ σ ∂σ σ σ σ
   
n σ2
0 0
I(µ , σ) =  σ2  ⇒ variance-covariance matrix Σ = I −1 =  n 
2n σ2
0 σ2
0 2n

5
Delta Method

σ2
then g(θ̂) ≈ N (g(θ) , g ′ (θ) σn )
2
Theorem. If θ̂ ≈ N (θ , n ),

In the professional exams, here is how it is used (we really don’t pay careful attention as to
whether the distribution is normal of not):

1
g(X) = g(α) + g ′ (α)(X − α) + g ′′ (α)(X − α)2 + · · · ⇒ g(X) ≈ g(α) + g ′ (α)(X − α)
2

Var(g(X)) ≈ g′ (α)2 Var(X)

and in the case of two variables involved:

  
[ ] ∂g
∂g ∂g  Var(X) Cov(X, Y )
 ∂x 
Var(g(X, Y)) ≈
∂x ∂y Cov(X, Y ) Var(Y ) ∂g
∂y

Example. Claim size X follows a single parameter Pareto distribution with known parameter
θ = 50. We estimate α to be 4 with variance 0.3 (variance of estimator). Calculate the
variance of the estimate for P r(X < 100).

Solution.

For a single parameter Pareto distribution we have:


( )α ( )α
θ 50
P (X < x) = 1 − ⇒ P (X < 100) = 1 − = 1 − (0.5)α
x 100

If we the estimator of α by Y , then the estimator of P (X < 100) is g(Y ) = 1 − (0.5)Y . Then
using the delta method:

g ′ (Y ) = −(0.5)Y ln(0.5) ⇒ g ′ (4) = (0.5)4 ln(0.5) = −0.0433

6
Var(g(Y )) = g ′ (4)2 Var(Y ) = (−0.0433)2 (0.3) = 0.00056 ✓

Example (questions 13.56 and 13.74 of the textbook) ∗.

(i) The random variable X had the pdf

f (x) = αλα (λ + x)−α−1 x, α, λ > 0

It is known that λ = 1000. You are given the following observations:

43 , 145 , 233 , 396 , 775

Determine the MLE of α.

(ii) Estimate the variance of the MLE and use it to construct a 95% confidence interval for
E(X ∧ 500)

Solution to part (i).


5
L = α5 10005α (1000 + xj )−α−1
j=1


5
(log-likelihood) l = 5 ln(α) + 5α ln(1000) − (α + 1) ln(1000 + xj )
j=1

5 ∑ 5
5
l′ (α) = + 5 ln(1000) − ln(1000 + xj ) = + 34.5388 − 35.8331
α α
j=1

l′ (α) = 0 ⇒ α̂ = 3.8629

Solution to part (ii).

ln f (x) = ln(α) + α ln(λ) − (α + 1) ln(λ + x)

∂ 2 ln f (x) 1
2
=− 2
∂α α

7
[ ]
∂ 2 ln f (x) n
(based on n=5 observations) I(α) = −nE 2
= 2
∂α α

Invert it to get:

α2 2 2
\ = α̂ = 3.8629 = 2.9844
Var(α̂) = ⇒ Var(α̂)
n n 5
∫ 500 ∫ 500
g(α) = E(X ∧ 500) = xf (x)dx = x α 1000α (1000 + x)−α−1 dx +
∫ 500 0 0
1000 2 1500
500 α 1000α (1000 + x)−α−1 dx = using the Pareto integrals = − ( )α
0 α−1 3 α−1
( )α ( ) ( )
′ 1000 2 1500 1500 2 α 2
g (α) = − + − ln
(α − 1)2 3 (α − 1) 2 α−1 3 3

 g(α̂)
d = 239.88
α̂ = 3.8629 ⇒
 g ′ (α̂) = −39.428

( )2
Var(g(α̂)) ≈ g ′ (α̂) Var(α̂) = (−39.428)2 (2.9844) = 5639.45

√ √
d ±
confidence interval = g(α̂) Var(g(α̂))z0.025 = 239.88 ± ( 5639.45)(1.96)

Example ∗. At this moment, the examples 62.5 and 62.6 of the Finan’s study guide were
solved in class.

8
3. Calculations for qx

If T is to denote the time of death, then the conditional distribution (T | x < T ≤ x + 1) is


assumed to be uniform on (x , x + 1). This is the same assumption we use for large data sets.
Under this assumption, for 0 < t < 1 we have:



 t qx = t qx




 (1−t)qx
q
1−t x+t = 1−t qx

For example, here is how the first equality is proved:

P (x < T ≤ x + t)
q = P (x < T ≤ x + t | x < T ) =
t x
P (x < T )

P (x < T ≤ x + t) P (x < T ≤ x + 1)
=
P (x < T ≤ x + 1) P (x < T )

= P (x < T ≤ x + t | x < T ≤ x + 1) P (x < T ≤ x + 1 | x < T ) = t qx

One problem of interest is the calculation of the MLE for the probability
qx = P ( x < T ≤ x + 1 | x < T ).

Example. A cohort of 500 individuals of age x is observed. The study ends at age x + 1. Five
deaths were observed and as many as 350 of them left the study at age x + 0.7. Assuming a
uniform distribution of death times in one year, find the MLE of qx .

Solution. The likelihood of death by time x + 0.7 is 0.7qx , therefore the likelihood of
surviving to age x + 0.7 is 1 − 0.7qx . Therefore, the likelihood function will be:

L(qx ) = qx5 (1 − 0.7qx )350 (1 − qx )145

The log-likelihood:

9
l(qx ) = 5 ln(qx ) + 350 ln(1 − 0.7qx ) + 145 ln(1 − qx )

dl 5 (350)(0.7) 145 5 245 145


= − − = − −
d qx qx 1 − 0.7qx 1 − qx qx 1 − 0.7qx 1 − qx

Setting the derivative equal to zero gives us:

5(1 − 0.7qx )(1 − qx ) − 145qx (1 − 0.7qx ) − 245qx (1 − qx ) = 0

700 qx2 − 797 qx + 10 = 0 ⇒ qx = 0.0127

Example ∗. In a one year mortality study on ten lives of age x, three withdrawals occur at
time 0.4 and one death is observed. Mortality is assumed to have a uniform distribution.
Determine the maximum likelihood estimate of qx .

Solution.

L(qx ) = qx (1 − 0.4qx )3 (1 − qx )6

The log-likelihood:

l(qx ) = ln(qx ) + 3 ln(1 − 0.4qx ) + 6 ln(1 − qx )

dl 1 1.2 6
= − −
d qx qx 1 − 0.4qx 1 − qx

Setting the derivative equal to zero gives us:

(1 − 0.4qx )(1 − qx ) − 6qx (1 − 0.4qx ) − 1.2qx (1 − qx ) = 0

4 qx2 − 8.6 qx + 1 = 0 ⇒ qx = 0.1234

10

Anda mungkin juga menyukai