Maximum Likelihood

Supplementary Note About The MLE
Theorem (Invariance Property of MLE). If the MLE of a parameter θ is θ̂ , then the

MLE of g(θ) is g(θ̂).
If the MLE of θ is θ̂, then the MLE of the parameter θ2 + sin θ is θ̂2 + sin(θ̂). So in
particular, if based on an observed data set we got the estimation θ̂ = 0.7226 for θ, then the
estimation for θ2 + sin θ is (0.7226)2 + sin(0.7226) = 1.1835
How to calculate the the Fisher Information
Let l(θ) be the log-likelihood. Here is how the (Fisher) information is calculated.
Case 1 one parameter only
[ ] [ ]
I(θ) = −E l′′ (θ) = E (l′ (θ))2
Case 2. Several parameters (θ1 , ... , θr )
In this case the information matrix is an r × r matrix whose (i, j)-th entry is
[ ∂2l ] [ ∂l ∂l ]
−E =E
∂θi ∂θj ∂θi ∂θj
For example, for two parameters θ1 and θ2 , the Information matrix is:
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E E
 ∂θ12 ∂θ1 ∂θ2   ∂θ12 ∂θ1 ∂θ2 
   
−  = −E  
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E ∂θ1 ∂θ2 E ∂θ22 ∂θ1 ∂θ2 ∂θ22
1
The asymptotic variance of the MLE estimator is then equal to
I(θ)−1
In the case of several parameter, the information matrix needs to be inverted. In the
professional exams only 2 × 2 matrices will be given. The inverse of any invertible 2 × 2
matrix can be calculated from the following formula:
 −1  
a b 1 d −b
  =  
c d ad − bc −c a
Note. The matrix

 
∂2l ∂2l
 ∂θ12 ∂θ1 ∂θ2 
 
 
 
∂2l ∂2l
∂θ1 ∂θ2 ∂θ22
is called the Hessian matrix.
Note. If n observations have been collected, then the amount of information for this n
observations is equal to n times the amount of information from one single observation.
Theorem. The asymptotic variance of the MLE is equal to
I(θ)−1
Example (question 13.66 of the textbook) ∗. A distribution has two parameters, α and
β. A sample of size 10 produced the following loglikelihood function:
l(α, β) = −2.5α2 − 3αβ − β 2 + 50α + 2β + k
where k is a constant. Estimate the covariance matrix of the MLE of (α̂ , β̂).
Solution.
2
∂l
= −5α − 3β + 50
∂α
∂l
= −3α − 2β + 2
∂β
∂2l
= −5
∂α2
∂2l
= −2
∂β 2
∂2l
= −3
∂α∂β
 
5 3
(information matrix) I = −E[Hessian Matrix] =  
3 2
 
1 2 −3
covariance matrix =  
(5)(2) − (3)(3) −3 5
Example. A single observation, x, is taken from a normal distribution with mean µ = 0 and
variance σ 2 = θ. The normal distribution has its probability density function given by
1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π
Let θ̂ be the maximum likelihood estimator of θ. Which of the following is the variance of θ̂ ?.
1 1 1
(A) (B) (C) (D) 2θ (E) θ2
θ θ2 2θ
Solution.
1 x2 x2
e− 2 θ = (2πθ)− 2 e− 2 θ
1
µ=0 σ2 = θ ⇒ L(θ) = √
2πθ
1 x2
l(θ) = C − ln(θ) − with some constant C
2 2θ
3
1 x2 1 x2
l′ (θ) = − + 2 ⇒ l′′ (θ) = −
2θ 2θ 2θ2 θ3
But:
E(X 2 ) = Var(X) + E(X)2 = σ 2 + µ2 = θ
Therefore:
[ ] 1 θ 1 [ ] 1
E l′′ (θ) = 2 − 3 = − 2 ⇒ I(θ) = −E l′′ (θ) = 2
2θ θ 2θ 2θ
1
asymptotic variance = = 2θ2
I(θ)
Distribution Asymptotic Variance

of MLE
θ2
Exponential n
nθ2
Uniform(0 , θ)
 (n+1) (n+2)
2

 Var(µ̂) = σn
2


σ 2
Lognormal Var(σ̂) = 2n



 Cov(µ̂ , σ̂) = 0
α2
Pareto with fixed θ Var(α̂) = n
(α+2)θ2
Pareto with fixed α Var(θ̂) = nα
θ2
Weibull with fixed τ Var(θ̂) = n τ2
Example. Verify the formula for the lognormal.
Solution.
( )n ( ∑ )
1 (x−µ)2 1 (xi − µ)2
f (x) = √ e− 2σ2 ⇒ 2
L(µ , σ ) = f (x1 , ..., xn ) = √ exp −
σ 2π σ 2π 2σ 2
∑
(xi − µ)2
l(µ , σ 2 ) = C − n ln(σ) −
2σ 2
4
∑ ∑
∂l (xi − µ)2 xi − nµ
=− 2
=−
∂µ σ σ2
[ ]
∂2l n ∂2l n
2
=− 2 ⇒ E − 2 = 2
∂µ σ ∂µ σ
∑ [ ]
∂2l 2 (ln(xi ) − µ) ∂2l
= ⇒ E − =0
∂µ ∂σ σ3 ∂µ ∂σ
∑
∂l (ln(xi ) − µ)2 n
= −
∂σ σ3 σ
∑ [ ]
∂2l −3 (ln(xi ) − µ)2 n ∂2l 3nσ 2 n 2n
2
= 4
+ 2 ⇒ E − 2 = 4
− 2 = 2
∂σ σ σ ∂σ σ σ σ
   
n σ2
0 0
I(µ , σ) =  σ2  ⇒ variance-covariance matrix Σ = I −1 =  n 
2n σ2
0 σ2
0 2n
5
Delta Method
σ2
then g(θ̂) ≈ N (g(θ) , g ′ (θ) σn )
2
Theorem. If θ̂ ≈ N (θ , n ),
In the professional exams, here is how it is used (we really don’t pay careful attention as to
whether the distribution is normal of not):
1
g(X) = g(α) + g ′ (α)(X − α) + g ′′ (α)(X − α)2 + · · · ⇒ g(X) ≈ g(α) + g ′ (α)(X − α)
2
Var(g(X)) ≈ g′ (α)2 Var(X)
and in the case of two variables involved:
  
[ ] ∂g
∂g ∂g  Var(X) Cov(X, Y )
 ∂x 
Var(g(X, Y)) ≈
∂x ∂y Cov(X, Y ) Var(Y ) ∂g
∂y
Example. Claim size X follows a single parameter Pareto distribution with known parameter
θ = 50. We estimate α to be 4 with variance 0.3 (variance of estimator). Calculate the
variance of the estimate for P r(X < 100).
Solution.
For a single parameter Pareto distribution we have:

( )α ( )α
θ 50
P (X < x) = 1 − ⇒ P (X < 100) = 1 − = 1 − (0.5)α
x 100
If we the estimator of α by Y , then the estimator of P (X < 100) is g(Y ) = 1 − (0.5)Y . Then
using the delta method:
g ′ (Y ) = −(0.5)Y ln(0.5) ⇒ g ′ (4) = (0.5)4 ln(0.5) = −0.0433
6
Var(g(Y )) = g ′ (4)2 Var(Y ) = (−0.0433)2 (0.3) = 0.00056 ✓
Example (questions 13.56 and 13.74 of the textbook) ∗.
(i) The random variable X had the pdf
f (x) = αλα (λ + x)−α−1 x, α, λ > 0
It is known that λ = 1000. You are given the following observations:
43 , 145 , 233 , 396 , 775
Determine the MLE of α.
(ii) Estimate the variance of the MLE and use it to construct a 95% confidence interval for
E(X ∧ 500)
Solution to part (i).
∏
5
L = α5 10005α (1000 + xj )−α−1
j=1
∑
5
(log-likelihood) l = 5 ln(α) + 5α ln(1000) − (α + 1) ln(1000 + xj )
j=1
5 ∑ 5
5
l′ (α) = + 5 ln(1000) − ln(1000 + xj ) = + 34.5388 − 35.8331
α α
j=1
l′ (α) = 0 ⇒ α̂ = 3.8629
Solution to part (ii).
ln f (x) = ln(α) + α ln(λ) − (α + 1) ln(λ + x)
∂ 2 ln f (x) 1
2
=− 2
∂α α
7
[ ]
∂ 2 ln f (x) n
(based on n=5 observations) I(α) = −nE 2
= 2
∂α α
Invert it to get:
α2 2 2
\ = α̂ = 3.8629 = 2.9844
Var(α̂) = ⇒ Var(α̂)
n n 5
∫ 500 ∫ 500
g(α) = E(X ∧ 500) = xf (x)dx = x α 1000α (1000 + x)−α−1 dx +
∫ 500 0 0
1000 2 1500
500 α 1000α (1000 + x)−α−1 dx = using the Pareto integrals = − ( )α
0 α−1 3 α−1
( )α ( ) ( )
′ 1000 2 1500 1500 2 α 2
g (α) = − + − ln
(α − 1)2 3 (α − 1) 2 α−1 3 3

 g(α̂)
d = 239.88
α̂ = 3.8629 ⇒
 g ′ (α̂) = −39.428
( )2
Var(g(α̂)) ≈ g ′ (α̂) Var(α̂) = (−39.428)2 (2.9844) = 5639.45
√ √
d ±
confidence interval = g(α̂) Var(g(α̂))z0.025 = 239.88 ± ( 5639.45)(1.96)
Example ∗. At this moment, the examples 62.5 and 62.6 of the Finan’s study guide were
solved in class.
8
3. Calculations for qx
If T is to denote the time of death, then the conditional distribution (T | x < T ≤ x + 1) is

assumed to be uniform on (x , x + 1). This is the same assumption we use for large data sets.
Under this assumption, for 0 < t < 1 we have:



 t qx = t qx




 (1−t)qx
q
1−t x+t = 1−t qx
For example, here is how the first equality is proved:
P (x < T ≤ x + t)
q = P (x < T ≤ x + t | x < T ) =
t x
P (x < T )
P (x < T ≤ x + t) P (x < T ≤ x + 1)
=
P (x < T ≤ x + 1) P (x < T )
= P (x < T ≤ x + t | x < T ≤ x + 1) P (x < T ≤ x + 1 | x < T ) = t qx
One problem of interest is the calculation of the MLE for the probability
qx = P ( x < T ≤ x + 1 | x < T ).
Example. A cohort of 500 individuals of age x is observed. The study ends at age x + 1. Five
deaths were observed and as many as 350 of them left the study at age x + 0.7. Assuming a
uniform distribution of death times in one year, find the MLE of qx .
Solution. The likelihood of death by time x + 0.7 is 0.7qx , therefore the likelihood of
surviving to age x + 0.7 is 1 − 0.7qx . Therefore, the likelihood function will be:
L(qx ) = qx5 (1 − 0.7qx )350 (1 − qx )145
The log-likelihood:
9
l(qx ) = 5 ln(qx ) + 350 ln(1 − 0.7qx ) + 145 ln(1 − qx )
dl 5 (350)(0.7) 145 5 245 145

= − − = − −
d qx qx 1 − 0.7qx 1 − qx qx 1 − 0.7qx 1 − qx
Setting the derivative equal to zero gives us:
5(1 − 0.7qx )(1 − qx ) − 145qx (1 − 0.7qx ) − 245qx (1 − qx ) = 0
700 qx2 − 797 qx + 10 = 0 ⇒ qx = 0.0127
Example ∗. In a one year mortality study on ten lives of age x, three withdrawals occur at
time 0.4 and one death is observed. Mortality is assumed to have a uniform distribution.
Determine the maximum likelihood estimate of qx .
Solution.
L(qx ) = qx (1 − 0.4qx )3 (1 − qx )6
The log-likelihood:
l(qx ) = ln(qx ) + 3 ln(1 − 0.4qx ) + 6 ln(1 − qx )
dl 1 1.2 6
= − −
d qx qx 1 − 0.4qx 1 − qx
Setting the derivative equal to zero gives us:
(1 − 0.4qx )(1 − qx ) − 6qx (1 − 0.4qx ) − 1.2qx (1 − qx ) = 0
4 qx2 − 8.6 qx + 1 = 0 ⇒ qx = 0.1234
10

Maximum Likelihood

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Maximum Likelihood

Diunggah oleh

Hak Cipta:

Format Tersedia

Supplementary Note About The MLE

Theorem (Invariance Property of MLE). If the MLE of a parameter θ is θ̂ , then the

How to calculate the the Fisher Information

Case 1 one parameter only

Case 2. Several parameters (θ1 , ... , θr )

Note. The matrix

is called the Hessian matrix.

Theorem. The asymptotic variance of the MLE is equal to

l(α, β) = −2.5α2 − 3αβ − β 2 + 50α + 2β + k

E(X 2 ) = Var(X) + E(X)2 = σ 2 + µ2 = θ

Distribution Asymptotic Variance

Example. Verify the formula for the lognormal.

Var(g(X)) ≈ g′ (α)2 Var(X)

and in the case of two variables involved:

For a single parameter Pareto distribution we have:

g ′ (Y ) = −(0.5)Y ln(0.5) ⇒ g ′ (4) = (0.5)4 ln(0.5) = −0.0433

Example (questions 13.56 and 13.74 of the textbook) ∗.

(i) The random variable X had the pdf

f (x) = αλα (λ + x)−α−1 x, α, λ > 0

It is known that λ = 1000. You are given the following observations:

43 , 145 , 233 , 396 , 775

Determine the MLE of α.

Solution to part (i).

Solution to part (ii).

ln f (x) = ln(α) + α ln(λ) − (α + 1) ln(λ + x)

If T is to denote the time of death, then the conditional distribution (T | x < T ≤ x + 1) is

For example, here is how the first equality is proved:

= P (x < T ≤ x + t | x < T ≤ x + 1) P (x < T ≤ x + 1 | x < T ) = t qx

L(qx ) = qx5 (1 − 0.7qx )350 (1 − qx )145

dl 5 (350)(0.7) 145 5 245 145

Setting the derivative equal to zero gives us:

5(1 − 0.7qx )(1 − qx ) − 145qx (1 − 0.7qx ) − 245qx (1 − qx ) = 0

700 qx2 − 797 qx + 10 = 0 ⇒ qx = 0.0127

l(qx ) = ln(qx ) + 3 ln(1 − 0.4qx ) + 6 ln(1 − qx )

Setting the derivative equal to zero gives us:

(1 − 0.4qx )(1 − qx ) − 6qx (1 − 0.4qx ) − 1.2qx (1 − qx ) = 0

4 qx2 − 8.6 qx + 1 = 0 ⇒ qx = 0.1234

Anda mungkin juga menyukai