If the MLE of θ is θ̂, then the MLE of the parameter θ2 + sin θ is θ̂2 + sin(θ̂). So in
particular, if based on an observed data set we got the estimation θ̂ = 0.7226 for θ, then the
estimation for θ2 + sin θ is (0.7226)2 + sin(0.7226) = 1.1835
Let l(θ) be the log-likelihood. Here is how the (Fisher) information is calculated.
[ ] [ ]
I(θ) = −E l′′ (θ) = E (l′ (θ))2
In this case the information matrix is an r × r matrix whose (i, j)-th entry is
[ ∂2l ] [ ∂l ∂l ]
−E =E
∂θi ∂θj ∂θi ∂θj
For example, for two parameters θ1 and θ2 , the Information matrix is:
[ ] [ ]
∂2l ∂2l ∂2l ∂2l
E E
∂θ12 ∂θ1 ∂θ2 ∂θ12 ∂θ1 ∂θ2
− = −E
[ ] [ ]
∂2l ∂2l ∂2l ∂2l
E ∂θ1 ∂θ2 E ∂θ22 ∂θ1 ∂θ2 ∂θ22
1
The asymptotic variance of the MLE estimator is then equal to
I(θ)−1
In the case of several parameter, the information matrix needs to be inverted. In the
professional exams only 2 × 2 matrices will be given. The inverse of any invertible 2 × 2
matrix can be calculated from the following formula:
−1
a b 1 d −b
=
c d ad − bc −c a
Note. If n observations have been collected, then the amount of information for this n
observations is equal to n times the amount of information from one single observation.
I(θ)−1
Example (question 13.66 of the textbook) ∗. A distribution has two parameters, α and
β. A sample of size 10 produced the following loglikelihood function:
where k is a constant. Estimate the covariance matrix of the MLE of (α̂ , β̂).
Solution.
2
∂l
= −5α − 3β + 50
∂α
∂l
= −3α − 2β + 2
∂β
∂2l
= −5
∂α2
∂2l
= −2
∂β 2
∂2l
= −3
∂α∂β
5 3
(information matrix) I = −E[Hessian Matrix] =
3 2
1 2 −3
covariance matrix =
(5)(2) − (3)(3) −3 5
Example. A single observation, x, is taken from a normal distribution with mean µ = 0 and
variance σ 2 = θ. The normal distribution has its probability density function given by
1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π
Let θ̂ be the maximum likelihood estimator of θ. Which of the following is the variance of θ̂ ?.
1 1 1
(A) (B) (C) (D) 2θ (E) θ2
θ θ2 2θ
Solution.
1 x2 x2
e− 2 θ = (2πθ)− 2 e− 2 θ
1
µ=0 σ2 = θ ⇒ L(θ) = √
2πθ
1 x2
l(θ) = C − ln(θ) − with some constant C
2 2θ
3
1 x2 1 x2
l′ (θ) = − + 2 ⇒ l′′ (θ) = −
2θ 2θ 2θ2 θ3
But:
Therefore:
[ ] 1 θ 1 [ ] 1
E l′′ (θ) = 2 − 3 = − 2 ⇒ I(θ) = −E l′′ (θ) = 2
2θ θ 2θ 2θ
1
asymptotic variance = = 2θ2
I(θ)
Var(µ̂) = σn
2
σ 2
Lognormal Var(σ̂) = 2n
Cov(µ̂ , σ̂) = 0
α2
Pareto with fixed θ Var(α̂) = n
(α+2)θ2
Pareto with fixed α Var(θ̂) = nα
θ2
Weibull with fixed τ Var(θ̂) = n τ2
Solution.
( )n ( ∑ )
1 (x−µ)2 1 (xi − µ)2
f (x) = √ e− 2σ2 ⇒ 2
L(µ , σ ) = f (x1 , ..., xn ) = √ exp −
σ 2π σ 2π 2σ 2
∑
(xi − µ)2
l(µ , σ 2 ) = C − n ln(σ) −
2σ 2
4
∑ ∑
∂l (xi − µ)2 xi − nµ
=− 2
=−
∂µ σ σ2
[ ]
∂2l n ∂2l n
2
=− 2 ⇒ E − 2 = 2
∂µ σ ∂µ σ
∑ [ ]
∂2l 2 (ln(xi ) − µ) ∂2l
= ⇒ E − =0
∂µ ∂σ σ3 ∂µ ∂σ
∑
∂l (ln(xi ) − µ)2 n
= −
∂σ σ3 σ
∑ [ ]
∂2l −3 (ln(xi ) − µ)2 n ∂2l 3nσ 2 n 2n
2
= 4
+ 2 ⇒ E − 2 = 4
− 2 = 2
∂σ σ σ ∂σ σ σ σ
n σ2
0 0
I(µ , σ) = σ2 ⇒ variance-covariance matrix Σ = I −1 = n
2n σ2
0 σ2
0 2n
5
Delta Method
σ2
then g(θ̂) ≈ N (g(θ) , g ′ (θ) σn )
2
Theorem. If θ̂ ≈ N (θ , n ),
In the professional exams, here is how it is used (we really don’t pay careful attention as to
whether the distribution is normal of not):
1
g(X) = g(α) + g ′ (α)(X − α) + g ′′ (α)(X − α)2 + · · · ⇒ g(X) ≈ g(α) + g ′ (α)(X − α)
2
[ ] ∂g
∂g ∂g Var(X) Cov(X, Y )
∂x
Var(g(X, Y)) ≈
∂x ∂y Cov(X, Y ) Var(Y ) ∂g
∂y
Example. Claim size X follows a single parameter Pareto distribution with known parameter
θ = 50. We estimate α to be 4 with variance 0.3 (variance of estimator). Calculate the
variance of the estimate for P r(X < 100).
Solution.
If we the estimator of α by Y , then the estimator of P (X < 100) is g(Y ) = 1 − (0.5)Y . Then
using the delta method:
6
Var(g(Y )) = g ′ (4)2 Var(Y ) = (−0.0433)2 (0.3) = 0.00056 ✓
(ii) Estimate the variance of the MLE and use it to construct a 95% confidence interval for
E(X ∧ 500)
∏
5
L = α5 10005α (1000 + xj )−α−1
j=1
∑
5
(log-likelihood) l = 5 ln(α) + 5α ln(1000) − (α + 1) ln(1000 + xj )
j=1
5 ∑ 5
5
l′ (α) = + 5 ln(1000) − ln(1000 + xj ) = + 34.5388 − 35.8331
α α
j=1
l′ (α) = 0 ⇒ α̂ = 3.8629
∂ 2 ln f (x) 1
2
=− 2
∂α α
7
[ ]
∂ 2 ln f (x) n
(based on n=5 observations) I(α) = −nE 2
= 2
∂α α
Invert it to get:
α2 2 2
\ = α̂ = 3.8629 = 2.9844
Var(α̂) = ⇒ Var(α̂)
n n 5
∫ 500 ∫ 500
g(α) = E(X ∧ 500) = xf (x)dx = x α 1000α (1000 + x)−α−1 dx +
∫ 500 0 0
1000 2 1500
500 α 1000α (1000 + x)−α−1 dx = using the Pareto integrals = − ( )α
0 α−1 3 α−1
( )α ( ) ( )
′ 1000 2 1500 1500 2 α 2
g (α) = − + − ln
(α − 1)2 3 (α − 1) 2 α−1 3 3
g(α̂)
d = 239.88
α̂ = 3.8629 ⇒
g ′ (α̂) = −39.428
( )2
Var(g(α̂)) ≈ g ′ (α̂) Var(α̂) = (−39.428)2 (2.9844) = 5639.45
√ √
d ±
confidence interval = g(α̂) Var(g(α̂))z0.025 = 239.88 ± ( 5639.45)(1.96)
Example ∗. At this moment, the examples 62.5 and 62.6 of the Finan’s study guide were
solved in class.
8
3. Calculations for qx
P (x < T ≤ x + t)
q = P (x < T ≤ x + t | x < T ) =
t x
P (x < T )
P (x < T ≤ x + t) P (x < T ≤ x + 1)
=
P (x < T ≤ x + 1) P (x < T )
One problem of interest is the calculation of the MLE for the probability
qx = P ( x < T ≤ x + 1 | x < T ).
Example. A cohort of 500 individuals of age x is observed. The study ends at age x + 1. Five
deaths were observed and as many as 350 of them left the study at age x + 0.7. Assuming a
uniform distribution of death times in one year, find the MLE of qx .
Solution. The likelihood of death by time x + 0.7 is 0.7qx , therefore the likelihood of
surviving to age x + 0.7 is 1 − 0.7qx . Therefore, the likelihood function will be:
The log-likelihood:
9
l(qx ) = 5 ln(qx ) + 350 ln(1 − 0.7qx ) + 145 ln(1 − qx )
Example ∗. In a one year mortality study on ten lives of age x, three withdrawals occur at
time 0.4 and one death is observed. Mortality is assumed to have a uniform distribution.
Determine the maximum likelihood estimate of qx .
Solution.
L(qx ) = qx (1 − 0.4qx )3 (1 − qx )6
The log-likelihood:
dl 1 1.2 6
= − −
d qx qx 1 − 0.4qx 1 − qx
10