L. Magee 1. Given: an observation-specic log likelihood function the log likelihood function (|y, X) = a data set (xi , yi ), i = 1, . . . , n a value for the maximum likelihood estimator of the parameter vector briey describe how you would compute (a) the negative Hessian estimator of the variance of (b) the outer product of gradient (OPG) estimator of the variance of (c) a misspecication-consistent variance estimator that follows from interpreting the ML estimator as a method of moments estimator 2. The random variable y has a probability density function f (y) = (1 ) + 2y for 0 < y < 1 = 0 otherwise
i () n i=1 i ()
Fall, 2008
= ln f (yi |xi , )
for 1 < < 1. There are n observations yi , i = 1, . . . , n, drawn independently from this distribution. (a) (i) Write the cumulative distribution function of y. (ii) Derive the expected value of y. (iii) Suggest a method of moments estimator for based on the sample mean y . (b) (i) Write the log likelihood function for . (ii) Write the rst-order condition for the ML estimator of . 3. y1 , . . . , yn are n independent draws from an exponential distribution. The probability density function of each yi is f (yi |) = 1 exp(yi /), where yi > 0 and > 0. The exponential distribution has the property E(yi ) = . (a) Derive (i) the observation-specic log likelihood function (ii) the log likelihood function () (iii) the maximum likelihood (ML) estimator of , .
i ()
(b) Derive the following estimators of the variance of , showing their general formulas as part of your answer. (i) the negative Hessian variance estimator (ii) the Information matrix variance estimator (iii) the outer product of gradient (OPG) variance estimator (iv) the misspecication-consistent variance estimator that follows from interpreting the ML estimator as a method of moments estimator 4. Given observations on the scalar xi , i = 1, . . . , n, each yi is independently drawn according to the conditional pdf f (yi |xi , ) = (xi )1 exp( yi ) xi
where yi > 0, xi > 0, and > 0. is an unknown scalar parameter. (a) Write the observation-specic log likelihood function (b) Write log likelihood function () =
i i () i ()
(c) Derive the maximum likelihood (ML) estimator of . (d) In this model, E(yi |xi , ) = xi . Using this fact, suggest another consistent estimator of that is dierent from the ML estimator in (c). No explanation is required. 5. (16 marks: 4 for each part) Let yi , i = 1, . . . , n be independently-observed non-negative integers drawn from a Poisson distribution Prob(yi |) = yi e , yi ! yi = 0, 1, 2, . . .
(a) Write the observation-specic log likelihood function (b) Write log likelihood function () =
i i ()
i ()
(c) Derive , the maximum likelihood (ML) estimator of . (d) Derive an estimator of the variance of using any one of the four standard methods.
Answers 1. (a) the negative Hessian estimator: Va = ( (b) the OPG estimator: Vb = (
n 2 i=1
)1 , evaluated at =
i s
n i i 1 i=1 ( )( ) ) ,
where the
are evaluated at =
(c) misspecication-consistent estimator: given the denitions in (a) and (b), it can be written 2 2 as Vc = Va Vb1 Va , or Vc = ( n )1 ( n ( i )( i ) )( n )1 i=1 i=1 i=1 2. (a) (i) The probability density function f (y) = 0 when y < 0 and f (y) = 0 when y > 1. Therefore when y < 0, the cdf is F (y) = When 0 < y < 1,
y y f (s)ds
F (y) =
0
((1 ) + 2s)ds
y((1 ) + 2y)dy
= ((1/2)(1 )y 2 + (2/3)y 3 ) |1 y=0 = (1/2)(1 ) + (2/3) = (1/2) + (1/6) (iii) From (ii), E(y) = (1/2) + (1/6), which gives a population moment condition E(y ((1/2) + (1/6))) = 0 The sample moment condition is
n
n1
i=1
which can be written as y (1/2) (1/6) = 0, and the estimator is = 6 3 y (b) (i) () =
n i=1 ln(1
+ 2yi ).
()/ = 0
i=1
2yi 1 = 0 at = 1 + 2yi
3. (a) (i)
i ()
(ii) () = n i () = n ln() n yi / i=1 i=1 is the value of that solves / = 0. (iii) / = Therefore n = (b) (i) 2 /2 = 2 n 2
n i=1 yi 3 n i=1 yi n i=1 yi 2
n +
n i=1 yi 2
n i=1 yi
=y
= n gives y
The negative Hessian variance estimator is 2 V1 = ( 2 ()/2 )1 = n (ii) The Information matrix is minus one times the expected value of the second derivative matrix derived in part (i). The exponential density assumption implies E(yi ) = , so E 2 /2 = 2
n i=1 3
n 2n n n = 3 2 = 2 2
The Information matrix variance estimator is the inverse of the Information matrix, evaluated at n 2 V2 = ( )1 = n 2 (iii) Evaluate the gradient, or rst derivative vector of yi y 1 yi yi i ()/ = + = = 2 2 2 For notational convenience, use 2 = n1
n i=1 (yi i,
at :
i () i () ( )( ) =
n i=1
i () 2 ) = (
n i=1 (yi 4
y )2
n 2 4
The outer product of gradient (OPG) variance estimator is the inverse of this OPG 4 V3 = n 2 (Aside: V3 has the odd feature that 2 appears in the denominator rather than the numerator. But it turns out that for the exponential distribution, Var(yi ) = 2 . Since plim( 2 ) =Var(yi ), then as n , 2 and 2 both converge to 2 . So as n , then V3 becomes close to (iv)
4 n2
2 n,
(
i=1
i () i () 2 )( ) )( 2 )1
= V1 (V3 )1 V1 2 n 2 2 = ( )( )( ) n n 4 2 = n 4. (a)
i ()
= ln(xi )
n i=1 i ()
yi xi i ln(xi ) n
(b) () = (c)
yi i ( xi ) n
()/ =
i=1
( ln(xi )/)
i=1 n
((
yi )/) xi
n +
(
i=1
yi 1 ( ) xi 2 (
i
= 0 when n +
yi ) = 0 = n1 xi
(
i
yi ) xi
(d) Since E(yi |xi , ) = xi , then Em(yi , xi , ) = 0 where m = yi xi . This population moment condition leads to the sample moment condition n1 (yi xi ) = 0. Solving for gives
i
5. (a)
i ()
(b) () =
i yi ln
i ln(yi !)
n=0
at
i yi
=y
evaluated at
= , and
yi 2 = i2 2 V () = (
therefore
1
i yi ) 2
2 2 = = n n i yi