Anda di halaman 1dari 3

MAXIMUM LIKELIHOOD ESTIMATORS To illustrate the principle of maximum likelihood estimation in the context of the linear regression model,

let us retain the assumption of a fixed nonstochastic X matrix. The model y = X + u then defines a transformation from u to y. The assumption of a multivariate density function for u implies a multivariate density function for y, which may be written
p ( y ) = p ( u)

u y

where u/y indicates the absolute value of the determinant formed from the n*n matrix of partial derivatives of the elements of u with respect to the elements of y.

u1 y1 u 2 y1 un y1

u1 y 2 u 2 y 2 un y 2

u1 yn u 2 yn un yn

In the case of the model postulated by the first equation above, this matrix is seen to be the identity matrix whose determinant is unity. Thus
p ( y ) = p ( u).

It will be recalled that univariate normal distribution is specified once its mean and its variance 2 are given. The univariate normal density (probability density function or pdf) of a variable X is given by the familiar formula:

p( X ) =

1 ( X )2 exp 2 2 2 1

The most important multivariate pdf is the multivariate normal distribution, which is similarly specified in terms of its mean vector and its variance matrix . When all the Xi (vector x = X1, X2, , Xn) have the same variance 2 and are all pairwise uncorrelated, then = 2 I and the above formula becomes

p( x ) =

If we assume, that u is multivariate normal with mean vector 0 and variance -covariance matrix 2 I, written compactly as u N (0, 2 I), then the formula for the probability density function (pdf) gives:

( 2 )
2

1 exp 2 2

( x )( x - )

/var/www/apps/conversion/tmp/scratch_4/150877012.doc

p ( u) =

( 2 )
2

1 exp - 2 u' u 2

Expression with an asterisk (u' ) denotes transpose of the disturbance vector. Recalling that u = y - X , the multivariate probability density function for y is given by:

p( y) =

( 2 )
2

1 exp - 2 ( y - X )' ( y - X ) 2

This equation involves both the observations on y and the unknown parameters and 2. Writing p(y) in the form L(y ; , 2) emphasizes that it is the probability for the y's, given the parameters and 2. Alternatively, writing it as L( , 2 ; y) stresses that for given y it can be regarded as a function of the parameters. It is termed the likelihood function and is conventionally denoted by the symbol L. The maximum likelihood principle is to choose as estimators of and 2 the values which maximize the likelihood function, given the sample data y. Let ' = [ ' 2] denote the (row) vector of unknown parameters and -hat the maximum likelihood (ML) estimator. is obtained as the solution of the equation

L =0
In practice the derivation of the ML estimators is often simplified by maximizing the logarithm of the likelihood function. In order to maximize the likelihood, we set the first derivative of the log-likelihood, ln L, to zero, that is, by finding as the solution to

(ln L) =0
Since

(ln L) 1 L = L
the same vector -hat is obtained in either case for any L > 0. For convenience, we reproduce the likelihood of y:

p( y) =

( 2 )
2

1 exp - 2 ( y - X )' ( y - X ) 2

Taking the natural logarithm of this expression gives:

n n 1 ln ( 2 ) ln ( 2 ) ( 2 2 ) ( y - X )' ( y - X ) 2 2 Differentiating partially with respect to (k equations) and 2 respectively and evaluating these derivatives gives the specific form as ln L =

( ln L) 1 1 = 2 - 2X y + 2X X = 2 X y - X X = 0 2 /var/www/apps/conversion/tmp/scratch_4/150877012.doc

) (

( ln L) n 1 = 2 + 4 y - X ' y - X = 0 2 2 2

( )( )

The latter is a first order condition taken with respect to 2 (not to ) and quotient rule is, therefore, appropriate to use. The simultaneous solution to these k + 1 equations gives
1 =(X X) X y

and

2 =

e e n

The maximum likelihood estimator - hat is seen, in this case, to be identical with the OLS estimator b. The estimate of 2, however, differs from the unbiased s2 (variance of the disturbance term in the OLS model) by the factor (n - k)/n.1 This illustrates the fact that ML estimators are not necessarily unbiased. In this application - hat is an unbiased estimator of , but 2 - hat is a biased estimator of 2.

The estimator of the variance in the ordinary least squares model was found to be: s2 = e'e/(n-k).

/var/www/apps/conversion/tmp/scratch_4/150877012.doc

Anda mungkin juga menyukai