Anda di halaman 1dari 11

Estimation

Theory
Part 3

Prof M S Prasad, Amity Univ

This lecture note is based on large number of Text books on optimal signal processing and is
suitable for Grad/Post grad students . It should be read in conjunction with class room
discussion.
ESTIMATION THEORY : LN -3
A signal Detection problem is centered around , where the receiver receives a noisy version of a signal
and decides which hypothesis is true among the many possible hypotheses. In the binary case, the
receiver had to decide between the null hypothesis H0 and the alternate hypothesis H1. Suppose the
receiver has received or made the true hypothesis , but still some parameter of signal is not known,
e.g amplitude , phase , type of disturbance etc , then the Estimation theory comes in which helps in
estimating such parameters optimally based on finite data samples. The parameter to be estimated
may be random or nonrandom. The estimation of random parameters is known as the Bayes’
estimation, while the estimation of nonrandom parameters is referred to as the maximum likelihood
estimation (MLE).

Conversely, detection theory often requires estimation of unknown parameters: Signal presence is
assumed, parameter estimates are incorporated into the detection statistic, and consistency of
observations and assumptions tested. Consequently, detection and estimation theory form a symbiotic
relationship, each requiring the other to yield high-quality signal processing algorithms.

Detection is science while estimation is an art. Understanding the problem , in terms of which error
criteria to choose , and innovative algorithms are key to estimation procedure. 1 Terminology in

Estimation Theory

The parameter estimation problem is to determine from a set of L observations, represented by the L-
dimensional vector r, the values of parameters denoted by the vector θ. We write the estimate of this
parameter vector as θ^ (r), where the "hat" denotes the estimate, and the functional dependence on r
explicitly denotes the dependence of the estimate on the observations. This dependence is always
present, but we frequently denote the estimate compactly as θ^. Because of the probabilistic nature of
the problems , a parameter estimate is itself a random vector, having its own statistical characteristics.

The estimation error є (r) equals the estimate minus the actual parameter value: є (r) =θ ^ (r)-θ . It too
is a random quantity and is often used in the criterion function. For example, the mean-squared error is
given by E [є єT ] ; the minimum mean-squared error estimate would minimize this quantity. The mean
squared error matrix is E [є єT ]; on the main diagonal, its entries are the mean-squared estimation
errors for each component of the parameter vector, whereas the off diagonal terms express the
correlation between the errors. The mean-squared estimation error E [є єT ] equals the trace of the
mean-squared error matrix : tr ( E [є єT ])

Bias

An estimate is said to be unbiased if the expected value of the estimate equals the true value of the

parameter: E [ θ^ | θ + = θ . Otherwise, the estimate is said to be biased: E [ θ^ | θ + ≠ θ.

Estimator : SDE # m s prasad Page 1


. The bias b (θ) is usually considered to be additive, so that b (θ) = E [ θ^ | θ+ –θ.

When we have a biased estimate, the bias usually depends on the number of observations L. An
estimate is said to be asymptotically unbiased if the bias tends to zero for large L: ( ) . An
estimate's variance equals the mean-squared estimation error only if the estimate is unbiased.

An unbiased estimate has a probability distribution where the mean equals the actual value of the
parameter. Should the lack of bias be considered a desirable property? If many unbiased estimates are
computed from statistically independent sets of observations having the same parameter value, the
average of these estimates will be close to this value. This property does not mean that the estimate has
less error than a biased one; there exist biased estimates whose mean-squared errors are smaller than
unbiased ones. In such cases, the biased estimate is usually asymptotically unbiased. Lack of bias is
good, but that is just one aspect of how we evaluate estimators.

Consistency

We term an estimate consistent if the mean-squared estimation error tends to zero as the number of
observations becomes large: ( ) = 0. Thus, a consistent estimate must be at least
asymptotically unbiased. Unbiased estimates do exist whose errors never diminish as more data are
collected: Their variances remain nonzero no matter how much data are available. Inconsistent
estimates may provide reasonable estimates when the amount of data is limited, but have the
counterintuitive property that the quality of the estimate does not improve as the number of
observations increases. Although appropriate in the proper circumstances (smaller mean-squared error
than a consistent estimate over a pertinent range of values of L, consistent estimates are usually favored
in practice.

Efficiency

An estimators can be derived in a variety of ways, their error characteristics must always be analyzed
and compared. In practice, many problems and the estimators derived for them are sufficiently
implicated to render analytic studies of the errors di-cult, if not impossible. Instead, numerical
simulation and comparison with lower bounds on the estimation error are frequently used instead to
assess the estimator performance.

An efficient estimate has a mean-squared error that equals a particular lower bound: the Cramér-Rao
bound. If an efficient estimate exists (the Cramér-Rao bound is the greatest lower bound), it is optimum
in the mean-squared sense: No other estimate has a smaller mean-squared error .

For many problems no efficient estimate exists. In such cases, the Cramér-Rao bound remains a lower
bound, but its value is smaller than that achievable by any estimator. How much smaller is usually not
known. However, practitioners frequently use the Cramér-Rao bound in comparisons with numerical
error calculations. Another issue is the choice of mean-squared error as the estimation criterion; it may
not suffice to pointedly assess estimator performance in a particular problem. Nevertheless, every

Estimator : SDE # m s prasad Page 2


problem is usually subjected to a Cramér-Rao bound computation and the existence of an efficient
estimate considered.

Some Criteria for good estimator

Since the estimator ̂ is a random variable and may assume more than one value some characteristics
of a “good” estimate need to be determined.

Unbiased Estimate : We say ̂ is an unbiased estimator for θ if E[ ̂ ] = θ

Biased estimate : when the estimate is E[ ̂] = θ+ B( θ) where B is the bias function . If this does not
depend on , we say it is a known Bias. In case it depends on then we say it as Unknown Bias.

In case of Unbiased estimator , the true estimate is approximated by an average value. However it
may not be the best estimate, since variance could be large enough. Hence the second parameter to
consider in Unbiased estimate is to check if the variance is small.

Unbiased Minimum Variance : Unbiased Minimum Variance θ ˆ is a minimum variance and


unbiased (MVU) estimate of θ if, for all estimates θ′ such that E[ θ ′] = θ and var[ ̂ +′ ≤ var[θ’+ for all θ′
. That is, θ ˆ has the smallest variance among all unbiased estimates of θ.

Consistent Estimate : θ is a consistent estimate of the parameter θ, based on N observations when


probability of estimated error exceeding the error is zero that is

[| ̂ | ]

Instead of this sometimes a simple criteria is used

[ ̂]

[ ̂]

This gives a consistent estimate .

Maximum Likelihood Estimators


When the a priori density of a parameter is not known or the parameter itself is inconveniently
described as a random variable, techniques must be developed that make no presumption about the
relative possibilities of parameter values. Lacking this knowledge, we can expect the error characteristics
of the resulting estimates to be worse than those which can use it.

The maximum likelihood estimate θ˄ML (r) of a nonrandom parameter is, simply, that value which
maximizes the likelihood function (the a priori density of the observations). Assuming that the maximum
can be found by evaluating a derivative, θ˄ML (r) is defined by

( ( | ) | evaluated at θ= θ˄ML

Estimator : SDE # m s prasad Page 3


The logarithm of the likelihood function may also be used in this maximization.

Example 1

Let r (l) be a sequence of independent, identically distributed Gaussian random variables having an
unknown mean θ but a known variance σn2 . Often, we cannot assign a probability density to a
parameter of a random variable's density; we simply do not know what the parameter's value is.
Maximum likelihood estimates are often used in such problems. In the specific case here, the derivative
of the logarithm of the likelihood function equals

The solution to this equation equals to sample average .

The expected value of this estimate E[ θ^ML| θ ] equals the actual value θ, showing that the maximum
likelihood estimate is unbiased. The mean-squared error equals σn2/ L and we infer that this estimate is
consistent.

Parameter Vectors

The maximum likelihood procedure , or any other procedure , can be easily generalized to situations
where more than one parameter must be estimated. Letting denote the parameter vector, the
likelihood function is now expressed as p (r|θ). The maximum likelihood estimate θ^ML of the
parameter vector is given by the location of the maximum of the likelihood function (or equivalently of
its logarithm).

Using derivatives, the calculation of the maximum likelihood estimate becomes

where Λθ denotes the gradient with respect to the parameter vector. This equation means that we
must estimate all of the parameter simultaneously by setting the partial of the likelihood function with
respect to each parameter to zero. Given P parameters, we must solve in most cases a set of P
nonlinear, simultaneous equations to find the maximum likelihood estimates.

Estimator : SDE # m s prasad Page 4


Maximum Posteriori Estimation

In those cases in which the expected value of the a posteriori density cannot be computed, a related but
simpler estimate, the maximum a posteriori (MAP) estimate, can usually be evaluated. The estimate θ^
map (r) equals the location of the maximum of the a posteriori density. Assuming that this maximum can
be found by evaluating the derivative of the a posteriori density, the MAP estimate is the solution of the
equation

The only quantity required to compute MAP are the likelihood function and a priori density of the
parameter.

Section II

Linear Least Square Estimator

Suppose we have data function F an where X is unknown and observation Y and we assume a linear
relation between them such as

Y = FX + e where e is residual error or some noise about which we are not sure.

To estimate X we can formulate a problem such as :

Minimize mean square error e subject to relation Y = FX+e in such a way that Y becomes appx equal
to FX.

i.e min || e2|| = eTe or min || Fx –Y||2

or Z ( x) = min || FX - Y||2 = |FX-Y| | Fx _y|T { we are assuming F & X as multiple value function }

= FX F T XT - FXYT – Y FTXT + Y YT

To minimize this we take fiorst derivative with respect to X and equate it to zero.

i.e ∂ Z(x) / ∂X = 2 FT F X~ - 2 F T Y = 0 here X^ estimated value of X^.

i.e X^ = ( FT F )-1 FT Y

Since F would be a full rank matrix hence it is invertible and solution can be found out .

Case 1 . Weighted Least Square

We assign a weight to the error e say L

Estimator : SDE # m s prasad Page 5


So that we have a condition

Minimize mean square error subject to Y = FX + L e or it can be written as

L-1 Y = L-1 FX + e

In such case we have X^ = ( FT W F )-1 FT W Y where W = L LT

Is it Unbiased Estimator ?

For an estimator to be unbiased the mean of estimated value should be equal to mean of data value.

Here we have X^ = ( FT F )-1 FT Y

= ( FT F )-1 FT ( FX +e) = X+ ( FT F )-1 FT e

Now we take Expectation of this equation [ since to expectation of X would give mean of a random
variable }

E(X ) = E ( X) + E [( FT F )-1 FT e ] = X

Assuming we have e as a random variable with Zero mean and say σ variance. The n E ( e) = 0

What is the variance in case e ~ ( 0 , I) means zero mean and unity variance ?

The covariance of estimated X is equal to

E[ ( X^ – X ) ( X^ – X )T ]

Simplifying ( X^ – X ) = { (FT F)-1 FT Y - X } = [( FT F)-1 FT ( Fx+ e) – X) } taking this expression and


calculating the co variance we find it is equal to X. hence it is unbiased estimator. Since mean of
estimate is equal to real value of variable to be estimated .

Linear Parameter estimator

In the previous example we assumed that parameter F is known we estimated the X. sometimes we
need to estimate the parameter itself based on some observation . This problem formulation in Linear
manner could be expressed as : -

Y^ = Ax+ B and here we have to find A and B such that error in estimation is minimum i.e

Error e = || ( Y – Y^ )||2 minimum.

Now error e = E [ (Y – AX+B )2 ] Where we have used Y^ r relation.

∂e/∂a = -2 E [ ( y – AX-B) X ] = - 2E(XY) + 2 A E (X2) + 2B mx = 0 ; where E {X} = mx i.e mean,

Estimator : SDE # m s prasad Page 6


Also

∂e/∂b = -2 E [ ( y – AX-B) ] = 0

Solving for A and B we get

A = { E(XY) – mX mY - /σ2X = Cov(XY)/ σ2X

B = E (X2 ) mY - E (XY) mx b/ σ2X since E(X2) – mX2 = σ2X

Y^ = A( X-mX) + mY . where A value can be substituted as above.

Minimum Variance Unbiased Estimator ( MVUE)

On an average an Estimator generally yields the true value of unknown parameter i.e

E * Ŝ + = S let Ŝ= g (X) where X , x1, x2, …….x(n-1) }T

Unbiased estimators E * Ŝ + = ∫ ( ) ( )

Bias of estimator E * Ŝ + - S = b ( s)

The m S e of estimator is E * Ŝ – S ] 2 which after expansion and taking Expectation operation we get

M S E *Ŝ+ = Var * Ŝ + + b2 (S)

This shows that he error is composed of error due to variance and bias.

Any estimator depending on bias will be unrealizable. Hence we set the bias to zero and seek for
MVUE.

Estimator : SDE # m s prasad Page 7


Section III

Correlation cancelers

A correlation canceler is the best linear processor/estimator hence it is a central concept in


number of optimum signal processing .

Assume vectors x and y of dimensions N and M, having Zero mean respectively. We assume
that the both signals are correlated with each other I.e.
Rxy = E[xyT ] ≠ 0,

To remove such correlations we use a linear transformation of the form


e = x − Hy

where H (NxM) Vector is to be chosen in such a manner that e and Y are no longer
correlated.

Rey = E[eyT ] = 0

Rey = E[eyT]= E[ (x − Hy)yT ] = E[xyT]−H E[yyT ]= Rxy – HRyy = 0

i.e. H = RXY RYY-1 = E[xyT] E[yyT]-1

Covariance matrix we have

Ree = E[eeT]= E [e(x T – y TH) = = Rex – Rey HT = Rex = E[ (x − Hy)xT ]_


or,

Ree = RXX – H RYX = RXX – RXY RYY-1 RYX

The vector
ˆx = Hy = R XY RYY -1 y = E[xyT] E[yyT] -1 y )

obtained by linearly processing the vector y by the matrix H. H matrix is called the linear
regression, or orthogonal projection, of x on the vector y. In a sense x also represents the best
“copy,” or estimate, of x that can be made on the basis of the vector y.

Thus, the vector e = x−Hy = x−ˆx may be thought of as an estimation error . Also ˆx = Hy not as
an estimate of x but rather as an estimate of that part of x which is correlated with y.

Estimator : SDE # m s prasad Page 8


If x has a part x1 which is correlated with y, then this part will tend to be canceled as much as
possible from the output e. The linear processor H accomplishes this by converting y into the
best possible copy ˆx1 of x1 and then proceeds to cancel it from the output. The output vector
e is no longer correlated with y. The part x2 of x which is uncorrelated with y remains entirely
unaffected.
Mean square sense best estimator
We have
Ree = E[eeT]= E[(x −Hy)(xT – yT HT ]

= Rxx − HRyx – RxyHT + HRyyHT


Minimizing this expression with respect to H that is dr/dh = 0 yields the optimum choice of H:
Hopt = Rxy RYY-1

with the minimum value for Ree given by:


Ree = Rxx – Rxy Ryy-1 Ryx

Inference 1.
If x and y are jointly Gaussian, show that the linear estimate ˆx = Hy is also
the conditional mean E[x|y] of the vector X given Y .

Proof
we know that Under a linear transformation, a Gaussian random vector remains Gaussian. also
if they are uncorrelated and jointly Gaussian, then they are also independent of each other.
The transformation from the jointly Gaussian pair (x, y) to the uncorrelated pair (e, y) is
linear: i.e.

[ ] [ ][ ] where I is an identity Matrix of size MxN

The conditional mean of x can written as


x = ˆx + e = Hy + e
and noting that if y is given, then Hy is no longer random. Therefore

E[x|y]= E [(Hy + e)|y ] = Hy + E[e|y]


Since e and y are independent, the conditional mean E[e|y] is the same as the unconditional
mean E[e], which is zero by the zero-mean assumption. Thus,
E[x|y]= Hy .

The conditional mean E[x|y] is the best unrestricted (i.e., not necessarily linear) estimate of x in
the mean-square sense.
Inference 2.
For A random vector x with mean m and covariance Σ, the best choice of a deterministic vector
ˆx which minimizes the quantity Ree = E[eeT]= minimum , where e = x − ˆx, is the mean m itself,
that is, ˆx = m.

Estimator : SDE # m s prasad Page 9


also for this optimal choice of ˆx, the actual minimum value of the quantity Ree is the
covariance Σ.
Proof
Assume that there is a deviation of ˆx from the mean m, that is,
x = m+Δ

Then, Ree becomes :

Ree = E[eeT]= E [ (x −m−Δ)(x −m−Δ)T ] = = E [(x −m)(x −m)T] −ΔE[xT –mT]−E[x −m]Δ + Δ ΔT

= Σ +ΔΔT
Since E[x−m]= E[x]- m = 0
Since the matrix Δ ΔT is non negative definite, it follows that Ree, will be minimized when Δ = 0,
and in this case the minimum value will be min R ee = Σ.

Appx
Cramer Rao Lower Bound ( Ref Class room discussion for details )

If pdf p( x ; ) is regular that is


( )
[ ]

Then the variance of any unbiased estimator must satisfy

( )
( )
[ ]

And the bound of estimator is given as

( )
( )( ( ) )

Estimator : SDE # m s prasad Page 10

Anda mungkin juga menyukai