Anda di halaman 1dari 23

1

Properties of OLS estimators




Population regression line: E(y|x)=|
1
+|
2
x,
Observation = systematic component + random error:
y
i
= |
1
+|
2
x + u
i

Sample regression line estimated using OLS estimators:
y = b
1
+ b
2
x
Observation = estimated relationship + residual:
y
i
= y + e
i
=> y
i
= b
1
+ b
2
x + e
i


Assumptions underlying model:
1. Linear Model u
i
= y
i
- |
1
- |
2
x
i

2. Error terms have mean = 0
E(u
i
|x)=0 => E(y|x) = |
1
+ |
2
x
i

3. Error terms have constant variance (independent of x)
Var(u
i
|x) = o
2
=Var(y
i
|x) (homoscedastic errors)
4. Cov(u
i
, u
j
)= Cov(y
i
, y
j
)= 0. (no autocorrelation)
5. X is not a constant and is fixed in repeated samples.
2
Additional assumption:
6. u
i
~N(0, o
2
) => y
i
~N(|
1
- |
2
x
i
, o
2
)
|
1 ,
|
2
are population parameters,
Estimators of |
1 ,
|
2


1 1 1
2
2 2
1 1
( )
N N N
i i i i
i i i
N N
i i
i i
N
N
x y x y
b
x x
= = =
= =




1
2
2
1
)(
)
( )
(
N
i i
i
N
i
i
x
x
x y y
b
x

=
=

=
cov( , )
var( )
x y
x


1 2
b y b x =


are OLS estimators of the population parameters.

Actual estimates of b
1
b
2
depend on the random sample of data
vary between samples => b
1
b
2
are random variables
they follow a distribution, i.e. have a mean (expected value)
and variance
3
- Find expected values, variances of b
1
b
2
and covariance
between them i.e find sampling distribution
- How do b
1
b
2
compare with other estimators of |
1 ,
|
2


Sampling Distribution:
Variance of estimator b
2
:
( )
2
2
2
( )
i
Var b
x x
o
=


standard error of b
2
: se(b
2
) = \var(b
2
)
Var(b
2
) varies positively with
2
o and negatively with
( )
2
i
x x


Variance of estimator b
1
:
( )
2
2
1
2
( )
i
i
i
i
x
Var b
N x x
o
(
(
(
(
(
(
(

=


standard error of b
1
: se(b
1
) = \var(b
1
)
4
Var(b
1
) increases in
2
o and
2
i
i
x

and decreases in N and


( )
2
i
i
x x

where o
2
is the unknown population
variance of u
i

5
Covariance:
Estimators b
1,
b
2
are a function of y
i,
(Sample data)
level of estimators are correlated because both depend on y
i

i.e. are functions of the same sample
( )
2
1 2
2
( , )
i
i
x
Cov b b
x x
o

=


=> covariance is negative if x > 0 i.e. if slope coefficient is
underestimated then the intercept is over estimated

Probability Distribution of estimators
Estimators b
1,
b
2
are normally distributed because of assumption
6 i.e. u
i
~N(0, o
2
) => y
i
~N(|
1
+ |
2
x
i
, o
2
)
b
1,
b
2
: linear functions of normally distributed variable y
i


6
i.e.
1
2
2
1
)(
)
( )
(
N
i i
i
N
i
i
x
x
x y y
b
x

=
=



1
2
( )
N
i i
i
k y y b
=
=


where
2
)
)
(
(
i
i
i
x
x
x
x
k

=
b
2
is a linear function of y
i

Central Limit Theorem(Lecture 2) implies that distribution of
b
1,
b
2
will approach the normal as N (sample size) gets larger, if
first five assumptions hold.

Probability distributions of OLS estimators:
b
1
~
( )
2
2
1
2
,
i
i
i
i
x
N
N x x
| o
| |
(
|
(
|
(
|
(
|
(
|
(
|
(

\ .


7
b
2
~
( )
2
2
2
,
i
i
N
x x
o
|
| |
|
|
|
|
|
|
\ .



Statistical Properties of OLS estimators
1. Linear Estimator
1
2
( )
N
i i
i
k y y b
=
=


where
2
)
)
(
(
i
i
i
x
x
x
x
k

=


2. Unbiasedness
Average or Expected value of b
2
= true value |
2

i.e. E(b
2
) = |
2
On average OLS gets it right

8
Algebraic Proof

1 1 1
2
2 2
1 1
( )
N N N
i i i i
i i i
N N
i i
i i
N
N
x y x y
b
x x
= = =
= =





Assume regression model is very simple: y
i
= |
2
x + u
i
,
substitute for yi

2 2
1 1 1
2
2 2
1 1
( ) ( )
( )
i i i i
N N N
i i
i i i
N N
i i
i i
x u x u N
N
x x
b
x x
| |
= = =
= =
+ +
=




=>
2
2 2
1
1 1 1 1 1
2
2 2
1 1
( )
i
i i i
N
i i
i
N N N N N
i
i i i i i
N N
i i
i i
x x u N N x u
N
x x
b
x x
| | +
=
= = = = =
= =




=>
2
2
2
1
2 2
1 1
1 1 1 1
2
2 2
1 1
i
N
i
i
N N
i i
i i
N N N N
i i i i
i i i i
N N
i i
i i
x
x x
N
N
x N x u x u
b
x N x
|
=
= =
| |
|
|
\ .
+
| | | |
| |
| |
\ . \ .
= = = =
= =

=





9
=>
( ) 2
2
1
1
1 1
2
2
1
.
N
i
i
N
i i
i
N N
i i
i i
N
i
i
x
x u N E x E u
b
N x
E |
=
| |
| |
|
|
|
|
|
\ .
\ .
+
| |
|
|
\ .
=
= =
=


=>
( ) 2
2
b E | = because E(u
i
)=E(xu)=0;

Implication
Sampling distribution of the estimator b
2
is centred around
the population parameter |
2

b
2
is an unbiased estimator of |
2


Note
E(x
i
u
i
)=0 is crucial, if this is not true then b
2
would be biased
estimator i.e. E(b
2
) = |
2
+ additional term.
f(.)




E(b
2
)= |
2
E(B
2
)= |
2

f(b
2
)
f(B
2
)

10

B
2
would be a biased estimator of |
2
: f(B
2
) is not centred at |
2.

b
1
is an unbiased estimator of |
1
also; i.e. E(b
1
)= |
1
Unbiasedness property hinges on the model being correctly
specified i.e.
E(x
i
u
i
)=0, E(u
i
)=0


3. Efficiency

An estimator is efficient if:
it is unbiased
no other unbiased estimator has a smaller variance i.e. it has
the minimum possible variance (See DG Sect 3.4 & Fig 3.8)
OLS estimators b
1,
b
2
are the Best Linear Unbiased Estimators of
|
1
|
2
when the first 5 assumptions of the linear model hold.
b
1,
b
2
: linear
unbiased
efficient, (have smaller variance than any other linear
11
unbiased estimator)
BLUE
Result is known as Gauss Markov Theorem
Gauss-Markov Theorem:
- First 5 assumptions above must hold
- OLS estimators are the best among all linear and unbiased
estimators because
- they are efficient: i.e. they have smallest variance among all
other linear and unbiased estimators
- Normality is unnecessary to assume, G-M result does not
depend on normality of dependent variable
- G-M refers to the estimators b
1,
b
2,
not to actual values of b
1,
b
2
calculated from a particular sample
- G-M applies only to linear and unbiased estimators, there are
other types of estimators which we can use and these may be
better in which case disregard G-M
12
e.g. a biased estimator may be more efficient than an unbiased one
which fulfills G-M.

3. Consistency
Other properties hold for small samples: Consistency is a large
sample property i.e. asymptotic property
As n , the sampling distribution of the estimators b
1,
b
2

collapse to |
1
|
2

This holds if var(b
1
), var(b
2
) 0;
True for b
1,
b
2
b
1,
b
2
are consistent estimators


Estimator for o
2

o
2
is an unknown population parameter: variance of the
unobservable error terms
2 2
1
1

2
N
i
i
e
N
o
=
=



13
where e
i
is the residual from the sample regression function
2
o unbiased estimator of
2
o

14
Hypothesis testing in Regression

Probability distributions of OLS estimators:
b
1
~
( )
2
2
1
2
,
i
i
i
i
x
N
N x x
| o
| |
(
|
(
|
(
|
(
|
(
|
(
|
(

\ .


b
2
~
( )
2
2
2
,
i
i
N
x x
o
|
| |
|
|
|
|
|
|
\ .


2 2
1
1

2
N
i
i
e
N
o
=
=



where e
i
is the residual from the sample regression function

Test values of b
1
b
2
calculated from particular sample against what
we believe to be the true value e.g. for consumption example, y
=250.18+0.73 x where y = monthly expenditure and x = monthly
income
15
MPC = 0.73, test if MPC = 0.75, is difference due to sampling
error? (review hypothesis testing from before lecture 3)

Hypothesis testing Procedure (as before)
1. Formulate null and alternative hypothesis
2. Calculate sample test statistic and specify distribution
3. Selection rejection region and compare to critical value
4. Accept or reject null hypothesis
two sided:
f(.)


-CV CV

Calculate ts =
/
x
n

o

, compare to critical value of ts for rejection
region of o

(1-o):
Acceptance
Region
o/2:
Rejection
region
16
/ 2
| | ts ts
o
> reject null hypothesis (hypothesised value is
outside of the (1-o)% CI )
/ 2
| | ts ts
o
< fail to reject null hypothesis (hypothesised value is
within the (1-o)% CI
Need se(b
i
) to construct test statistic;
Test statistic will have a t distribution

Take b
2
~
( )
2
2
2
,
i
i
N
x x
o
|
| |
|
|
|
|
|
|
\ .


1. Transform to a standard normal variable:
If
2
~ ( , ) X N o , and Z =
X
o

Z~N(0,1)
2 2
2
( )
b
Z
Var b
|
= ~N(0,1) Var(b
2
) =
( )
2
2
i
i
x x
o

where o
2
is unknown
Estimator:
2 2
1
1

2
N
i
i
e
N
o
=
=


,
We know: u
i
~N(0, o
2
) (u
i
-0)/o ~N(0,1) Std Normal
17

2
2
1
~
i
u
_
o
| |
|
|
\ .

2
1
2
2
1
1
~
N
i
i
N
i
i
u
N
u
o
o
_
=
| |
|
|
\ . =
=



2
1
N
i
i
u
=

2
o are both unobservable i.e. come from population regression function
u
i
= y
i
- |
1
- |
2
x
i

we estimate them from the sample regression function
e
i
=y
i
- b
1
- b
2
x but
2
2
i
i
e
o

has only N-2 independent _


2
variables
because 2 observations are lost: in calculating b
1
b
2


2
2
2
2
~
i
i
N
e
_
o

, then substitute
2 2
1
1

2
N
i
i
e
N
o
=
=




2
2
2
2

( 2) ~
N
N
o
_
o



1.
2 2
2
( )
b
Z
Var b
|
= ~N(0,1)
2.
2
2
2
2

( 2) ~
N
N
o
_
o


18
Student-t distribution:
Z~N(0,1); V~
2
m
_ then:
t = ~
/
m
Z
t
V m


If
2 2
2
( )
b
Z
Var b
|
= , and
2
2

( 2) V N
o
o
=
2 2
2
2
2
( )

( 2) / 2
b
Var b
t
N N
|
o
o

=

, Substituting in for var(b
2
):
( )
2
2
.
2 2
2
2

( 2) / 2
1
i
i
t
N N
b
x x
o
o
|
o
=

, then o
2
cancel, N-2 cancel:
( )
2
2
2 2 2 2
2
2
( )
2 2
) (

i
i
var b
t
b b
se b
x x
b
| |
o
|
= =


test statistic for b
1
, b
2
follows a t-distribution with N-2 d.f.
Have specified distribution for tests on estimates, b
1
, b
2,
19
Select 1 or 2 sided test:




In Probability terms:
Two sided:
P(t<-t
o/2
)=P(t>t
o/2
) = o/2 P(-t
o/2
< t < t
o/2
)=1-o
One sided
P( t < -t
c
)= P(t > t
c
) = o
For many degrees of freedom and o = 5%, i.e. 95% level of
significance t
c
=1.96.
Confidence intervals:
P(-t <
2
2 2
) (
b
se b
|
< t)= 1-o
P(b
2
- t
o/2
.se(b
2
) < |
2
< b
2
+ t
o/2
.se(b
2
)) = 1-o
Similarly for b
1

P(b
1
- t
o/2
.se(b
1
) < |
1
< b
1
+ t
o/2
.se(b
1
)) = 1-o
1-o
1-o
o/2 o/2
o
20
Hypothesis Testing
1. Formulate null and alternative hypothesis: alternative
depends on 1 or 2 tailed test
e.g. H
0
: |
2
= 0 , H
1
: |
2
=0 (two sided)

2. Specify test statistic and appropriate distribution
t =
2
2 2
) (
b
se b
|
~ t
o/2
N-2
3. Choose rejection region: o
4. Calculate test statistic for sample
5. Reject /Fail to reject the null hypothesis
if P(| t |> t
o/2
) reject null hypothesis (two sided)
if P(| t |>t
o
) reject null hypothesis (one sided)
6. State Conclusion




21
Prediction
For a given x: x
0
, use estimates b
1
b
2
to predict y
0
:
E(y|x0) =
0
y = b
1
+ b
2
x
0

0
y differs from actual outcome
- b
1
, b
2
are estimates and not always = |
1
|
2

- randomness occurs
22
Measuring goodness of fit:
How close is sample regression to population regression i.e. how
well does estimated model fit the data?
y
i
=b
1
+b
2
x
i
+e
i
systematic component that is estimated plus a
residual
Explained Variation: b
1
+b
2
x
i

Unexplained Variation: e
i
= y
i
- y = y
i
- b
1
- b
2
x
i

y
i
= y + e
i

y
i
- y = ( y - y )+ e
i


y
i
- y : total variation around the mean
y - y : variation of fitted values around the mean
2 2 2
1 1 1
( ) ( )
N N N
i i
i i i
y y y y e
= = =
= +


Total = Explained + Residual
sum of sum of sum of
squares squares squares

TSS = ESS + RSS

If sample regression line fits perfectlyL TSS=ESS i.e. ESS/TSS = 1
23
If sample regression line is very poor: ESS/TSS=0.
R
2
is used a measure of good fit:
R
2
= ESS = TSS RSS = 1- RSS = 1-
i
e
i
2
TSS TSS TSS

R
2
is known as the coefficient of determination: it is a descriptive
statistic and should not be used as a measure of the quality of the
model.

Measures proportion of variation explained by the linear model
In simple linear model: \ R
2
= R = i.e the correlation coefficient
between x and y
0 < R
2
< 1 -1 < < 1


OLS estimators have the following properties:
1. Linear
2. Unbiased
3. Efficient: it has the minimum variance
4. Consistent

This property is simply a way to determine which estimator to use.
An estimator that is unbiased but does not have the minimum variance is not good.
An estimator that has the minimum variance but is biased is not good
An estimator that is unbiased and has the minimum variance of all other estimators is the
best (efficient).

Anda mungkin juga menyukai