ECO 318
Instructor: Kanika Mahajan
Textbooks
Jeffrey M. Wooldridge [JMW], Introductory Econometrics: A
Modern Approach, 4th or 5th edition
Causality
Regression based, understanding a particular effect
Forecasting
Explanatory power of the model
Descriptive analysis
Consider the simple single variable model :
Y X
If X is continuous?
How important is descriptive analyses?
Read the below articles, we will discuss them in the next
class. The links will be emailed.
Suppose X in the previous model is not the only variable which affects
the outcome variable. Another variable with which it is correlated , that
also has an effect on the outcome :
Y 1 X 1 2 X 2 u
Interpret , 1 , 2?
Assumptions for estimation of the above using OLS?
RECAP: MULTIPLE LINEAR
REGRESSION (OLS)
Y 1 X 1 2 X 2 ... k X k u
i
(Y
i 1
X X ... X ) 2
1 i1 2 i2 k ik
(Y - X
i 1
i 1 i1 2 X i 2 ... k X i k ) 0
n
X
i 1
i1 (Yi - 1 X i1 2 X i 2 ... k X i k ) 0
.
.
n
X
i 1
ik (Yi - 1 X i1 2 X i 2 ... k X i k ) 0
Goodness of fit
n n n
SST (Yi Y ) , SSE (Yi Y ) , SSR (ui ) 2
2 2
i 1 i 1 i 1
SST
Alternatively, it is the squared correlatio n coefficien t between Yi and predicted Yi
Small Sample Properties of the OLS estimator :
1) Unbiased
2) Efficient
Under what assumptions?
M LR.1Linear in parameters
M LR.2 Random Sampling
M LR.3 No perfect collineari ty
M LR.4 Zero conditional mean (unbiasedn ess) : E(u i | X) 0 i 1,...n
w' s are non - random because depend on x' s. Thus, j have the same normal
distribution as the errors.
M LR.6 is required for Inference
1) z - Stat/t - stat : testing individual coefficien ts
2) t - stat : linear combinatio n of parameters
3) F - stat/ 2 : multiple linear restrictions
M LR.6
The population error is independent of the explanatory variables and is
normally distributed with zero mean and constant variance
u~N( 0, 2 )
This may not be true always.For e.g. wage? prices? Other clearly non - normal
are number of visits to a doctor last month?
Normality of dependant variable?
Usually even if Y is not normally distributed, some transformation is.
Try tofind the transformation of Y which is normal and also makes intuitive
sense.
3.0e-04
2.5e-08
2.0e-08
1.5e-12
2.0e-04
1.5e-08
1.0e-12
1.0e-08
1.0e-04
5.0e-13
5.0e-09
0
0
0 1.00e+12
2.00e+12
3.00e+12
4.00e+12 0 5.00e+07
1.00e+08
1.50e+08
2.00e+08
2.50e+08 0 5000 10000 15000
100150200
Density
1.5
1
.5
50
0
0
60 80 100 120 140 8 8.5 9 9.5 10 -.018 -.016 -.014 -.012 -.01 -.008
8.0e+10
8000
1.5e+07
6.0e+10
6000
1.0e+07
4.0e+10
4000
5.0e+06
2.0e+10
2000
0
-.0003
-.00025
-.0002
-.00015
-.0001
-.00005 -1.00e-07
-8.00e-08
-6.00e-08
-4.00e-08
-2.00e-08 0 -3.00e-11-2.00e-11-1.00e-11 0
Price
Histograms by transformation
OLS ASYMPTOTICS
Asymptotics
Assumption for OLS estimator of the above to be BLUE :
1) Linear in parameters
2) Random Sample
3) No perfect multicolli nearity
4) Zero conditional mean (unbiasedn ess) : E(u i | X) 0 i 1,...n
5) Homoskedasticity (Efficienc y, Inference) : Var(u i | X) E(u i | X) 2
2
Unbiasedness and Efficiency are finite/sma ll sample properties of the OLS estimator.
i.e. they hold for any sample size ' n'
( X X )Y
n
i 1 i i
(X X )
n 2
i 1 i
( X X )X
n n
( X i X )ui
i 1
i 1
i i
(X X )
n 2 n 2
i 1 i i 1
(Xi X )
n ( X X )u
1 n
i 1 i i
n (X X )
2
1 n
i 1 i
ApplyLaw of Large numbers. The numerator is covariance and the denominato r is
variance in the population.
Cov( X i , ui )
[Assumption MLR.4]
Var ( X i )
If X1 and error are not correlated with any independent variable only then
will their coefficien ts be consistent.
Asymptotic normality
M LR.6 : Required for inference of the estimated parameters
What if Y' s not normal?
Can anything be said about what happens to the distribution of the estimated
parameters when Y is not conditonal normal distributed?
Here the Central Limit Theorem can be invoked to see that the estimated
parameters will have asymptoticnormality
CLT : In general, let Y be the estimator of population mean ( ) based on a sample
Y1 , Y2 ,...Yn , then the below holds
N (Y - ) N(0, 2 )
as N
We have already seen in STATA, using simulation , that as N increases
the sampling distribution gets tighter around the true population parameter.
But notice that the distribution also changes with sample size. We do not
know this distribution for each N.
What CLT says is that the distribution of recentered and a rescaled estimator
gets close to a normal distribution.
Show using example in STATA.
1.5
1
.5
0
-1 -.5 0 .5 1
x
2
1 n
n i 1
(Xi X )
Under the Gauss - M arkov assumptions M LR.1- M LR.5
a 2 2
(i) n ( j ) ~ Normal 0, 2 where 2 is the asymptoticvariance of
a aj
j
j j
2
n ( ). For slope coefficien ts a p lim n 1
n 2
i 1 ij
r where the r ij
As N increases,
a
~ 2
2
(1 R j ) (0,1)
2
SST j ~ n j
2