Week2 Handout

O MITTING RELEVANT VARIABLES
Suppose the correct model has two sets of variables,
y = X1 β1 + X2 β2 +
Compute least squares omitting X2 . Denote this estimator by β

f1 Some easily proved results:
V (βe1 ) is smaller than V (βb1 ). i.e., you get a smaller variance when you omit X2 . (One interpretation:
Omitting X2 amounts to using extra information (β2 = 0). Even if the information is wrong (see the
next result), it reduces the variance. (This is an important result.)
(No free lunch)
f1 ] = β1 + (X 0 X1 )−1 X 0 X2 β2 6= β1 .
E[β 1 1
f1 is biased.
So, β
The bias can be huge. Can reverse the sign of a price coefficient in a “demand equation.”
βe1 may be more “precise.” Smaller variance but positive bias. If bias is small, may still favor the short
regression.
Dr. Rachida Ouysse (ECON3208/ECON3291) Review of Multiple Regression Model: continued School
c of Economics, UNSW 1 / 16
O MITTED VARIABLES
(Free lunch?) Suppose X10 X2 = 0. Then the bias goes away. Interpretation, the information is
not “right,” it is irrelevant. β
f1 is the same as β
c1 .
W. Ch 3 page 99. shows that
σ2
V (βb1 ) =
SST1 (1 − R12 )
where SST1 is the total variation in X1 and R1 is the R − squared from the regression of X1
on X2 . Furthermore,
σ2
V (βe1 ) =
SST1
when β2 6= 0, βe1 is biased, and V (βe1 ) < V (βb1 );

when β2 = 0, both βe1 and βb1 are unbiased, and V (βe1 ) < V (βb1 );
W HAT AFFECTS THE VARIANCE OF OLS?
The variance of the OLS estimator of βj , conditional on the sample values of the independent
variables is
σ2
V (βbj ) = (1)
SSTj (1 − Rj2 )
where SSTj = ni=1 (Xij − X j )2 is the total sample variation in Xj and Rj2 is the R-squared
P
from the regression of Xj on all other independent variables including constant term.
The larger σ 2 , the larger is the variance of OLS estimator. More noise means difficult to
estimate the partial effect of any variable.
The larger the total variation in Xj , the smaller is the variance of βbj . To increase the in sample
variation of Xj , one can increase the sample size!
M ULTICOLLINEARITY
The variance of an estimated coefficient will tend to be larger if there are other X ’s in the
model that can predict Xj . This is reflected by a high Rj2 in equation 1;
The standard error of prediction will also tend to be larger if there are unnecessary or
redundant X ’s in the model.
See W. page 96-97 for a discussion
S TATA ILLUSTRATIVE EXAMPLE : THE 1978 AUTOMOBILE DATASET
Data from the data file auto.dta.: 74 observations and 12 variables.

Suppose we are interested in what determines the sale price of an automobile.
What is the best we can do so far?
Know your dataset:
describe and report summary statistics: detect categorical variables, any suspicious values ..
In Stata use: describe and summarize
E XAMPLE : AUTOMOBILE DATASET
auto_out.txt
. describe
Contains data from auto.dta

obs: 74 1978 Automobile Data
vars: 12 13 Apr 2005 17:45
size: 3,478 (99.9% of memory free) (_dta has notes)
--------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
--------------------------------------------------------------------------------
Sorted by: foreign
. summarize
Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------
make | 0
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
rep78 | 69 3.405797 .9899323 1 5
headroom | 74 2.993243 .8459948 1.5 5
-------------+--------------------------------------------------------
trunk | 74 13.75676 4.277404 5 23
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
turn | 74 39.64865 4.399354 31 51
displacement | 74 197.2973 91.83722 79 425
-------------+--------------------------------------------------------
gear_ratio | 74 3.014865 .4562871 2.19 3.89
foreign | 74 .2972973 .4601885 0 1
. xi: reg log_p weight mpg forXmpg i.foreign

i.foreign _Iforeign_0-1 (naturally coded; _Iforeign_0 omitted)
Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 4, 69) = 25.43
Model | 6.68757029 4 1.67189257 Prob > F = 0.0000
Residual | 4.5359628 69 .065738591 R-squared = 0.5959
-------------+------------------------------ Adj R-squared = 0.5724
Dr. Rachida Ouysse (ECON3208/ECON3291)
Total | 11.2235331Review73
of Multiple Regression
.153747029 Model: continuedRoot MSE School
c of
= Economics,
.2564 UNSW 6 / 16
displacement | 74 197.2973 91.83722 79 425
-------------+--------------------------------------------------------
gear_ratio | 74 3.014865 .4562871 2.19 3.89
foreign | 74 .2972973 .4601885 0 1
. xi: reg log_p weight mpg forXmpg i.foreign

Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 4, 69) = 25.43
Model | 6.68757029 4 1.67189257 Prob > F = 0.0000
Residual | 4.5359628 69 .065738591 R-squared = 0.5959
-------------+------------------------------ Adj R-squared = 0.5724
Total | 11.2235331 73 .153747029 Root MSE = .2564
------------------------------------------------------------------------------
log_p | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | .0006052 .0000916 6.61 0.000 .0004225 .0007879
mpg | .030484 .013984 2.18 0.033 .0025868 .0583813
forXmpg | -.0391345 .0136981 -2.86 0.006 -.0664614 -.0118077
_Iforeign_1 | 1.498792 .3472995 4.32 0.000 .8059483 2.191636
_cons | 6.006634 .5585861 10.75 0.000 4.892285 7.120983
------------------------------------------------------------------------------
Page 1
S TATA ILLUSTRATIVE EXAMPLE : THE 1978 AUTOMOBILE DATASET
Regression Diagnostics:
After estimating a model, we want to check the entire regression for: Normality of the residuals,
Omitted and unnecessary variables, Heteroskedasticity;
We also want to test individual variables for: Outliers, Collinearity, Functional form
Look at Residuals: in stata rvfplot
Check residuals for normality
Check Residuals for Normality

Density Histo gram B ox Plot
Residuals
-5000 0 5000 10000
Symmetry Plot Qu antile-Normal Plo t

2000 4000 6000 8000
Distance above median
Residuals
0
0 1000 2000 3000 -4000 -2000 0 2000 4000
Residual plots also indicate non-normality

U9611 Spring 2005 4
auto_out.txt
. imtest
Cameron & Trivedi's decomposition of IM-test
---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 13.43 10 0.2005
Skewness | 12.08 4 0.0168
Kurtosis | 1.16 1 0.2815
---------------------+-----------------------------
Total | 26.67 15 0.0315
---------------------------------------------------
. ovtest
Ramsey RESET test using powers of the fitted values of log_p

Ho: model has no omitted variables
F(3, 66) = 6.58
Prob > F = 0.0006
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of log_p
chi2(1) = 0.53
Prob > chi2 = 0.4654
. xi: rreg log_p weight mpg forXmpg i.foreign

Huber iteration 1: maximum difference in weights = .66655424
Huber
Dr. Rachida Ouysse iteration
(ECON3208/ECON3291) 4: Review
maximum
of Multiple difference in weights School
Regression Model: continued c= .02511736
of Economics, UNSW 11 / 16
E XAMPLE
hettest performs three versions of the Breusch-Pagan (1979) and Cook-Weisberg (1983) test for
heteroskedasticity. All three versions of this test present evidence against the null hypothesis that t = 0 in
V () = σ 2 expzt . If varlist is not specified, the fitted values are used for z.
ovtest Ramsey regression specification-error test for omitted variables
imtest performs an information matrix test for the regression model and an orthogonal
decomposition into tests for heteroskedasticity, skewness, and kurtosis.
AUTOMOBILE EXAMPLE :ROBUST REGRESSION
auto_out.txt
. imtest
Cameron & Trivedi's decomposition of IM-test
---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 13.43 10 0.2005
Skewness | 12.08 4 0.0168
Kurtosis | 1.16 1 0.2815
---------------------+-----------------------------
Total | 26.67 15 0.0315
---------------------------------------------------
. ovtest
Ramsey RESET test using powers of the fitted values of log_p

Ho: model has no omitted variables
F(3, 66) = 6.58
Prob > F = 0.0006
. hettest
Variables: fitted values of log_p
chi2(1) = 0.53
Prob > chi2 = 0.4654

Huber
Dr. Rachida Ouysse iteration
(ECON3208/ECON3291) 4: Review
maximum
of Multiple difference in weights School
Regression Model: continued c= .02511736
of Economics, UNSW 13 / 16
R EGRESSION F IXES
If you detect possible problems with your initial regression, you can:
Check for mis-coded data
Divide your sample or eliminate some observations (like diesel cars)
Try adding more covariates if the ovtest turns out positive
Change the functional form on Y or one of the regressors
Use robust regression
ROBUST R EGRESSION
This is a variant on linear regression that downplays the influence of outliers

First performs the original OLS regression
Drops observations with Cook’s distance > 1
Calculates weights for each observation based on their residuals
Performs weighted least squares regression using these weights. Stata command:
rreginstead of reg
AUTOMOBILE EXAMPLE :ROBUST
Variables: fitted values ofREGRESSION
log_p
chi2(1) = 0.53
Prob > chi2 = 0.4654

Biweight iteration 5: maximum difference in weights = .27951612
Robust regression Number of obs = 74

F( 4, 69) = 28.60
Prob > F = 0.0000
------------------------------------------------------------------------------
log_p | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | .0006669 .0000848 7.87 0.000 .0004978 .0008361
mpg | .0485624 .0129485 3.75 0.000 .0227308 .0743939
forXmpg | -.0542761 .0126837 -4.28 0.000 -.0795795 -.0289728
_Iforeign_1 | 1.892195 .3215827 5.88 0.000 1.250655 2.533735
_cons | 5.397624 .5172239 10.44 0.000 4.36579 6.429457
------------------------------------------------------------------------------

Week2 Handout

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Week2 Handout

Diunggah oleh

Hak Cipta:

Format Tersedia

O MITTING RELEVANT VARIABLES

Suppose the correct model has two sets of variables,

Compute least squares omitting X2 . Denote this estimator by β

when β2 6= 0, βe1 is biased, and V (βe1 ) < V (βb1 );

Data from the data file auto.dta.: 74 observations and 12 variables.

Contains data from auto.dta

Variable | Obs Mean Std. Dev. Min Max

. xi: reg log_p weight mpg forXmpg i.foreign

Source | SS df MS Number of obs = 74

. xi: reg log_p weight mpg forXmpg i.foreign

Source | SS df MS Number of obs = 74

Check Residuals for Normality

Symmetry Plot Qu antile-Normal Plo t

0 1000 2000 3000 -4000 -2000 0 2000 4000

Residual plots also indicate non-normality

Ramsey RESET test using powers of the fitted values of log_p

. xi: rreg log_p weight mpg forXmpg i.foreign

Ramsey RESET test using powers of the fitted values of log_p

. xi: rreg log_p weight mpg forXmpg i.foreign

This is a variant on linear regression that downplays the influence of outliers

. xi: rreg log_p weight mpg forXmpg i.foreign

Robust regression Number of obs = 74

Anda mungkin juga menyukai