/* INVESTIGATION OF HETEROSKEDASTICITY */
First graph data
. u hetdat2
. gra manuf gdp, s([country].) xlab ylab
300000
france
uk
200000
korea
italy
canada
100000
spain
switzerl
mexico
netherla
belgium
sweden
turkey
denmark
finland
singapor
malaysia
ireland
portugal
norway
chile
hong kon
israel
greece
hungary
slovenia
syria
slovakia
kuwait
500000
1.0e+06
gdp (US $ million)
1.5e+06
Data are tightly packed with exception of a few outliers (Italy, UK and France)
Sometimes outliers can lead to heteroskedasticity. (Similar levels of GDP, big
variation in level of manufacturing output).
. reg manuf gdp
Source |
SS
df
MS
---------+-----------------------------Model | 1.1600e+11
1 1.1600e+11
Residual | 1.4312e+10
26
550464875
---------+-----------------------------Total | 1.3031e+11
27 4.8264e+09
Number of obs
F( 1,
26)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
28
210.73
0.0000
0.8902
0.8859
23462
-----------------------------------------------------------------------------manuf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------gdp |
.1936932
.0133428
14.517
0.000
.1662666
.2211197
_cons |
603.8754
5699.688
0.106
0.916
-11112
12319.75
As a first check for heteroskedasticity, inspect pattern of residuals. If H.is
present should see residual variance increasing (or decreasing) with value of
gdp.
. predict res, resid
/* this command gets residuals from previous
regression & saves them with the name res */
Residuals
50000
uk
ireland
singapor
malaysia
turkey switzerl
chile
finland
slovenia
portugal belgium
slovakia
hungary
kuwait
israel
denmark
sweden
syria
greece
norway
hong kon
netherla
canada
france
spain
mexico
-50000
italy
500000
1.0e+06
gdp (US $ million)
1.5e+06
Graph is a little unclear. There is an increase in the residual spread with GDP,
but with the exception of France.
Need a more formal test to get round the ambiguity.
1) Goldfeld-Quandt Test
. sort gdp
Omit middle c observations (approx. 20% of total sample. In this example c=4)
So N-c/2 = 28-4/2 = 12
. reg manuf gdp if _n<=12
/* regression on first 12 observations. These
should have smallest residual variance, if residual variance increases with
level of GDP.
Source |
SS
df
MS
---------+-----------------------------Model |
438802283
1
438802283
Residual |
157002655
10 15700265.5
---------+-----------------------------Total |
595804938
11 54164085.2
Number of obs
F( 1,
10)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
12
27.95
0.0004
0.7365
0.7101
3962.4
-----------------------------------------------------------------------------manuf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------gdp |
.2293109
.0433754
5.287
0.000
.1326644
.3259573
_cons | -607.1009
2598.401
-0.234
0.820
-6396.699
5182.497
Now repeat regression for top 12 in the data set
Number of obs
F( 1,
10)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
12
41.88
0.0001
0.8073
0.7880
36874
-----------------------------------------------------------------------------manuf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------gdp |
.1879586
.029043
6.472
0.000
.1232468
.2526705
_cons |
5556.995
18758.07
0.296
0.773
-36238.59
47352.58
Goldfeld-Quandt Test =
RSShigh/RSSlow
~ F(N-c-2k/2, N-c-2k/2)
1.358E10/1.57E8
~ F(28-4-2(2)/2, 28-4-2(2)/2)
= 86.5
~ F(11,11)
residual for each observation, u , is a good proxy for the residual variance.
^2
Hence a regression of u
= d0 + d1GDPi + eI
This will suggest Het. if the coeffcicient on GDP in this auxiliary regression
is significantly different from zero (residual variance correlated with rhs
variable).
. predict reshat, resid
. g res2=reshat^2
/* square them */
Then regress square of residuals on all rhs variables from original regression
(in this case just GDP).
Number of obs
F( 1,
26)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
28
3.12
0.0893
0.1070
0.0727
1.5e+09
-----------------------------------------------------------------------------res2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gdp |
1469.225
832.2865
1.77
0.089
-241.5643
3180.015
_cons |
1.17e+08
3.56e+08
0.33
0.745
-6.14e+08
8.48e+08
Breusch-Pagan Test is N*R2auxiliary = 28*0.107 = 3.00
This has a Chi-squared distribution with degrees of freedom equal to the number
of right hand side variables in the auxiliary regression excluding the constant,
(in this case 1).
From tables, critical value at 5% level is 3.84
So estimated 2 < 2critical , so accept null that residuals are not heteroskedastic
Note that an equivalent test is to take the F test of goodness of fit of the
model as a whole from the auxiliary regression)
If the exact form of H. is not known (and this is likely to be the case for most
estimations. H. will vary across more than one variable -so these tests and
methods above are not valid and it is unlikely that can say with certainty
what the true functional form of H. looks like)
In this case use White Adjusted standard errors (see lecture notes), which fix
up the biased OLS estimates. (Note that these are not the same as if you knew
the true form of H. but they are better than the unadjusted OLS ones).
Original regression
. reg manuf gdp
Source |
SS
df
MS
---------+-----------------------------Model | 1.1600e+11
1 1.1600e+11
Residual | 1.4312e+10
26
550464875
---------+-----------------------------Total | 1.3031e+11
27 4.8264e+09
Number of obs
F( 1,
26)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
28
210.73
0.0000
0.8902
0.8859
23462
-----------------------------------------------------------------------------manuf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------gdp |
.1936932
.0133428
14.517
0.000
.1662666
.2211197
_cons |
603.8754
5699.688
0.106
0.916
-11112
12319.75
Now with adjusted standard errors (you should know what the adjustment does
see lecture notes).
. reg manuf gdp, robust
Regression with robust standard errors
Number of obs =
F( 1,
26) =
Prob > F
=
R-squared
=
Root MSE
=
28
116.39
0.0000
0.8902
23462
-----------------------------------------------------------------------------|
Robust
manuf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------gdp |
.1936932
.0179542
10.788
0.000
.1567879
.2305985
_cons |
603.8754
3542.399
0.170
0.866
-6677.629
7885.38
consistent standard error is now larger (.108 compared with .013) and hence the
t value is lower and the confidence interval larger than in the unadjusted
regression.
The reason you might worry is that this adjustment is only valid asymptotically
(as the sample size gets very large). With only 28 observations this is far from
being valid. So the adjusted standard errors and t stats. may be as wrong as the
original OLS ones.
2.
. u hetwage
. reg logpay exper exper2
Source |
SS
df
MS
-------------+-----------------Model | 18.7214801
2 9.36074005
Residual | 301.357698
967
.31164188
-------------+-----------------------------Total | 320.079178
969 .330319069
Number of obs
F( 2,
967)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
970
30.04
0.0000
0.0585
0.0565
.55825
---------------------------------------------------------------------------logpay |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+-------------------------------------------------------------exper |
.0408645
.0052725
7.75
0.000
.0305177 .0512114
exper2 | -.0008537
.0001144
-7.46
0.000
-.0010782 -.0006291
_cons |
1.626793
.0529785
30.71
0.000
1.522827 1.730759
. predict res, resid
. g res2=res^2
Number of obs
F( 2,
967)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
970
1.63
0.1974
0.0033
0.0013
.53934
-----------------------------------------------------------------------------res2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------exper |
.0034881
.0050939
0.68
0.494
-.0065083
.0134846
exper2 | -.0000213
.0001106
-0.19
0.847
-.0002383
.0001956
_cons |
.2494289
.0511841
4.87
0.000
.1489842
.3498737
-----------------------------------------------------------------------------Breusch-Pagan Test is N*R2auxiliary = 970*0.0033 = 3.20
This has a Chi-squared distribution with degrees of freedom equal to the number
of right hand side variables in the auxiliary regression excluding the constant,
(in this case 1).
From tables, critical value at 5% level is 3.84
So estimated 2 < 2critical , so accept null that residuals are not heteroskedastic
Source |
SS
df
MS
---------+-----------------------------Model | 18.7214801
2 9.36074005
Residual | 301.357698
967
.31164188
---------+-----------------------------Total | 320.079178
969 .330319069
Number of obs
F( 2,
967)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
970
30.04
0.0000
0.0585
0.0565
.55825
-----------------------------------------------------------------------------logpay |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0408645
.0052725
7.750
0.000
.0305177
.0512114
exper2 | -.0008537
.0001144
-7.460
0.000
-.0010782
-.0006291
_cons |
1.626793
.0529785
30.707
0.000
1.522827
1.730759
-----------------------------------------------------------------------------Regression suggests that pay rises with work experience but at a decreasing
rate, (the coefficient on the quadratic is negative)
Now do separate regressions for
a) women
. reg logpay exper exper2 if female==1
Source |
SS
df
MS
---------+-----------------------------Model | 7.70631111
2 3.85315556
Residual | 135.033678
482
.28015286
---------+-----------------------------Total | 142.739989
484 .294917334
Number of obs
F( 2,
482)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
485
13.75
0.0000
0.0540
0.0501
.52929
-----------------------------------------------------------------------------logpay |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.035764
.0074775
4.783
0.000
.0210716
.0504564
exper2 | -.0008903
.0001719
-5.180
0.000
-.0012281
-.0005526
_cons |
1.587002
.0694657
22.846
0.000
1.450509
1.723495
-----------------------------------------------------------------------------b) men
. reg logpay exper exper2 if female==0
Source |
SS
df
MS
---------+-----------------------------Model | 17.9038006
2 8.95190032
Residual | 130.932656
482 .271644515
---------+-----------------------------Total | 148.836457
484
.30751334
Number of obs
F( 2,
482)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
485
32.95
0.0000
0.1203
0.1166
.5212
-----------------------------------------------------------------------------logpay |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0539586
.0068722
7.852
0.000
.0404554
.0674618
exper2 | -.0009829
.0001415
-6.944
0.000
-.001261
-.0007048
_cons |
1.598233
.0727244
21.977
0.000
1.455337
1.741129
------------------------------------------------------------------------------
In this case
Goldfeld-Quandt Test =
RSShigh/RSSlow
~ F(N-c-2k/2, N-c-2k/2)
Becomes RSSwomen/RSSmen
(since women are the high variance sub-sample look at the RSS in the output
above)
Where N= full sample size = 970
C= number observations dropped from middle (zero in this case )
K = number parameters in estimated equation (3)
=
135.0/130.9
= 1.03
Number of obs =
970
F( 2,
967) =
31.44
Prob > F
= 0.0000
R-squared
= 0.0585
Root MSE
= .55825
-----------------------------------------------------------------------------|
Robust
logpay |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0408645
.0052659
7.760
0.000
.0305307
.0511984
exper2 | -.0008537
.0001216
-7.020
0.000
-.0010923
-.000615
_cons |
1.626793
.0482309
33.729
0.000
1.532144
1.721442
Note sample size is 970, so do not have to worry too much about small sample
effects on White adjusted standard errors.
3. You have data on firm size, N, and the level of profits, PROF, measured in million, for 118
firms.
You estimate the following regression:
(1)
^
PROF = 18,000 +
(12,000)
1,000 N (100)
1.0N2
(0.25)
R2 = 0.35
RSS = 20,000
Should recognise that firm size variables are statistically significant (t values >2) Constant is not.
Constant gives notional amount of profit when firm size is zero. R-squared suggests 35% of
variation in profits is explained by right hand side variables. Coefficients suggest profits rise then
fall with firm size
Profits are maximised when dProf/dN = 0
Given information in the question the only test you could do is the Goldfeld-Quandt test for
heteroskedasticity
= RSShigh variance sample/RSSlow variance sample
~ F(N-c-2k/2, N-c-2k/2)
3.5
~ F(47,47)