Anda di halaman 1dari 25

Chapter 6:

Heteroscedasticity

Objectives:

(1) Causes
(2) Consequences
(3) Detection
(4) Solutions

Econometrics 1
What is Heteroscedasticity
z Recall: Assumptions of OLS
E (u i ) = 0
E (u i u j ) = 0 i≠ j

E (u 2 ) = σ 2
z The LRM assumes that the variance of the equation
disturbance term is constant over the whole sample
period. That is:
2 2
σ t = σ for all t, t = 1,...,T

z Error Term has constant variance


3

What is Heteroscedasticity
z When the requirement of a constant
variance holds we have
homoscedasticity
z When the requirement of a constant
variance is violated we have a condition of
heteroscedasticity

Econometrics 2
What is Heteroscedasticity
- Essence (informal) is that the estimated regression
line fits some observations better than others, and that
the poor fit of this observation can be identified as
being related in a systematic fashion to some cause.

- Must try detect this violation (not always easy or


obvious) and then either remove the hetroskedasticity
(by transforming data) or use a more efficient
estimator (the Weighted Least Squares, or General
Least Square estimator).
5

Types of Heteroscedasticity
z Type A.) Error variance related to a
category: variety, gender, type of industry
z Type B.) Related to values of Xi
z Type C.) Related to values of Yi
z Type D.) Some function of the explanatory
variables X1, X2, X3

Econometrics 3
z The Residual for each observation i is the
vertical distance between the observed
value of the dependent variable and the
predicted value of the dependent variable
– I.e. the difference between the observed value
of the dependent variable and the line of best
fit value

Case Price Predicted Price Residual


1 19000 19174.0 -174.0
2 30000 28028.7 1971.3
3 8100 45738.2 -37638.2
4 55000 36883.5 18116.5
5 130000 45738.2 84261.8
6 55000 45738.2 9261.8
7 54000 36883.5 17116.5
8 7500 45738.2 -38238.2
9 36000 36883.5 -883.5
10 32000 28028.7 3971.3

N.B. Predicted price is the value on the regression


line that corresponds to the values of the dependent
variables (in this case, No. rooms) for a particular
observation.
8

Econometrics 4
Diagnose heteroscedasticity by plotting the
residual against the predicted y.
(Assume that this represents multiple
observations of y for each given value of x):
400000

300000

200000 +ive
residual

100000
-ive
residual
+
Purchase price

-
0

-100000
0 2 4 6 8 10 12 14

Number of rooms
9

Diagnose heteroscedasticity by plotting the


residual against the predicted y.

z Each one of the residuals has a sampling


distribution, each of which should have the
same variance -- “homoscedasticity”
z Clearly, this is not the case within in this
sample,

10

Econometrics 5
If we plot the residual against Rooms, we
can see that its variance increases with
No. rooms:
300000

200000

100000
Unstandardized Residual

-100000

-200000
0 2 4 6 8 10 12 14

Number of rooms
11

We can imagine the sampling distributions of


particular residuals as follows:
300000

200000
There is
clear
100000 evidence
of
Unstandardized Residual

0 increasing
variance
-100000
here
-200000
0 2 4 6 8 10 12 14

Number of rooms
12

Econometrics 6
Consumption function example (cross-
section data): credit worthiness as a
missing variable?

13

The Homoskedastic Case

14

Econometrics 7
The Heteroskedastic Case

15

Causes
z What might cause the variance of the
residuals to change over the course of the
sample?
– the error term may be correlated with:
z either the dependent variable and/or the
explanatory variables in the model,
z or some combination (linear or non-linear) of all
variables in the model
z or those that should be in the model.

– But why?
16

Econometrics 8
(i) Non-constant coefficient
z Suppose that the slope coefficient varies
across i:
– yi = a + bi xi + ui
z suppose that it varies randomly around some
fixed value b:
– bi = b + ei
z then the regression actually estimated by
eviews will be:
– yi = a + (b + ei) xi + ui
– = a + b xi + (ei xi + ui)
– where (ei x + ui) is the error term in the regression.
17
The error term will thus vary with x.

(ii) Omitted variables


z Suppose the “true” model of y is:
– yi = a + b1xi + b2zi + ui
z but the model we estimate fails to include z:
– yi = a + b1xi + vi
z then the error term in the model estimated
by eviews (vi) will be capturing the effect of
the omitted variable, and so it will be
correlated with z:
z vi = c zi + ui
z and so the variance of vi will be non-
scalar 18

Econometrics 9
(iii) Non-linearities
z If the true relationship is non-linear:
– yi = a + b xi2 + ui
z but the regression we attempt to estimate is
linear:
yi = a + b xi + vi
z then the residual in this estimated
regression will capture the non-linearity
and its variance will be affected
accordingly:
vi = f(xi2, ui)
19

(iv) Aggregation
z Sometimes we aggregate our data across groups:
– e.g. quarterly time series data on income = average
income of a group of households in a given quarter
z if this is so, and the size of groups used to calculate the
averages varies,
⇒ variation of the mean will vary
z larger groups will have a smaller standard error
of the mean.
⇒the measurement errors of each value of our
variable will be correlated with the sample size of
the groups used.
z Since measurement errors will be captured by the
regression residual
⇒ regression residual will vary the sample size of the
underlying groups on which the data is based. 20

Econometrics 10
The consequences of
heteroskedasticity

z The least squares results are no longer


efficient and t tests and F tests results
may be misleading

21

The consequences
z OLS estimators are still unbiased (unless there are also
omitted variables)

z Heteroscedasticity by itself does not cause OLS


estimators to be biased or inconsistent*
– NB neither bias nor consistency are determined by
the covariance matrix of the error term.

z However, if heteroscedasticity is a symptom of omitted


variables, measurement errors, or non-constant
parameters,
– ⇒ OLS estimators will be biased and inconsistent.

22

Econometrics 11
The consequences
z However OLS estimators are no longer
efficient or minimum variance

z The formulae used to estimate the


coefficient standard errors are no longer
correct - so the t-tests will be misleading
(if the error variance is positively related to
an independent variable then the estimated
standard errors are biased downwards and
hence the t-values will be inflated)
23

– Heteroskedasticity does, however,


bias the OLS estimated standard
errors for the estimated coefficients:
z which means that the t tests will not
be reliable:
t = bhat /SE(bhat).
– F-tests are also no longer reliable
z e.g. Chow’s second Test no longer reliable
24

Econometrics 12
Detecting heteroskedasticity

z testingfor hetero. is closely


related to tests for
misspecification generally.

z Unfortunately,there is usually no
straightforward way to identify
the cause
25

Detecting heteroskedasticity

z Visual inspection of residuals

z Formal Tests:
Ho: Homoskedasticity
HA : Hetroskedasticity

26

Econometrics 13
Detecting heteroskedasticity:
Formal Tests
z Ramsey Reset Test
z White’s Test
z Goldfeld-Quandt test - suitable for a
simple form of heteroskedasticity
z Breusch-Pagan test - a test of more
general forms of heteroskedastcity
27

Ramsey Reset Test

28

Econometrics 14
Goldfeld-Quandt test: Type B
– S.M. Goldfeld and R.E. Quandt, "Some Tests
for Homoscedasticity," Journal of the American
Statistical Society, Vol.60, 1965.

z H0: σi2 isnot correlated with a variable z


z H1: σi2 is correlated with a variable z

29

Goldfeld-Quandt test: Procedure


(i) order the observations in ascending order of x.
(ii) omit p central observations (as a rough guide
take p ≈ n/3 where n is the total sample size).
z This enables us to easily identify the differences in
variances.
(iii) Fit two regressions ie. A separate regression to
both sets of observations.
z The number of observations in each sample would be
(n - p)/2, so we need (n - p)/2 > k where k is the
number of explanatory variables.

30

Econometrics 15
Goldfeld-Quandt test: Procedure
(iv) Calculate the test statistic G where:
G = RSS2/ (1/2(n - p) -k)
RSS1/ (1/2(n - p) -k)
G has an F distribution: G ~ F[1/2(n - p) - k, 1/2(n - p) -k]
z NB G=F* must be > 1. If not, invert it.
z Compare F* to Fc
z Prob: In practice we don’t usually know what z is.
– But if there are various possible z’s then it may not matter
which one you choose if they are all highly correlated which
each other.

31

Breusch-Pagan Test: Type C


z T.S. Breusch and A.R. Pagan, "A Simple Test for
Heteroscedasticity and Random Coefficient
Variation," Econometrica, Vol. 47, 1979.

– Assumes that:
σi2 = a1 + a2z1 + a3 z3 + a4z4 … am zm [1]
where z’s are all independent variables. z’s can be
some or all of the original regressors or some other
variables or some transformation of the original
regressors which you think cause the
heteroscedasticity:
e.g. σi2 = a1 + a2exp(x1) + a3 x32 + a4x4
32

Econometrics 16
Breusch-Pagan Test
H 0 : σ 2 = σ 22
H A : σ i = f ( Z 1 , Z 2 ,...Z n ) = f (γZ 1 + ...γZ 1 )
2

z Idea is that error variance is related to


several explanatory variables
z May or may not be the regressors in the
model or may be subset or all regresion

H 0 : γ 1 = γ 2 = ....γ n = 0

33

Breusch-Pagan Test: Procedure


(i) Obtain OLS residuals uihat from the original
regression equation and construct a new variable g:
gi = uhat 2 / σihat 2
where σihat 2 = RSS / n
(ii) Regress gi on the z’s (include a constant in the
regression)
(iii) B = 1/2(RSS) from the regression of gi on the z’s,
where B has a Chi-square distribution with m-1 degrees of
freedom.
Note: RSS refers to regression (explained) sum of squares.
6.) Compare to B to X 2(K - 1) d.f.
34

Econometrics 17
Breusch-Pagan Test: Problems
z This test is not reliable if the errors are not
normally distributed and if the sample size
is small
z Koenker (1981) offers an alternative
calculation of the statistic which is less
sensitive to non-normality in small
samples:
BKoenker = nR2 ~ χ2m-1
where n and R2 are from the regression of uhat 2
on the z’s, where BKoenker has a Chi-square
distribution with m-1 degrees of freedom
35

White (1980) Test: General test


– The most general test of heteroscedasticity
z nospecification of the form of hetero
required

– (i) run an OLS regression - use the OLS


regression to calculate uhat 2 (i.e. square of
residual).

36

Econometrics 18
White (1980) Test: Procedure
– (ii) use uhat 2 as the dependent variable in another
regression, in which the regressors are:
z (a) all "k" original independent variables, and
z (b) the square of each independent variable,
(excluding dummy variables), and all 2-way
interactions (or crossproducts) between the
independent variables.
– The square of a dummy variable is excluded
because it will be perfectly correlated with the
dummy variable.

z Call the total number of regressors (not including the


constant term) in this second equation, P. 37

White (1980) Test: Procedure


– (iii) From results of equation 2, calculate
the test statistic
nR2 ~ χ2P
where n = sample size, and R2 =
unadjusted coefficient of determination.
z The statistic is asymptotically (I.e. in large
samples) distributed as chi-squared with P degrees
of freedom, where P is the number of regressors in
the regression, not including the constant

38

Econometrics 19
Notes on White’s test:
z The White test does not make any
assumptions about the particular form of
heteroskedasticity, and so is quite general
in application.
– It does not require that the error terms be
normally distributed.
– However, rejecting the null may be an
indication of model specification error, as well
as or instead of heteroskedasticity.

39

Notes on White’s test:


z generality is both a virtue and a
shortcoming.
– It might reveal heteroscedasticity, but it might
also simply be rejected as a result of missing
variables.
– it is "nonconstructive" in the sense that its
rejection does not provide any clear indication
of how to proceed.
z NB: if you use White’s standard errors,
eradicating the heteroscedasticity is less
important. 40

Econometrics 20
White test in Brief
1.) Run OLS on Yi = B1 + B2 X2i + B3iX3 + ei to
obtain ei2
2.) Regress ei2 = a2 +Z2i + Z3i + ZKi + vi
3.) Calculate X2* = N . R2 where R2is from (2.)
4.) Compare X2* to X2 with K. d.f. to X2 where
K= # Regressors

- This is a Very General Test


-Unlike the other tests, you are not making any
specific assumptions about nature of
Heteroskedasticity. Makes things easy!
41

White’s test: Problems


z Note that although t-tests become reliable
when you use White’s standard errors, F-
tests are still not reliable (so Chow’s first
test still not reliable).
z White’s SEs have been found to be
unreliable in small samples
– but revised methods for small samples have
been developed to allow robust SEs to be
calculated for small n.
42

Econometrics 21
Ramsey Reset Test
(1) Run the intended regression and obtain the
residuals, εˆ, and the predicted values, yˆ
2. Estimate and specify the auxiliary regression. The
auxiliary regression is thus one where εˆ from the
original regression is regressed on powers of the
predicted values:

3. Obtain R2 from the auxiliary regression and calculate

4. If , reject the null hypothesis.

43

Heteroscedasticity
Solutions/Correction:
z GLS Estimation
z Weighted Least Squares
z Respecificationof the model
– Include relevant omitted variable(s)
– Express model in log-linear form
– Express variables in per capita form
z White’s Standard Errors
– Where respecification won’t solve the problem use
robust HeteroskedasticConsistent Standard
Errors(White standard errors)
44

Econometrics 22
GLS/Weighted Least Squres
z GLS is a new estimator that is more efficient
than OLS when errors are heteroscedastic

Q: How did we derive OLS estimator


A: Minimize the ESS
z WLS estimator is derived (almost) the same way.
We minimize the ESS of the observations,
transformed to weight them by their variance.
This specific example assumes that the variances
are known. As we will see later we can use other
estimators or variables as weights and the
principle will work.

45

OLS estimator Obj.: WLS or GLS estimator objective:


min S (Yi - B1 - B2 X1)2 min S ( Y1 - B1 - B2 Xi)2
i=1 i=1 σi

Same except that each obs. is weighted by its


variance in GLS

46

Econometrics 23
z A. Weighted Least Squares
– If the differences in variability of the error term can be
predicted from another variable within the model, the
Weight Estimation procedure (available in eviews) can
be used.
z computes the coefficients of a linear regression model using
WLS, such that the more precise observations (that is, those
with less variability) are given greater weight in determining
the regression coefficients.
– Problems:
z Wrong choice of weights can produce biased estimates of the
standard errors.
– we can never know for sure whether we have chosen the
correct weights, this is a real problem.
z If the weights are correlated with the disturbance term, then the
WLS slope estimates will be inconsistent.

47

z C. Whites Standard Errors


– White (op cit) developed an algorithm for
correcting the standard errors in OLS when
heteroscedasticity is present.
– The correction procedure does not assume any
particular form of heteroscedasticity and so in
some ways White has “solved” the
heteroscedasticity problem.

48

Econometrics 24
Summary

49

Econometrics 25