X 1 , X 2 , , X k (where X 1 = 1 for all observations to allow for the intercept term), an exact linear
relationship is said to exist if the following condition is satisfied.
1 X 1 + 2 X 2 + + k X k = 0
(1)
where 1 , 2 , , k are constants such that not all of them are zero simultaneously.
Today, however the term Multicollinearity is also used as the case where the not perfectly so, as follows:
1 X 1 + 2 X 2 + + k X k + vi = 0
where v i is a stochastic error term.
( 2)
Multicollinearity may also be express the nonlinear relationships among the following regression model
Yi = 0 + 1 X i + 2 X i2 + k X i3 + u i
where, say,
( 3)
Y is the total cost of production and X is the output. The variables X i2 and X i3 are obviously
Note:
The difference between perfect and less than perfect Multicollinearity: Let 2 0 then (1) can be written as
X 2i =
1 X 1i 3 X 3i k X ki 2 2 2
( 4)
which shows how X 2 is exactly linearly related to other variables or how it can be derived from a linear combination of other
X variables. In this situation, the coefficient of correlation between the variable X 2 and
the linear combination on the right side of ( 4 ) is bound to be unity. Similarly, if 2 0 then ( 2 ) can be written as
X 2i =
1 1 X 1i 3 X 3i k X ki vi 2 2 2 2
( 5)
which shows that X 2 is not an exact linear combination of other stochastic error term v i .
Why does the Classical Linear Regression Model Assume that There is no Collinearity Among X s?
The reason is, if multicollinearity is perfect the regression coefficients of the
their standard errors are infinite. If multicollinearity is less than perfect the regression coefficients although determinate, possess large standard errors (in relation to the coefficients themselves), which means the coefficients cannot be estimated with great precision or accuracy.
Sources of Multicollinearity:
Multicollinearity may be due to the following factors. a) There is a tendency of economic variables to move together over time. relationship. c) The data collection method employed For example, sampling over a limited range of the values taken by the regressors in the population. d) Constraints on the model or in the population being sampled For example, in the regression of electricity consumption on income
b) The use of lagged values of some explanatory variables as separate independent factors in the
(X2)
( X3) ,
there is a physical constraint in the population in that families with higher incomes generally have larger homes than families with lower incomes. e) Model specification For example, adding polynomial terms to a regression model, especially when the range of the variable is small. f) An over-determined model This happens when the model has more explanatory variables that the number of observations. This could happen in medical research where there may be a small number of patients about whom information is collected on a large number of variables.
Theorem: If the intercorrelation between the explanatory variables is perfect rxi x j = 1 , then
a) The estimates of the coefficients are indeterminate, and b) The standard errors of these estimates become infinitely large.
Proof (a):
Suppose that the relation to be estimates is Y = 0 + 1 X 1 + 2 X 2 + u and that X 1 and X 2 are related with the exact relation X 2 = kX 1 , where k is any arbitrary constant number.
and are The formulae for the estimation of the coefficients 1 2
( ) ( ) k ( x y )( x ) k ( x y )( x ) 0 = = 0 k ( x ) k ( x )
1 2 2 1 2 2 1 1 2 1 2 2 2 1
Therefore the parameters are indeterminate i.e. there is no way of finding separate values of each coefficient.
Proof (b):
If rxi x j = 1 , the standard errors of the estimates become infinitely large. In the two variable model
Y = 0 + 1 X 1 + 2 X 2 + u
If X 1 and X 2 are perfectly correlated X 2 = kX 1 , where k is any arbitrary constant number the variances
and will be of 1 2
2 x2 ( ) 2 x12 x22 ( x1 x2 ) 2 x1 2 V ( 2 ) = u 2 x12 x22 ( x1 x 2 )
=2 V 1 u
and
=2 V 1 u =2 V 2 u
( )
and
( )
2 k 2 x1 x12 k 2
2 k 2 x2
( x )
2 2 1
2 =u
x22
0
2 = u
x12
0
if X 3 and X 2 are perfectly collinear, there is no way X 3 can be kept constant. It means that there is no way to get the separate influences of X 3 and X 2 from the given sample.
x + x +u i yi = 2 2i 3 3i
(1)
Since X 2 and X 3 are perfectly correlated, we may assume that they are related as X 3i = kX 2i . Substitute
2i
x 2i + u i yi =
Applying the usual OLS formula to ( 2 ) , we get
= 2 + k3 where,
( 2)
+ k = = 2 3
) xx y
2i i 2 2i
( 3)
( 4)
gives us only one equation in two unknowns and there is an infinity of solutions to
( 4)
and k . So in the case of perfect multicollinearity one cannot get a unique solution for the individual regression coefficients. But one can get a unique solution for linear combinations of these coefficients. The linear
+ k is uniquely estimated by combination 2 3
(1)
Since X 2 and X 3 are not perfectly correlated, we may assume that they are related as
X 3i = kX 2i + vi
( 2)
=0 .
x 2i vi
In this case, estimation of regression coefficients 2 and 3 may be possible. Now, Substitute ( 2 ) into
= 2
We get ,
where
( x )( x + v ) ( x )
2 2i 2 2 2i 2 i
2 2 2i
( 3)
x 2i vi
( 3)
sufficiently small, say, very close to zero, ( 2 ) will indicate almost perfect collinearity and we shall be back to eh indeterminate case of perfect collinearity.
In cases of near or high multicollinearity, one is likely to encounter the following consequences: 1. 2. 3. Although BLUE, the OLS estimators have large variances and co-variances. Because of consequence 1, the confidence intervals tend to be much wider, leading to the acceptance of the zero null hypothesis (i.e., the true population coefficient is zero) more readily. Also because of consequence 1, the insignificant. 4. Although the
of goodness of fit, can be very high. 5. The OLS estimators and their standard errors can be sensitive to small changes in the data.
(1)
= V 2 = V 3
( ) ( ) (
2 2 x 22i 1 r23
) )
r23 2
2 2i 2 3i
( 2) ( 3) ( 4)
2 2 x32i 1 r23
, = cov 2 3
(1 r ) x x
2 23
where r23 is the coefficient of correlation between X 2 and X 3 . It is apparent from ( 2 ) and
( 3)
the two estimators increase and in the limit when r23 = 1 , they are infinite. It is equally clear from ( 4 ) the as
r23 increases toward 1, the covariance of the two estimators also increases in absolute value.
The speed with which variances and covariances increase can be seen with the variance-inflating factor (VIF, which is defined as
VIF =
(1 r )
2 23
( 5)
2 VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As r23 approaches
1 , the VIF approaches infinity. That is, as the extent of collinearity increases, the variance of an estimator
increases, and in the limit it can become infinite. So we can write
2 = V VIF 2 2 x2 i 2 = V VIF 3 2 x3 i
( ) ( )
and are directly proportional to the VIF. which show that the variance of 2 3
For the k -variable model, the variance of the k th coefficient can be expressed as:
2 = V j xi2
( )
1 1 R2 j
( 6)
j
on
( )
TOL j =
1 = 1 R2 j VIF j
( 7)
2 2 When R j =1 (i.e., perfect collinearity), TOL j = 0 and when R j = 0 (i.e., no collinearity whatsoever),
TOL j is
1 . Because of the intimate connection between VIF and TOL, one can use them interchangeably.
Insignificant t Ratio:
To test the null hypothesis that, say, 2 = 0 , we use the
( )
t value with the critical t value from the t table. But in cases of high collinearity the estimated standard
errors increase dramatically, thereby making the
accept the null hypothesis that the relevant true population value is zero.
problem is the covariances between the estimators, which are related to the correlations between the regressors.
Sensitivity of OLS Estimators and Their Standard Errors to Small Changes in Data:
As long as multicollinearity is not perfect, estimation of the regression coefficients is possible but the estimates and their standard errors become very sensitive to even the slightest change in the data.
Detection of Multicollinearity:
There are different methods of detecting multicollinearity such as-
a)
t ratios
b) High pair wise correlations among regressors c) e) f) Examination of partial correlations Eigen values and condition index Tolerance and variance inflation factor d) Auxiliary regressions
F test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneously equal to zero, but the individual zero.
t tests will show that none or very few of the partial slope coefficients are statistically different from
Yi = 1 + 2 X 2i + 3 X 3i + 4 X 4i + u i
and suppose that X 4i = 2 X 2i + 3 X 3i where 2 and 3 are constants, not both zero. Obviously, X 4 is an exact linear combination of X 2 and
2 X 3 , giving R4 23 = 1 , the coefficient of determination is the regression of X 4 on X 2 and X 3 .
We know that,
2 R4 23 =
(1)
1=
It can be seen that values.
( 2)
( 2)
is satisfied by r42 = 0.5, r43 = 0.5 and r23 = 0.5 , which are not very high
Therefore, in models involving more than two explanatory variables, the simple or zero-order correlation will not provide an infallible guide to the presence of multicollinearity. Of course, if there are only two explanatory variables, the zero-order correlations will be sufficient.
2 2 2 r 1234 , r 1324 and r 1423 are comparatively low which suggests that the variables X 2 , X 3 and X 4 are
d) Auxiliary Regressions:
One way of finding out multicollinearity that is which each X i on the remaining
X variables and compute the corresponding R 2 . Each one of these regressions is Y on the X s. Then
2 Rx i x2 xk
Fi =
n
(1
2 Rx i x 2 xk
( k 2) ( n k + 1)
~ (2k 2 ) ,( n k +1)
where
stands for the sample size, k stands for the number of explanatory variables including the intercept
2 term, and R xi x2 xk is the coefficient of determination in the regression of variable X i on the remaining
F exceeds the critical Fi at the chosen level of significance, it is taken to mean that the X s; if it does not exceed the critical Fi , we say that it is not collinear
X s.
If k is between 100 and 1000 there is moderate to strong multicollinearity and if it exceeds 1000 there is severe multicollinearity. Alternatively, if the CI(=
k ) is between
f)
of multicollinearity. The closer is TOL j to zero, the greater the degree of collinearity of that variable with the other regressors. On the other hand, the closer the TOL j is to collinear with the other regressors.
is not
Remedial Measures:
There are some rules for removing multicollinearity. a) A priori information b) Combining cross sectional and time series data
c) e) f)
Dropping a variable(s) and specification bias Additional or new data Reducing collinearity in polynomial regressions
d) Transformation of variables
a) A Priori Information:
Suppose we consider the model Yi = 1 + 2 X 2i + 3 X 3i + u i where
between 2 and 3 . We can get priori information from previous empirical work in which the collinearity problem happens to be less serious or from the relevant theory underlying the field of study.
ln Yi = 1 + 2 ln Pt + 3 ln I t + u t
where
Y = number of cars sold, P = average price, I = income, and t = time. Our objective is to
estimate the price elasticity 2 and income elasticity 3 . In time series data the price and income variables generally tend to be highly collinear. Therefore, we cannot run the preceding regression. A way out of this, if we have cross-sectional data, we can obtain a fairly reliable estimate of the income elasticity 3 because in such data, which are at a point in time, the prices do not very
. Using this estimate, we may write as much. Let the cross-sectionally estimated income elasticity be 3
Yt* = 1 + 2 ln Pt + u t
ln I , that is, Y * represents that value of where Y * = ln Y 3
income. We can now obtain an estimate of the price elasticity 2 from the preceding regression.
d) Transformation of Variables:
Suppose we have time series data on consumption expenditure, income and wealth. Income and wealth are highly correlated. One way of minimizing this dependence is as follows. If the relation
Yt = 1 + 2 X 2t + 3 X 3t + u t
holds at time
(1)
t , it must also hold at time t 1 because the origin of time is arbitrary anyway.
Therefore, we have
Yt 1 = 1 + 2 X 2,t 1 + 3 X 3,t 1 + u t 1
If we subtract ( 2 ) from (1) , we obtain
( 2)
Yt Yt 1 = 2 ( X 2t X 2,t 1 ) + 3 ( X 3t X 3,t 1 ) + vt
where vt = u t u t 1 . Equation (3) is known as the first difference form.
( 3)
The first difference regression model often reduces the multicollinearity because there is no reason that differences of the variables will also be highly correlated. Another type of transformation may be ration transformation.