Agenda
Definition of multicollinearity Is multicollinearity really a problem? Nature of multicollinearity Practical consequences Detection of multicollinearity Remedial measures
Definition
Meaning of Multicollinearity: It explains presence of a perfect (or) less than perfect linear relationship among some or all Xs Assumption of OLS: X variables are independent. That is, they are not correlated with each other
Example
As a numerical example, consider the following hypothetical data: X2 X3 X*3 10 50 52 15 75 75 18 90 97 24 120 129 30 150 152
Explanation
It is apparent that X3i = 5X2i . Therefore, there is perfect collinearity between X2 and X3 since the coefficient of correlation r23 is unity. The variable X*3 was created from X3 by simply adding to it the following numbers, which were taken from a table of random numbers: 2, 0, 7, 9, 2. Now there is no longer perfect collinearity between X2 and X*3. (X3i = 5X2i + vi ) However, the two variables are highly correlated because calculations will show that the coefficient of correlation between them is 0.9959.
Diagram
If I and W have linear relationship (i.e. if they move exactly together), there is no way to asses their separate influence on consumption
Sources of multicollinearity
Model Constraints or Specification: Electricity consumption = f (Income, House size) Both Xs are important for the model Because, families with high incomes generally have larger homes than otherwise Overestimated model: When the model has more Xs than the number of observations Example, Information collected on large number of variables from small no. of patients
Data generation
Most economic data are not obtained in controlled laboratory experiments Example: GDP, stock prices, profits If data could be obtained experimentally, we would not allow collinearity to come up Thus, multicollinearity is a sample phenomenon arising out of non-experimental data generated in social sciences
Example
X1
10 15 18 24 30
X2
50 75 90 120 150
X3
52 75 97 129 152
Detection
Kliens rule of thumb Instead of formally testing all auxiliary R2 values, one may adopt Kliens rule of thumb, which suggests that multicollinearity may be a troublesome problem only if the R2 obtained from an auxiliary regression is greater than the overall R2, that is, that obtained from the regression of Y on all the regressors.
Of course, like all other rules of thumb, this one should be used judiciously.
Practical consequences
Theory Vs. Practice: Theoretically inclusion of two Xs might be warranted. But, it may create practical problems Example: Consumption = f (Income, Wealth) Although, income & wealth are logical candidates to explain consumption, they might be correlated wealthier people tend to have higher incomes Ideal solution here would be to have sample observations of both wealthy individuals with low-income and high-income individuals with low wealth
Sensitivity of results:
In the presence of multicollinearity, OLS estimators and their SEs become very sensitive to small changes in data
Remedial measures
What can be done if multicollinearity is serious? We have two choices: Do nothing or
Follow some rules of thumb.
Do Nothing.
Why, because Multicollinearity is essentially a data deficiency problem (micronumerosity) and some times we have no choice over the data we have available for empirical analysis.
Dropping of one of the colinear variables. Substitution of lagged variables for other explanatory variables in distributed lag models Application of methods incorporating extraneous quantitative information Introduction of additional equations in the model
Conclusion
We cant tell which of these methods will work in a given case No single diagnostic will give as complete handle over the collinearity problem Since it is a sample specific problem, in some situations it might be easy to diagnose But in others one or more of various methods will have to be used In short, there is no easy solution to the problem
Thank You