Lec1 Regression

An Investigation into Regression Model using EVIEWS
Prepared by: Sayed Hossain Lecturer for Economics Multimedia University Personal website: www.sayedhossain.com Email: sayed.hossain@yahoo.com
Seven assumptions about a good regression model 1. 2. 3. Regression line must be fitted to data strongly. Most of the independent variables should be individually significant to explain dependent variable Independent variables should be jointly significant to influence or explain dependent variable. The sign of the coefficients should follow economic theory or expectation or experiences or intuition. No serial or auto-correlation in the residual (u) The variance of the residual (u) should be constant meaning that homoscedasticity The residual (u) should be normally distributed.
2
4.
5. 6. 7.
(Assumption no. 1)
Regression line must be fitted to data strongly (Goodness of Data Fit)
*** Guideline : R2 => 60 percent (0.60) is better
Goodness of Data Fit Data must be fitted reasonable well. That is value of R2 should be reasonable high, more than 60 percent. Higher the R2 better the fitted data.
www.sayedhossain.com
4
(Assumption no. 2) Most of the independent variables should be individually individually significant ** t- test
t test is done to know whether each and every independent variable (X1, X2 and X3 etc here) is individually significant or not to influence the dependent variable, that is Y here.
5
Individual significance of the variable

Most of the independent variables should be individually significant. This matter can be checked using t test. If the p-value of t statistics is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis. If we can reject the null hypothesis, it means that particular independent variable is significant to influence dependent variable in the population.
6
For Example>>
Variables:
We have four variables, Y, X1, X2 X3 Here Y is dependent and X1, X2 X3 are independent Population regression model Y = Bo + B1X1+ B2X2 + B3X3 + u Sample regression model Y = bo + b1X1+ b2X2 + b3X3 + e Here, sample regression line is a estimator of population regression line. Our target is to estimate population regression line (which is almost impposible or time and money consuming to estimate) from sample regression line. For example, small b1, b2 and b3 are estimators of big B1, B2 and B3
Here, u is the residual for population regression line while e is the residual for sample regression line. e is the estimator of u. We want to know the nature of u from e.
Tips
If the sample collection is done as per the statistical guideline (several random procedures) then sample regression line can be a representative of population regression line. Our target is to estimate the population regression line from a sample regression line.
Setting hypothesis for t test : An example

Null Hypothesis: Bo=0 Alternative hypothesis: Bo0 Null hypothesis : B1=0 Alternative hypothesis: B10 Null Hypothesis : B2=0 Alternative hypothesis: B20 Null Hypothesis : B3=0 Alternative hypothesis: B3 0 Hypothesis setting is always done for population, not for sample. That is why we have taken all big B (from population regression line) but not small b from sample regression line.
9
Hypothesis Setting
Null hypothesis : B1=0 Alternative hypothesis: B10
Since the direction of alternative hypothesisis is , meaning that we assume that there exists a relationship between independent variable (X1 should be here) with dependent variable (Y here) in the population. But it can not say whether the relationship is negative or positive. This direction is a two tail hypothesis.
Null hypothesis : B1=0 Alternative hypothesis: B1<0

` But if we set hypothesis as above, then we assume that in the population, there exists a negative relationship between X1 and Y as the direction in alternative hypothesis is <. It requires one tail test.
(Assumption no. 3) Joint Significace Independent variables should be jointly significant to explain dependent variable ** F- test ANOVA (Analysis of Variance)
11
Joint significance
Independent variables should be jointly significant to explain Y. This can be checked using F-test. If the p-value of F statistic is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis. If we can reject null hypothesis, it means that all the independent variables (X1, X2 X3 ) jointly can influence dependent variable, that is Y here.
12
Joint hypothesis setting

Null hypothesis Ho: B1=B2=B3=0 Alternative H1: Not all Bs are simultaneously equal to zero Here Bo is dropped as it is not associated with any variable.
Here also taken all big B

13
Few things Residual ( u or e) = Actual Y estimated (fitted) Y Residual, error term, disturbance term all are same meaning. Serial correlation and auto-correlation are same meaning.
14
(Assumption no. 4)
The sign of the coefficients should follow economic theory or expectation or experiences of others (literature review) or intuition.
15
Residual Analysis
(Assumption no. 5)
No serial or auto-correlation in the residual (u).
**
Breusch-Godfrey serial correlation LM test : BG test
17
Serial correlation
Serial correlation is a statistical term used to the describe the situation when the residual is correlated with lagged values of itself. In other words, If residuals are correlated, we call this situation serial correlation which is not desirable.
18
How serial correlation can be formed in the model? Incorrect model specification, omitted variables, incorrect functional form, incorrectly transformed data.
19
Detection of serial correlation

Many ways we can detect the existence of serial correlation in the model. An approach of detecting serial correlation is Breusch-Godfrey serial correlation LM test : BG test
20
Hypothesis setting
Null hypothesis Ho: no serial correlation (no correlation between residuals (ui and uj)) Alternative hypothesis H1: serial correlation (correlation between residuals (ui and uj )
21
(Assumption no. 6) The variance of the residual (u) is constant (Homoscedasticity) ***
Breusch-Pegan-Godfrey Test
22
Heteroscedasticity is a term used to the describe the situation when the variance of the residuals from a model is not constant. When the variance of the residuals is constant, we call it homoscedasticity. Homoscedasticity is desirable. If residuals do not have constant variance, we call it hetersocedasticty, which is not desirable.
23
How the heteroscedasticity may form? Incorrect model specification, Incorrectly transformed data,
24
Hypothesis setting for heteroscedasticity Null hypothesis Ho: Homoscedasticity (the variance of residual (u) is constant) Alternative hypothesis H1 : Heteroscedasticity (the variance of residual (u) is not constant )
25
Detection of heteroscedasticity
There are many test involed to detect heteroscedasticity. One of them is Bruesch-Pegan-Godfrey test which we will employ here.
26
(Assumption no. 7) Residuals (u ) should be normally distributed **

Jarque Bera statistics
27
Setting the hypothesis:

Null hypothesis Ho : Normal distribution (the residual (u) follows a normal distribution) Alternative hypothesis H1: Not normal distribution (the residual (u) follows not normal distribution)
Detecting residual normality: Histogram-Normality test (Perform JarqueBera Statistic).

If the p-value of Jarque-Bera statistics is less than 5 percent (0.05) we can reject null and accept the alternative, that is residuals (u) are 28 not normally distributed.
An Emperical Model Development
29
Our hypothetical model

Variables:
We have four variables, Y, X1, X2 X3

Here Y is dependent and X1, X2 and X3 are independent
Population regression model Y = Bo + B1X1+ B2X2 + B3X3 + u Sample regression line Y = bo+ b1X1+ b2X2+b3X3 + e
30
DATA
Sample size is 35 taken from population
DATA
obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 RESID 0.417167 -0.27926 -0.17833 0.231419 -0.33278 0.139639 -0.01746 -0.14573 0.480882 -0.0297 -0.32756 0.016113 -0.34631 0.485755 0.972181 -0.22757 -0.2685 -0.41902 -0.4259 0.076632 X1 1700 1200 2100 937.5 7343.3 837.9 1648 739.1 2100 274.6 231 1879.1 1941 2317.6 471.4 678 7632.9 510.1 630.6 1500 X2 1.2 1.03 1.2 1 0.97 0.88 0.91 1.2 0.89 0.23 0.87 0.94 0.99 0.87 0.93 0.79 0.93 0.93 0.93 1.03 X3 20000 18000 19000 15163 21000 15329 16141 21876 17115 23400 16127 17688 17340 21000 16000 16321 18027 18023 15634 17886 Y 1.2 0.65 0.6 1.2 0.5 1.2 1 0.65 1.5 1.5 0.75 1 0.6 1.5 2 0.9 0.6 0.6 0.6 1 YF 0.782833 0.929257 0.778327 0.968581 0.832781 1.060361 1.017457 0.795733 1.019118 1.529701 1.077562 0.983887 0.946315 1.014245 1.027819 1.127572 0.868503 1.019018 1.0259 0.923368
DATA
obs 21 22 23 24 RESID -0.37349949 0.183799347 0.195832507 -0.46138707 X1 1618.3 2009.8 1562.4 1200 X2 1.1 0.96 0.96 0.88 X3 16537 17655 23100 13130 Y 0.5 1.15 1.15 0.6 YF 0.873499 0.966201 0.954167 1.061387
25
26 27
0.309577968
-0.21073204 -0.08351157
13103
3739.6 324
1
0.92 1.2
20513
17409 14525
1
0.75 0.75
0.690422
0.960732 0.833512
28
29 30
-0.02060854
0.14577644 -0.06000649
2385.8
1698.5 544
0.89
0.93 0.87
15207
15409 18900
1
1.15 1
1.020609
1.004224 1.060006
31
32 33
-0.50510204
0.870370225 0.274774344
1769.1
1065 803.1
0.45
0.65 0.98
17677
15092 18014
0.85
2.1 1.25
1.355102
1.22963 0.975226
34
35
-0.1496757
0.062732149
1616.7
210
1
1.2
28988
21786
0.75
0.87
0.899676
0.807268
Y, X1, X2 and X3 are actual sample data collected from population YF= Estimated, forecasted or predicted Y RESID (e) = Residuals of the sample regression line that is, e=Actual Y Predicted Y (fitted Y)
Regression Output
Regression output
Dependent Variable: Y Included observations: 98 Variable C X1 X2 X3 Coefficient 1.800 -2.11E-05 -0.7527 -3.95E-06 Std. Error 0.4836 2.58E-05 0.3319 2.08E-05 t-Statistic 3.72 -0.820 -2.267 -0.189 Prob. 0.0008 0.4183 0.0305 0.8509 35 Observation Method: Least Squares
R-squared Adjusted R-squared S.E. of regression
0.1684 0.087 0.3736
Mean dependent var S.D. dependent var Akaike info criterion
0.9834 0.3912 0.9762
Sum squared resid

Log likelihood Durbin-Watson stat
4.328
-13.08 2.184
Schwarz criterion
F-statistic Prob(F-statistic)
1.15
2.093 0.1213
35
Few things
t- statistics= Coeffient / standard error
t-statistics (absolute value) and p values always move in opposite direction
Output
Actual Y, Fitted Y, Residual and its plotting
obs 1 Actual 1.2 Fitted 0.782832991 Residual 0.417167009 Residual Plot | . | .* |
2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.65
0.6 1.2 0.5 1.2 1 0.65 1.5 1.5 0.75 1 0.6 1.5 2 0.9 0.6 0.6 0.6 1
0.92925722
0.778327375 0.96858115 0.8327808 1.060360549 1.017457055 0.79573323 1.019118163 1.529701243 1.077562408 0.983887019 0.946314864 1.014244939 1.027819105 1.127572088 0.868503447 1.019018495 1.025899595 0.923368304
-0.27925722
-0.178327375 0.23141885 -0.3327808 0.139639451 -0.017457055 -0.14573323 0.480881837 -0.029701243 -0.327562408 0.016112981 -0.346314864 0.485755061 0.972180895 -0.227572088 -0.268503447 -0.419018495 -0.425899595 0.076631696
|
| | | | | | | | | | | | | | | | | |
.* | .
.*| . . | *. * | . . |*. . * . .*| . . | .* . * . * | . . * . * | . . | .* . | . .* | . .* | . *. | . *. | . . |* .
|
| | | | | | | | | | | | * | | | | |
Output
Actual Y, Fitted Y, Residual and its plotting
obs 21 Actual 0.5 Fitted 0.873499486 Residual -0.373499486 Residual Plot | * | . |
22
23 24 25 26 27 28 29 30 31 32 33 34 35
1.15
1.15 0.6 1 0.75 0.75 1 1.15 1 0.85 2.1 1.25 0.75 0.87
0.966200653
0.954167493 1.061387074 0.690422032 0.960732042 0.833511567 1.020608541 1.00422356 1.060006494 1.355102042 1.229629775 0.975225656 0.899675696 0.807267851
0.183799347
0.195832507 -0.461387074 0.309577968 -0.210732042 -0.083511567 -0.020608541 0.14577644 -0.060006494 -0.505102042 0.870370225 0.274774344 -0.149675696 0.062732149
|
| | | | | | | | | | | | |
. |*.
. |*. *. | . . | * .*| . . *| . . * . . |*. . *| . *. | . . | . . | *. .*| . . |* .
|
| | | | | | | | | *| | | |
Actual Y, Fitted Y and Residual

2.4 2.0 1.6 1.2 1.2 0.8 0.8 0.4 0.4 0.0 -0.4 -0.8 5 10 15 20 Actual 25 Fitted 30 35
Residual
Sample residual
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 5 10 15 20 Y Residuals 25 30 35
(Assumption no. 1) Goodness of Fit Data R-square: 0.1684 It means that 16.84 percent variation in Y can be explained jointly by three independent variables such as X1, x2 and X3. The rest 83.16 percent variation in Y can be explained by residuals or other variables other than X1 X2 and X3.
41
(Assumption no. 2)
Joint Hypothesis : F statistics F statistics: 2.093 and Prob 0.1213
Null hypothesis Ho: B1=B2=B3=0 Alternative H1: Not all Bs are simultaneously equal to zero
Since the p-value is more tha than 5 percent (here 12.13 percent), we can not reject null. In other words, it means that all the independent variables (here X1 X2 and X3) can not jointly explain or influence Y in the population.
42
Assumption No. 3
Independent variable significance
For X1, p-value : 0.4183 Null Hypothesis: B1=0 Alternative hypothesis: B10 Since the p-value is more than 5 percent (0.05) we can not reject null and meaning we accept null meaning B1=0. In other words, X1 can not influence Y in the population.
For X2, p-value: 0.0305 (3.05 percent)

Null Hypothesis: B2=0 Alternative hypothesis: B20
Since p-value (0.03035) is less than 5 percent meaning that we can reject null and accept alternative hypothesis. It means that variable X2 can influence variable Y in the population but what direction we can not say as alternative hypothesis is .
For X3, p-value: 0.8509. So X3 is not significant to explain Y.
43
Assumption No. 4
Sign of the coefficients Our sample model: Y=bo+b1x1+b2x2+b3x3+e Sign we expected after estimation as follows: Y=bo - b1x1 + b2x2 - b3x3 Decision : The outcome did not match with our expectation. So assumption 4 is violated.
Assumption no 5 SERIAL OR AUTOCORRELATION

Breusch-Godfrey Serial Correlation LM Test: F-statistic 1.01 Obs*R-squared 2.288 Prob. F(2,29) 0.3751 Prob. Chi-Square(2) 0.3185
Null hypothesis : No serial correlation in the residuals (u) Alternative: There is serial correlation in the residuals (u)
Since the p-value ( 0.3185) of Obs*R-squared is more than 5 percent (p>0.05), we can not reject null hypothesis meaning that residuals (u) are not serially correlated which is desirable.
45
Assumption no. 6
Heteroscedasticy Test
Breusch-Pegan-Godfrey test (B-P-G Test) F-statistic 1.84 Probability 0.3316 Obs*R-squared 3.600 Probability 0.3080 Null Hypothesis: Residuals (u) are Homoscedastic Alternative: Residuals (u) are Hetroscedastic The p-value of Obs*R-squared shows that we can not reject null. So residuals do have constant variance which is desirable meaning that residuals are homoscedastic. B-P-G test normally done for large sample
46
Assumption no. 7
Residual (u) Normality Test
6
Series: Residuals Sample 1 35 Observations 35 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis Jarque-Bera Probability
-0.6 -0.4 -0.2 -0.0 0.2 0.4 0.6 0.8 1.0
1.15e-16 -0.029701 0.972181 -0.505102 0.356788 0.880996 3.508042 4.903965 0.086123
Null Hypothesis: residuals (u) are normally distribution Alternative: Not normally distributed Jarque Berra statistics is 4.903 and the corresponding p value is 0.08612. Since p vaue is more than 5 percent we accept null meaning that population residual (u) is normally distrbuted which fulfills the 47 assumption of a good regression line.
Evaluation of our model on the basis of assumptions

1. 2. 3. 4. 5. 6. 7. R-square is very low ( Bad sign) There is no serial correlation (Good sign) Independent variables are not jointly can influence Y (Bad sign) Signs are not as expected (Bad sign) Only X2 variable is significant out of three (Bad sign). Heteroscedasticity problem is not there (Good sign) Residuals are normally distributed (Good sign)
48
References
Essentials of Econometrics by Damodar Gujarati, McGraw Hill Publication. Basic Econometrics by Damodar Gujarati, McGraw Hill Publication. An Introduction to Econometrics by Cheng Ming Yu, Sayed Hossain and Law Siong Hook. McGraw Hill Publication.
Prepared by: Sayed Hossain Lecturer for Economics Multimedia University, Malaysia Peronal website: www.sayedhossain.com Email: sayed.hossain@yahoo.com Year: 2009 Use the information of this website on your own risk. This website shall not be responsible for any loss or expense suffered in connection with the use of this website. PLEASE COMMENT IN MY GUESTBOOK AT: www.sayedhossain.com
50

Lec1 Regression

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lec1 Regression

Diunggah oleh

Hak Cipta:

Format Tersedia

An Investigation into Regression Model using EVIEWS

*** Guideline : R2 => 60 percent (0.60) is better

Individual significance of the variable

Setting hypothesis for t test : An example

Null hypothesis : B1=0 Alternative hypothesis: B1<0

Joint hypothesis setting

Here also taken all big B

Detection of serial correlation

(Assumption no. 7) Residuals (u ) should be normally distributed **

Setting the hypothesis:

Detecting residual normality: Histogram-Normality test (Perform JarqueBera Statistic).

An Emperical Model Development

Our hypothetical model

We have four variables, Y, X1, X2 X3

R-squared Adjusted R-squared S.E. of regression

0.1684 0.087 0.3736

Mean dependent var S.D. dependent var Akaike info criterion

0.9834 0.3912 0.9762

Sum squared resid

Actual Y, Fitted Y and Residual

For X2, p-value: 0.0305 (3.05 percent)

For X3, p-value: 0.8509. So X3 is not significant to explain Y.

Assumption no 5 SERIAL OR AUTOCORRELATION

1.15e-16 -0.029701 0.972181 -0.505102 0.356788 0.880996 3.508042 4.903965 0.086123

Evaluation of our model on the basis of assumptions

Anda mungkin juga menyukai