Anda di halaman 1dari 2

n

(X
i 1

X )(Yi Y )

1 =

(X
i 1

Sample cov ariance


= sample var iance
Y

X )2

0 = Y - 1 X

u i = Y 0 1 X = Y i = actual value predicted value = size of mistake


i
i
i
n
1
^
^
u i2

2
= n 2 i 1 = (Yi 0 1 Xi)2/(n 2) = SSR/(n 2)

standard error of the residuals, or standard error of the regression (SER).


1/ 2

( X i X )

SE( 1 ) = i 1
2

X
i 1

1/ 2

2
i

p-value = 2*Prob(t > tcn-2) 2-tail test

X)

i 1

n (X X )2
i

SE( 0 ) =


Cov( 1 , 0 ) =

(X
i 1

p-value = Prob(t > tcn-2) 1-tail test

If p-value 2*0.05 , reject null hypothesis, 10% significance level 2-tail test
If p-value 2*0.025 , reject null hypothesis, 5% significance level 2-tail test
If p-value 2*0.005 , reject null hypothesis, 1% significance level 2-tail test

Confidence Interval:

1 t*SE( 1 ) 1 1 + t*SE( 1 )

Explained Variation
R2 =

Variation

in Y

Random Variation
=1

Variation

n
(Yi Y )2
i 1
n
(Yi Y )2
R2 = i 1
= ESS/TSS
R2 = 1 1 n
1
(Yi Yi ) 2
SSR

2
n

2
i 1
SER =

in Y
n
(Yi Yi ) 2
i 1
n
(Yi Y ) 2
i 1
= 1 SSR/TSS

n
( X i X )2
2
i n1
1 (Y Y )2
i
i 1
R2 =

Remember

2 = SER2

R2 / k
( ESS / TSS ) / k
(1 R 2 ) / n k 1
Fc = ( SSR / TSS ) / n k 1 =
( SSRR SSRU ) / q
( RU2 R R2 ) / q
2
F = SSRU /( n k 1) = (1 RU ) / n k 1 , where q = number of restrictions Heteroskedasticity exists
when the variance of the error term is not identical for all observations.

SE( 1 ) =

1
2
( X X ) ui
1 n 2 i 1 i

*
2
n 1 n

( X i X )2
n i

1/ 2

SE(

)=

1 n 2 2
H u
1 n 2 i 1 i i
*
n 1 n 2 2

H i
n i
1

1/ 2

H i

=1-

1
X
*
n 1 n X 2
i
n i

Xi

ln Y Y / Y
lnY = X, 1 = X = X = the percent change in Y given a one unit change in X
(growth rate or rate of return)

Y
Y
Y = ln X, then Y = (X/X) 1 = ln X = X / X = the total change in Y given a one percent change in X

ln Y = ln X, then Y/Y = (X/X)


(elasticity)

ln Y
Y / Y
1 = ln X = X / X = the percent change in Y given a one percent change in X

Always use 1 less dummy variables than the number of options we are trying to explain. (to fix Multicollinearity)
R2 may be high, even though one or several t-statistics are low. Good for prediction, but bad for explanation.

Heteroskedasticity SE of estimators are not efficient


Solution: Use Eicker-Huber-White procedure to estimate heteroskedastic-robust parameters
Multicollinearity SE of estimators are not efficient - R2 may be high, one or several t-statistics are low.
Solution drop variable(s) responsible for multicollinearity
Solution expand the sample size (perfect collinearly may occur because a binary variable
on takes on 1 value or sample size is so small that that high correlation occurs
among two or more variables) (this will increase sample variance of regressors and if correlation between variables does
not increase, standard errors will decline and t-statistics will rise)

Correlation of errors across observations SE of estimators are not correct


Serial correlation is most common expression of this problem
May occur in cross sectional data with random sample if there are omitted variables that are
correlated with sampling strategy, e.g., sample is based on geographical units and there
are omitted variables that are correlated with geographic influences
Errors in Variables OLS estimator is biased and inconsistent
Dependent variable measured with error standard errors correct, but larger than they would
be without measurement error
Independent variable measured with error attenuation biase
Endogeneity Problems
Yi = 0 + 1Xi + ui, i = 1, , n, where E(ui|Xi) 0
Omitted Variable OLS estimator is biased and inconsistent
Incorrect Functional Form OLS estimator is biased and inconsistent, similar to omitted variable bias
Sample Selection Bias OLS estimator is biased and inconsistent
Simultaneous Causality bias OLS estimator is biased and inconsistent

Probit - z = is the number of standard deviations from the mean for a normally distributed random variable

Logit -

Ln
L
1 p
X k
X k
= k

z
X k

= k