SST= SSR+SSE
where:
Multiple linear regression model the fit of the regression equation to the data is improved as
𝑦�=�0+�1𝑥� +⋯+�𝐾� +𝜖� SSR increases and SSE decreases. Tests of Hypothesis for the regression coefficient
𝜖�: random error (usually = 0)
Coefficient of determination or R2 = SSR/SST = 1-SSE/SST
Linear regression function (conditional mean)
𝐸 (𝑦�/𝑥1…, 𝑥𝐾) =�0+�1𝑥� +⋯+�𝐾𝑥� interpretation of R2=X% of the variability in (dependent
variable) can be explained by the variation (independent
Error term: 𝜖�=𝑦�−𝐸(𝑦�/𝑥�) variables)
Estimation for multiple linear regression Hypothesis test for all parameters together
F-statistic
Ÿ= b0 + b1x1i + b2x2i
Decision rule
ÿ�=�0+�1𝑥�1+⋯+�𝐾𝑥�𝐾 > estimated average values for 𝑦� given
values of 𝑥�1,…,𝑥�𝐾.
b0,b1,bk… are estimators parameters for B0,B1,Bk….
δ 2
Including too few is worse than having too many in the end.
Conclusion: Null-hypothesis that all parameters are zero is 4) If H0 not rejected: Misspecification/Functional form
rejected. All parameters together are significant. is not a problem.
Prediction
above.
y i =β 0 + β 1 x i + β 2 x 2+ ϵ i = β 0 + β 1 x i+ 0
For the estimation, we need an observable variable, “instrument
A prediction interval is obtained by: z1” that satisfies 2 conditions: adds correlation effect:
1
1) z1 is uncorrelated with ui:
s ¿ becomes hence
x −− x
i ¿
¿
¿
¿ 2
¿
∑ ¿
2
so the standard
s e
For model with one independent variable: ¿
¿
C o v ( z i 1 ,u i) =0 √ ¿
C o v ( zi 1 , x i )≠0
Chapter7:
Chapter5: Functional Form This reduces the bias, but the variance of estimated coefficients
gets larger
Multicollinearity
Up to now, linear regression models were always “linear”, now it
RESET Test (Ramsey’s regression specification error test):
is not always the case and must be transformed Multicollinearity: Two
Can be used to detect issues related to omitted variables and
basically tells if there is something missing (a variable) in the or more correlated independent x
model. variables
k
s¿ =
Consequences of heteroscedasticity: The OLS estimated
standard errors sbo, sb1, sbk are incorrect (too low or too
high). This can make intervals and hypotheses tests invalid.
2
5) Calculate Chi-squared test statistic = n* R¿e
√ s 2e
∑ ( x i k −−x k ) 2
*
1
2
1−R k
VIF is >=
2) Obtain y i=β 0 + β 1 x i +ϵ i
^ and
2) Variance if Inflation Factor VIF: If VIF is > 10,
then multicollinearity is high. Accounts for Formally – residuals (as far as we always have been gone
combinations of several independent variables anyway)
if ρ=0 , then no
Example:
2( 14 * 18)
μr = =16.75
autocorrelation ( 14+18 )
n = 1091 2 2
Test stat: 1091*0.168 = 183.288 and σ u =σ t with
Crit= invX2(0.99,2) = 9.21
√
pval=X2CDF(183.288,+inf,2) = 0
Stationarity condition: −1< ρ< 1 (16.75−1 )(16.75−2)
σr= =2.7375
> Reject H0 since Test Stat > Critical value
and/or p-value < 0.01 (14+18−1)
if ρ<0 , then neg. Autkorrel (next day it will
Remedies for Heteroskedasticity
5−16.75
be pos, then and neg. Etc)
s ¿ =−4.2922
> Transform the data by using log(yi) instead ot yi Log 2.7375
Linear, the estimate the model again and do White test. If if ρ>0 , then
visually and white test tells me that this is still not enough, go to
White corrected (see next) Decision: |-4.2922| > 1.996 Reject H0, there is autocorrel,
and we have 5 runs vs 17 expected hence positive autocor.
Neg. 0≤d≤4
Chapter9:
Autoc
H1 : ρ≠ 0 (positive or negative autocorrelation)
C o v ( ϵ t , ϵ s )=C o r r ( ϵ t , ϵ s )=0 , t≠s Run Test: Issue that low power any may not detect
autocorrelation.
error terms should not be correlated
H0: No autocorrelation I f ρ=0 t h e n d=2, i f ρ=1 t h e n d= a
Autocorrelation happens in regression models with time series
data (order of observation is relevant). It means that the error T1: Nr of positive residuals
in period t is related to error in another period s. OLS
estimators are still unbiased, but std errors are biased! H1: positive OR negative autocorrelation
T2: Nr of negative residuals
Positive Autocorrelation: Errors tend to be followed by errors
of same sign. Standard Errors too small = CI too narrow
and test statistics too large (some variables may appear
r− μ r
Test statistic s= Critical
significant even if they are not) σr
Negative Autocorrelation: Errors tend to be followed by errors
of opposite sign. Standard Errors too large = CI too wide Value Z = invnorm(1-a/2,0,1)
and test statistics too small (some variables may not appear
significant even when they are)
Careful autocorrelation
when drawing conclusions about
autocorrel. The data that looks like having
autocorrelation may simply have other
Key point of PE
Assignment 1
Assignment 2
Exercises
German Bundesliga
ii. 99% confidence interval for ��1: (0.21, 0.72). Now you can
be 99% certain that the interval contains the population
parameter �1.