Anda di halaman 1dari 26

ECON2209
Slides 02

BF-02

Ch.2 Brief Review

Lecture Plan
A review of probability theory
A review of linear regressions

Simple linear regression and scatter plot

Multiple linear regression
Ordinary least squares
Statistical inference in linear regressions
Model selection

EViews
Quick start (more will be introduced gradually)
Understand EViews output

BF-02

Ch.2 Brief Review

Cumulative Distribution

0.0

0.2

0.4

0.6

P(X<=x)

0.8

1.0

-4

-2

eg. Tomorrow BHP share price; Outcome of the next election

Probability distribution
cumulative distribution (cdf): F(x) = Prob(X x)
probability density (pdf):
f(x) = derivative of F(x)

Expectation operation E[g(X)]

It is the average
weighted by
pdf/pmf.

g ( x) f( x)dx

E[ g ( X )] =

for continuous RV

g ( x) P( X = x)

.
for discrete RV

all x

BF-02

A review of probability theory

Mean and variance of a RV
= E(X)
: measure of centre location; (a point prediction)
2 = E[(X)2] : measure of dispersion.

Joint probability distribution of 2 RVs

joint cdf:
joint pdf:

FXY(x,y) = P(X x, Y y)
fXY(x,y) = partial derivative of FXY(x,y).

Marginal probability distributions

obtained by letting y (or x) go to infinity in FXY(x,y),
FX(x) = FXY(x,) or FY(y) = FXY(,y).

BF-02

A review of probability theory

Independence
Two RVs X and Y are independent if
FXY(x,y) = FX(x)FY(y)
or
fXY(x,y) = fX(x)fY(y).
When independent, E(XY) = E(X)E(Y) .

Correlation between 2 RVs

covariance: Cov(X, Y) = E[(XX)(YY)]
correlation: = Cov(X,Y)/(XY)

Conditional distribution of Y given X = x

The conditional pdf is fY|X(y|x) = fXY(x,y)/fX(x).
When independent, Conditional pdf = Marginal pdf.
BF-02

A review of probability theory

Conditional expectation
Given X, the conditional expectation of Y, E(Y|X),
is the average of Y weighted by the conditional pdf.
A forecasting model is built on historic data or
observed information set.
The conditional (on the data) distribution of future Y,
which utilises available information, is the object we
aim to understand.
X = history up to now (ie, data)
Y = future value of interest.
Then, the conditional distribution
pdf(Y|X) is all we need for forecasting Y.
BF-02

A review of linear regression models

Simple linear regression model
yt = 0 + 1 xt + t .

A dependent variable yt is linearly explained by an

independent variable xt (regressor), subject to a
random disturbance t .
Scatter plot (y vs x)
Y vs

14
13
12

eg. xyz.dat
Y
6.9105
8.9035
13.0322
12.1195
9.1661
9.3989

Z
-1.7184
1.6342
-0.7165
1.1775
0.1522
-0.9960

10

X
-1.3044
0.0268
0.9392
0.9108
0.6078
-1.4191

11

9
8
7
6
5

...

BF-02

-3

-2

-1

A review of linear regression models

Assumptions about the disturbance (error term) t
t ~ iid (0, 2): identically independently distributed (iid)
with mean 0 and variance 2;
yt = 0 + 1 xt + t
b) t is independent of regressor xt .
a)

These imply
E(t | xt) = 0 ;
t and xt are uncorrelated, ie, Cov(t , xt ) = 0;
E(yt | xt) = 0 + 1xt . (population regression function)

If the parameters (0, 1, 2) are know, y may be

predicted by the regression function for any given x.
BF-02

A review of linear regression models

Ordinary least squared (OLS) estimation
( 0 , 1 ) minimises

(y
t =1

0 1 xt ) 2

Sum of
squared residuals
(SSR)

T
T
1
1
( yt 0 1 xt ) 2 =
et2 .
2 =

T 2 t =1 sample variance of residuals T 2 t =1

standard error of es2mate - unbiased es2mator of the variance of the error variable - square root of the
above

Fitted value and residual

y t = 0 + 1 xt ,

et = yt y t = yt 0 1 xt .

fitted value = sample regression function

= in-sample forecast
BF-02

A review of linear regression models

eg. xyz.dat
y t = 0 + 1 xt ,

0 = 9.959319, 1 = 0.949749.
14
12

Residual plot:

10
8

4
0
-2
-4
5

10

15

20

Residual

BF-02

25

30

Actual

35

40
Fitted

45

X
-1.3044055855
0.0268445360
0.9391732559
0.9108356314
0.6078185429
-1.4190856454
1.6450399496
-1.5705456746
1.1725971823
-0.0001469589
-1.9102214403
0.4106547298
-0.2633816925
2.9898141231
-1.1719854886
1.8628263016
1.1237612050
-1.8801405887
0.3094047477
-1.9164071601
0.0443377940
2.1231233412
-0.1525597566
-0.0302621863
-0.5267709581
1.1087466899
-0.7870372302
0.1217747790
-0.6272752055
3.0619892478
1.0277405158
2.2274783060
-0.0633318382
-0.9195725833
-0.4661417205
1.3575193910
1.1639896889
0.4570019947
-1.6182578476
-2.8131818585
-0.1329263293
-0.5127844249
1.3569336398
-0.8396330238
0.6548109587
-0.2817164164
0.7701143610
-0.0454431404

Y
6.910518
8.903510
13.032210
12.119516
9.166077
9.398871
10.861943
6.900604
8.557908
10.384578
8.951113
10.502956
9.012794
13.172481
7.294597
13.475142
11.766178
6.552416
11.078862
11.977582
9.175949
10.706387
8.633693
8.111663
7.630387
10.874706
11.471748
10.913112
8.379696
11.959233
10.103590
11.681388
9.716037
8.469919
9.167781
10.512578
10.094872
12.046379
9.805910
5.811298
11.314932
10.729959
13.846940
10.350224
10.837194
7.928579
10.755102
12.906690

Z
-1.71836069
1.63420482
-0.71649462
1.17746037
0.15218715
-0.99595505
1.95295568
-0.29073338
-1.35916970
-0.36981642
-0.53368660
0.01555975
0.39533785
0.62545839
1.05345240
-2.20815358
-0.54770754
1.50752962
-1.85750582
-2.60940240
0.07977844
0.70568532
1.29592881
-0.25520209
-1.46954656
0.56564230
-0.29023287
-1.09615388
-0.47673444
1.22468138
1.03023708
-0.12631043
-1.02449478
1.50979638
0.19944663
1.80739691
0.56168127
-0.69538078
-1.20063045
0.41839468
-1.72367774
-1.09610913
-1.07509952
0.23852197
1.16375597
0.50112855
0.62021869
-1.10814255

10

A review of linear regression models

Multiple linear regression:
y is explained by more than one x variable.
yt = 0 + 1 x1t +

+ K xKt + t ,

t ~ iid (0, 2 ).

Conditional mean (population regression function)

E ( yt | x1t ,..., xKt ) = 0 + 1 x1t +

OLS estimation
( 0 ,

, K ) minimises

+ K xKt

( yt 0 1 x1t
t =1

K xKt ) 2

T
1
2
=
e

t .
T K 1 t =1
2

BF-02

11

A review of linear regression models

Sample regression function and residuals
y t = 0 + 1 x1t +

+ K xKt ,

et = yt y t

eg. xyz.dat
y t = 0 + 1 xt + 2 zt ,

0 = 9.88, 1 = 1.07, 2 = 0.64.

When we say
regress y on [1, x1, x2],
we mean the OLS
estimation of
yt = 0 + 1x1t + 2x2t + t .

BF-02

12

Ch.2 Brief Review

Preparation to use EViews

Unzip it into F:\BF

1) Create a folder in your USB, say, F:\BF

3) Launch EViews, select Options, General Options, File
Locations, set Current Data Path as F:\BF , OK
4) In EViews, click File, Open, Foreign data as Workfile,
xyz.dat (in File name), Open, Finish
5) A workfile is created and you will see Workfile: XYZ
window.
6) Read: Help, User Guide I, Part I, Chapter 2

BF-02

13

Ch.2 Brief Review

1) Create a Workfile (Clicks/keys)
File, New, Workfile, Dated-regular frequency, integer date (in Frequency),
1 (in Start date), 48 (in End date), OK Steps 1) and 2) are an alternative to Step 4) of
2) Read data file (in Workfile window) page 13. They are necessary for seasonality.
Proc, Import, Read, xyz.dat (in File name), Open,
x y z (seperated by a space in Names for series), OK
3) Find summary statistics (type hist y)
Quick, Series Statistics, Histogram and Stats, y (in Series name), OK
4) Generate new series (type genr w=x+y+z)
Quick, Generate Series, w = x+y+z (in Enter equation), OK
5) Graph data (type scat x y)
Quick, Graph, x y (seperated by a space in List of series), OK, Scatter (in
specific), Regression Line (in Fit lines), OK
6) OLS estimation (type ls y c x z)
Quick, Estimate Equation, y c x z (in Equation specification), OK
BF-02

14

Ch.2 Brief Review

This top panel can
be used to carry
out commands
directly.
e.g. Type plot x y z
and press Enter.

Residual plot and

many test statistics
may be viewed in
Press Stats to
see this table
again.
BF-02

15

Y vs. X

16

Y vs. Z

12

13

13

12

12

11

11

10

10

4
0
-4
5

10

15

20

30

25

35

40

45

14

14

9
8

-3

BF-02

-2

-1

-3

-2

-1

16

A review of linear regression models

Distribution of OLS estimator
is approximately normal with mean being true parameter
and standard deviation (std error) being reported.
eg. Std error for 1 is estimated as se(1) = 0.150341
Test the null hypothesis 1 = 1.
(1 -1)/se(1 ) = 0.486. The null cannot be rejected.

Approximate 95% confidence

interval for :
CI = 2 se( )
eg.
CI 1.07 2(0.15) = [0.77, 1.37].
BF-02

17

A review of linear regression models

t-statistic and probability value (p-value)
t = / se( ),

p - value = Prob(| t |>| t | ),

t ~ Student' s t .

p-value = prob of making Type-I error (rejecting true H0)

when you reject H0: = 0
eg. t-stat for 2 : -3.699; p-value for 2 : 0.0006

Sum of squared residuals (useful to test restrictions)

T

SSR = et2 ,
t =1

et = yt 0 1 xt 2 zt ,

OLS minimizes SSR to estimate parameters.

eg. SSR = 76.56223
BF-02

18

A review of linear regression models

Mean of the dependent variable (eg. 10.08241)
1 T
y = yt
T t =1

1 T
SD =
( yt y ) 2

T 1 t =1

BF-02

19

A review of linear regression models

S.E. of regression (eg. 1.304371)
T
1
2
= =
e
t
T K 1 t =1

K = # of regressors
= model size

T

R2 = 1

e
t =1

2
t

y
y
(
)
t

t =1

t =1
T

y
y
(
)
/(T 1)
t
t =1

the proportion of variation in y

explained by the model
BF-02

R 2 = 1

2
e
t /(T K 1)

the model size (K) penalised in

measuring goodness-of-fit

20

A review of linear regression models

Log likelihood (eg. -79.31472)
likelihood = joint pdf of data (as a function of parameters)
reported value = log likelihood
evaluated at the OLS estimates,
assuming normality for

Durbin-Watson stat (eg. 1.506278)

T

t =2

t =1

DW = (et et 1 ) 2 / et2

for testing if serial correlation (AC) exists in disturbance t.

Roughly, if no auto-correlation, DW 2. Significant AC if DW is
too different from 2. Diebold recommended 1.5 as cutoff.
BF-02

21

A review of linear regression models

Akaike info criterion (eg. 3.429780)
SSR
K +1
AIC = ln(
)+2
T
T

selecting the model with smallest AIC

Schwarz criterion (eg. 3.546730)
SSR
K +1
SIC = ln(
) + ln(T )
T
T

smaller SSR

complex model
more parameters
BF-02

22

A review of linear regression models

Selecting a model: K*+1 minimises SIC

ln(SSR/T)

ln(T)(K+1)/T

K*+1

BF-02

K+1

23

A review of linear regression models

F-statistic (eg. 27.82752)
for testing joint H0: 1 = 2 = ... = K = 0
(H0: independent variables do not explain/predict y)
F - stat =

(SSR r SSR)/K
SSR/ (T K 1)

~ F ( K , T K 1) distr. under H 0

where SSRr is the SSR under H0.

Prob(F-statistic): Prob(F-RV > F-stat)
the probability of Type-I error (rejecting true H0) when
you reject H0.
eg. 0.000000 = P(F-RV > 27.82752)
BF-02

24

A review of linear regression models

Residual-based specification tests
A regression model is based on assumptions about t
a) independent, identical distribution, t ~ iid (0, 2);
b) independent of xt.
If model correct, residuals et = yt y t approximate t.
For the correct model,
the residuals should be approximately iid (0, 2).
If the residuals are not iid, the model must be mis-specified.
Check the model by checking if the residuals are iid.
eg. Check if there is serial correlation in the residuals.
Check if there is heteroskedasticity in the residuals.
Check if normality is supported by data.
BF-02

25

Ch.2 Brief Review

Summary
What is a linear regression?
What is the key assumption about linear regressions?
What is the method of estimating a linear regression
model?
How do you use EViews (read, plot, summarise data)?
How do you use EViews to estimate linear regression
models?
Do you understand EViews output?
How do you do inference with EViews output?
Why do we use SIC (or AIC) to choose models?
BF-02

26