Anda di halaman 1dari 26

Business Forecasting

ECON2209
Slides 02

Lecturer: Minxian Yang

BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

Lecture Plan
A review of probability theory
A review of linear regressions

Simple linear regression and scatter plot


Multiple linear regression
Ordinary least squares
Statistical inference in linear regressions
Model selection

EViews
Quick start (more will be introduced gradually)
Understand EViews output

BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

Cumulative Distribution

0.0

A RV is a numerical description of the

0.2

0.4

Random variable (RV) X:

0.6

P(X<=x)

0.8

1.0

A review of probability theory

outcomes of a random experiment

-4

-2

eg. Tomorrow BHP share price; Outcome of the next election

Probability distribution
cumulative distribution (cdf): F(x) = Prob(X x)
probability density (pdf):
f(x) = derivative of F(x)

Expectation operation E[g(X)]


It is the average
weighted by
pdf/pmf.

g ( x) f( x)dx

E[ g ( X )] =

for continuous RV

g ( x) P( X = x)

.
for discrete RV

all x

BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of probability theory


Mean and variance of a RV
= E(X)
: measure of centre location; (a point prediction)
2 = E[(X)2] : measure of dispersion.

Joint probability distribution of 2 RVs


joint cdf:
joint pdf:

FXY(x,y) = P(X x, Y y)
fXY(x,y) = partial derivative of FXY(x,y).

Marginal probability distributions


obtained by letting y (or x) go to infinity in FXY(x,y),
FX(x) = FXY(x,) or FY(y) = FXY(,y).

BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of probability theory


Independence
Two RVs X and Y are independent if
FXY(x,y) = FX(x)FY(y)
or
fXY(x,y) = fX(x)fY(y).
When independent, E(XY) = E(X)E(Y) .

Correlation between 2 RVs


covariance: Cov(X, Y) = E[(XX)(YY)]
correlation: = Cov(X,Y)/(XY)

Conditional distribution of Y given X = x


The conditional pdf is fY|X(y|x) = fXY(x,y)/fX(x).
When independent, Conditional pdf = Marginal pdf.
BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of probability theory


Conditional expectation
Given X, the conditional expectation of Y, E(Y|X),
is the average of Y weighted by the conditional pdf.
A forecasting model is built on historic data or
observed information set.
The conditional (on the data) distribution of future Y,
which utilises available information, is the object we
aim to understand.
X = history up to now (ie, data)
Y = future value of interest.
Then, the conditional distribution
pdf(Y|X) is all we need for forecasting Y.
BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of linear regression models


Simple linear regression model
yt = 0 + 1 xt + t .

A dependent variable yt is linearly explained by an


independent variable xt (regressor), subject to a
random disturbance t .
Scatter plot (y vs x)
Y vs

14
13
12

eg. xyz.dat
Y
6.9105
8.9035
13.0322
12.1195
9.1661
9.3989

Z
-1.7184
1.6342
-0.7165
1.1775
0.1522
-0.9960

10

X
-1.3044
0.0268
0.9392
0.9108
0.6078
-1.4191

11

9
8
7
6
5

...

BF-02

-3

-2

-1

my, School of Economics, UNSW

Ch.2 Brief Review

A review of linear regression models


Assumptions about the disturbance (error term) t
t ~ iid (0, 2): identically independently distributed (iid)
with mean 0 and variance 2;
yt = 0 + 1 xt + t
b) t is independent of regressor xt .
a)

These imply
E(t | xt) = 0 ;
t and xt are uncorrelated, ie, Cov(t , xt ) = 0;
E(yt | xt) = 0 + 1xt . (population regression function)

If the parameters (0, 1, 2) are know, y may be


predicted by the regression function for any given x.
BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of linear regression models


Ordinary least squared (OLS) estimation
( 0 , 1 ) minimises

(y
t =1

0 1 xt ) 2

Sum of
squared residuals
(SSR)

T
T
1
1
( yt 0 1 xt ) 2 =
et2 .
2 =

T 2 t =1 sample variance of residuals T 2 t =1

standard error of es2mate - unbiased es2mator of the variance of the error variable - square root of the
above

Fitted value and residual


y t = 0 + 1 xt ,

et = yt y t = yt 0 1 xt .

fitted value = sample regression function


= in-sample forecast
BF-02

my, School of Economics, UNSW

Ch.2 Brief Review

A review of linear regression models


eg. xyz.dat
y t = 0 + 1 xt ,

0 = 9.959319, 1 = 0.949749.
14
12

Residual plot:

10
8

4
0
-2
-4
5

10

15

20

Residual

BF-02

25

30

Actual

35

40
Fitted

my, School of Economics, UNSW

45

X
-1.3044055855
0.0268445360
0.9391732559
0.9108356314
0.6078185429
-1.4190856454
1.6450399496
-1.5705456746
1.1725971823
-0.0001469589
-1.9102214403
0.4106547298
-0.2633816925
2.9898141231
-1.1719854886
1.8628263016
1.1237612050
-1.8801405887
0.3094047477
-1.9164071601
0.0443377940
2.1231233412
-0.1525597566
-0.0302621863
-0.5267709581
1.1087466899
-0.7870372302
0.1217747790
-0.6272752055
3.0619892478
1.0277405158
2.2274783060
-0.0633318382
-0.9195725833
-0.4661417205
1.3575193910
1.1639896889
0.4570019947
-1.6182578476
-2.8131818585
-0.1329263293
-0.5127844249
1.3569336398
-0.8396330238
0.6548109587
-0.2817164164
0.7701143610
-0.0454431404

Y
6.910518
8.903510
13.032210
12.119516
9.166077
9.398871
10.861943
6.900604
8.557908
10.384578
8.951113
10.502956
9.012794
13.172481
7.294597
13.475142
11.766178
6.552416
11.078862
11.977582
9.175949
10.706387
8.633693
8.111663
7.630387
10.874706
11.471748
10.913112
8.379696
11.959233
10.103590
11.681388
9.716037
8.469919
9.167781
10.512578
10.094872
12.046379
9.805910
5.811298
11.314932
10.729959
13.846940
10.350224
10.837194
7.928579
10.755102
12.906690

Z
-1.71836069
1.63420482
-0.71649462
1.17746037
0.15218715
-0.99595505
1.95295568
-0.29073338
-1.35916970
-0.36981642
-0.53368660
0.01555975
0.39533785
0.62545839
1.05345240
-2.20815358
-0.54770754
1.50752962
-1.85750582
-2.60940240
0.07977844
0.70568532
1.29592881
-0.25520209
-1.46954656
0.56564230
-0.29023287
-1.09615388
-0.47673444
1.22468138
1.03023708
-0.12631043
-1.02449478
1.50979638
0.19944663
1.80739691
0.56168127
-0.69538078
-1.20063045
0.41839468
-1.72367774
-1.09610913
-1.07509952
0.23852197
1.16375597
0.50112855
0.62021869
-1.10814255

10

Ch.2 Brief Review

A review of linear regression models


Multiple linear regression:
y is explained by more than one x variable.
yt = 0 + 1 x1t +

+ K xKt + t ,

t ~ iid (0, 2 ).

Conditional mean (population regression function)


E ( yt | x1t ,..., xKt ) = 0 + 1 x1t +

OLS estimation
( 0 ,

, K ) minimises

+ K xKt

( yt 0 1 x1t
t =1

K xKt ) 2

T
1
2
=
e

t .
T K 1 t =1
2

sample residual variance

BF-02

my, School of Economics, UNSW

11

Ch.2 Brief Review

A review of linear regression models


Sample regression function and residuals
y t = 0 + 1 x1t +

+ K xKt ,

et = yt y t

eg. xyz.dat
y t = 0 + 1 xt + 2 zt ,

0 = 9.88, 1 = 1.07, 2 = 0.64.


When we say
regress y on [1, x1, x2],
we mean the OLS
estimation of
yt = 0 + 1x1t + 2x2t + t .

BF-02

my, School of Economics, UNSW

12

Ch.2 Brief Review

Quick start with EViews


Preparation to use EViews

Download bfData13.zip
Unzip it into F:\BF

1) Create a folder in your USB, say, F:\BF


2) Download xyz.dat and save it in F:\BF
3) Launch EViews, select Options, General Options, File
Locations, set Current Data Path as F:\BF , OK
4) In EViews, click File, Open, Foreign data as Workfile,
xyz.dat (in File name), Open, Finish
5) A workfile is created and you will see Workfile: XYZ
window.
6) Read: Help, User Guide I, Part I, Chapter 2

BF-02

my, School of Economics, UNSW

13

Ch.2 Brief Review

Quick start with Eviews (Menu-driven approach)


1) Create a Workfile (Clicks/keys)
File, New, Workfile, Dated-regular frequency, integer date (in Frequency),
1 (in Start date), 48 (in End date), OK Steps 1) and 2) are an alternative to Step 4) of
2) Read data file (in Workfile window) page 13. They are necessary for seasonality.
Proc, Import, Read, xyz.dat (in File name), Open,
x y z (seperated by a space in Names for series), OK
3) Find summary statistics (type hist y)
Quick, Series Statistics, Histogram and Stats, y (in Series name), OK
4) Generate new series (type genr w=x+y+z)
Quick, Generate Series, w = x+y+z (in Enter equation), OK
5) Graph data (type scat x y)
Quick, Graph, x y (seperated by a space in List of series), OK, Scatter (in
specific), Regression Line (in Fit lines), OK
6) OLS estimation (type ls y c x z)
Quick, Estimate Equation, y c x z (in Equation specification), OK
BF-02

my, School of Economics, UNSW

14

Ch.2 Brief Review


This top panel can
be used to carry
out commands
directly.
e.g. Type plot x y z
and press Enter.

Residual plot and


many test statistics
may be viewed in
the View menu.
Press Stats to
see this table
again.
BF-02

my, School of Economics, UNSW

15

Ch.2 Brief Review


Y vs. X

16

Y vs. Z

12

13

13

12

12

11

11

10

10

4
0
-4
5

10

15

20

30

25

35

40

45

14

14

9
8

-3

BF-02

-2

-1

my, School of Economics, UNSW

-3

-2

-1

16

( )/sd( ) ~ t(T K 1) N(0,1)

Ch.2 Brief Review

A review of linear regression models


Distribution of OLS estimator
is approximately normal with mean being true parameter
and standard deviation (std error) being reported.
eg. Std error for 1 is estimated as se(1) = 0.150341
Test the null hypothesis 1 = 1.
(1 -1)/se(1 ) = 0.486. The null cannot be rejected.

Approximate 95% confidence


interval for :
CI = 2 se( )
eg.
CI 1.07 2(0.15) = [0.77, 1.37].
BF-02

my, School of Economics, UNSW

17

Ch.2 Brief Review

A review of linear regression models


t-statistic and probability value (p-value)
t = / se( ),

p - value = Prob(| t |>| t | ),

t ~ Student' s t .

p-value = prob of making Type-I error (rejecting true H0)


when you reject H0: = 0
eg. t-stat for 2 : -3.699; p-value for 2 : 0.0006

Sum of squared residuals (useful to test restrictions)


T

SSR = et2 ,
t =1

et = yt 0 1 xt 2 zt ,

OLS minimizes SSR to estimate parameters.


eg. SSR = 76.56223
BF-02

my, School of Economics, UNSW

18

Ch.2 Brief Review

A review of linear regression models


Mean of the dependent variable (eg. 10.08241)
1 T
y = yt
T t =1

sample mean of y; central location measure

S.D. of the dependent variable (eg. 1.908842)


1 T
SD =
( yt y ) 2

T 1 t =1

sample SD of y; dispersion measure

BF-02

my, School of Economics, UNSW

19

Ch.2 Brief Review

A review of linear regression models


S.E. of regression (eg. 1.304371)
T
1
2
= =
e
t
T K 1 t =1

K = # of regressors
= model size

measures the dispersion of error term .

R-squared and Adjusted R-squared (eg. 0.553, 0.533)


T

R2 = 1

e
t =1

2
t

y
y
(
)
t

t =1

t =1
T

y
y
(
)
/(T 1)
t
t =1

the proportion of variation in y


explained by the model
BF-02

R 2 = 1

2
e
t /(T K 1)

the model size (K) penalised in


measuring goodness-of-fit

my, School of Economics, UNSW

20

Ch.2 Brief Review

A review of linear regression models


Log likelihood (eg. -79.31472)
likelihood = joint pdf of data (as a function of parameters)
reported value = log likelihood
evaluated at the OLS estimates,
assuming normality for

Durbin-Watson stat (eg. 1.506278)


T

t =2

t =1

DW = (et et 1 ) 2 / et2

for testing if serial correlation (AC) exists in disturbance t.


Roughly, if no auto-correlation, DW 2. Significant AC if DW is
too different from 2. Diebold recommended 1.5 as cutoff.
BF-02

my, School of Economics, UNSW

21

Ch.2 Brief Review

A review of linear regression models


Akaike info criterion (eg. 3.429780)
SSR
K +1
AIC = ln(
)+2
T
T

selecting the model with smallest AIC


Schwarz criterion (eg. 3.546730)
SSR
K +1
SIC = ln(
) + ln(T )
T
T

selecting the model with smallest SIC


Trade-off in model selection

smaller SSR

complex model
more parameters
BF-02

my, School of Economics, UNSW

22

Ch.2 Brief Review

A review of linear regression models


Selecting a model: K*+1 minimises SIC

SIC = ln(SSR/T)+ ln(T)(K+1)/T

ln(SSR/T)

ln(T)(K+1)/T

K*+1

BF-02

my, School of Economics, UNSW

K+1

23

Ch.2 Brief Review

A review of linear regression models


F-statistic (eg. 27.82752)
for testing joint H0: 1 = 2 = ... = K = 0
(H0: independent variables do not explain/predict y)
F - stat =

(SSR r SSR)/K
SSR/ (T K 1)

~ F ( K , T K 1) distr. under H 0

where SSRr is the SSR under H0.


Prob(F-statistic): Prob(F-RV > F-stat)
the probability of Type-I error (rejecting true H0) when
you reject H0.
eg. 0.000000 = P(F-RV > 27.82752)
BF-02

my, School of Economics, UNSW

24

Ch.2 Brief Review

A review of linear regression models


Residual-based specification tests
A regression model is based on assumptions about t
a) independent, identical distribution, t ~ iid (0, 2);
b) independent of xt.
If model correct, residuals et = yt y t approximate t.
For the correct model,
the residuals should be approximately iid (0, 2).
If the residuals are not iid, the model must be mis-specified.
Check the model by checking if the residuals are iid.
eg. Check if there is serial correlation in the residuals.
Check if there is heteroskedasticity in the residuals.
Check if normality is supported by data.
BF-02

my, School of Economics, UNSW

25

Ch.2 Brief Review

Summary
What is a linear regression?
What is the key assumption about linear regressions?
What is the method of estimating a linear regression
model?
How do you use EViews (read, plot, summarise data)?
How do you use EViews to estimate linear regression
models?
Do you understand EViews output?
How do you do inference with EViews output?
Why do we use SIC (or AIC) to choose models?
BF-02

my, School of Economics, UNSW

26