Anda di halaman 1dari 92

Econometrics

Financial Econometrics
Lecturer:
Dr. Damien Cassells.

Text Books:
Introductory Econometrics for Finance 3rd
Edition.

Lecture Times:
Week beginning November 13th 2017.
Financial Econometrics
Assessment:
Group project 30%.
Exam 70%.

Contact:
Damien.Cassells@dit.ie
Introduction to Financial
Econometrics
Definitions and Data
What is Econometrics?
Econometrics means measurement in
economics.
Econometrics is;
the application of statistical and mathematical
methods to the analysis of economic data, with
a purpose of giving empirical content to
economic theories and verifying or rejecting
them.
What is Econometrics?
Econometrics is the name given to the
study of quantitative tools for analysing
economic or financial data.
Its my experience that economy-tricks
is usually nothing more than a justification
of what the author believed before the
research began.
Financial econometrics tools fundamentally
the same as econometrics.
Brief History of Econometrics
First empirical demand schedule
published in 1699 by Charles Davenant.

First modern statistical demand studies were


made by Rodulfo Enini in 1907.

Econometric Society 1930.


Journal of Econometrica 1933.
Bayesian versus Classical Statistics

The philosophical approach to model-building used here throughout is


based on classical statistics.
This involves postulating a theory and then setting up a model and
collecting data to test that theory.
Based on the results from the model, the theory is supported or refuted.
There is, however, an entirely different approach known as Bayesian
statistics.
Here, the theory and model are developed together.
The researcher starts with an assessment of existing knowledge or beliefs
formulated as probabilities, known as priors.
The priors are combined with the data into a model.

8
Bayesian versus Classical Statistics

The beliefs are then updated after estimating the model to form a set of
posterior probabilities.
Bayesian statistics is a well established and popular approach, although
less so than the classical one.
Some classical researchers are uncomfortable with the Bayesian use of
prior probabilities based on judgment.
If the priors are very strong, a great deal of evidence from the data would
be required to overturn them.
So the researcher would end up with the conclusions that he/she wanted in
the first place!
In the classical case by contrast, judgement is not supposed to enter the
process and thus it is argued to be more objective.
9
Uses of Econometrics
1) Describing Economic and Financial
Reality.

2) Testing hypothesis about economic and


financial theory.

3) Forecasting future economic or finance


activity.
Describing Economic Reality
Demand curve/function (Q):
Q = f(P, Ps, Yd) (1).
Where P is the price of the good, Ps is the
price of another good and Yd is the
disposable income of the consumer.
Equation (1) is a general and theoretical
functional relationship, but can be made
more explicit;
Q = 31.50 - 0.73P + 0.11Ps + 0.23Yd (2).
Hypothesis Testing
Most common use of econometrics is
hypothesis testing.
Hypothesis testing is the evaluation of
alternative theories with quantitative
evidence.
Test equation (2) to see if the good is a
normal good.
First glance?
Check.
Forecasting
Forecasting is the most difficult use of
econometrics.
Attempt to predict what is likely to happen
in the forthcoming economic periods based
on what has happened in the past.
Accuracy of such forecasts depends in large
measures on the degree to which the past is
a good guideline to the future.
Steps Required for Any Type of
Quantitative Research
1) Specifying the models or relationships to
be studied.

2) Collecting the data needed to quantify the


models.

3) Quantifying the models with the data.


Economic, Finance and
Econometric Models
A model is a simplified representation of a
real world process.

Some economists/scientists argue in favour


of simplicity because simple models are
easier to understand, communicate and test
empirically with data.
Economic, Finance and
Econometric Models
Using simple models to explain complex
realities leads to criticism;

1) Model is oversimplified.
2) Assumptions may be unrealistic.

An economic model is a set of assumptions


that approximately describes the behaviour
of an economy or a sector of an economy.
The Models Should Consist
Of..
1) A set of behavioural equations derived
from the economic model. These equations
involve some observed variables and some
disturbances.
2) A statement of whether there are errors
of observation in the observed variables.
3) A specification of the probability
distribution of the disturbances.
Example: Demand Equation
1) Behavioural equation is the following
demand equation:

Qt = + Pt + t (3)

Where Qt (quantity demanded in time t) and


Pt (the price at time t) are observed
variables and t is a disturbance term.
Example: Demand Equation
2) A specification of the probability
distribution of t which says that E(t/Pt) =
0 and that the values of t for the different
observations are independently and
normally distributed with a mean = 0 and a
variance = 2.

3) Test the law of demand.


Aims of Econometrics
1) Formulation of econometric models, that
is, formulation of economic and finance
models in an empirically testable form.

2) Estimation and testing of these models


with observed data.

3) Use of these models for prediction and


policy purposes
Data Types
Experimental data comes from experiments
designed to evaluate a treatment or policy.

Observational data is obtained by observing


actual behaviour outside an experimental
setting.

Data is usually collected using surveys.


Data Types
Whether the data are experimental or
observational, data sets come in three main
types:

1) Cross-sectional data;

2) Time-series data and;

3) Panel data.
Cross-Sectional Data
Data on different entities for a single time
period are called cross-sectional data.

With cross-sectional data, we can learn


about relationships among variables by
studying differences across people, firms or
other economic entities during a single time
period.
Time-Series Data
Time-series data are data for a single entity
collected at multiple time periods and the
data are ordered by time.

By tracking a single entity over time, time-


series data can be used to study the
evolution of variables over time and to
forecast future values of those variables.
Time-Series Data
Macroeconomic data measures phenomena
such as real GDP, interest rates and the
money supply and is collected at specific
points in time.

Financial data measures phenomena such as


change in the stock prices and this data is
collected more frequently.
Panel-Data
Some data sets will have both a time-series
and a cross-sectional component.
Panel data (longitudinal data) are data for
multiple entities in which each entity is
observed at two or more time periods.
Panel data can be used to learn about
economic relationships from the
experiences of many different entities in the
data set and from the evolution over time of
the variables of each entity.
Summary
Cross-sectional data consist of multiple
entities observed at a single time period.

Time-series data consist of a single entity


observed at multiple time periods.

Panel data will consist of multiple entities


where each entity is observed at two or
more time periods.
Qualitative V Quantitative
Quantitative data will have a number
corresponding to each entity surveyed.

Qualitative data arise when choices are


involved and answers are of a yes or
no nature.

Economists will convert qualitative data


into numeric values.
Data Transformation
Raw data obtained with a numeric value for
the period is level data e.g. Wages = 15
per hour.

For much econometric work, the level of


the variable is of primary concern, however
sometimes it is necessary to transform the
level data into growth rates.
Levels Versus Growth Rates
Percentage Change = ((YtYt-1)/Yt-1)X100.
Where;
Yt is the current level of the variable.
Yt-1 is the previous level of the variable.
Calculate the growth rate of GDP in
economy Y given that:
Real GDP 1988 = 110b and;
Real GDP 1989 = 120b.
Working With Data
Graphical Methods;
Time-series graphs.
Histograms.
Scatter plots.
Descriptive statistics;
Mean.
Standard deviation.
Variance.
Covariance.
Correlation coefficient.
Temperature
Temperature
21.5

21

20.5

20

19.5

19

18.5

18
6/20/16 6/21/16 6/22/16 6/23/16 6/24/16 6/25/16 6/26/16 6/27/16

Degress Celsius
FTSE Index
6,400.00

6,300.00

6,200.00

6,100.00

6,000.00

5,900.00

5,800.00
6/20/16 6/21/16 6/22/16 6/23/16 6/24/16 6/25/16 6/26/16 6/27/16

FTSE Index
Find the Average and SD
Date Temperature Date FTSE Index
June 20th 2016 21 June 20th 2016 6,204.00
June 21st 2016 19 June 21st 2016 6,226.55
June 22nd 2016 21 June 22nd 2016 6,216.19
June 23rd 2016 20 June 23rd 2016 6,338.10
June 24th 2016 19 June 24th 2016 6,318.69
June 27th 2016 20 June 27th 2016 5,982.20
FTSE Today
What has happened since Eastenders
(BREXIT).
http://markets.ft.com/data/indices/tearsh
eet/charts?s=FTSE:FSI
http://www.cambridge.org/gb/academic/s
ubjects/sociology/organisational-
sociology/emotions-finance-booms-busts-
and-uncertainty-2nd-
edition?format=PB&isbn=978110763337
Estimation of a Population Mean
Suppose a researcher wants to know the
mean value of Y (Y) in a population such
as the mean earnings of women recently
graduated from college.
Natural way to estimate this mean is to
compute the sample average () from a
sample of n independently and identically
distributed (i.i.d.) observations, Y1Yn.
Y1Yn are i.i.d. if they are collected by
simple random sampling.
Estimation of a Population Mean
Look at estimation of Y and the properties
of as an estimator of Y.

An estimator is a function of a sample of


data to be drawn randomly from a
population. An estimate is the numerical
value of the estimator when it is actually
computed using data from a specific
sample.
Estimation of a Population Mean
An estimator is a random variable because
of randomness in selecting the sample,
while an estimate is a non-random number.

Many possible estimators other than (e.g.


Y1) so what are the desirable characteristics
of the sampling distribution of an estimator?
Unbiasedness
Evaluate an estimator many times over
repeated randomly drawn samples, it would
be reasonable to hope that on average you
would get the right answer.

A desirable property of an estimator is that


the mean of its sampling distribution is Y,
if so the estimator is said to be unbiased.
Unbiasedness: Mathematically
Y = some estimator of Y i.e. ().

Estimator of Y is unbiased if:

E(Y) = Y.

Where E(Y) is the mean of the sampling


distribution of Y, else Y is biased.
Consistency
Desirable property of an estimator Y is
that when the sample size is large, the
uncertainty about the value of Y arising
from random variations in the sample is
very small.
It is desirable property of Y that the
probability that it is within a small interval
of the true value Y approaches 1 as the
sample size increases i.e. Y is consistent.
Variance and Efficiency
Suppose you have two candidate estimators
Y and Y1 , both of which are unbiased,
how do we choose the estimator?
One way is to choose the estimator with the
tightest sampling distribution i.e. pick the
estimator with the smallest variance.
If Var (Y) < Var (Y1), then Y is said to
be more efficient than Y1.
Summary
Let Y be the estimator of Y then:

The bias of Y is E(Y) - Y;


Y is an unbiased estimator of Y if E(Y) =
Y ;
Y is a consistent estimator of Y if .and;
Y is said to be more efficient than Y1 if;
Var (Y) < Var (Y1).
Properties of
How does fare as an estimator of Y when
judged by the criteria of bias, consistency
and efficiency?
Sampling distribution of shows that
E() = Y.
Therefore is an unbiased estimator of Y.
Consistency: the law of large numbers
states that..
Variance?
Hypothesis Testing

Hypothesis Tests and Confidence


Intervals
Hypothesis Testing
Hypothesis testing determines what we can
learn about the real world from a sample.

Almost impossible to prove that a given


hypothesis is correct.

All that can be done is to state that a


particular sample conforms to a particular
hypothesis.
Hypothesis Testing
Cant prove that a given hypothesis is
correct using hypothesis testing, we can
often reject a given hypothesis with a
certain degree of confidence.

In such a case, the researcher concludes that


it is very unlikely the sample result would
have been observed if the hypothesis theory
were correct.
Important Issues in Hypothesis
Testing
1) The specification of the hypothesis to be
tested.

2) The decision rule to use in deciding


whether to reject the hypothesis in question.

3) The kinds of errors that might be


encountered.
Hypothesis Specification
To ensure fairness, the researcher should
specify the hypothesis before the equation is
estimated.
The purpose of prior theoretical work is to
match the hypothesis to the underlying
theory as completely as possible.
In making a hypothesis you must state
carefully what you think is not true and
what you think is true.
Null and Alternative Hypothesis
The null hypothesis (H0:) is typically a
statement of the range of values of the
regression coefficient that would be
expected to occur if the researchers theory
were not correct.
The alternative hypothesis (HA:) is used to
specify the range of values of the coefficient
that would be expected to occur if the
researchers theory were correct.
Examples
If you anticipate a negative coefficient for
an estimator, then the correct null
hypothesis is;
H0: 0. (The values you do not expect).
HA: < 0. (The values you expect to be true).

Test the null that is not significantly


different from zero in either direction;
H0: = 0.
HA: 0.
One and Two Sided Tests
In the second example the alternative
hypothesis has values on both sides of the
null hypothesis: this approach is called a
two-sided test (two-tailed).

The first example was a one sided (one-


tailed) test i.e. the alternative hypothesis is
only one side of the null.
Decision Rules of Hypothesis
Testing
Sample statistic must be calculated that
allows the null hypothesis to be accepted
or rejected, depending on the magnitude of
that sample statistic compared with a pre-
selected critical value found in a set of
tables.

This procedure is referred to as the decision


rule.
Decision Rule
The decision rule is formulated before
regression estimates are obtained.

The range of possible values of ( ) is
divided into two regions, an acceptance
and rejection region, where the terms are
expressed relative to the null hypothesis.
The critical value is a value that divides the
acceptance region from the rejection
region when testing a null hypothesis.
T-Test
The t-test is the test that econometricians
usually use to test hypothesis about
individual regression slope coefficients.
Test of one or more than one coefficient at a
time are typically done with the f-test.
The test accounts for differences in the units
of measurement of the variables and in the
standard deviations of the estimated
coefficients.
T-Statistic
Yi = 0 + 1X1i + 2X2i + i.

Can calculate t-values for each of the


estimated coefficients in the equation.

tk = {K - H0}/S.E.(K). K = 1,2.n.
Where;

K = estimated regression coefficient of the kth variable.
H0 = border value implied by the null hypothesis for
K.
S.E.(K) = estimated standard error of K.
Critical T-Value
Critical t-value (tc) is the value that
distinguishes the acceptance region from
the rejection region.
Critical t-values selected from a set of
tables depending on whether the test is one-
sided or two-sided.
Reject if: ltkl > tc and if tk also has the sign
implied by HA.
Degrees of freedom = (n k 1).
n = sample size.
k = number of independent variables.
P-Values
The P-Value or probability value denotes the
marginal significance level for which the null
hypothesis would still be rejected. It is the
probability, under the null, to find a test statistic
that exceeds the value of the statistic that is
computed from the sample.
If the P-Value is smaller than the significance
level, the null hypothesis is rejected.
If the P-Value < 0.05, the null hypothesis is
rejected at the 5% level, i.e. the result is
significant.
Likewise for P-Value < 0.1 and the 10% level.
Example
1) Set up a null and alternative hypothesis.

2) Choose a level of significance and therefore a


critical t-value.

3) Run the regression and obtain an estimated t-


value.

4) Apply the decision rule.


Example
Y = f(X1, X2, X3) +
Y = sales of cars.
X1= Real disposable income.
X2 = Average car price.
X3 = Number of sports utility vehicles sold.
Let n (sample size) =10.
Expect 1 > 0 and 2 and 3 < 0.
Example
Null hypothesis:
Ho: 1 0.
HA: 1 > 0.

Ho: 2 0.
HA: 2 < 0.

Ho: 3 0.
HA: 3 < 0.

5% significance level: df = (10 3 - 1).


T-critical = 1.94.
Apply Decision Rule
Reject Ho if l2.1l > 1.94 and if 4.91 is
positive. Reject the null.

Reject Ho if l5.6l > 1.94 and if 0.00123 is


negative. Fail to reject the null.

Reject Ho if l-0.1l > 1.94 and if -7.14 is


negative. Fail to reject the null.
Two-Sided Test
Most hypothesis in regression analysis can
be tested with one-sided t-tests, however
two-sided tests required;

1) Two-sided test of whether an estimated


coefficient is significantly different from zero.
2) Two-sided test of whether an estimated
coefficient is significantly different from a non-
zero value.
Example: Demand (Yt) and
Income
Yt = + Incomet + t, where n = 31.
Not sure if will be positive or negative, why?

1) Set up the null hypothesis;


H0: = 0.
HA: 0.

2) Choose a level of significance and therefore a


critical t-value.
Use 5% significance level, df = 31-1-1.
Example: Demand (Yt) and
Income
3) Run the regression and obtain estimated
t-values.

4) Apply the decision rule by comparing the


calculated t-value with the critical t-value.

Reject if H0 l2.37l > 2.05 and if 1.288 0.


Reject the null hypothesis. What can we
infer?
Type I and Type II Errors
There are two kinds of errors we can make
in hypothesis testing.

A type I error is made if a true null


hypothesis is rejected.

A type II error is made if we dont reject a


false null hypothesis.
Choosing a Level of Significance
The level of significance indicates the
probability of observing an estimated t-
value greater than the critical t-value if the
null hypothesis were correct.
Also measures the amount of type I error
implied by a particular critical t-value.
Is it better to have a low level of
significance?
Choosing a Level of Significance
If you lower the level of significance, you
increase the probability of type II error.

Its recommended to use the 5%


significance level.
Confidence Intervals
A confidence interval is a range within which the
true value of an item is likely to fall a specified
percentage of the time.
This percentage is the level of confidence
associated with the level of significance used to
choose the critical t-value in the interval.
Used mainly for forecasting a range of values into
which the forecasted item is likely to fall some
percentage of the time.
Confidence Intervals
For an estimated regression coefficient, the
confidence interval can be calculated using
the two-sided critical t-value and the
standard error of the estimated coefficient.

Confidence Interval = {(tc) x (S.E.[])}.

Where is an estimator, tC is the critical

value and SE( ) is the standard error of the
estimator.
Example
Yt = + 0Nt + 1Pt + 2It + t.

Where:

Yt = demand in time frame t.


Nt = the level of competition in time frame t.
Pt = price in time frame t.
It = average level of income in time frame t.
Example
Setup a 90% confidence interval for 1.

Find t-critical.

Apply formula.
We expect that 90% of the time the true
coefficient will fall between 0.2311 and
0.4783.
Confidence Intervals and
Hypothesis Testing Example
1) All 5 tests are one sided so they have the
same critical value.

At the 5% significance level, with (32-5-1)


degrees of freedom the critical value is 1.71.

Test GDPN:
Ho: 0.
HA: > 0.
Reject as l6.81l > 1.71 and 1.43 is positive
as in HA.
Confidence Intervals and
Hypothesis Testing Example
2) The confidence
interval
equation is;
{( tc )x(S.E.[])}.

The 10% two-sided is the same


as the 5%

one sided critical value so 1.71xSE( )
will be the interval.

3) Yes the important signs were as expected


and statistically significant.
Univartiate Modelling Issues

Forecasting
Stationarity
A time series is covariance stationary when:
1) It exhibits mean reversion;
2) It has a finite variance that is time invariant
and;
3) It has constant variances and covariances.
If a series is non-stationary, the results of
the classical regression analysis are invalid.
Regressions with non-stationary series may
have no meaning (spurious regressions).
Time Series Modelling and
Forecasting
Univariate time series models (forecasting)
are specified so as to model financial
variables using only information contained
in their own past values and past error
terms.
Dynamic models (structural models) which
are multivariate and try to explain changes
in a variable based on past values of other
explanatory variables.
ARIMA Models
Important time series model are the ARIMA
(Autoregressive Integrated Moving
Average).
Simplest time series model is the
autoregressive of order one (AR(1)) model.
Yt = Yt-1 + ut (1).
Assumption is that the time series behavior
of Yt is determined by its previous value.
AR(1) Model
Yt = Yt-1 + ut (1).
For AR(1) model constraint imposed that:
|| < 1 (2).
If (2) holds then stationarity applies.

If || > 1, then Yt gets larger in each period


(explosive series).
AR (1) V Exploding AR(1)

4
2.4E+15
3
2.0E+15

2 1.6E+15

1 1.2E+15

0 8.0E+14

-1 4.0E+14

0.0E+00
-2
-4.0E+14
-3 25 50 75 100 125 150 175 200
100 200 300 400 500
XT
YT
AR (p) Model
A generalization of the AR(1) model is the
AR(p) model; the number in parenthesis
denotes the order of the autoregressive
process and therefore the number of lagged
dependent variables that the model will
have.
For example, the AR(2) model will be an
autoregressive model of order two, and will
have the form:
Yt = 1Yt1 + 2Yt2 + ut (3).
AR (p) Model
Similarly the AR(3) model will be an
autoregressive model of order three, and
will have the form:
Yt = 1Yt1 + 2Yt2 + 3Yt3 + ut (4).
And in general the AR(p) model will be an
autoregressive model of order p, and will
have p lagged terms as in the following:
Yt = 1Yt1 + 2Yt2 + +pYtp + ut (5).
Moving Average Models
The simplest moving average model is that
of order one, or the MA(1) model, which
has the form:
Yt = ut + ut1 (6).
Thus, the implication behind the MA(1)
model is that Yt depends on the value of the
immediate past error, which is known at
time t.
MA(q) Model
The general form of the MA model has the
form:
Yt = ut + 1ut1 + 2ut2++ qutq (7).
ARMA Models
The general form of the ARMA model is an
ARMA(p, q) model of the form:

Yt = 1Yt1 + 2Yt2 + +pYtp + ut+


+1ut1 + 2ut2 + +qutq (8).

or
Yt = iYti + ut + jutj (9).
Integrated Process ARIMA
Models
ARMA models can only be made on time series Yt
that are stationary.
Most economic and financial time series show
trends over time, and so the mean of Yt during one
year will be different from its mean in another
year.
Thus, the mean of most economic and financial
time series is not constant over time, which means
that the series are non-stationary.
In order to avoid this problem, and in order to
induce stationarity, we need to detrend the raw
data through a process called differencing.
Integrated Process ARIMA
Models
The first differences of a series Yt are given by the
equation:
Yt = Yt Yt1 (10).
If, after first differencing, a series is stationary
then the series is also called integrated to order
one, and denoted I(1).
If the series, even after first differencing is not
stationary, then we need to take second differences
by the equation:
Yt = 2Yt = Yt Yt1 (11).
Integrated Process ARIMA
Models
If the series becomes stationary after second
differences, then it is integrated of order two and
denoted by I(2).

And in general if it is stationary after d differences


then it is called I(d).

Thus, we have ARIMA(p,d,q).


Box-Jenkins Approach
In general Box and Jenkins popularized a
three-stage method aimed at selecting an
appropriate ARIMA model for the purpose
of estimating and forecasting a univariate
time series.
The three stages are:
(a) identification,
(b) estimation, and
(c) diagnostic checking
Box-Jenkins Approach
Step 1 Calculate the ACF and PACF of the
raw data, and check whether the series is
stationary or not. If the series are stationary
go to step 3, if not go to step 2.

Step 2 Take the logarithm and the first


differences of the raw data and calculate the
ACF and PACF for the first logarithmic
differenced series.
Box-Jenkins Approach
Step 3 Examine the graphs of the ACF and
PACF and determine which models would
be good starting points.
Step 4 Estimate those models.
Step 5 For each of these estimated models:
(a) check to see if the parameter of the longest
lag is significant. If not, then you probably
have too many parameters, and should
decrease the order of p and/or q.
Box-Jenkins Approach
Step 5 For each of these estimated models:

(b) check the ACF and PACF of the errors. If the


model has at least enough parameters, then all
error ACFs and PACFs will be insignificant.
(c) check the AIC and SBC together with the adj-R2
of the estimated models to detect which model is
the parsimonious one (i.e. the one that minimizes
AIC and SBC and has the highest adj-R2).

Step 6 If changes in the original model are needed,


go back to step 4.

Anda mungkin juga menyukai