Anda di halaman 1dari 67

Econometrics [EM2008]

Lecture 1
Catching up: two-variable relationships

Irene Mammi

irene.mammi@unive.it

Academic Year 2018/2019

1 / 67
outline

I relationships between two variables


I correlations
I probability distributions
I the two-variable linear regression model
I inference in the two-variable linear regression model
I prediction
I further issues

I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapters 1 and 2.

2 / 67
examples of bivariate relationships

Figure 1: saving and income (1)

3 / 67
examples of bivariate relationships (cont.)

Figure 2: saving and income (2)

4 / 67
examples of bivariate relationships (cont.)

Figure 3: natural log of gasoline consumption vs natural log of price per gallon

5 / 67
examples of bivariate relationships (cont.)

Figure 4: natural log of gasoline consumption vs natural log of income per capita

6 / 67
examples of bivariate relationships (cont.)

three main characteristics of the scatter diagrams


I sign of the association: do variables move together in a positive or
negative fashion?
I strength of the association: do variables move together in a positive
or negative fashion?
I linearity of the association: is the general shape linear?

7 / 67
examples of bivariate relationships (cont.)
I data come in the form of n pairs of observations of the form (Xi , Yi ),
i = 1, 2, . . . , n
I when n gets large, we can consider a bivariate frequency distribution

Table 1: distribution of heights and chest circumferences of 5731 Scottish men

chest circumference (inches)


33-35 36-38 39-41 42-44 45+ row totals
64-65 39 331 326 26 0 722
66-67 40 591 1010 170 4 1815
height (inches) 68-69 19 312 1144 488 18 1981
70-71 5 100 479 290 23 897
72-73 0 17 120 153 27 317
column totals 103 1351 3079 1127 72 5732

Table 2: conditional means for the data in table 1

mean of height given chest (inches) 66.31 66.84 67.89 69.16 70.53
mean of chest given height (inches) 38.41 39.19 40.26 40.76 41.8

8 / 67
correlation coefficient

I the correlation coefficient measures the direction and the closeness


of the linear association between two variables
I denote the observations by (Xi , Yi ) with i = 1, . . . , n
I express the data in deviation form from sample means as:

xi = Xi − X yi = Yi − Y

I consider the product xi yi

9 / 67
correlation coefficient (cont.)

Figure 5: coordinates for scatter diagram for paired variables

10 / 67
correlation coefficient (cont.)
I the sign of ∑ni=1 xi yi indicates whether the scatter slopes upward or
downward
I better to express the sum in average terms, giving the sample
covariance:
n n
Cov(X , Y ) = ∑ (Xi − X )(Yi − Y )/n = ∑ xi yi /n
i =1 i =1

I to obtain a measure of association that is invariant with respect to


units of measurement, consider the correlation coefficient r :

Cov(X , Y ) ∑ni=1 xi yi ∑n xi yi
r = p p = q q = i =1
Var(X ) Var(Y ) nsX sY
∑ni=1 xi2 ∑ni=1 yi2

where sX and sY are the standard deviations of X and Y


I the correlation coefficient must lie in the range from -1 to +1

11 / 67
probability models for two variables
discrete bivariate probability distribution

I consider a discrete bivariate probability distribution as in the table


I each cell entry indicates the probability of the joint occurrence of the
associated X , Y values

Table 3: a bivariate probability distribution

Marginal
X1 ··· Xi ··· Xm
probability
Y1 p11 ··· pi1 ··· pm1 p·1
.. .. .. .. ..
. . . . .
Yj p1j ··· pij ··· pmj p·j
.. .. .. .. ..
. . . . .
Yp p1p ··· pip ··· pmp p·p
Marginal
p1 · ··· pi · ··· pm · 1
probability

12 / 67
probability models for two variables (cont.)

I define the probability that X = Xi and Y = Yj as


pij = P (X = Xi , Y = Yi )
I the means for the bivariate distribution are defined by

µX = E(X ) = ∑ pi. Xi and µY = E(Y ) = ∑ p.j Yj


i j

I the variances are defined as

σX2 = var(X ) = E[(X − µX )2 ] = ∑ pi. (Xi − µX )2


i

σY2 = var(Y ) = E[(Y − µY )2 ] = ∑ p.j (Yi − µY )2


j

13 / 67
probability models for two variables (cont.)

I the covariance is

σXY = Cov(X , Y ) = E[(X − µX )(Y − µY )]


= ∑ ∑ pij (Xi − µX )(Yj − µY )
i j

I the population correlation coefficient is defined as


σXY
corr = ρ =
σX σY

14 / 67
probability models for two variables (cont.)
conditional probabilities

I the conditional probability for Y given X is given by


pij
= probability that Y = Yj given that X = Xi
pi.
= P(Yj |Xi )

I the mean of this distribution is the conditional expectation of Y given


X:  
pij
µY |Xi = E(Y |Xi ) = ∑ Yj
j pi.

I the variance of this distribution is a conditional variance


 
pij
σY |Xi = Var(Y |Xi ) = ∑
2
(Yj − µY |Xi )2
j p i.

15 / 67
probability models for two variables (cont.)
the bivariate normal distribution
I most famous distribution for continuous variables is the bivariate
normal
I when X and Y follow a bivariate normal distribution, the probability
density function (pdf) is given by
1√
f (x , y ) = ×
2πσX σY 1−ρ2
  2     2 
1 x − µX x − µX y − µY y − µY
exp − 2(1− ρ2 ) σX − 2ρ σX σY + σY

where x and y stand for the values taken by X and Y and ρ is the
correlation coefficient between X and Y
I integrating over y gives the marginal distribution for X
"  #
1 x − µX 2

1
f (x ) = p exp − , −∞ < x < ∞
2πσx2 2 σX

which is normal with mean µX and standard deviation σX


16 / 67
probability models for two variables (cont.)
I likewise, the marginal distribution of Y is normal with mean µY and
standard deviation σY
I the conditional distribution of Y given X is

f (y |x ) = f (x , y )/f (x )
  y − µ 2 
Y |X
= √ 1 exp − 12
2πσY |X σ
Y |X

I the conditional distribution is also seen to be normal: the conditional


mean is
µY |X = α + βx
where α = µY − βµX and β = ρ σσY
X
I the conditional mean is a linear function of the X variable
I the conditional variance is invariant with X and is given by

σY2 |X = σY2 (1 − ρ2 )

I the condition of constant variance is referred to as homoskedasticity


17 / 67
the two variables linear regression model

I in many bivariate situations, the variables are treated in a symmetrical


way
I economists often have explicit notions, derived from theoretical
models, of causality running from X , say, to Y
I generally f (X , Y ) = f (X ) · f (Y |X ) will be of greater interest than
f (X , Y ) = f (Y ) · f (X |Y )

18 / 67
the two variables linear regression model (cont.)
a conditional model

I assume Y = vacation expenditure and X = income; draw a sample of


n out of the N households in the population
I there is some bivariate distribution of Y and X for all N households
I economic theory would suggest

E(Y |X ) = g (X )

where g (X ) is expected to be an increasing function of X


I if the conditional expectation is linear in X then

E(Y |X ) = α + βX

I for the i th household this expectation gives

E(Y |Xi ) = α + βXi

19 / 67
the two variables linear regression model (cont.)

I define Yi to be the actual vacation expenditure of the i th household


and the disturbance (or error) ui as

ui = Yi − E(Y |Xi ) = Yi − α − βXi

I taking expectations of both sides gives E(ui |Xi ) = 0


I the variance of ui is the variance of the conditional distribution, σY2 |X
I for now assume homoskedasticity
I also assume that disturbances are distributed independently of one
another so that they are pairwise uncorrelated

20 / 67
the two variables linear regression model (cont.)

I collecting this assumptions together gives

E(ui ) = 0 for all i


var(ui ) = E(ui2 ) = σ2 for all i
cov(ui , uj ) = E(ui uj ) = 0 for all i 6= j

I these assumptions are summarized as

the ui are i.i.d.(0, σ2 )

21 / 67
the two variables linear regression model (cont.)

estimates and estimators

I consider the simplest version of the two-variable model

Yi = α + βXi + ui with ui i.i.d.(0, σ2 )

I need to estimate three parameters: α, β and σ2


I an estimator is a formula, method, or recipe for estimating an
unknown population parameter: it is a random variable
I an estimate is the numerical value obtained when the formula is
applied to sample data
I we want to fit a straight line to sample data: Ŷi = a + bXi
I many estimators of the pair a, b may be devised

22 / 67
the two variables linear regression model (cont.)
least-squares estimators

I the dominant estimating principle is that of least squares


I denote the residuals from any fitted straight line by

ei = Yi − Ŷi = Yi − a − bXi i = 1, 2, . . . , n

I each pair of a, b values defines a different line and hence a different


set of residuals
I the residual sum of squares is a function of a and b
I the least-square principle is

select a, b to minimize the residual sum of squares


RSS = ∑ ei2 = f (a, b )

23 / 67
the two variables linear regression model (cont.)

Figure 6: residuals from a fitted straight line

24 / 67
the two variables linear regression model (cont.)
I taking derivatives of RSS with respect to a and b and setting them to
zero gives
∂(∑i ei2 )
∂a = −2 ∑i (Yi − a − bXi ) = −2 ∑i ei = 0
∂(∑i ei2 )
∂b = −2 ∑i Xi (Yi − a − bXi ) = −2 ∑i Xi ei = 0

I the normal equations for the linear regression of Y on X are

∑i Yi = na + b ∑i Xi
∑i Xi Yi = a ∑i Xi + b ∑i Xi2

I the least-squares estimators are

a = Y − bX

∑i (Xi − X )(Yi − Y ) ∑ xi yi s cov(Xi , Yi )


b= = i 2 =r Y =
∑i (Xi − X ) 2 ∑ i xi sX var(Xi )

25 / 67
the two variables linear regression model (cont.)

the least-squares line has important properties:


I it minimizes SSR
I passes through the mean point (X , Y )
I ∑i ei = 0 in the sample
I cov(ei , Xi ) = 0 in the sample

the error variance σ2 cannot be estimated from a sample of u values since


they are unobservable, but an estimate can be based on the residuals ei
I an unbiased estimator of σ2 is

∑i ei2
s2 =
(n − 2)

26 / 67
the two variables linear regression model (cont.)

decomposition of the sum of squares

I the value of Yi can be decomposed as

Yi = Ybi + ei

I subtract Y from both sides of the previous equation and get

Yi − Y = (Ybi − Y ) + ei

I by squaring and then summing both sides we get

∑(Yi − Y )2 = ∑(Ybi − Y )2 + ∑ ei2


that gives a decomposition of the total sample variation (total sum
of squares) into explained and unexplained components

27 / 67
the two variables linear regression model (cont.)
I ∑(Yi − Y )2 = TSS: total sum of squared deviations in Y
I ∑(Ybi − Y )2 =ESS: explained sum of squares from the regression of
Y on X
I ∑ ei 2 = RSS: residual, or unexplained, sum of squares from the
regression of Y on X
I the previous decomposition can be rewritten as

TSS = ESS + RSS

I the coefficient of determination R 2 measures the proportion of


variation in Y explained by X within the regression model

ESS RSS
R2 = = 1−
TSS TSS
I the closer R 2 to 1, the closer the sample values of Yi to the fitted line
I let r be the sample correlation coefficient then R 2 = r 2

28 / 67
inference in the two variables least-square model

properties of LS estimators

I focus on the sampling distribution of the LS estimators


I the parameters of interest are α, β and σ2 of the conditional
distribution f (Y |X )
I the only source of variation is variation in the stochastic disturbance u
I we assume X to be nonstochastic (fixed regressor case)
I the LS slope estimator can be written as

b= ∑ wi Yi
i

Xi −X
where wi = so that the LS slope estimator is linear in the
∑ i ( Xi − X ) 2
Y values

29 / 67
inference in the two variables least-square model (cont.)
I by substituting Yi = α + βXi + ui and using the stochastic properties
of u we have

b = α (∑i wi ) + β (∑i wi Xi ) + ∑i wi ui
= β + ∑ i w i ui

from which
E(b ) = β
that is, b is an unbiased estimator of β
I the variance is
 !2 
var(b ) = E[(b − β)2 ] = E  ∑ wi ui 
i

which reduces to
σ2
var(b ) =
∑i (Xi − X )2

30 / 67
inference in the two variables least-square model (cont.)

I similarly, it can be shown that

E(a ) = α

2
" #
2 1 X
var(a ) = σ +
n ∑i (Xi − X )2
I the covariance of the two estimators is

σ2 X
cov(a, b ) = −
∑i (Xi − X )2

31 / 67
inference in the two variables least-square model (cont.)
Gauss-Markov theorem

I the sampling variances of the LS estimators are the smallest that can
be achieved by any linear unbiased estimator
I looking at estimators of β, let

b∗ = ∑ ci Yi
i

denote any arbitrary linear unbiased estimator of β


I it can be shown that

var(b ∗ ) = var(b ) + σ2 ∑(ci − wi )2


i

I since ∑i (ci − wi )2 ≥ 0, var(b ∗ ) ≥ var(b )


I the LS estimator has minimum variance in the class of linear unbiased
estimators and is the Best Linear Unbiased Estimator (BLUE)

32 / 67
inference in the two variables least-square model (cont.)
inference procedures
I up to now results only require the assumption that ui are i.i.d.(0, σ2 )
I inference also requires the assumption of normality
I since linear combination of normal variables are themselves normally
distributed, the sampling distribution of a, b is bivariate normal
I thus
b ∼ N ( β, σ2 / ∑ (Xi − X )2 )
I the standard deviation of the sampling distribution is referred to as
the standard error of b and denoted by se(b )
I the sampling distribution of the intercept term is
2
" !#
2 1 X
a ∼ N α, σ +
n ∑i (Xi − X )2

I if σ2 were known, a 95% confidence interval for β would be


r
b ± 1.96σ/ ∑(Xi − X )2
i

33 / 67
inference in the two variables least-square model (cont.)
I we would also have
b−β
z= q ∼ N (0, 1)
σ/ ∑i (Xi − X )2

I a test of the hypothesis H0 : β = β 0 is carried out by computing

b − β0 b − β0
q =
se(b )
σ/ ∑i (Xi − X )2

I when σ2 is unknown, need further results:

∑i ei2
∼ χ2 (n − 2)
σ2

∑ ei2 is distributed independently of f (a, b )


i

34 / 67
inference in the two variables least-square model (cont.)
I we have
b−β
q ∼ t (n − 2)
s/ ∑i (Xi − X )2

where s 2 = ∑i ei2 /(n − 2)


I a 95% confidence interval for β is
r
b ± t0.025 s/ ∑(Xi − X )2
i

I H0 : β = β 0 would be rejected if



q b − β 0
> t0.025 (n − 2)

s/ ∑i (Xi − X )2

35 / 67
prediction in the two variables regression model
I point prediction is given by the regression value corresponding to X0

Ŷ0 = a + bX0 = Y + bx0

where x0 = X0 − X
I the true value of Y for the prediction period or observation is

Y0 = α + βX0 + u0

I the average value of Y taken over the n sample observations is

Y = α + βX + u

I subtracting gives
Y0 = Y + βx0 + u0 − u
I the prediction error is defined as

e0 = Y0 − Ŷ0 = −(b − β)x0 + u0 − u


36 / 67
prediction in the two variables regression model (cont.)
I the expected prediction error is zero so Ŷ0 is a linear unbiased
predictor of Y0
I the variance of e0 is

∑ x02
 
1
var(e0 ) = σ2 1 + +
n ∑i (Xi − X )2

I e0 is a linear combination of normally distributed variables (b, u0 , u)


so that it is also normally distributed
e0
q ∼ N (0, 1)
σ 1 + 1/n + x02 / ∑(Xi − X )2

I replacing σ2 by s 2 gives

Y0 − Y
b0
q ∼ t (n − 2)
s 1 + 1/n + (X0 − X )2 / ∑(Xi − X )2

37 / 67
prediction in the two variables regression model (cont.)
I everything known except Y0 so a 95% confidence interval for Y0 is
s
1 (X0 − X )2
(a + bX0 ) ± t(0.025) s 1 + +
n ∑(Xi − X )2

I there is an element of uncertainty in predicting Y0 due to the random


drawing u0
I focus interest on the prediction of the mean value of Y0

E(Y0 ) = α + βX0

which allows to eliminate u0 from the prediction error


I a 95% confidence interval for E(Y0 ) is
s
1 (X0 − X )2
(a + bX0 ) ± t(0.025) s +
n ∑(Xi − X )2

I nb: the width of the confidence interval increases the further X0 is


from the sample mean X
38 / 67
stochastic properties of disturbance ui : summary

I recall the specific assumptions

E(Y |X ) = α + βX
E(ui ) = 0 for all i
E(ui2 ) = σ2 for all i
E(ui uj ) = 0 for all i 6= j

I from the homoskedasticity and the fixed regressor assumptions it


follows that
E(Xi uj ) = Xi E(uj ) = 0 for all i, j
I adding the assumption of normality gives

the ui are i.i.d. N (0, σ2 )

39 / 67
time as a regressor
I many economic variables increase or decrease with time
I a linear trend relationship would be modeled as

Y = α + βT + u

where T indicates time


I standard LS estimators can be used to estimate β
I taking first difference of the linear trend equation gives

∆Yt = β + (ut − ut −1 )

I ignoring disturbances, the series increases (decreases) by a constant


amount each period
I for an increasing series (β > 0) the growth rate is decreasing, for a
decreasing series (β < 0) the growth rate is increasing
I an appropriate specification for a series with a constant growth rate
expresses the logarithm of the series as a linear function of time
40 / 67
time as a regressor (cont.)
I without disturbances a constant growth series is given by

Yt = Y0 (1 + g )t

where g = (Yt − Yt −1 )/Yt −1 is the constant proportionate rate of


growth per period
I taking logs gives
lnYt = α + βt
where α = lnY0 and β = ln(1 + g )
I the β coefficient represents the continuous rate of change ∂lnYt /∂t,
whereas g represents the discrete rate
I formulating a constant growth series in continuous time gives

Yt = Y0 e βt or lnYt = α + βt
I taking fist differences gives

∆lnYt = β = ln(1 + g ) ≈ g

where the approximation is accurate for small values of g


41 / 67
transformations of variables

log-log transformation

I constant elasticity function

Yt = AX β or lnYt = α + βlnX
dY X
I β represents the elasticity of Y with respect to X , ε = dX Y

semilog transformation

I constant growth equation

lnY = α + βX + u

I β = Y1 dY
dX represents the proportionate change in Y per unit
change in X

42 / 67
lagged dependent variable as regressor

I when two variables display trends, successive values tend to be close


together
I can model such behavior by means of an autoregression, e. g. an
AR(1)
Yt = α + βYt −1 + ut
I the LS estimators for the AR(1) equation are

∑ Yt = na + b ∑ Yt −1
∑ Yt Yt −1 = a ∑ Yt −1 + b ∑ Yt2−1

I properties of the LS estimators and inference procedures are not


strictly applicable
I the fixed regressor assumption is violated

43 / 67
lagged dependent variable as regressor (cont.)
I by repeated substitution obtain

Y1 = α + βY0 + u1
Y2 = α + β(α + βY0 + u1 ) + u2
= α(1 + β) + β2 Y0 + (u2 + βu1 )

and, hence, the general equation

Yt = α(1 + β + β2 + . . . + βt −1 )
+ βt Y0 + (ut + βut −1 + β2 ut −2 + . . . + βt −1 u1 )

I multiply successively by ut , ut −1 , ut −2 , . . . and take expectations to


get

E(Yt ut ) = σ2
E(Yt ut −1 ) = βσ2
E(Yt ut −2 ) = β2 σ2
44 / 67
lagged dependent variable as regressor (cont.)

I Yt is correlated with current and all previous disturbances but


uncorrelated with all future disturbances; Yt −1 is uncorrelated with
current disturbance ut and all future disturbances but is correlated
with all previous disturbances
I nb: the zero covariances assumption does not hold when the regressor
is a lagged value of the dependent variable

45 / 67
intro to asymptotics

I recall the distribution of the mean of a random sample


I suppose X is a random variable with some unknown pdf which has
finite mean µ and finite variance σ2
I n values are drawn independently from the distribution
I sample mean x n is a random variable with pdf f (x n )
I question: how do a random variable such as x n and its pdf behave as
n → ∞?

46 / 67
intro to asymptotics (cont.)

convergence in probability

I the x’s are i.i.d.(µ, σ2 ) from which

σ2
E(x n ) = µ and var(x n ) =
n
so that x n is an unbiased estimator and the variance tends to zero as
n increases
I the distribution of x n becomes more and more concentrated in the
neighborhood of µ as n increases
I define µ ± e to be a neighborhood around µ and

P {µ − e < x n < µ + e} = P {|x n − µ| < e}

the probability that x n lies in the specific interval

47 / 67
intro to asymptotics (cont.)

I since var(x n ) declines monotonically with increasing n, there exists a


n∗ and a δ (0 < δ < 1) such that ∀n > n∗

P {|x n − µ| < e} > 1 − δ

I x n is said to converge in probability to µ


I equivalent statement is

lim P {|x n − µ| < e} = 1


n→∞

I shorthand expression
plim x n = µ
I the sample mean is a consistent estimator of µ

48 / 67
intro to asymptotics (cont.)
convergence in distribution

I the form of the distribution of x n is unknown; however, it collapses on


µ since the variance goes to zero in the limit

I consider n(x n − µ)/σ which has zero mean and unit variance
I central limit theorem states
√  Z y
n (x n − µ ) 1 2
lim P ≤y = √ e −z /2 dz
n→∞ σ −∞ 2π
I whatever the form of f (x ), the limiting distribution of

n(x n − µ)/σ is standard normal
I this process is defined convergence in distribution and can be
expressed as
√ d √
nx n → N ( nµ, σ2 )
I the objective is to use x n to make inference about µ

49 / 67
intro to asymptotics (cont.)

I can do that by taking the limiting normal distribution as an


approximation for the unknown distribution of x n

σ2
 
a
xn ∼ N µ,
n

I the unknown σ2 can be replaced by the sample variance, which will


be a consistent estimate

50 / 67
intro to asymptotics (cont.)
autoregressive equation

I consider again the AR(1) model

Yt = α + βYt −1 + ut

I it may be estimated by the LS estimators a and b obtained from

∑ Yt = na + b ∑ Yt −1
∑ Yt Yt −1 = a ∑ Yt −1 + b ∑ Yt2−1
√ √
I it can be proved that n(a − α) and n(b − β) have a bivariate
normal limiting distribution with zero means and finite variances and
covariances
I thus LS estimators are consistent for α and β
I the application of LS formulae to the AR model has an asymptotic, or
large-sample, justification

51 / 67
intro to asymptotics (cont.)

I however, two assumptions required

(1) the ut are i.i.d. with zero mean and finite variance
(2) the Yi series is stationary

52 / 67
stationary and nonstationary series
I consider again the AR(1) model

Yt = α + βYt −1 + ut

I make again the assumptions about the disturbance u

E(ui ) = 0 for all i


E(ui2 ) = σ2 for all i
E(ui uj ) = 0 for all i 6= j

which define a white noise series


I the equation

Yt = α(1 + β + β2 + . . . + βt −1 )
+ βt Y0 + (ut + βut −1 + β2 ut −2 + . . . + βt −1 u1 )

shows Yt as a function of α, β, Y0 , and the current and previous


disturbances.
53 / 67
stationary and nonstationary series (cont.)

I assume that the process started a very long time ago so that can
rewrite

Yt = α(1 + β + β2 + . . .) + (ut + βut −1 + β2 ut −2 + . . .)

I the stochastic properties of the Y series are determined by the


stochastic properties of the u series
I taking expectations of both sides gives

E(Yt ) = α(1 + β + β2 + . . .)

which only exists if the infinite geometric series on the RHS has a limit
I the necessary and sufficient condition is

| β| < 1

54 / 67
stationary and nonstationary series (cont.)
I the expectation is then
α
E(Yt ) = µ =
1−β

so that the Y series has a constant unconditional mean µ at all


points in time
I to otain the variance write

(Yt − µ) = ut + βut −1 + β2 ut −2 + . . .

I square both sides and take expectations

var(Yt ) = E[(Yt − µ)2 ]


= E[ut2 + β2 ut2−1 + β4 ut2−2 + . . . + 2βut ut −1 + 2β2 ut ut −2 + . . .]

I the white noise assumptions imply

σ2
var(Y ) = σY2 =
1 − β2
55 / 67
stationary and nonstationary series (cont.)
I the Y series has a constant unconditional variance, independent of
time
I define autocovariance: covariance of Y with a lagged value of itself
I the first-order (first-lag) autocovariance is defined as

γ1 = E[(Yt − µ)(Yt −1 − µ)]


= βσY2

I the second-order autocovariance is

γ2 = E[(Yt − µ)(Yt −2 − µ)]


= β2 σY2

I in general the s th -order autocovariance is

γs = βs σY2 s = 0, 1, 2, . . .

56 / 67
stationary and nonstationary series (cont.)

I nb: the autocovariances depend only on the lag length and are
independent of t
I γ0 is the variance: dividing the covariances by the variance gives the
set of autocorrelation coefficients (or serial correlation
coefficients) defined by

ρs = γs /γ0 s = 0, 1, 2, . . .

I plotting the autocorrelation coefficients against the lag lengths gives


the correlogram of the series
I summing up: when | β| < 1 the mean, the variance, and covariances
of the Y series are constant, independent of time ⇒ the Yt series is
said to be weakly or covariance stationary

57 / 67
stationary and nonstationary series (cont.)

Figure 7: correlogram of an AR(1) series

58 / 67
stationary and nonstationary series (cont.)
unit root

I when β = 1 the AR(1) process is said to have a unit root


I the model becomes
Yt = α + Yt −1 + ut
which is called random walk with drift
I the conditional expectation is

E(Yt |Y0 ) = αt + Y0

which increases or decreases without limit as t increases


I the conditional variance is

var (Yt |Y0 ) = E[(Yt − E(Yt |Y0 ))2 ]


= E[(ut + ut −1 + . . . + u1 )2 ]
= tσ2

which increases without limit


59 / 67
stationary and nonstationary series (cont.)

I in the unit root case, the conditional mean and variance of Y do not
exist ⇒ the series is said to be nonstationary, and the asymptotic
results do not hold
I when | β| > 1, the series exhibits explosive behavior

60 / 67
stationary and nonstationary series (cont.)

Figure 8: a stationary AR(1) series and a random walk

61 / 67
stationary and nonstationary series (cont.)

Figure 9: an explosive series

62 / 67
maximum likelihood estimation of the AR model

maximum likelihood estimators

I if some assumptions are made about the specific form of the pdf for
u, it is possible to derive maximum likelihood estimators (MLEs)
of the parameters of the AR model
I MLEs are consistent, asymptotically normal and asymptotically
efficient
I assume that the disturbances ui are i.i.d. N (0, σ2 ) so that the pdf is

1 2 2
f (ui ) = √ e −ui /2σ i = 1, 2, . . . , n
σ 2π
I arbitrary initial value Y0 ; any observed set of sample values
Y1 , Y2 , . . . , Yn is generated by some set of u values

63 / 67
maximum likelihood estimation of the AR model (cont.)
I the probability of a set of u values is

P(u1 , u2 , . . . , un ) = f (u1 )f (u2 ) · . . . · f (un )


n
= ∏ f ( ut )
t =1
1 n 2 2
= 2 n/2
e − ∑t =1 (ut /2σ )
(2πσ )
I the joint density of the Y values conditional on y0 is then
" #
1 1 n
P(Y1 , Y2 , . . . , Yn ) = exp − 2 ∑ (Yt − α − βYt −1 )2
(2πσ2 )n/2 2σ t =1

I this density may be interpreted in two ways: (1) for given α, β, and
σ2 it indicates the probability of a set of sample outcomes; (2) it is a
function of α, β, and σ2 , conditional on a set of sample outcomes

64 / 67
maximum likelihood estimation of the AR model (cont.)
I for interpretation (2), we refer to the density as likelihood function:

likelihood function = L(α, β, σ2 ; Y )

I maximizing the likelihood with respect to the three parameters gives


specific values α̂, β̂ and σ̂2 , which maximize the probability of
obtaining the sample values that have actually been observed: these
are the maximum likelihood estimators of the parameters of the AR
model
I the MLEs estimators solve
∂L ∂L ∂L
= = 2 =0
∂α ∂β ∂σ
I often it is simple to maximize the logarithm of the likelihood function

` = lnL

65 / 67
maximum likelihood estimation of the AR model (cont.)
I since ` is a monotonic transformation of L, the MLEs may equally be
obtained by solving

∂` ∂` ∂`
= = 2 =0
∂α ∂β ∂σ
I for the AR model, the log-likelihood (conditional on Y0 ) is
n
n n 1
` = − ln(2π ) − lnσ2 − 2
2 2 2σ ∑ (Yt − α − βYt −1 )2
t =1

I the α̂, β̂ values that maximize L are those that minimize


∑nt=1 (Yt − α − βYt −1 )2 ⇒ in this case, the LS and ML estimates of
α and β are identical
I the ML estimator of σ2 is
n
1
σ̂2 =
n ∑ (Yt − α̂ − β̂Yt −1 )2
t =1

66 / 67
maximum likelihood estimation of the AR model (cont.)

properties of MLEs

(1) consistency: MLEs are consistent, thus yield consistent estimates of


α, β and σ2
(2) asymptotic normality: the estimators α̂, β̂ and σ̂2 have
asymptotically normal distributions centered at the true parameter
values; asymptotic variances are derived from the information matrix
(more on this later)
(3) asymptotic efficiency: no other consistent and asymptotically
normal estimator can have a smaller asymptotic variance

67 / 67

Anda mungkin juga menyukai