Anda di halaman 1dari 45

Regression Analysis

Lecture 6: Regression Analysis

MIT 18.S096

Dr. Kempthorne

Fall 2013

MIT 18.S096 Regression Analysis 1


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 2


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Multiple Linear Regression: Setup


Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Examples
Prediction
Causal Inference
Approximation
Functional Relationships
MIT 18.S096 Regression Analysis 3
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

General Linear Model: For each case i, the conditional


distribution [yi | xi ] is given by
yi = ŷi + i
where
ŷi = β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p
β = (β1 , β2 , . . . , βp )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of ŷi (in regression parameters) maintained with non-linear x.
MIT 18.S096 Regression Analysis 4
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Steps for Fitting a Model

(1) Propose a model in terms of


Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of  over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).

MIT 18.S096 Regression Analysis 5


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Assumptions in (1) for Residual Distribution


Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, σ 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 − δ) of the i are i.i.d.
N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some
contamination distribution).

MIT 18.S096 Regression Analysis 6


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Estimator Criterion in (2)


Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume βj are r.v.’s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters β̃j are:
ei = yi − [β̃1 xi,1 + β̃2 xi,2 + · · · + β̃p xi,p ]
Influence diagnostics (identify cases which are highly
‘influential’ ?)
Outlier detection
MIT 18.S096 Regression Analysis 7
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 8


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Ordinary Least Squares Estimates

Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define


PN 2
Q(β) =
PN i=1 [yi − ŷi ]
2
= i=1 [yi − (β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p )]

Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β).

Matrix Notation
   
y1 x1,1 x1,2 · · · x1,p  
 y2   x2,1 x2,2 · · · β1
x2,p 
.. 
y= . X= . β=
    
.. .. . . 
 ..   .. . . .. 
βp
yn xn,1 xn,2 · · · xp,n

MIT 18.S096 Regression Analysis 9


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate β̂


 
ŷ1
 ŷ2 
ŷ =   = Xβ and
 
..
 . 
ŷn
Pn
Q(β) = i=1 (yi − ŷi )2 = (y − ŷ)T (y − ŷ)

= (y − Xβ)T (y − Xβ)
∂Q(β)
OLS β̂ solves ∂βj =0, j = 1, 2, . . . , p
∂Q(β) ∂ Pn 2

∂βj = ∂βj i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
Pn
= i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
= −2(X[j] )T (y − Xβ) where X[j] is the jth column of X

MIT 18.S096 Regression Analysis 10


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate β̂


 ∂Q
  
∂β1 XT
[1] (y − Xβ)
∂Q
∂Q XT
[2] (y − Xβ)
   
 ∂β2  
 = −2   = −2XT (y − Xβ)

=  .. ..
∂β 
 .



 .


∂Q
∂βp XT
[p] (y − Xβ)
So the OLS Estimate β̂ solves the “Normal Equations”
XT (y − Xβ) = 0
⇐⇒ XT Xβ̂ = XT y
=⇒ β̂ = (XT X)−1 XT y

N.B. For β̂ to exist (uniquely)


(XT X) must be invertible
⇐⇒ X must have Full Column Rank
MIT 18.S096 Regression Analysis 11
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit


OLS Estimate: 
β̂1
βˆ2 T T
β̂ =   = (X X)−1 X y Fitted Values:
 
..
.
β̂p
   
ŷ1 x1,1 β̂1 + · · · + x1,p β̂p
 ŷ2   x2,1 β̂1 + · · · + x2,p β̂p 
yˆ = 
 .. =
 
..


 .   . 
ŷn xn,1 βˆ1 + · · · + xn,p βˆp
= Xβ̂ = X(XT X)−1 XT y = Hy
Where H = X(XT X)−1 XT is the n × n “Hat Matrix”
MIT 18.S096 Regression Analysis 12
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit


The Hat Matrix H projects R n onto the column-space of X

Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n


 
ˆ1
ˆ2
ˆ =   = y − ŷ = (In − H)y
 
..
.
ˆn
 
0
Normal Equations: XT (y − Xβ̂) = XT ˆ = 0p =  ... 
 

0
N.B. The Least-Squares Residuals vector ˆ is orthogonal to the
column space of X
MIT 18.S096 Regression Analysis 13
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 14


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Assumptions


   
y1 x1,1 x1,2 · · · x1,p
 y2   x2,1 x2,2 · · · x2,p 
Data y =   and X =  ..
   
.. .. . . .. 
 .   . . . . 
yn xn,1 xn,2 · · · xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and
E (Y | X, β) = Xβ, where β = (β1 , β2 , . . . βp )T is the
p-vector of regression parameters.
Cov (Y | X, β) = σ 2 In , for some σ 2 > 0.
I.e., the random variables generating the observations are
uncorrelated and have constant variance σ 2 (conditional on X,
and β).
MIT 18.S096 Regression Analysis 15
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 .
Under the Gauss-Markov assumptions, the estimator
θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 ,
where β̂1 , β̂2 , . . . βˆp are the least squares estimates is
1) An Unbiased Estimator of θ
2) A Linear Estimator of θ, that is
θ̂ = ni=1 bi yi , for some known (given X) constants bi .
P

Theorem: Under the Gauss-Markov Assumptions, the estimator


θ̂ has the smallest (Best) variance among all Linear Unbiased
Estimators of θ, i.e., θ̂ is BLUE .
MIT 18.S096 Regression Analysis 16
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Proof


Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of θ = cT β is:
θ̂ = cT β̂ = cT (XT X)−1 XT y ≡ dT y
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of θ:
θ̃ = bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b − d and note that
θ̃ = bT y = (d + f)T y = θ̂ + f T y
If θ̃ is unbiased then because θ̂ is unbiased
0 = E (f T y) = dT E (y) = f T (Xβ) for all β ∈ R p
=⇒ f is orthogonal to column space of X
=⇒ f is orthogonal to d = X(XT X)−1 c
MIT 18.S096 Regression Analysis 17
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

If θ̃ is unbiased then
The orthogonality of f to d implies

Var (θ̃) = Var (bT y) = Var (dT y + f T y)


= Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
= Var (θ̂) + Var (f T y) + 2dT Cov (y)f
= Var (θ̂) + Var (f T y) + 2dT (σ 2 In )f
= Var (θ̂) + Var (f T y) + 2σ 2 dT f
= Var (θ̂) + Var (f T y) + 2σ 2 × 0
≥ Var (θ̂)

MIT 18.S096 Regression Analysis 18


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 19


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized Least Squares (GLS) Estimates


Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = Xβ + 
where the random n-vector : E [] = 0n and E [0 ] = σ 2 Σ.
σ 2 is an unknown scale parameter
Σ is a known (n × n) positive definite matrix specifying the
relative variances and correlations of the component
observations.
1 1
Transform the data (Y, X) to Y∗ = Σ− 2 Y and X∗ = Σ− 2 X and
the model becomes
Y∗ = X∗ β + ∗ , where E [∗ ] = 0n and E [∗ (∗ )0 ] = σ 2 In
By the Gauss-Markov Theorem, the BLUE (‘GLS’) of β is
β̂ = [(X∗ )T (X∗ )]−1 (X∗ )T (Y∗ ) = [XT Σ−1 X]−1 (XT Σ−1 Y)
MIT 18.S096 Regression Analysis 20
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 21


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Normal Linear Regression Models


Distribution Theory
Yi = xi,1 β1 + xi,2 β2 + · · · xi,p βp + i
= µ i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, σ 2 ).
=⇒ [Yi | xi,1 , xi,2 , . . . , xi,p , β, σ 2 ] ∼ N(µi , σ 2 ),
independent over i = 1, 2, . . . n.

Conditioning on X, β, and σ 2  
1
 2 
Y = Xβ + , where  =   ∼ Nn (On , σ 2 In )
 
..
 . 
n

MIT 18.S096 Regression Analysis 22


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Distribution Theory

 
µ1
µ =  ...  = E (Y | X, β, σ 2 ) = Xβ
 

µn

MIT 18.S096 Regression Analysis 23


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

σ2
 
0 0 ··· 0

 0 σ2 0 ··· 0 

Σ = Cov (Y | X, β, σ 2 ) = 
 0 0 σ2 0  = σ 2 In


 ... ... ..
. ... 

0 0 ··· σ2
That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j .

Apply Moment-Generating Functions (MGFs) to derive


Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
Joint distribution of β̂ = (β̂1 , β̂2 , . . . , β̂p )T .

MIT 18.S096 Regression Analysis 24


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of Y
For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T ,
T
MY (t) = E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn )
= E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn )
= MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn )
Qn ti µi + 21 ti2 σ 2
= i =1 e
P n 1 Pn T 1 T
= e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt
=⇒ Y ∼ Nn (µ, Σ)
Multivariate Normal with mean µ and covariance Σ

MIT 18.S096 Regression Analysis 25


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of β̂
For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T ,
T β̂ ˆ ˆ
Mβ̂ (τ ) = E (e τ ) = E (e τ1 β1 +τ2 β2 +···τp βp )

Defining A = (XT X)−1 XT we can express


β̂ = (XT X)−1 XT y = AY
and
T ˆ
Mβˆ (τ ) = E (e τ β )
T
= E (e τ AY )
T
= E (e t Y ), with t = AT τ
= MY (t)
T 1 T
= e t u+ 2 t Σt
MIT 18.S096 Regression Analysis 26
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of β̂
For
T β̂
Mβ̂ (τ ) = E (e τ )
tT u+ 12 tT Σt
= e
Plug in:
t = AT τ = X(XT X)−1 τ
µ = Xβ
Σ = σ 2 In
Gives:
tT µ = τ T β
tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ
= τ T [σ 2 (XT X)−1 ]τ
So the MGF of β̂ is
T 1 T 2 T −1
Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ
ˆ 2
MIT 18.S096 T Regression
−1 Analysis 27
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Marginal Distributions of Least Squares Estimates

Because

β̂ ∼ Np (β, σ 2 (XT X)−1 )

the marginal distribution of each β̂j is:

β̂j ∼ N(βj , σ 2 Cj,j )

where Cj.j = jth diagonal element of (XT X)−1

MIT 18.S096 Regression Analysis 28


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The Q-R Decomposition of X

Consider expressing the (n × p) matrix X of explanatory variables


as
X=Q·R
where
Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p × p) upper-triangular matrix.

The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by


performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]

MIT 18.S096 Regression Analysis 29


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
 
r1,1 r1,2 · · · r1,p−1 r1,p
 0
 r2,2 · · · r2,p−1 r2,p 

If R= 0

0
..
. ... ... , then

 
 0 0 rp−1,p−1 rp−1,p 
0 0 ··· 0 rp,p
X[1] = Q[1] r1,1
=⇒
2
r1,1 = XT
[1] X[1]
Q[1] = X[1] /r1,1

X[2] = Q[1] r1,2 + Q[2] r2,2


=⇒
T T T
Q[1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2
= 1 · r1,2 + 0 · r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.S096 Regression Analysis 30
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With r1,2 and Q[1] specfied we can solve for r2,2 :


=⇒
Q[2] r2,2 = X[2] − Q[1] r1,2

Take squared norm of both sides:


2 = XT X T 2
r2,2 [2] [2] − 2r1,2 Q[1] X[2] + r1,2

(all terms on RHS are known)

With r2,2 specified


=⇒
1
 
Q[2] = r2,2 X[2] − r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)

MIT 18.S096 Regression Analysis 31


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With the Q-R Decomposition


X = QR
(QT Q = Ip , and R is p × p upper-triangular)

β̂ = (XT X)−1 XT y = R−1 QT y


(plug in X = QR and simplify)

Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T

H = X(XT X)−1 XT = QQT


(giving ŷ = Hy and ˆ = (In − H)y)

MIT 18.S096 Regression Analysis 32


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

More Distribution Theory


Assume y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,

 ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where µz = AE (y) = AXβ,
and Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096 Regression Analysis 33
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Theorem For the normal linear regression model


y = Xβ + ,
where
X (n × p) has rank p and
 ∼ Nn (0n , σ 2 In ).
(a) β̂ = (XT X)−1 XT y and ˆ = y − Xβ̂ are independent r.v.s
(b) β̂ ∼ N (β, σ 2 (XT X)−1 )
Pn p2
(c) ˆi = ˆT ˆ ∼ σ 2 χ2n−p (Chi-squared r.v.)
i=1 
(d) For each j = 1, 2, . . . , p
β̂j −βj
ˆtj =
σ̂Cj,j ∼ tn−p (t− distribution)
1
Pn
where σ̂ 2 = n−p i=1  ˆ2i
Cj,j = [(XT X)−1 ]j,j
MIT 18.S096 Regression Analysis 34
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof: Note that (d) follows immediately from (a), (b), (c)

QT
 
Define A = , where
WT
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider  T   
Q y zQ (p × 1)
z = Ay = T =
W y z W (n − p) × 1
MIT 18.S096 Regression Analysis 35
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The distribution of z = Ay is Nn (µz , Σz )


where  T 
Q
µz = [A][Xβ] = [Q · R · β]
WT
 T 
Q Q
= T [R · β]
 W Q 
Ip
= [R · β]
0
 (n−p)×p 
R·β
=
0(n−p)×p
Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In
since AT = A−1

MIT 18.S096 Regression Analysis 36


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
    
zQ Rβ 2
Thus z = ∼ Nn , σ In
zW On−p
=⇒
zQ ∼ Np [(Rβ), σ 2 Ip ]
zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ]
and zQ and zW are independent.
The Theorem follows by showing
(a*) β̂ = R−1 zQ and ˆ = WzW ,
(i.e. β̂ and ˆ are functions of different independent vecctors).
(b*) Deducing the distribution of β̂ = R−1 zQ ,
applying Theorem* with A = R−1 and “y” = zQ
(c*) ˆT ˆ = zW T zW
= sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ).
∼ σ 2 χ(n−p)
2 , a scaled Chi-Squared r.v.

MIT 18.S096 Regression Analysis 37


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip

ˆ = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )


= y − QzQ
= y − QQT y = (In − QQT )y
= WWT y (since In = AT A = QQT + WWT )
= WzW

MIT 18.S096 Regression Analysis 38


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 39


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
 ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Definitions:
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096 Regression Analysis 40
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

2
Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
Because
µi = j=1 βj xi,j ,
Qn
L(β, σ 2 ) = p (yi | β, σ 2 )
Qni=1 h 1 − 1 2 (yi − j=1 βj xi,j )2
P i
= i=1

2
e 2σ
2πσ
1 − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
= (2πσ 2 )n/2
e
The maximum likelihood estimates (β̂, σ̂ 2 ) maximize the
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q (β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate 2
of σ 2 solves
∂log L(β̂ ,σ )
∂(σ 2 )
= 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0
=⇒ σML ˆ
2 = Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!)
P

MIT 18.S096 Regression Analysis 41


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 42


Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized M Estimation
For data y, X fit the linear regression model
y i = xT
i β + i , i = 1, 2, . . . , n.
by specifying β =P β̂ to minimize
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT
i β)
2

(2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xiT β|


(3) Maximum Likelihood (ML): Assume the yi are independent
with pdf’s p(yi | β, xi , σ 2 ),
h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 )
(4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT i β)
χ( ) is even, monotone increasing on (0, ∞).
MIT 18.S096 Regression Analysis 43
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile


τ |yi − xT
i β|, if yi ≥ xi β
h(yi , xi , β, σ 2 ) =
(1 − τ )|yi − xT
i β|, if yi < xi β

E.g., τ = 0.90 corresponds to the 90th quantile /


upper-decile.
τ = 0.50 corresponds to the MAD Estimator

MIT 18.S096 Regression Analysis 44


MIT OpenCourseWare
http://ocw.mit.edu

18.S096 Topics in Mathematics with Applications in Finance


Fall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Anda mungkin juga menyukai