Classical Regression Model

5 The Classical Model: Ordinary Least Squares
5.1 Introduction and Basic Assumptions
• We have developed the simple regression model in which we included only an intercept term and
one right-hand side variable.
• Oftentimes we would think that this is rather naive because we are not able to explain much of
the variation in Y and we may have theoretical justification for including other variables.
• We have hinted already at the idea that we would include more than one right-hand side variable
in a model. Indeed, we often have any number of regressors on the right hand side.
• To this end, we develop the OLS model with more than one rhs variable. To make the notation
easier, we will use matrix notation.
• We note that the econometric model must be linear in parameters:
yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ²i
Y = Xβ + ²
• We assume that y = [N × 1], X = [N × k] and includes a constant term (represented by a column
of ones), β = [k × 1] and ² = [N × 1].
• We can only include observations with fully defined values for each Y and X.
• The Full Ideal Conditions:
1. Model is linear in parameters
2. Explanatory variables (X) are fixed in repeated samples (non-stochastic)
3. X has rank of k where k < N .
4. ²i are independent and identically distributed.
5. All error terms have a zero mean: E[²i ] = 0.
6. All error terms have a constant variance: E[²²0 ] = σ 2 I.
64
7. #6 implies that the error terms have no covariance: E[²i ²j ] = 0 ∀ i 6= j.
• The linear estimator for β, denoted β̂ is found by minimizing the sum of squared errors over β
where
N
X N
X
SSE = ²2i = (y − βX)2
i=1 i=1
or in matrix notation
SSE = ²0 ² = (y − Xβ)0 (y − Xβ).
5.2 Aside: Matrix Differentiation
• We know that ²0 ² = y 0 y − 2β 0 X 0 y + β 0 X 0 X 0 β.
• The second term is clearly linear in β since X 0 y is a k-element vector of known scalars, whereas
the third term is a quadratic in β.
• Looking at the linear term, one could write the term as
f (β) = a0 β = a1 β1 + a2 β2 + · · · + ak βk = β 0 a
where a = X 0 y.
• Taking partial derivatives with respect to each of the βi and arranging the results in a column
vector yields
 
a1
 a2 
 
0
∂(a β) ∂(β a) 
0
a 
= =
3 =a

∂β ∂β  . 
 . 
ak
• For the linear term in the SSE, it immediately follows that
∂(2β 0 X 0 y)
= 2X 0 y
∂β
• The quadratic term can be rewritten as β 0 Aβ where the matrix A is of known constants, i.e.,
65
X 0 X. We can write this as
  
· ¸  a11 a12 a13   β1 
  
f (β) = β1 β2 β3  a21 a22 a23
 β 
 2 
  
a31 a32 a33 β3
= a11 β12 + a22 β22 + a33 β32 + 2a12 β1 β2 + 2a13 β1 β3 + 2a23 β2 β3
• The vector of partial derivatives is then written as

   
∂f
 ∂β1   2(a11 β1 + a12 β2 + a13 β3 ) 
∂f (β)    
=

∂f  =  2(a β + a β + a β )
  12 1 22 2 23 3
 = 2Aβ

∂β  ∂β2
  
∂f
∂β3 2(a13 β1 + a23 β2 + a33 β3 )
• This result holds for any symmetric quadratic form, that is
∂(β 0 Aβ)
= 2Aβ
∂β
for any symmetric A. From our SSE, we have A = X 0 X and substituting we obtain
∂(β 0 X 0 Xβ)
= 2(X 0 X)β.
∂β
5.3 Derivation of the OLS Estimator
• The minimization of the SSE leads to the first-order necessary conditions:
∂SSE
= −2X 0 y + 2X 0 Xβ = 0
∂β
• This is a matrix version of the simple regression model. There is one fonc for every parameter to
be estimated.
• We solve these k first order conditions by dividing by 2, taking X 0 y to the right-hand side and
solving for β.
66
• Unfortunately, we cannot divide when it comes to matrices, but we do have the matrix analogue
to division, the inverse matrix.
• Pre-multiply both sides of the equation by (X 0 X)−1 to obtain
(X 0 X)−1 (X 0 X)β = (X 0 X)−1 X 0 y
• The first two matrices on the left hand side cancel each other out to become the identity matrix,
a la A−1 A = I, which can be suppressed so that the estimator for β, denoted β̂ is
β̂ = (X 0 X)−1 X 0 y
• Note that the matrix-notation version of β̂ is very analogous to the scalar version derived in the
simple regression model.
P
• (X 0 X)−1 is the matrix analogue to the denominator of the simple regression estimator β̂, x2i .
P
• Likewise X 0 y is the matrix analogue to the numerator of the simple regression estimator β̂, xi yi .
• Remember that β̂ is a vector of estimated parameters, not a scalar.
• We look again at the first two moments of β̂ in matrix form: E[β̂] and var(β̂).
• The expectation of β̂: E[β̂] = E[(X 0 X)−1 X 0 y] where y = Xβ + ². Therefore,
E[β̂] = E[(X 0 X)−1 X 0 (Xβ + ²)]
= E[β + (X 0 X)−1 X 0 ²]: but β is a constant so,
E[β̂] = β + (X 0 X)−1 X 0 E[²]: but E[²] = 0 so,
E[β̂] = β + 0
E[β̂] = β
67
• The cov(β̂) is found by taking the E[(β̂ − β)(β̂ − β)0 ]. This leads to the following:
cov(β̂) = E[(β + (X 0 X)−1 X 0 ² − β)(β + (X 0 X)−1 X 0 ² − β)0 ]
= E[((X 0 X)−1 X 0 ²)(X 0 X)−1 X 0 ²)0 ]
= E[((X 0 X)−1 X 0 ²²0 X(X 0 X)−1 ]
= (X 0 X)−1 X 0 E[²²0 ]X(X 0 X)−1
= (X 0 X)−1 X 0 σ 2 IX(X 0 X)−1
= σ 2 (X 0 X)−1 XX(X 0 X)−1
cov(β̂) = σ 2 (X 0 X)−1
• How do we get an estimate of σ 2 ?
• We use the fitted residuals ²̂ and adjust for the appropriate degrees of freedom:
· ¸
²̂0 ²̂
σ̂ 2 =
N −k
where k is the number of right-hand side variables (including the constant term).
• Note: X 0 ²̂ = X 0 [Y − X β̂] = X 0 Y − X 0 X β̂ = X 0 Y − X 0 X(X 0 X)−1 X 0 Y = X 0 Y − X 0 Y = 0
5.4 The Properties of the OLS Estimator
• Having shown that E[β̂] = β and cov(β̂) = σ 2 (X 0 X)−1 we move to prove the Gauss-Markov
Theorem.
• The Gauss-Markov Theorem states that β̂ is BLUE or Best Linear Unbiased Estimator. Our
estimator is the ”best” because it has the minimum variance of all linear unbiased estimators.
• The proof is relatively straight forward.
• Consider another linear estimator β̃ = C 0 y where C is some [N × k] matrix.
68
• For E[β̃] = β it must be true that
E[β̃] = E[C 0 y] = E[C 0 (Xβ + ²)]
= E[C 0 Xβ + C 0 ²]
= C 0 Xβ
Thus, for E[β̃] = β it must be true that C 0 X = I.
• Now consider the following lemmas:
Lemma: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]y
Proof: β̃ = C 0 y, thus
β̃ = C 0 y + (X 0 X)−1 X 0 y − (X 0 X)−1 X 0 y
= β̂ + [C 0 − (X 0 X)−1 X 0 ]y
(1)
Lemma: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]²
Proof: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]y, thus
β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ][Xβ + ²]
= β̂ + C 0 Xβ − β + [C 0 − (X 0 X)−1 X 0 ]²
but C 0 X = I from before, so that
= β̂ + [C 0 − (X 0 X)−1 X 0 ]²
• With these two lemmas we can continue to prove the Gauss-Markov theorem. We have determined
that both β̂ and β̃ are unbiased. Now, we must prove that cov(β̂) ≤ cov(β̃).
cov(β̃) = E[(β̃ − E[β̃])(β̃ − E[β̃])0 ]
69
Now, take advantage of our lemmas and that β̃ is unbiased to obtain
cov(β̃) = E[(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)0 ]
= E[((X 0 X)−1 X 0 ² + [·]²)((X 0 X)−1 X 0 ² + [·]²)0 ] : [·] = [C 0 − (X 0 X)−1 X 0 ]
= σ 2 (X 0 X)−1 + σ 2 [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0
• The matrix [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0 is non-negative semi-definite. This is the matrix
analogue to saying greater than or equal to zero.
• Thus, cov(β̂) ≤ cov(β̃) and β̂ is BLUE.
• Is β̂ consistent?
• Assume
1
lim (X 0 X) = Qxx which is nonsingular
N →∞ N
• Theorem: plimβ̂ = β.
• Proof: β̂ is asymptotically unbiased. That is, limN →∞ E[β̂] = β.
Rewrite the cov(β̂) as

µ ¶−1
σ2 1
cov(β̂) = σ 2 (X 0 X)−1 = (X 0 X)
N N
Then
σ 2 −1
lim cov(β̂) = lim Q =0
N →∞ N xx
which implies that the covariance matrix of β̂ collapses to zero which then implies
plimβ̂ = β
5.5 Multiple Regression Example: The Price of Gasoline
• Some express concern that there might be price manipulation in the retail gasoline market. To
see if this is true, monthly price, tax, and cost data were gathered from the Energy Information
Agency (www.eia.gov) and the Tax Foundation (www.taxfoundation.org).
70
• Here is a time plot of the retail and wholesale price of gasoline (U.S. Average)
250
200
150
100
50
0 50 100 150 200 250

obs
gasprice wprice
• Here are the results of a multiple regression analysis:
. reg allgradesprice fedtax avestatetax wholesaleprice obs
Source | SS df MS Number of obs = 264

---------+------------------------------ F(4, 259) = 8917.25
Model | 408989.997 4 102247.499 Prob > F = 0.0000
Residual | 2969.76256 259 11.4662647 R-squared = 0.9928
---------+------------------------------ Adj R-squared = 0.9927
Total | 411959.76 263 1566.38692 Root MSE = 3.3862
------------------------------------------------------------------
gasprice | Coef. Std. Err. t P>|t| [95% Conf. Int]
--------------+---------------------------------------------------
fedtax | 1.268 .159 7.94 0.000 .953 1.583
avestatetax | .725 .203 3.57 0.000 .325 1.125
wholesaleprie | 1.091 .011 92.62 0.000 1.068 1.115
trend | .033 .009 3.62 0.000 .015 .051
_cons | 5.281 2.698 1.96 0.051 -.031 10.594
-----------------------------------------------------------------
• The dependent variable is measured in pennies per gallon, as are all independent variables.
• The results suggest:
1. For every penny in federal tax, the retail gasoline price increases by 1.268 pennies.
71
2. For every penny in state sales tax the price increases by only 0.725 cents.
3. For every penny in wholesale price, the retail price increases by 1.091 pennies.
4. The time trend, which advances by one unit for every month starting in January 1985,
indicates that the average real price of gasoline increases by about 0.03 cents per gallon per
month, everything else equal.
5. The multiple regression results do not suggest a tremendous amount of pricing power on the
part of retail outlets.
6. The R2 is very high; approximately 99.2% of the variation in retail gasoline prices are
explained by the variables included in the model (although it should be noted that the data
are time-series in nature and therefore a high R2 is expected).
7. To return to the conspiracy theory that prices are actively manipulated by retailers, the
95% confidence interval of the wholesale price parameter is [1.068, 1.115]. At the maximum,
the historical pre-tax markup on marginal cost at the retail level is approximately 11.5%,
which is consistent with the rest of the retail sector.
8. One other conclusion is that while wholesale price increases are associated with retail price
decreases, it is also true that wholesale price decreases are associated with retail price de-
creases. Or is this conclusion too strong for the given estimation?
• What if we defined a dummy variable that took a value of one when the wholesale price of gasoline
declined from one month to the next and included that as an additional regressor. If the retail
market reacts symmetrically to increases and decreases in wholesale price changes, this dummy
variable should have an insignificant parameter.
. tsset obs
time variable: obs, 1 to 264
. gen wpdown = [wholesaleprice<l.wholesaleprice]
. sum wpdown
Variable | Obs Mean Std. Dev. Min Max

-------------+-----------------------------------------------
wpdown | 264 .4659091 .4997839 0 1
72
. reg allgradesprice fedtax avestatetax wprice wpdown obs
------------------------------------------------
gasprice | Coef. Std. Err. t P>|t|
-------------+----------------------------------
fedtax | 1.328 .132 10.00 0.000
avestatetax | .741 .168 4.39 0.000
wprice | 1.101 .009 112.04 0.000
wpdown | 3.777 .348 10.84 0.000
obs | .027 .007 3.63 0.000
_cons | 2.245 2.258 0.99 0.321
-------------------------------------------------
• The historical data suggest that the price of gasoline is 3.777 cents higher in months when the
wholesale price declines. This suggests that there is an asymmetric effect of wholesale price
changes on retail prices of gasoline.
5.6 Multiple Regression Example: Software Piracy and Economic Freedom
• Many in information providing industries are anxious about software piracy. Many policy sug-
gestions have been made and the industry is pursuing legal remedies against individuals.
• However, there might be economic influences on the prevalence of software piracy. Bezmen
and Depken (2006, Economics Letters) looks at the impact of various socio-economic factors on
estimated software piracy rates in the United States from 1999, 2000, and 2001.
• Consider the simple regression model in which piracy is related to per capita income (measured
in thousands):
. reg piracy lninc
Regression with robust standard errors Number of obs = 150

F( 1, 148) = 38.48
Prob > F = 0.0000
R-squared = 0.2045
Root MSE = 8.3097
-----------------------------------------------------------------
Robust
Piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
73
--------+--------------------------------------------------------
lninc | -24.39808 3.932914 -6.20 0.000 -32.17 -16.62616
_cons | 114.1777 13.55488 8.42 0.000 87.39163 140.9638
-----------------------------------------------------------------
• As expected, states with greater income levels have lower levels of software piracy.
• What if we include other factors such as Economic Freedom (from the Fraser Institute), the level
of taxation (from the Tax Foundatin), unemployment (from the Bureau of Labor Statistics), and
two dummy variables to control for year 2000 and year 2001:
. reg piracy sfindex statetax lninc unemp yr00 yr01
Regression with robust standard errors Number of obs = 150

F( 6, 143) = 18.87
Prob > F = 0.0000
R-squared = 0.2887
Root MSE = 7.994
-------------------------------------------------------------------
| Robust
piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+-----------------------------------------------------
sfindex | -2.652959 1.331309 -1.99 0.048 -5.284546 -.0213714
statetax | -1.914663 .6571426 -2.91 0.004 -3.213632 -.6156949
lninc | -24.83959 3.248853 -7.65 0.000 -31.26157 -18.41761
unemp | .2433874 .7928428 0.31 0.759 -1.323819 1.810594
yr00 | 1.568842 1.390505 1.13 0.261 -1.179758 4.317442
yr01 | 3.611868 1.611408 2.24 0.027 .4266101 6.797126
_cons | 150.7459 18.60083 8.10 0.000 113.9778 187.514
-------------------------------------------------------------------
• The parameter estimate on income did not change very much.
1. A 1% increase in income corresponds with a reduction in software piracy of 24%.
2. States with greater economic freedom tend to pirate less.
3. States with greater taxation (which might proxy for enforcement efforts) tend to pirate less.
4. States with greater unemployment experience do not pirate more.
5. The parameter on yr01 suggests that piracy was greater in 2001 than in 2000 or 1999.
• The upshot: software piracy seems to be an inferior good.
74

Classical Regression Model

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Classical Regression Model

Diunggah oleh

Hak Cipta:

Format Tersedia

5 The Classical Model: Ordinary Least Squares

5.1 Introduction and Basic Assumptions

one right-hand side variable.

easier, we will use matrix notation.

• We note that the econometric model must be linear in parameters:

yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ²i

• We assume that y = [N × 1], X = [N × k] and includes a constant term (represented by a column

of ones), β = [k × 1] and ² = [N × 1].

• The Full Ideal Conditions:

1. Model is linear in parameters

2. Explanatory variables (X) are fixed in repeated samples (non-stochastic)

3. X has rank of k where k < N .

4. ²i are independent and identically distributed.

5. All error terms have a zero mean: E[²i ] = 0.

6. All error terms have a constant variance: E[²²0 ] = σ 2 I.

SSE = ²0 ² = (y − Xβ)0 (y − Xβ).

5.2 Aside: Matrix Differentiation

the third term is a quadratic in β.

• Looking at the linear term, one could write the term as

• For the linear term in the SSE, it immediately follows that

• The vector of partial derivatives is then written as

• This result holds for any symmetric quadratic form, that is

5.3 Derivation of the OLS Estimator

• The minimization of the SSE leads to the first-order necessary conditions:

to division, the inverse matrix.

• Pre-multiply both sides of the equation by (X 0 X)−1 to obtain

(X 0 X)−1 (X 0 X)β = (X 0 X)−1 X 0 y

a la A−1 A = I, which can be suppressed so that the estimator for β, denoted β̂ is

simple regression model.

• Remember that β̂ is a vector of estimated parameters, not a scalar.

• The expectation of β̂: E[β̂] = E[(X 0 X)−1 X 0 y] where y = Xβ + ². Therefore,

E[β̂] = E[(X 0 X)−1 X 0 (Xβ + ²)]

= E[β + (X 0 X)−1 X 0 ²]: but β is a constant so,

E[β̂] = β + (X 0 X)−1 X 0 E[²]: but E[²] = 0 so,

cov(β̂) = E[(β + (X 0 X)−1 X 0 ² − β)(β + (X 0 X)−1 X 0 ² − β)0 ]

= E[((X 0 X)−1 X 0 ²)(X 0 X)−1 X 0 ²)0 ]

= E[((X 0 X)−1 X 0 ²²0 X(X 0 X)−1 ]

= (X 0 X)−1 X 0 E[²²0 ]X(X 0 X)−1

= (X 0 X)−1 X 0 σ 2 IX(X 0 X)−1

= σ 2 (X 0 X)−1 XX(X 0 X)−1

• How do we get an estimate of σ 2 ?

• Note: X 0 ²̂ = X 0 [Y − X β̂] = X 0 Y − X 0 X β̂ = X 0 Y − X 0 X(X 0 X)−1 X 0 Y = X 0 Y − X 0 Y = 0

5.4 The Properties of the OLS Estimator

• The proof is relatively straight forward.

• Consider another linear estimator β̃ = C 0 y where C is some [N × k] matrix.

E[β̃] = E[C 0 y] = E[C 0 (Xβ + ²)]

Thus, for E[β̃] = β it must be true that C 0 X = I.

• Now consider the following lemmas:

Proof: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]y, thus

but C 0 X = I from before, so that

cov(β̃) = E[(β̃ − E[β̃])(β̃ − E[β̃])0 ]

cov(β̃) = E[(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)0 ]

= E[((X 0 X)−1 X 0 ² + [·]²)((X 0 X)−1 X 0 ² + [·]²)0 ] : [·] = [C 0 − (X 0 X)−1 X 0 ]

= σ 2 (X 0 X)−1 + σ 2 [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0

analogue to saying greater than or equal to zero.

• Thus, cov(β̂) ≤ cov(β̃) and β̂ is BLUE.

• Proof: β̂ is asymptotically unbiased. That is, limN →∞ E[β̂] = β.

Rewrite the cov(β̂) as

5.5 Multiple Regression Example: The Price of Gasoline

Agency (www.eia.gov) and the Tax Foundation (www.taxfoundation.org).

0 50 100 150 200 250

• Here are the results of a multiple regression analysis:

. reg allgradesprice fedtax avestatetax wholesaleprice obs