Anda di halaman 1dari 11

5 The Classical Model: Ordinary Least Squares

5.1 Introduction and Basic Assumptions

• We have developed the simple regression model in which we included only an intercept term and

one right-hand side variable.

• Oftentimes we would think that this is rather naive because we are not able to explain much of

the variation in Y and we may have theoretical justification for including other variables.

• We have hinted already at the idea that we would include more than one right-hand side variable

in a model. Indeed, we often have any number of regressors on the right hand side.

• To this end, we develop the OLS model with more than one rhs variable. To make the notation

easier, we will use matrix notation.

• We note that the econometric model must be linear in parameters:

yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ²i

Y = Xβ + ²

• We assume that y = [N × 1], X = [N × k] and includes a constant term (represented by a column

of ones), β = [k × 1] and ² = [N × 1].

• We can only include observations with fully defined values for each Y and X.

• The Full Ideal Conditions:

1. Model is linear in parameters

2. Explanatory variables (X) are fixed in repeated samples (non-stochastic)

3. X has rank of k where k < N .

4. ²i are independent and identically distributed.

5. All error terms have a zero mean: E[²i ] = 0.

6. All error terms have a constant variance: E[²²0 ] = σ 2 I.

64
7. #6 implies that the error terms have no covariance: E[²i ²j ] = 0 ∀ i 6= j.

• The linear estimator for β, denoted β̂ is found by minimizing the sum of squared errors over β

where
N
X N
X
SSE = ²2i = (y − βX)2
i=1 i=1

or in matrix notation

SSE = ²0 ² = (y − Xβ)0 (y − Xβ).

5.2 Aside: Matrix Differentiation

• We know that ²0 ² = y 0 y − 2β 0 X 0 y + β 0 X 0 X 0 β.

• The second term is clearly linear in β since X 0 y is a k-element vector of known scalars, whereas

the third term is a quadratic in β.

• Looking at the linear term, one could write the term as

f (β) = a0 β = a1 β1 + a2 β2 + · · · + ak βk = β 0 a

where a = X 0 y.

• Taking partial derivatives with respect to each of the βi and arranging the results in a column
vector yields
 
a1
 a2 
 
0
∂(a β) ∂(β a) 
0
a 
= =
3 =a

∂β ∂β  . 
 . 
ak

• For the linear term in the SSE, it immediately follows that

∂(2β 0 X 0 y)
= 2X 0 y
∂β

• The quadratic term can be rewritten as β 0 Aβ where the matrix A is of known constants, i.e.,

65
X 0 X. We can write this as
  
· ¸  a11 a12 a13   β1 
  
f (β) = β1 β2 β3  a21 a22 a23
 β 
 2 
  
a31 a32 a33 β3
= a11 β12 + a22 β22 + a33 β32 + 2a12 β1 β2 + 2a13 β1 β3 + 2a23 β2 β3

• The vector of partial derivatives is then written as


   
∂f
 ∂β1   2(a11 β1 + a12 β2 + a13 β3 ) 
∂f (β)    
=

∂f  =  2(a β + a β + a β )
  12 1 22 2 23 3
 = 2Aβ

∂β  ∂β2
  
∂f
∂β3 2(a13 β1 + a23 β2 + a33 β3 )

• This result holds for any symmetric quadratic form, that is

∂(β 0 Aβ)
= 2Aβ
∂β

for any symmetric A. From our SSE, we have A = X 0 X and substituting we obtain

∂(β 0 X 0 Xβ)
= 2(X 0 X)β.
∂β

5.3 Derivation of the OLS Estimator

• The minimization of the SSE leads to the first-order necessary conditions:

∂SSE
= −2X 0 y + 2X 0 Xβ = 0
∂β

• This is a matrix version of the simple regression model. There is one fonc for every parameter to

be estimated.

• We solve these k first order conditions by dividing by 2, taking X 0 y to the right-hand side and

solving for β.

66
• Unfortunately, we cannot divide when it comes to matrices, but we do have the matrix analogue

to division, the inverse matrix.

• Pre-multiply both sides of the equation by (X 0 X)−1 to obtain

(X 0 X)−1 (X 0 X)β = (X 0 X)−1 X 0 y

• The first two matrices on the left hand side cancel each other out to become the identity matrix,

a la A−1 A = I, which can be suppressed so that the estimator for β, denoted β̂ is

β̂ = (X 0 X)−1 X 0 y

• Note that the matrix-notation version of β̂ is very analogous to the scalar version derived in the

simple regression model.

P
• (X 0 X)−1 is the matrix analogue to the denominator of the simple regression estimator β̂, x2i .

P
• Likewise X 0 y is the matrix analogue to the numerator of the simple regression estimator β̂, xi yi .

• Remember that β̂ is a vector of estimated parameters, not a scalar.

• We look again at the first two moments of β̂ in matrix form: E[β̂] and var(β̂).

• The expectation of β̂: E[β̂] = E[(X 0 X)−1 X 0 y] where y = Xβ + ². Therefore,

E[β̂] = E[(X 0 X)−1 X 0 (Xβ + ²)]

= E[β + (X 0 X)−1 X 0 ²]: but β is a constant so,

E[β̂] = β + (X 0 X)−1 X 0 E[²]: but E[²] = 0 so,

E[β̂] = β + 0

E[β̂] = β

67
• The cov(β̂) is found by taking the E[(β̂ − β)(β̂ − β)0 ]. This leads to the following:

cov(β̂) = E[(β + (X 0 X)−1 X 0 ² − β)(β + (X 0 X)−1 X 0 ² − β)0 ]

= E[((X 0 X)−1 X 0 ²)(X 0 X)−1 X 0 ²)0 ]

= E[((X 0 X)−1 X 0 ²²0 X(X 0 X)−1 ]

= (X 0 X)−1 X 0 E[²²0 ]X(X 0 X)−1

= (X 0 X)−1 X 0 σ 2 IX(X 0 X)−1

= σ 2 (X 0 X)−1 XX(X 0 X)−1

cov(β̂) = σ 2 (X 0 X)−1

• How do we get an estimate of σ 2 ?

• We use the fitted residuals ²̂ and adjust for the appropriate degrees of freedom:

· ¸
²̂0 ²̂
σ̂ 2 =
N −k

where k is the number of right-hand side variables (including the constant term).

• Note: X 0 ²̂ = X 0 [Y − X β̂] = X 0 Y − X 0 X β̂ = X 0 Y − X 0 X(X 0 X)−1 X 0 Y = X 0 Y − X 0 Y = 0

5.4 The Properties of the OLS Estimator

• Having shown that E[β̂] = β and cov(β̂) = σ 2 (X 0 X)−1 we move to prove the Gauss-Markov

Theorem.

• The Gauss-Markov Theorem states that β̂ is BLUE or Best Linear Unbiased Estimator. Our

estimator is the ”best” because it has the minimum variance of all linear unbiased estimators.

• The proof is relatively straight forward.

• Consider another linear estimator β̃ = C 0 y where C is some [N × k] matrix.

68
• For E[β̃] = β it must be true that

E[β̃] = E[C 0 y] = E[C 0 (Xβ + ²)]

= E[C 0 Xβ + C 0 ²]

= C 0 Xβ

Thus, for E[β̃] = β it must be true that C 0 X = I.

• Now consider the following lemmas:

Lemma: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]y

Proof: β̃ = C 0 y, thus

β̃ = C 0 y + (X 0 X)−1 X 0 y − (X 0 X)−1 X 0 y

= β̂ + [C 0 − (X 0 X)−1 X 0 ]y

(1)

Lemma: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]²

Proof: β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ]y, thus

β̃ = β̂ + [C 0 − (X 0 X)−1 X 0 ][Xβ + ²]

= β̂ + C 0 Xβ − β + [C 0 − (X 0 X)−1 X 0 ]²

but C 0 X = I from before, so that

= β̂ + [C 0 − (X 0 X)−1 X 0 ]²

• With these two lemmas we can continue to prove the Gauss-Markov theorem. We have determined

that both β̂ and β̃ are unbiased. Now, we must prove that cov(β̂) ≤ cov(β̃).

cov(β̃) = E[(β̃ − E[β̃])(β̃ − E[β̃])0 ]

69
Now, take advantage of our lemmas and that β̃ is unbiased to obtain

cov(β̃) = E[(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)(β̂ + [C 0 − (X 0 X)−1 X 0 ]² − β)0 ]

= E[((X 0 X)−1 X 0 ² + [·]²)((X 0 X)−1 X 0 ² + [·]²)0 ] : [·] = [C 0 − (X 0 X)−1 X 0 ]

= σ 2 (X 0 X)−1 + σ 2 [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0

• The matrix [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0 is non-negative semi-definite. This is the matrix

analogue to saying greater than or equal to zero.

• Thus, cov(β̂) ≤ cov(β̃) and β̂ is BLUE.

• Is β̂ consistent?

• Assume
1
lim (X 0 X) = Qxx which is nonsingular
N →∞ N

• Theorem: plimβ̂ = β.

• Proof: β̂ is asymptotically unbiased. That is, limN →∞ E[β̂] = β.

Rewrite the cov(β̂) as


µ ¶−1
σ2 1
cov(β̂) = σ 2 (X 0 X)−1 = (X 0 X)
N N

Then
σ 2 −1
lim cov(β̂) = lim Q =0
N →∞ N xx

which implies that the covariance matrix of β̂ collapses to zero which then implies

plimβ̂ = β

5.5 Multiple Regression Example: The Price of Gasoline

• Some express concern that there might be price manipulation in the retail gasoline market. To

see if this is true, monthly price, tax, and cost data were gathered from the Energy Information

Agency (www.eia.gov) and the Tax Foundation (www.taxfoundation.org).

70
• Here is a time plot of the retail and wholesale price of gasoline (U.S. Average)

250
200
150
100
50

0 50 100 150 200 250


obs

gasprice wprice

• Here are the results of a multiple regression analysis:

. reg allgradesprice fedtax avestatetax wholesaleprice obs

Source | SS df MS Number of obs = 264


---------+------------------------------ F(4, 259) = 8917.25
Model | 408989.997 4 102247.499 Prob > F = 0.0000
Residual | 2969.76256 259 11.4662647 R-squared = 0.9928
---------+------------------------------ Adj R-squared = 0.9927
Total | 411959.76 263 1566.38692 Root MSE = 3.3862

------------------------------------------------------------------
gasprice | Coef. Std. Err. t P>|t| [95% Conf. Int]
--------------+---------------------------------------------------
fedtax | 1.268 .159 7.94 0.000 .953 1.583
avestatetax | .725 .203 3.57 0.000 .325 1.125
wholesaleprie | 1.091 .011 92.62 0.000 1.068 1.115
trend | .033 .009 3.62 0.000 .015 .051
_cons | 5.281 2.698 1.96 0.051 -.031 10.594
-----------------------------------------------------------------

• The dependent variable is measured in pennies per gallon, as are all independent variables.

• The results suggest:

1. For every penny in federal tax, the retail gasoline price increases by 1.268 pennies.

71
2. For every penny in state sales tax the price increases by only 0.725 cents.

3. For every penny in wholesale price, the retail price increases by 1.091 pennies.

4. The time trend, which advances by one unit for every month starting in January 1985,

indicates that the average real price of gasoline increases by about 0.03 cents per gallon per

month, everything else equal.

5. The multiple regression results do not suggest a tremendous amount of pricing power on the

part of retail outlets.

6. The R2 is very high; approximately 99.2% of the variation in retail gasoline prices are

explained by the variables included in the model (although it should be noted that the data

are time-series in nature and therefore a high R2 is expected).

7. To return to the conspiracy theory that prices are actively manipulated by retailers, the

95% confidence interval of the wholesale price parameter is [1.068, 1.115]. At the maximum,

the historical pre-tax markup on marginal cost at the retail level is approximately 11.5%,

which is consistent with the rest of the retail sector.

8. One other conclusion is that while wholesale price increases are associated with retail price

decreases, it is also true that wholesale price decreases are associated with retail price de-

creases. Or is this conclusion too strong for the given estimation?

• What if we defined a dummy variable that took a value of one when the wholesale price of gasoline

declined from one month to the next and included that as an additional regressor. If the retail

market reacts symmetrically to increases and decreases in wholesale price changes, this dummy

variable should have an insignificant parameter.

. tsset obs
time variable: obs, 1 to 264

. gen wpdown = [wholesaleprice<l.wholesaleprice]

. sum wpdown

Variable | Obs Mean Std. Dev. Min Max


-------------+-----------------------------------------------
wpdown | 264 .4659091 .4997839 0 1

72
. reg allgradesprice fedtax avestatetax wprice wpdown obs

------------------------------------------------
gasprice | Coef. Std. Err. t P>|t|
-------------+----------------------------------
fedtax | 1.328 .132 10.00 0.000
avestatetax | .741 .168 4.39 0.000
wprice | 1.101 .009 112.04 0.000
wpdown | 3.777 .348 10.84 0.000
obs | .027 .007 3.63 0.000
_cons | 2.245 2.258 0.99 0.321
-------------------------------------------------

• The historical data suggest that the price of gasoline is 3.777 cents higher in months when the

wholesale price declines. This suggests that there is an asymmetric effect of wholesale price

changes on retail prices of gasoline.

5.6 Multiple Regression Example: Software Piracy and Economic Freedom

• Many in information providing industries are anxious about software piracy. Many policy sug-

gestions have been made and the industry is pursuing legal remedies against individuals.

• However, there might be economic influences on the prevalence of software piracy. Bezmen

and Depken (2006, Economics Letters) looks at the impact of various socio-economic factors on

estimated software piracy rates in the United States from 1999, 2000, and 2001.

• Consider the simple regression model in which piracy is related to per capita income (measured
in thousands):

. reg piracy lninc

Regression with robust standard errors Number of obs = 150


F( 1, 148) = 38.48
Prob > F = 0.0000
R-squared = 0.2045
Root MSE = 8.3097

-----------------------------------------------------------------
Robust
Piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval]

73
--------+--------------------------------------------------------
lninc | -24.39808 3.932914 -6.20 0.000 -32.17 -16.62616
_cons | 114.1777 13.55488 8.42 0.000 87.39163 140.9638
-----------------------------------------------------------------

• As expected, states with greater income levels have lower levels of software piracy.

• What if we include other factors such as Economic Freedom (from the Fraser Institute), the level

of taxation (from the Tax Foundatin), unemployment (from the Bureau of Labor Statistics), and

two dummy variables to control for year 2000 and year 2001:

. reg piracy sfindex statetax lninc unemp yr00 yr01

Regression with robust standard errors Number of obs = 150


F( 6, 143) = 18.87
Prob > F = 0.0000
R-squared = 0.2887
Root MSE = 7.994
-------------------------------------------------------------------
| Robust
piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+-----------------------------------------------------
sfindex | -2.652959 1.331309 -1.99 0.048 -5.284546 -.0213714
statetax | -1.914663 .6571426 -2.91 0.004 -3.213632 -.6156949
lninc | -24.83959 3.248853 -7.65 0.000 -31.26157 -18.41761
unemp | .2433874 .7928428 0.31 0.759 -1.323819 1.810594
yr00 | 1.568842 1.390505 1.13 0.261 -1.179758 4.317442
yr01 | 3.611868 1.611408 2.24 0.027 .4266101 6.797126
_cons | 150.7459 18.60083 8.10 0.000 113.9778 187.514
-------------------------------------------------------------------

• The parameter estimate on income did not change very much.

1. A 1% increase in income corresponds with a reduction in software piracy of 24%.

2. States with greater economic freedom tend to pirate less.

3. States with greater taxation (which might proxy for enforcement efforts) tend to pirate less.

4. States with greater unemployment experience do not pirate more.

5. The parameter on yr01 suggests that piracy was greater in 2001 than in 2000 or 1999.

• The upshot: software piracy seems to be an inferior good.

74

Anda mungkin juga menyukai