Anda di halaman 1dari 51

Introduction to Econometrics

Eco-20042
Lecture 3

1/51
Bivariate Regression Analysis

Motivation:
dependent = f( independent or explanatory variables)

e.g.
- defense expenditure = f(GNP)
- q
d
= f(p
o
, p
s
, Y .)
- l
s
= f(wage, no. of kids, age of kids, .)

notation:

Y = f(X
1
, X
2
, )

Note: does NOT imply causation (from theory)

Introduction to Econometrics
Eco-20042
Lecture 3

2/51
Reminder: objectives of exercise
- estimate mean value of Y for given X - E(Y/X)
e.g. mean sales if advertising is 10k
- Test hypothesis suggested by theory
e.g. does advertising affect sales
- Predict Y
e.g. if adv increased by 10% what would happen to sales

Population Regression Function (PRF)

Example: Law of demand

Y: quantity demanded
X: price
N=55 - assume this is the population

Introduction to Econometrics
Eco-20042
Lecture 3

3/51
The demand schedule for Widgets
Price (X) Quantity Demanded (Y) Number of consumers Average Y demanded
1
2
3
4
5
6
7
8
9
10
45,46, 47, 48, 49, 50, 51
44, 45, 46, 47, 48
40, 42, 44, 46, 48
35, 38, 42, 44, 46, 47
36, 39, 40, 42, 43
32, 35, 37, 38, 39, 42, 43
32, 34, 36, 38, 40
31, 32, 33, 34, 35, 36, 37
28, 30, 32, 34, 36
29, 30, 31
Total
7
5
5
6
5
7
5
7
5
3
55
48
46
44
42
40
38
36
34
32
30


Introduction to Econometrics
Eco-20042
Lecture 3

4/51

3
0
3
5
4
0
4
5
5
0
Q
u
a
n
t
i
t
y

(
Y
)
0 2 4 6 8 10
price
quantity Population Regression Line (PRL)
Scattergram of Price and Quantity

Introduction to Econometrics
Eco-20042
Lecture 3

5/51
PRL: gives average (mean) Y for each level of X
mathematically

E(Y/X
i
) = B
1
+ B
2
X
i
(1)

(1) is the Population Regression Function (PRF)

i.e. line that passes through conditional means of Y

B
1
and B
2
are parameters of PRF

Stochastic Population Regression Function
Not all points lie on the PRL:

Y
i
= B
1
+ B
2
X
i
+ u
i


Introduction to Econometrics
Eco-20042
Lecture 3

6/51
- u
i
accounts for fact that not all individuals are equal to mean value.
- u
i
is stochastic or random error term; a random variable.

Properties of u
i
:

Error may represent
- variables not included in model
e.g. income, price of other variables
- inherent randomness in behaviour
- measurement error
- principle of parsimony

Introduction to Econometrics
Eco-20042
Lecture 3

7/51
Sample Regression Function
Generated from sample of
population

Y
i
= b
1
+ b
2
X
i
+ e
i


e
i
is residual, estimator of u
i
.
b
1
is estimator of B
1
.
b
2
is estimator of B
2
.
2
5
3
0
3
5
4
0
4
5
5
0
Q
u
a
n
t
i
t
y

(
Y
)
0 2 4 6 8 10
Price (X)
Sample 1 SRL for sample 1
Sample 2 SRL for sample 2
Regression Lines from two Samples

Introduction to Econometrics
Eco-20042
Lecture 3

8/51




Introduction to Econometrics
Eco-20042
Lecture 3

9/51
Digression: Linearity

Models need not be linear in variables

e.g.
2
2 1
) (
i
X B B Y E + =



i
X
B B Y E
1
) (
2 1
+ =


can be estimated using regression
but NOT non-linear in parameters

i
X B B Y E
2
2 1
) ( + =



Introduction to Econometrics
Eco-20042
Lecture 3

10/51
Estimation of parameters when we have one sample : OLS
How to find line?
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30
S
a
l
e
s

Advertising
Sales v Advertising
3
0
3
5
4
0
4
5
5
0
Q
u
a
n
t
i
t
y

(
Y
)
0 2 4 6 8 10
Price (X)
qs3 SRL for sample 3
Sample Regression for Widget Demand

Introduction to Econometrics
Eco-20042
Lecture 3

11/51
1 2 i i i
Y b b X e = + +
or i i i
e Y Y + =



where 1 2

i i
Y b b X = +


i i i
Y Y e

=
so
i i i
X b b Y e
2 1
=


choose b
1
and b
2
such that minimize residual sum of squares

minimize

=
2
2 1
2
) (
i i i
X b b Y e


solve using calculus to get:


X b Y b
2 1
=


Introduction to Econometrics
Eco-20042
Lecture 3

12/51

=
2
2
i
i i
x
y x
b


=
2
) (
) )( (
X X
Y Y X X
i
i i



Q (Y) P (X) x y y
2
x
2
xy predicted Ye e
2
eX
49 1 -4.5 11.2 125.44 20.25 -50.4 47.5091 1.4909 2.2228 1.490909
45 2 -3.5 7.2 51.84 12.25 -25.2 45.3515 -0.3515 0.1236 -0.70303
44 3 -2.5 6.2 38.44 6.25 -15.5 43.1939 0.8061 0.6497 2.418182
39 4 -1.5 1.2 1.44 2.25 -1.8 41.0364 -2.0364 4.1468 -8.14545
38 5 -0.5 0.2 0.04 0.25 -0.1 38.8788 -0.8788 0.7723 -4.39394
37 6 0.5 -0.8 0.64 0.25 -0.4 36.7212 0.2788 0.0777 1.672727
34 7 1.5 -3.8 14.44 2.25 -5.7 34.5636 -0.5636 0.3177 -3.94545
33 8 2.5 -4.8 23.04 6.25 -12 32.4061 0.5939 0.3528 4.751515
30 9 3.5 -7.8 60.84 12.25 -27.3 30.2485 -0.2485 0.0617 -2.23636
29 10 4.5 -8.8 77.44 20.25 -39.6 28.0909 0.9091 0.8264 9.090909
sum 378 55 0 0 393.6 82.5 -178 378 0 9.551515 0
mean Y =378/10 = 37.8
mean X =55/10 = 5.5
b2 -2.15758
b1 49.66667

Introduction to Econometrics
Eco-20042
Lecture 3

13/51
In this example:

2
2
178
2.1576
82.5
i i
i
x y
b
x

= = =



1 2
37.8 ( 2.1576)(5.5) 49.667 b Y b X = = =


So
i i
X Y 1576 . 2 667 . 49

=


interpretation:
b
2
: ceteris parabis, if price goes up by $1, mean quantity falls by 2.16
units
b
1
: if price was zero, mean quantity is 49.7 units (often intercept has
no economic meaning)

Introduction to Econometrics
Eco-20042
Lecture 3

14/51
Note:
- OLS line passes through sample mean values of X and Y
- mean(e) =Ee
i
/n =0
- residuals and explanatory variables are uncorrelated: Ee
i
X
i
/n =0


Hypothesis Testing

Remember so far we have:
Stochastic Population Regression: Y
i
= B
1
+ B
2
X
i
+ u
i

Sample Regression: Y
i
= b
1
+ b
2
X
i
+ e
i


Introduction to Econometrics
Eco-20042
Lecture 3

15/51
For the example of widget demand the estimated regression was:
Y
i
= 49.667 - 2.1576X
i

The estimates of b
1
and b
2
will differ with each sample so there will be a
probability distribution associated with them.



Introduction to Econometrics
Eco-20042
Lecture 3

16/51
Assumptions of the Classical Linear Regression Model
1 The explanatory variable(s) X is uncorrelated with the disturbance term u.

2 The expected, or mean, value of the disturbance term u is zero E(u
i
) = 0
i.e. on average the error term u has no effect on Y

3 The variance of each u
i
is constant, or homoscedastic: var(u
i
) = o
2
i.e. the
conditional distribution of each Y population corresponding to a given X
has the same variance. The alternative is that we have heteroscedasticity
or unequal variance

4 There is no correlation between two error terms no autocorrelation
cov(u
i
,u
j
) = 0 for i = j

Introduction to Econometrics
Eco-20042
Lecture 3

17/51
i.e. no systematic relationship between two error terms. If one u is above
the mean value then the other error neednt also be above (below) the
mean. Error terms u
i
are random.


Remember :
OLS estimates are random variables their value will change from
sample to sample.

X b Y b
2 1
=

=
2
2
i
i i
x
y x
b


The variance or standard error of the estimates tells us something about the
sampling variability of the estimates.



Introduction to Econometrics
Eco-20042
Lecture 3

18/51
Formula:

For the relationship Y
i
= b
1
+ b
2
X
i
we have

Var(b
1
) =
2
2
2
o

i
i
x n
X



Se(b
1
) =
) var(
1
b



Var(b
2
) =

2
2
i
x
o



Se(b
2
) =
) var(
2
b

2

2
2

=

n
e
i
o

. . f d
RSS


2
o o =



Introduction to Econometrics
Eco-20042
Lecture 3

19/51
Estimator Formula

Result
2
o

8
5515 . 9
2
2
=

n
e
i

1.1939
o
1939 . 1
2
= o
1.0926
Var(b
1
)
) 5 . 82 ( 10
) 1939 . 1 )( 385 (
2
2 2
=

i
x n
X o

0.5572
Se(b
1
)
5572 . 0 ) var(
1
= b

0.7464
Var(b
2
)
5 . 82
1935 . 1
2
2
=
i
x
o

0.0145
Se(b
2
)
0145 . 0 ) var(
2
= b

0.1203


Introduction to Econometrics
Eco-20042
Lecture 3

20/51
i
i
X
Y
) 1203 . 0 (
15676 . 2
) 7464 . 0 (
6670 . 49

=


Tells us that the slope coefficient is 2.1576 and that the standard error is 0.1203
that is a measure of the variability of b
2
from sample to sample


Hypothesis Testing
Suppose someone suggests that price has no effect on the quantity demanded. The
null hypothesis is that
H
0
: B
2
= 0
This hypothesis is in effect a straw man. If sustained it says that there is no
relationship between Y and X to begin with.

Introduction to Econometrics
Eco-20042
Lecture 3

21/51

If X belongs to the model one would expect to reject the null hypothesis H
0
in favour
of the alternative hypothesis H
1
, which says B
2
is different from zero.

H
1
: B
2
= 0

Remember: We cant simply look at the numerical value of b
2
because this value
is random and will vary from sample to sample. A formal test is required.




Introduction to Econometrics
Eco-20042
Lecture 3

22/51
Two approaches:
- The confidence interval approach
- The test of significance approach to test any hypothesis about B
2
as well as
B
1


General Testing issues
In particular, we know that b
2
follows the normal distribution because b
2
is simply
a linear function of u, which is a normally distributed random variable
If b
2
is distributed as
) , (
2
2
2
b
B N o
then

Introduction to Econometrics
Eco-20042
Lecture 3

23/51
) 1 , 0 ( ~
) (
2
2 2
2
2 2
N
x
B b
b se
B b
Z
i

=
o


This allows us to calculate the probability of b
2
lying within a given range of B
2
.

Problem
We dont know true o but can replace it using
o
.

If we replace o using
o
then


Introduction to Econometrics
Eco-20042
Lecture 3

24/51

2
2 2

i
x
B b
o
~ t
n-2


The confidence interval approach
Assume that the level of significance o, the probability of committing a type I error
is fixed at 5%.
From the t table, we find that with 8 d.f. P(-2.306 s t s 2.306) = 0.95

The probability that a t value (for 8 d.f.) lies between the limits
(-2.306, 2.306) is 0.95 or 95%.

Introduction to Econometrics
Eco-20042
Lecture 3

25/51
These are the critical t values

Substituting we have
P(-2.306 s

2
2 2

i
x
B b
o
s 2.306) = 0.95

95 . 0
306 . 2 306 . 2
P
2
2 2
2
2
=
|
|
|
.
|

\
|
+ s s
i i
x
b B
x
b
o o

Or more generally:
P[b
2
2.306 se(b
2
) s B
2
s b
2
+ 2.306 se(b
2
)] = 0.95

Introduction to Econometrics
Eco-20042
Lecture 3

26/51
Which provides us with the 95% confidence interval for B
2
.

For our example:
-2.1576 2.306(0.1203) s B
2
s-2.1576 + 2.306 (0.1203)
-2.4350 s B
2
s -1.8802

Because this range does not include the null-hypothesized value of 0, we can reject a
null hypothesis that price has no effect on quantity demanded.

Check the confidence interval for B
1



Introduction to Econometrics
Eco-20042
Lecture 3

27/51
The test of significance approach to hypothesis testing:
Here the decision to accept or reject H
0
is made on the basis of the value of the test
statistic obtained from the sample data.
In particular, we know that
) (
2
2 2
b se
B b
t

=
follows a t distribution with n 2 d.f.

Let H
0
: B
2
= B
2
*
where B
2
*
is a specific numerical value of B
2
, then
) (
2
*
2 2
b se
B b
t

=

may be interpreted as the test statistic which follows a t distribution with n 2 d.f.


Introduction to Econometrics
Eco-20042
Lecture 3

28/51
Test requires three pieces of information
- The d.f. - always n 2 for bivariate regression
- The level of significance - conventionally set at 1%, 5%, 10%
- Whether to use a one-tailed or a two-tailed test

Two-tailed test
H
0
: B
2
= 0
H
1
: B
2
= 0
Using the formula we have


Introduction to Econometrics
Eco-20042
Lecture 3

29/51
94 . 17
1203 . 0
0 1576 . 2
~

= t
with 10-2=8 d.f.
Level of significance 0.01 0.05 0.1
critical t: t
*
3.355 2.306 1.860



Introduction to Econometrics
Eco-20042
Lecture 3

30/51
Compare calculated t value with critical value, say 0.01 level
|-17.94|>3.355
Hence reject null hypothesis that B
2
= 0 in favour of alternative

One-tailed test:
H
0
: B
2
> 0
H
1
: B
2
< 0 left sided test

We already know t = -17.94
Level of significance 0.01 0.05 0.1
critical t: t
*
-2.896 -1.860 -1.397

Introduction to Econometrics
Eco-20042
Lecture 3

31/51



Compare calculated t value with critical value, say 0.01 level
-17.94<-2.896

Introduction to Econometrics
Eco-20042
Lecture 3

32/51
Hence reject null hypothesis that B
2
> 0 in favour of alternative i.e. price coefficient
is negative as expected
We have looked at tests on the coefficients now look at some other tests;
How good is Fitted regression line overall?
This is measured by r
2
: coefficient of determination
How can this be computed?
i i i
e Y Y + =


i i i
e Y Y Y Y + = )

( ) (

i i i
e y y + =


Introduction to Econometrics
Eco-20042
Lecture 3

33/51
or total deviation of
Y
i
from the mean
= explained
deviation
+ unexplained
deviation



Introduction to Econometrics
Eco-20042
Lecture 3

34/51
Square and sum gives, with some manipulation


+ =
2
2
2

i
i
i
e y y


or total
variation in
Y about its
mean

= explained
variation in Y
ESS
+ unexplained variation
in Y: or residual sum
of squares
RSS
TSS = ESS + RSS

i.e. TSS = ESS + RSS

TSS
RSS
TSS
ESS
+ = 1



Introduction to Econometrics
Eco-20042
Lecture 3

35/51
let
TSS
ESS
r =
2
the coefficient of determination

then

= =
2
2
2
1 1
i
i
y
e
TSS
RSS
r


Note: 0 s r
2
s 1

Example:
9757 . 0
360 . 393
5515 . 9
1
2
= = r


i.e. ~ 98% of the variation in Y (Quantity) is explained by the regression in this
case the variable X (Price)


Introduction to Econometrics
Eco-20042
Lecture 3

36/51
Note: sample correlation: r

r = \ (r
2
)

so here r = \ (0.9757) = -0.9875

sign determined from graph, estimated slope coefficient etc.


Test on Overall Model: R
2
= 0

H
0
: R
2
= 0 i.e. no explanatory power in model
H
1
: R
2
> 0

i.e. variables together have no effect on Y is the null (here we only have one
variable)


Introduction to Econometrics
Eco-20042
Lecture 3

37/51
We can show that


) 2 (
) 1 2 (

=
n
RSS
ESS
F
~ F(1,n-2)

if ESS large and RSS small then F gets big, reject H
0


also
) 2 ( ) 1 (
) 1 2 (
2
2


=
n R
R
F


if R
2
= 0; F = 0

R
2
= 1; F =



Introduction to Econometrics
Eco-20042
Lecture 3

38/51
Using our example:

218 . 321
8 ) 9757 . 0 1 (
1 9757 . 0
=

= F


5% critical value F(1,8) = 5.32 from tables

F > CV so reject H
0
: R
2
= 0


Normality tests:

We assumed errors normally distributed and all preceding tests are based on this
assumption, need to check.


Introduction to Econometrics
Eco-20042
Lecture 3

39/51
Look at histogram of errors to see if random, or perform Bera-Jacques test. Might
come back to this later too few observations to show really.

Regression using Stata:




_cons 49.66667 .7464394 66.54 0.000 47.94537 51.38796
price -2.157576 .1202996 -17.94 0.000 -2.434987 -1.880164

quantity Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 393.6 9 43.7333333 Root MSE = 1.0927
Adj R-squared = 0.9727
Residual 9.55151515 8 1.19393939 R-squared = 0.9757
Model 384.048485 1 384.048485 Prob > F = 0.0000
F( 1, 8) = 321.66
Source SS df MS Number of obs = 10
. reg quantity price

Introduction to Econometrics
Eco-20042
Lecture 3

40/51

Forecasting/Prediction:

Use model to forecast MEAN value for Y given some value for X

Let X = X
0
e.g. X
0
= 3


We want E(Y/ X
0
=3)

0

49.667 2.1576 (3) 43.194 Y = =




Now 0 0

Y Y =
there exists forecasting error so we need a distribution for
0

Y

Mean: E(Y/ X
0
) =

B
1
+ B
2
X
t



Introduction to Econometrics
Eco-20042
Lecture 3

41/51
Variance:
(
(


+ =

2
2
0 2
0
) ( 1
) var(
i
x
X X
n
Y o


o
2
not known so use
2
o
;

now Y
0
distributed as t, generate confidence interval

o
o o
=
(

+ + s + s + 1 ) Y

se( t ) X b (b X B B ) Y

se( t ) X b (b P
0
2
0 2 1 0 2 1 0
2
0 2 1

Widget example:
2
0
1 (3 5.5)
var(Y ) 1.1939 0.290844
10 82.50
(

= + =
(



Introduction to Econometrics
Eco-20042
Lecture 3

42/51
0

se(Y ) 0.4581 =


95% confidence interval where critical t value with 8df = 2.306

| |
P 43.194 2.306 (0.4581) E(Y) 43.194 2.306 (0.4581) 0.95 s s + =


or 42.138 s E(Y/X
0
) s 44.250

CI grows as X
0
goes away from X so one cannot extrapolate very far away from the
mean or out of sample


Introduction to Econometrics
Eco-20042
Lecture 3

43/51

2
5
3
0
3
5
4
0
4
5
5
0
Q
u
a
n
t
i
t
y

(
Y
)
0 2 4 6 8 10
Price (X)
quantity Fitted values
80% CI Fitted values
Sample Regression for Widget Demand
X

Introduction to Econometrics
Eco-20042
Lecture 3

44/51
Illustrative Examples:
1) estimate relationship between average wages and years of schooling;
sample of 13 observations




_cons -.0144527 .8746238 -0.02 0.987 -1.939487 1.910581
schooling .7240967 .0695813 10.41 0.000 .5709492 .8772442

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 105.118326 12 8.75986048 Root MSE = .9387
Adj R-squared = 0.8994
Residual 9.6928077 11 .881164337 R-squared = 0.9078
Model 95.4255181 1 95.4255181 Prob > F = 0.0000
F( 1, 11) = 108.29
Source SS df MS Number of obs = 13
. reg wage schooling

Introduction to Econometrics
Eco-20042
Lecture 3

45/51

0.0144 0.7241
i i
Y X = +


where Y is average hourly wage rate ($)
X is years of schooling

conclusions:
- if schooling goes up 1 unit i.e. 1 year; expect average hourly wage to
increase approx. 72 cents
- negative intercept has no particular economic interpretation
- consider t values, conf intervals, R
2
etc





Introduction to Econometrics
Eco-20042
Lecture 3

46/51
2) Gujarati has data available on a clock auction which included information
on the price of the winning bid, age of clock and number of bidders.



Note: age of clock and number of bidders
How do we expect age of clock to affect winning bid?
numbider 32 9.53125 2.839632 5 15
age 32 144.625 27.54556 108 194
price 32 1328.094 393.6495 729 2131
observation 32 16.5 9.380832 1 32

Variable Obs Mean Std. Dev. Min Max

Introduction to Econometrics
Eco-20042
Lecture 3

47/51

Expected relationship: Price and
Age the older the clock, the
higher the winning bid expect
positive relationship

5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
100 120 140 160 180 200
Age
Price Fitted values

Introduction to Econometrics
Eco-20042
Lecture 3

48/51


183.04 10.49
i i
Y X = +


where Y is price of winning bit
X is age of clock


_cons -183.0435 261.9194 -0.70 0.490 -717.9542 351.8672
age 10.44866 1.780017 5.87 0.000 6.813378 14.08394

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 4803756.72 31 154959.894 Root MSE = 273
Adj R-squared = 0.5191
Residual 2235809.47 30 74526.9823 R-squared = 0.5346
Model 2567947.25 1 2567947.25 Prob > F = 0.0000
F( 1, 30) = 34.46
Source SS df MS Number of obs = 32
. reg price age

Introduction to Econometrics
Eco-20042
Lecture 3

49/51
conclusions:
- if age goes up 1 unit i.e. 1year; expect price to increase on average by
$10.49
- R
2
mid value at 0.5346
What about number of bidders?

Expected relationship: Price and
number of bidders the more
bidders the higher the price
because large number of bidders
suggest clock is valuable expect
positive relationship
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
5 10 15
NumBider
Price Fitted values

Introduction to Econometrics
Eco-20042
Lecture 3

50/51


807.95 54.57
i i
Y X = +

where Y is price of winning bit
X is number of bidders

_cons 807.9501 231.0921 3.50 0.001 335.9972 1279.903
numbider 54.57245 23.26605 2.35 0.026 7.056827 102.0881

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 4803756.72 31 154959.894 Root MSE = 367.85
Adj R-squared = 0.1268
Residual 4059311.81 30 135310.394 R-squared = 0.1550
Model 744444.914 1 744444.914 Prob > F = 0.0258
F( 1, 30) = 5.50
Source SS df MS Number of obs = 32
. reg price numbider

Introduction to Econometrics
Eco-20042
Lecture 3

51/51
Conclusions:
- if number of bidders goes up 1 person; expect price to increase on
average by $54.5
- Note: R
2
low at 0.1550


Today:
We have explored how to estimate the best-fit line, interpret and evaluate
coefficients in a bivariate model using:
Hypothesis testing for coefficients(t-test, confidence intervals)
Hypothesis testing for R
2
(F-test)
How to predict and see if it is good prediction

Anda mungkin juga menyukai