1 - Regresi Model Building Methodology

Ch.
1
Regresi: Model Building
Methodology

Setyo Tri Wahyudi

Pendahuluan
Korelasi:
Ukuran kekuatan hubungan antara 2 variabel.
Misal X1 dengan X2.
Nilai antara 0-1; nilai 0 semakin tidak
berhubungan (tidak berkorelasi); nilai 1 korelasi
sempurna.

Regresi:
Suatu proses pembentukan model matematika
atau fungsi yang dapat digunakan untuk prediksi
atau penentuan suatu variabel oleh variabel
lainnya.

Macam-macam Regresi
Regresi Linear
Regresi linier ialah bentuk hubungan di mana variabel
bebas X maupun variabel tergantung Y sebagai faktor
yang berpangkat satu.

Regresi linier ini dibedakan menjadi:
1). Regresi linier sederhana dengan bentuk fungsi:
Y = a + bX + e,
2). Regresi linier berganda dengan bentuk fungsi:
Y = b
0
+ b
1
X
1
+ . . . + b
1
X
1
+ e

Dari kedua fungsi di atas 1) dan 2); masing-masing
berbentuk garis lurus (linier sederhana) dan bidang datar
(linier berganda).

Regresi Non-Linear
Regresi non linier ialah bentuk hubungan atau fungsi di mana variabel X
dan atau variabel Y dapat berfungsi sebagai faktor atau variabel dengan
pangkat tertentu.
Beberapa bentuk regresi non linier adalah sebagai berikut:
1). Regresi polinomial ialah regresi dengan sebuah variabel bebas
sebagai faktor dengan pangkat terurut.
Y = a + bX + cX
2
(fungsi kuadratik).
Y = a + bX + cX
2
+ bX
2
(fungsi kubik)
Y = a + bX + cX
2
+ dX
2
+ eX
4
(fungsi kuartik),
Y = a + bX + cX
2
+ dX
3
+ eX
4
+ fX
5
(fungsi kuinik), dan seterusnya.
2). Regresi hiperbola (fungsi resiprokal)
Pada regresi hiperbola, di mana variabel bebas X atau variabel tak bebas
Y, dapat berfungsi sebagai penyebut sehingga regresi ini disebut regresi
dengan fungsi pecahan atau fungsi resiprok. Regresi ini mempunyai
bentuk fungsi seperti:
1/Y = a + Bx
3). Regresi eksponensial
Regresi eksponensial ialah regresi di mana variabel bebas X berfungsi
sebagai pangkat atau eksponen. Bentuk fungsi regresi ini adalah:
Y = a ebX

Regresi sederhana vs berganda
Sederhana: terdapat dua variabel dalam model
dependent variable, the variable to be
predicted, usually called Y
independent variable, the predictor or
explanatory variable, usually called X
Y = |
0
+ |
1
X
1
+ c

Berganda: terdapat dua atau lebih variabel
independen
Y = |
0
+ |
1
X
1
+ |
2
X
2
+ |
3
X
3
+ . . . +
|
k
X
k
+ c
Evaluating Regression Model
H
H
k
a
0
1 2 3
0 :
:
| | | | = = = = =
=
At least one of the regression coefficients is 0

H
H
H
H
H
H
H
H
a a
a
k
a
k
0
1
1
0
3
3
0
2
2
0
0
0
0
0
0
0
0
0
:
:
:
:
:
:
:
:
|
|
|
|
|
|
|
|
=
=
=
=
=
=
=
=
Significance
Tests for
Individual
Regression
Coefficients
Testing
the
Overall
Model
Testing the Overall Model (F test)
0 is ts coefficien regression the of one least At :
0 :
2
1
0
=
=
=
a H
H
| |
MSR
SSR
k
MSE
SSE
n k
F
MSR
MSE
= =

=
1
ANOVA
df
SS MS F p
Regression 2 8189.723 4094.86 28.63 .000
Residual (Error) 20 2861.017 143.1
Total 22 11050.74
. , ,
.
. . ,
01 2 20
585
28 63 585
F
F
Cal
=
= > reject H . 0
Significance Test of the
Regression Coefficients (t test)
H
H
H
H
a
a
0
1
1
0
2
2
0
0
0
0
:
:
:
:
|
|
|
|
=
=
=
=
t
Cal
= 5.63 > 2.086, reject H
0
.
Coefficients Std Dev t Stat p
X
1

0.0177 0.003146 5.63 .000
X
2

-0.666 0.2280

-2.92 .008
t
.025,20
= 2.086
Residuals and Sum of Squares
Error
SSE
Observation Y Observation Y
1 43.0 42.466 0.534 0.285 13 59.7 65.602 -5.902 34.832
2 45.1 51.465 -6.365 40.517 14 64.5 75.383 -10.883 118.438
3 49.9 51.540 -1.640 2.689 15 76.0 65.442 10.558 111.479
4 56.8 58.622 -1.822 3.319 16 89.5 82.772 6.728 45.265
5 53.9 54.073 -0.173 0.030 17 82.5 77.659 4.841 23.440
6 57.9 55.627 2.273 5.168 18 101.0 87.187 13.813 190.799
7 54.9 62.991 -8.091 65.466 19 84.9 89.356 -4.456 19.858
8 58.0 85.702 -27.702 767.388 20 108.0 91.237 16.763 280.982
9 59.0 48.495 10.505 110.360 21 109.0 85.064 23.936 572.936
10 63.4 61.124 2.276 5.181 22 97.9 114.447 -16.547 273.815
11 59.5 68.265 -8.765 76.823 23 120.0 112.460 7.540 56.854
12 63.9 71.322 -7.422 55.092 2861.017
Y Y Y

( )
2
Y Y

Y Y Y

( )
2
Y Y

SSE and Standard Error
of the Estimate
e
S
SSE
n k
where
=

=

=
1
2861
23 2 1
1196 .
: n = number of observations
k = number of independent variables
SSE
ANOVA
df
SS MS F P

Regression 2 8189.7 4094.9 28.63 .000
Residual (Error) 20 2861.0 143.1
Total 22 11050.7
Coefficient Determination (R
2
)
2
2
8189 723
11050 74
741
1 1
2861017
11050 74
741
R
R
SSR
SSY
SSE
SSY
= = =
= = =
.
.
.
.
.
.
SSE
ANOVA
df
SS MS F p
Regression 2 8189.7 4094.89 28.63 .000
Total 22 11050.7
SS
YY

SSR
Adjusted R
2
adj
SSE
n k
SSY
n
R
.
.
.
. .
2
1
1
1
1
2861017
23 2 1
1105074
23 1
1 285 715 =

= =
ANOVA
df
SS MS F p
Regression 2 8189.7 4094.9 28.63 .000
Total 22 11050.7
SS
YY

SSE
n-k-1
n-1
Model-Building
Stepwise Regression
Forward Selection
Backward Elimination
All Possible Regressions
Stepwise Regression
Perform k simple regressions; and
select the best as the initial model

Evaluate each variable not in the model
If none meet the criterion, stop
Add the best variable to the model; re-
evaluate previous variables, and drop any
which are not significant

Return to previous step
Forward Selection
Like stepwise, except
variables are not re-evaluated
after entering the model
Backward Elimination
Start with the full model (all k predictors)
If all predictors are significant, stop
Otherwise, eliminate the most non-
significant predictor; return to previous
step
Data for Multiple
Regression
Y World Crude Oil
Production
X
1
U.S. Energy
Consumption
X
2
U.S. Nuclear
Generation
X
3
U.S. Coal
Production
X
4
U.S. Dry Gas
Production
X
5
U.S. Fuel Rate
for Autos
Y X
1
X
2
X
3
X
4
X
5
55.7 74.3 83.5 598.6 21.7 13.30
55.7 72.5 114.0 610.0 20.7 13.42
52.8 70.5 172.5 654.6 19.2 13.52
57.3 74.4 191.1 684.9 19.1 13.53
59.7 76.3 250.9 697.2 19.2 13.80
60.2 78.1 276.4 670.2 19.1 14.04
62.7 78.9 255.2 781.1 19.7 14.41
59.6 76.0 251.1 829.7 19.4 15.46
56.1 74.0 272.7 823.8 19.2 15.94
53.5 70.8 282.8 838.1 17.8 16.65
53.3 70.5 293.7 782.1 16.1 17.14
54.5 74.1 327.6 895.9 17.5 17.83
54.0 74.0 383.7 883.6 16.5 18.20
56.2 74.3 414.0 890.3 16.1 18.27
56.7 76.9 455.3 918.8 16.6 19.20
58.7 80.2 527.0 950.3 17.1 19.87
59.9 81.3 529.4 980.7 17.3 20.31
60.6 81.3 576.9 1029.1 17.8 21.02
60.2 81.1 612.6 996.0 17.7 21.69
60.2 82.1 618.8 997.5 17.8 21.68
60.6 83.9 610.3 945.4 18.2 21.04
60.9 85.6 640.4 1033.5 18.9 21.48
Stepwise: Step 1 - Simple Regression Results
for Each Independent Variable
Dependent
Variable
Independent
Variable t-Ratio R
2
Y X
1
11.77 85.2%
Y X
2
4.43 45.0%
Y X
3
3.91 38.9%
Y X
4
1.08 4.6%
Y X
5
33.54 34.2%
All Possible Regressions
with Five Independent Variables
Four
Predictors
X
1
,X
2
,X
3
,X
4
X
1
,X
2
,X
3
,X
5
X
1
,X
2
,X
4
,X
5
X
1
,X
3
,X
4
,X
5
X
2
,X
3
,X
4
,X
5
Single
Predictor
X
1
X
2
X
3
X
4
X
5
Two
Predictors
X
1
,X
2
X
1
,X
3
X
1
,X
4
X
1
,X
5
X
2
,X
3
X
2
,X
4
X
2
,X
5
X
3
,X
4
X
3
,X
5
X
4
,X
5
Three
Predictors
X
1
,X
2
,X
3
X
1
,X
2
,X
4
X
1
,X
2
,X
5
X
1
,X
3
,X
4
X
1
,X
3
,X
5
X
1
,X
4
,X
5
X
2
,X
3
,X
4
X
2
,X
3
,X
5
X
2
,X
4
,X
5
X
3
,X
4
,X
5
Five Predictors
X
1
,X
2
,X
3
,X
4
,X
5
6.20
Functional Forms of Regression
The term linear in a simple regression model
means that there are linear in the parameters;
variables in the regression model may or may not
be linear.
6.21
True model is non-linear
Y
X
Income
Age
60
15
PRF
SRF
But run the wrong linear regression model
and makes a wrong prediction
6.22
Y
i
= |
0
+ |
1
X
i
+ c
i
Examples of Linear Statistical Models
ln(Y
i
) = |
0
+ |
1
X
i
+ c
i
Y
i
= |
0
+ |
1
ln(X
i
)

+ c
i
Y
i
= |
0
+ |
1
X
i
+ c
i
2
Examples of Non-linear Statistical Models
Y
i
= |
0
+ |
1
X
i
+ c
i
|
2
Y
i
= |
0
+ |
1
X
i
+ exp(|
2
X
i
)

+ c
i
Y
i
= |
0
+ |
1
X
i
+ c
i
|
2
Linear vs. Nonlinear
6.23
Different Functional Forms
5. Reciprocal (or inverse)
Attention to
each forms
slope and
elasticity
1. Linear
2. Log-Log
3. Semilog
Linear-Log or Log-Linear
4. Polynomial
6.24
Functional Forms of Regression models
Transform into linear log-form:
i
c
X
ln ln Y ln
+ =
1
| |
0

i
c X Y
+ + =
*
*
1
*
0
*
| |
i
c X ln Y ln
+ =
1
*
0
| |
==>
==>
1
*
1
| | =
where
*
*
*
ln
ln
|
1
= = =
X
dX
Y
dY
X d
Y d
dX
dY
elasticity
coefficient
2. Log-log model:
c
i

e X Y
0
|
1

|

=
This is a non-
linear model
6.25
Q
u
a
n
t
i
t
y

D
e
m
a
n
d

Y
X
price
1
0
|
|

=
X Y
lnY
lnX
X Y ln ln ln
1 0
| | =
lnY
lnX
X Y ln ln ln
1 0
| | + =
Q
u
a
n
t
i
t
y

D
e
m
a
n
d

price
Y
X
1
0
|
| =
X Y
6.26
3. Semi log model:
Log-lin model or lin-log model:
i i i
c
X Y
+ + =
1 0
ln
o o
i i i
c
X Y
+ + =
ln
1 0
| |
or
and
=
1
o
relative change in Y
absolute change in X
Y dX
dY
dX
Y
dY
dX
Y d 1 ln
= = =
=
1
|
absolute change in Y
relative change in X
1 ln
X
dX
dY
X d
dY
= =
6.27
5. Reciprocal (or inverse) transformations
i
i
i
c
X
Y
+ + =
)
1
(
1 0
| |
Functional Forms of Regression models(Cont.)
i i i
c X Y
+ + =
) (
*
1 0
| |
==>
Where
i
i
X
X
1 *
=
4. Polynomial: Quadratic term to capture the nonlinear pattern
Y
i
= |
0
+ |
1
X
i
+|
2
X
2
i
+ c
i
Yi
X
i
|
1
>0, |
2
<0
Yi
X
i
|
1
<0, |
2
>0
6.28
Some features of reciprocal model
X
Y
1
|
1
|
0
+ =
Y
0
|
X
0
0
> |
0

and
0
1
> |
Y
X
0
|
0
+
-
X
Y
1
|
1
|
0
+ =
0
0
< |
and
0
1
> |
Y
0
|
X
0
0 1
/
| |
0
0
> |
and
0
1
< |
Y
0
|
X
0
0 1
/
| |
0
0
< |
and
0
1
< |
6.29
Two conditions for nonlinear, non-additive equation
transformation.
1. Exist a transformation of the variable.
2. Sample must provide sufficient information.
Example 1:
Suppose
2 1 3
2
1 2 1 1 0
X X X X Y
| | | | + + + =
transforming
X
2
*
= X
1
2

X
3
*
= X
1
X
2
rewrite
*
3 3
*
2 2 1 1 0
X X X Y
| | | | + + + =
6.30
Example 2:
2
1
0
|
|
|
+
+ =
X
Y
transforming
2
*
1
1
| +
=
X
X
*
1 1 0
X Y
| | + =
rewrite
However, X
1
*
cannot be computed, because |
is unknown.
2

6.31
Application of functional form regression
1. Cobb-Douglas Production function:
c
e K L Y
0
|
2
|
1

| =
Transforming:
c K L Y
c K L Y
+ + + =
+ + + =
ln ln ln
ln ln ln ln
2 1 0
2 1 0
| | |
| | |
==>
1
ln
ln
| =
L d
Y d
2
ln
ln
| =
K d
Y d
: elasticity of output w.r.t. labor input
: elasticity of output w.r.t. capital input.
1
2 1
= + | |
>
<
Information about the scale of returns.
6.32
2. Polynomial regression model:
Marginal cost function or total cost function
cost
s
y
MC
i.e.
cost
s
y
c X X Y
+ + + =
2
2 1 0
| | |
(MC)
or
cost
s
y
TC
c X X X Y
+ + + + =
3
3
2
2 1 0
| | | |
(TC)
6.33
2
5325 . 1 304 . 100 M P N G
+ =
^
(1.368) (39.20)
Linear model
6.34
GNP = -1.6329.21 + 2584.78
lnM
2

(-23.44) (27.48)
^
Lin-log model
6.35
lnGNP = 6.8612 + 0.00057 M
2

(100.38) (15.65)
^
Log-lin model
6.36
2
ln 9882 . 0 5529 . 0 ln M NP G
+ =
^
(3.194) (42.29)
Log-log model
6.37
Wage(y)
unemp.(x)
SRF
10.4
3
wage=10.343-3.808(unemploy)
(4.862) (-2.66)
^
6.38
)
1
(
x
y
SRF
-1.428
u
N

u
N
: natural rate of
unemployment
Reciprocal Model
(1/unemploy)
Wage = -1.4282+8.7243
)
1
(
x
(-.0690) (3.063)
^
The |
0
is statistically insignificant
Therefore, -1.428 is not reliable
6.39
lnwage = 1.9038 - 1.175ln(unemploy)
(10.375) (-2.618)
^
6.40
Lnwage = 1.9038 + 1.175 ln
)
1
(
X
(10.37) (2.618)
^
Antilog(1.9038) = 6.7113, therefore it is a more meaningful
and statistically significant bottom line for min. wage
Antilog(1.175) = 3.238, therefore it means that one unit X increase
will have 3.238 unit decrease in wage
6.41
(MacKinnon, White, Davidson)
MWD Test for the functional form (Wooldridge, pp.203)
Procedures:
1. Run OLS on the linear model, obtain Y
^
Y = o
0
+ o
1
X
1
+ o
2
X
2

^
^ ^ ^
2. Run OLS on the log-log model and obtain lnY
^
lnY = |
0
+ |
1
ln

X
1
+ |
2
ln

X
2
^
^ ^ ^
3. Compute Z
1
= ln(Y) - lnY
^
^
4. Run OLS on the linear model by adding z
1

Y = o
0

+ o
1

X
1
+ o
2

X
2
+ o
3
Z
1

^
^
^
^
^
and check t-statistic of o
3

If t
*
o
3
> t
c
==> reject H
0
: linear model
If t
*
o
3
< t
c
==> not reject H
0
: linear model
6.42
MWD test for the functional form (Cont.)
5. Compute Z
2
= antilog (lnY) - Y
^
^
6. Run OLS on the log-log model by adding Z
2

lnY = |
0
+ |
1
ln X
1
+ |
2
ln X
2
+ |
3
Z
2

^
^
^ ^
^
If t
*
|
3
> t
c
==> reject H
0
: log-log model
If t
*
|
3
< t
c
==> not reject H
0
: log-log model
and check t-statistic of |
3
6.43
MWD TEST: TESTING the Functional form of regression
CV
1
=
o
Y
_
=
1583.279
24735.33
= 0.064
^
Y
^
Example:(Table 7.3)
Step 1:
Run the linear model
and obtain
C
X1
X2
6.44
lnY
^
fitted
or
estimated
Step 2:
Run the log-log model
and obtain
C
LNX1
LNX2
CV
2
=
o
Y
_
=
0.07481
10.09653
= 0.0074
^
6.45

MWD TEST
t
c
0.05, 11
= 1.796
t
c
0.10, 11
= 1.363

t
*
< t
c
at 5%
=> not reject H
0

t
*
> t
c
at 10%
=> reject H
0

Step 4:
H
0
: true model
is linear

C
X1
X2
Z1
6.46

MWD Test
t
c
0.025, 11
= 2.201

t
c
0.05, 11
= 1.796
t
c
0.10, 11
= 1.363

Since t
*
< t
c

=> not reject H
0

Comparing the C.V. =
C.V.
1

C.V.
2

=
0.064
0.0074
Step 6:
H
0
: true model
is log-log model
C
LNX1
LNX2
Z2
6.47
o
Y
^
The coefficient of variation:

C.V. =

It measures the average error of the sample regression function
relative to the mean of Y.

Linear, log-linear, and log-log equations can be
meaningfully compared.
The smaller C.V. of the model,
the more preferred equation (functional model).

Criterion for comparing two different functional models:
6.48
= 4.916 means that model 2 is better
Coefficient Variation
(C.V.)
o / Y

of model 1

^
o / Y

of model 2

^
=
2.1225/89.612
0.0217/4.4891
=
0.0236
0.0048
Compare two different functional form models:
Model 1
linear model
Model 2
log-log model
TUGAS INDIVIDU:
1. Cari sebarang data (buku, web)
2. Tentukan model awal (berdasar teori):
model linear dan model log-linear
3. Lakukan uji MWD
4. Interpretasikan hasilnya

Pengumpulan:
- Minggu Depan (17/09/2012)
- Print out

1 - Regresi Model Building Methodology

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

1 - Regresi Model Building Methodology

Diunggah oleh

Hak Cipta:

Format Tersedia

Ch.

At least one of the regression coefficients is 0

Anda mungkin juga menyukai