Anda di halaman 1dari 51

Qualitative Response

Regression Models

Chapter 15
MODEL KUALITATIF

1. Model Pilihan Ganda (Binary-Choice Models)


2. Model Pilihan Berganda (Multiple-Choice
Models)
3. Models untuk data hitungan (Models for Count
Data)
4. Model Tobit (Tobit Models)
5. Model Durasi (Duration Models)
Model Pilihan Ganda (Binary-choice
models)
1. Model Probabilitas Linier (Linear Probalibity
Models)
2. Model Probit atau Normit (Probit/Normit
Models)
3. Model Logistik (Logit Models)
Model Probabilitas Linier (LPM)

Y i = a + b X i + ei (1)

=1 ya pi
Yi Distribusi binomial
Bukan distribusi normal
=0 tidak 1-pi

E(YiXi) = a + bXi + E(ei)  E(ei)=0


E(YiXi) =  Yi . p(Yi)
= 1. pi + 0(1-pi) = pi (2)
Model Probabilitas Linier (LPM)

E(ei) = (1-a-bXi) pi + (0-a-bXi) (1-pi) = 0


pi = a+bXi
1-pi = 1-a-bXi (3)

E(ei2) = (1-a-bXi)2 pi + (0-a-bXi)2 (1-pi)


= pi (1-pi) (4)

2 = E(Yi) (1-E(Yi))  heteroskedastisitas


Yi

Over estimated slope Fitted line

Actual predictions
1
True regression line
E(Yi) = a + bXi

Under estimated line

Xi
-2 -1 0 1 2
Masalah Estimasi LPM

1. Distribusi binomial  bukan distribusi normal


 N    distribusi normal
2. Heteroskedastisitas
3. Tidak terpenuhinya 0  E(Yi)  1
4. R2 dipertanyakan  goodness of fit (?)

 diatasi dengan weighted least square (WLS)


Kelemahan estimasi WLS
 Tidak ada jaminan bahwa nilai estimasi Y
terletak antara 0 dan 1. Jika nilai terletak diluar
area (0,1), observasi harus dibuang atau
dianggap sama dengan 0,01 atau 0,99. Untuk
kedua kasus tersebut WLS tidak efisien untuk
sampel yang terbatas.
 Sangat sensitif terhadap kemungkinan adanya
kesalahan spesifikasi. Jika terdapat
kemungkinan kesalahan spesifikasi sangat
dianjurkan WLS untuk tidak digunakan.
 Sangat peka terhadap data yang ekstrem
Model Probit atau Normit

 Melihat permasalahan LPM  dicari solusi 


mentransformasi model asli sdmkn rupa shg
hasil prediksinya terletak antara (0,1)
 Caranya dengan fungsi probabilitas kumulatif
 Normal  normit atau probit
 Logistik  logit
Model Probit atau Normit
 Distribusi Normal
Zi = a + bXi
Zi
1
Pi  F  Z i   e
s2 2
ds
2 

Zi  tidak tersedia  didekati dengan “indeks” Zi*


Contoh: pemilihan ketua kelas
Zi > Zi*  calon I
Zi  Zi*  calon II
Zi*  variabel random  ~ dist. normal
S ~ N(0,1)
Zi = F-1(Pi) = a + bXi
F(Z)

LPM
1
PROBIT

Z
-3 -2 -1 0 1 2 3
Model Logit
 Probabilitas kumulatif  logistik
1 1
Pi  F  Z i   F  a  bX i    Zi
  a  bX i 
1 e 1 e

Pi 1  e  Zi
  P  Pe
i i
 Zi
1
 Zi 1  Pi Pi
e   e  Zi

Pi 1  Pi
 Pi  ni
Li  Z i  ln    a  bX i Pi =
 1  Pi  Ni
Contoh:
I (000 USD) Ni ni Pi Zi=F-1(Pi)
6 40 8 0.20 -0.84
8 50 12 0.24 -0.70
10 60 18 0.30 -0.38
13 80 28 0.35 -0.12
15 100 45 0.45 0.03
20 70 36 0.51 -0.52
25 65 39 0.60 0.25
30 50 33 0.66 0.40
35 40 30 0.75 0.67
40 25 20 0.80 0.84
Values of Cumulative probability
Functions
P(Zi) P(Zi)
Zi Dist. Normal Dist. Logistik
-3 0.0013 0.0474
-2 0.0228 0.1192
-1 0.1587 0.2689
0 0.5000 0.5000
1 0.6915 0.6225
2 0.9772 0.8808
3 0.9987 0.9526
y

0.8

1
y= –x
1+e
0.6

0.4
2
-0.5 – 0.5x
y = (2) e

0.2

-4 -2 2 4 x

Kurva distribusi normal dan kurva logit


kurva distribusi kumulatif

1.2

0.8
F(Z)

0.6

0.4

0.2

0
-4 -3 -2 -1 0 1 2 3 4
Z

probit logit
Tabel 15.1 (Gujarati, 2003)
FAMILY = Family
Y = Home Ownershi, where 1=Owns a House; 0= Does Not Own a House
X = Family Income, Thousands of $
FAM Y X FAM Y X FAM Y X
1 0 8 15 0 6 28 1 18
2 1 16 16 1 19 29 0 11
3 1 18 17 1 16 30 0 10
4 0 11 18 0 10 31 1 17
5 0 12 19 0 8 32 0 13
6 1 19 20 1 18 33 1 21
7 1 20 21 1 22 34 1 20
8 0 13 22 1 16 35 0 11
9 0 9 23 0 12 36 0 8
10 0 10 24 0 11 37 1 17
11 1 17 25 1 16 38 1 16
12 1 18 26 0 11 39 0 7
13 0 14 27 1 20 40 1 17
14 1 20
The LPM estimated by OLS
Dependent Variable: Y
Method: Least Squares
Sample: 1 40
Included observations: 40
Variable Coefficient Std. Error t-Statistic Prob.
C -0.945686 0.122841 -7.698428 0.0000
X 0.102131 0.008160 12.51534 0.0000
R-squared 0.804761 Mean dependent var 0.525000
Adjusted R-squared 0.799624 S.D. dependent var 0.505736
S.E. of regression 0.226385 Akaike info criterion -0.084453
Sum squared resid 1.947505 Schwarz criterion -9.31E-06

Log likelihood 3.689066 F-statistic 156.6336


Durbin-Watson stat 1.955187 Prob(F-statistic) 0.000000
Interpretasi hasil
 LPM  OLS
Yi = -0.945686 + 0.102131 Xi
(-7.698428) (12.51534)
R2= 0.804761

 b0 = -0.9457  b0 = 0 probabilitas rt dengan


pendapatan nol memiliki rumah adalah 0%
 b1 = 0.1021  untuk setiap unit kenaikan
pendapatan, rata-rata probabilitas memiliki rumah
naik 0.1021%
Interpretasi hasil
 Misal untuk X=12
estimasi probabilitas memiliki rumah adalah
-0.9457 + 0.1021 (12) = 0.2795
probabilitas rt dengan pendapatan $12000 adalah
28%

Terdapat nilai estimasi yang >1 atau <0


Kel. 1: Xi = 8  Yi = -0.12864
Kel. 7: Xi = 20  Yi = 1.09693
Interpretasi hasil
 LPM dengan wi Weighted Least-Squares
(WLS)


wi  Yˆi 1  Yˆi 
Yi
Yi 
*

wi
Xi
X *
i 
wi
The WLS
Dependent Variable: Y/SW
Method: Least Squares
Sample(adjusted): 2 40
Included observations: 28
Excluded observations: 11 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob.
1/SW -1.245592 0.120555 -10.33211 0.0000
X/SW 0.119589 0.006852 17.45438 0.0000
R-squared 0.981050 Mean dependent var 2.191518
Adjusted R-squared 0.980321 S.D. dependent var 3.556681
S.E. of regression 0.498942 Akaike info criterion 1.516095
Sum squared resid 6.472517 Schwarz criterion 1.611252
Log likelihood -19.22533 F-statistic 1345.999
Durbin-Watson stat 1.882836 Prob(F-statistic) 0.000000
 Nilai estimasi LPM yang bernilai >1 atau <0
dihilangkan dari observasi.

Yi/wi = -1.245592(1/wi)+0.119589(Xi/wi)
(-10.33211) (17.45438)
R2 = 0.981050

 Logit
Dibedakan atas 2 jenis data:
1. Data individu (data at individual or micro level)
2. Data grup (grouped or replicanted data)

Didasarkan pada data pada Tabel 15.4


(Gujarati, 2003: 598)
Dependent Variable: Y Logit dengan data individu
Method: ML - Binary Logit
Sample: 1 580
Included observations: 580
Convergence achieved after 3 iterations
Covariance matrix computed using second derivatives
Variable Coefficient Std. Error z-Statistic Prob.
C -1.602343 0.204034 -7.853317 0.0000
X 0.079066 0.010112 7.818651 0.0000
Mean dependent var 0.463793 S.D. dependent var 0.499118
S.E. of regression 0.469458 Akaike info criterion 1.266556
Sum squared resid 127.3858 Schwarz criterion 1.281601
Log likelihood -365.3014 Hannan-Quinn criter. 1.272422
Restr. log likelihood -400.5033 Avg. log likelihood -0.629830
LR statistic (1 df) 70.40395 McFadden R-squared 0.087894
Probability(LR stat) 0.000000
Obs with Dep=0 311 Total obs 580
Obs with Dep=1 269
Beberapa hal yang harus diperhatikan

 Estimasi dengan maximum likelihood


 Uji hipotesa koefisien dengan Z statistik 
dibandingkan dengan Tabel Z (normal)
 R2 tdk bisa digunakan
 McFadden R2(R2McF) = 1-(LLFur/LLFr)
LLFur = fungsi log likelihood unrestrik dimana semua
regresor dimasukkan dalam model
LLFr = fungsi log likelihood restrik dimana hanya
intersep yang dimasukkan dalam model
 Count R2 = jml prediksi benar/jml observasi
Beberapa hal yang harus diperhatikan
 Uji hipotesa bahwa semua koefisien sama
dengan nol  LR statistik  dibandingkan
dengan CS (df = jml variabel bebas)

 Interpretasi
b1= 0.079066 e 0.079066 = 1.0823
 rumah tangga yang mempunyai
pendapatan lebih tinggi mempunyai
probabilitas memiliki rumah 1.0823 kali lebih
tinggi
Logit dengan data grup  dengan weigted
Xi Ni ni pi 1-pi pi/(1-pi)
6 40 8 0.20 0.80 0.25
8 50 12 0.24 0.76 0.32
10 60 18 0.30 0.70 0.43
13 80 28 0.35 0.65 0.54
15 100 45 0.45 0.55 0.82
20 70 36 0.51 0.49 1.06
25 65 39 0.60 0.40 1.50
30 50 33 0.66 0.34 1.94
35 40 30 0.75 0.25 3.00
40 25 20 0.80 0.20 4.00
Lanjutan….

Li=ln(pi/(1-pi)) wi=Nipi(1-pi) wi L1i=Liwi X1i=Xiwi


-1.3863 6.40 2.5298 -3.5071 15.1789
-1.1527 9.12 3.0199 -3.4810 24.1595
-0.8473 12.60 3.5496 -3.0076 35.4965
-0.6190 18.20 4.2661 -2.6409 55.4599
-0.2007 24.75 4.9749 -0.9983 74.6241
0.0572 17.49 4.1816 0.2390 83.6318
0.4055 15.60 3.9497 1.6015 98.7421
0.6633 11.22 3.3496 2.2218 100.4888
1.0986 7.50 2.7386 3.0087 95.8514
1.3863 4.00 2.0000 2.7726 80.0000
Dependent Variable: L1
Method: Least Squares
Sample: 1 10
Included observations: 10
Variable Coefficient Std. Error t-Statistic Prob.
SW -1.593238 0.111494 -14.28984 0.0000
X1 0.078669 0.005448 14.44122 0.0000
R-squared 0.963656 Mean dependent var -0.379142
Adjusted R-squared 0.959114 S.D. dependent var 2.672782
S.E. of regression 0.540447 Akaike info criterion 1.784017
Sum squared resid 2.336666 Schwarz criterion 1.844534
Log likelihood -6.920086 F-statistic 212.1217
Durbin-Watson stat 1.136398 Prob(F-statistic) 0.000000
 Interpretasi Logit
 b = 0.078669  untuk setiap unit (1000$)

kenaikan income yang dibobot, kepemilikan


rumah yang dibobot akan naik 0.08 unit 
tidak lazim!
 Antilog fungsi regresi

Pi 1.593238 wi
e e 0.078669X1i

1  Pi
e0.078669 = 1.0818  untuk setiap unit
kenaikan income yang dibobot, kepemilikan
rumah naik 1.0818 atau 8.18%
 Menghitung probabilitas
Pada X=20($20000)
L1i = -1.593238wi + 0.078669 X1i = - 0.019858
dibagi wi =4.1816  - 0.004605

-0.004605 = ln (p/1-p)
e-0.004605 = p/1-p =0.9954
(1-p) 0.9954 = p
0.9954 = p(1.9954)
p = 0.9954 / 1.9954 = 0.4988
 rt dengan pendapatan $20000 mempunyai
probabilitas memiliki rumah 49.88%
 Probit (data grup)
Zi = -1.0088 + 0.0481 Xi
(-17.330) (19.105) R2 = 0.9786

 Probit (+5)  Ki = Zi+5


Ki = 3.9911 + 0.0481 Xi
(68.560) (19.105) R2 = 0.9786
Logit and Probit Models
Which model is preferable?
P
 The conditional probability
Pi approaches zero or 1
one at a slower rate in Probit
logit than in probit. Logit
 The logit model had
mathematically simplicity
 Has to be careful in
interpreting the coefficient
estimated by the two
model and not directly
comparable 0
The Tobit Model
 An extension of the probit model  develop by
James Tobin
 Example: home ownership  finding out the amount
of money a person or family spends on a house in
relation to socioeconomic variables.
 Problem: if a consumer does not purchase a house,
we have no data on housing expenditure. We have
data only on consumer who actually purchase a
house
The Tobit Model
 Consumers divided into two groups
 n1 consumers: we have information on the
regressors and the regressand  cencored
sample
 n2 consumers: we have information only on the
regressors but not on the regressand
 Also known as a cencored regression model,
limited dependent variable regression models
The Tobit Model
 We can express the Tobit model as
Yi  1   2 X i  ui if RHS > 0
 0 otherwise
 The OLS estimates of parameters obtained from
the subset of n1 observations will be biased as
well as inconsistent, they are biased even
asymptotically.
 How does estimate the Tobit model?
How does estimate the Tobit model?

 The method of maximum likelihood


 Comparative simple  James Heckman
Consists of a two-step estimating:
1. Estimate the probability of a consumer owning a
house, which is done on the basis of the probit
model
2. Estimate the tobit model by adding to it a variable
that is derived from the probit estimate  the
inverse Mills ratio or the hazard rate
How does estimate the Tobit model?
 The Heckman procedure yields consistent
estimates of the parameters, but they are
not as efficient as the ML estimates.
 Illustration of the Tobit Model: Ray Fair’s
Model of Extramarital Affairs.
 Ray Fair collected a sample of 601 men and
women then married for the first time and
analyzed their responses to a question about
extramarital affairs.
 451 individuals had no extramarital affairs, and
150 individuals had one or more affairs
How does estimate the Tobit model?
The variables used in that study are defined as follows:
 Y = number of affairs in the past year
 Z1 = 0 for female and 1 for male
 Z2 = number of years married
 Z3
 Z4 = 0 if no children and 1 if children
 Z5 = religiousness on a scale 1 to 5, 1 being antireligion
 Z6 = education, years: grade school=9; high school=12;
PhD or other=20
 Z7 = occupation, “Hollingshead” scale, 1-7
 Z8 = self rating of marriage, 1=very unhappy, 5=very happy
Modeling Count Data: the Poisson
Regression Model
 The regressand is of the count type, such as:
 the number of vacations taken by a family per year, the
number of patents received by a firm per year, the number
of visits to the dentist or doctor per year, the number of
visits to a grocery store per week, the number of days
stayed in a hospital in a given period, etc
 The variable is discrete
 The probability distribution is suited for count data
is the Poisson probability distribution
Modeling Count Data: the Poisson
Regression Model
 The pdf of the Poisson distribution is
 Y e 
f  Yi   Y  0,1, 2,...
Y!
where f  Yi   the probability that the variable Y takes
non-negative integer values.
It can be proved that
EY  
var  Y   
Its variance is the same as its mean value
Modeling Count Data: the Poisson
Regression Model
 The Poisson regression model:

Yi  E  Yi   ui  i  ui
where the Y’s are independently distributed as
Poisson random variables with mean I
i  E  Yi   1   2 X 2i   3 X 3i  ...   k X ki
where the X’s are some of the variables that might
affect the mean value.
Modeling Count Data: the Poisson
Regression Model
 For estimation purposes, we write the model as
 Y e 
Yi   ui
Y!
 Example: The data related to 100 individuals 65
years of age and older. The objective of the
study was to record the number of falls (Y) in
relation to gender (X2=0 female and 1 for male),
a balance index (X3/+) and a strength index
(X4/+), and intervention variable (X1=0 education
and 1 for education plus aerobic exercise
training)
Table 15.18
Dependent Variable: Y
Sample: 1 100
Convergence achieved after 7 iterations
Y=EXP(C(0)+C(1)*X1+C(2)*X2+C(3)*X3+C(4)*X4)
Coefficient Std. Error t-Statistic Prob.
C(0) 0.37020 0.34590 1.0701 0.2873
C(1) -1.10036 0.17050 -6.4525 0.0000
C(2) -0.02194 0.11050 -0.1985 0.8430
C(3) 0.01066 0.00270 3.9483 0.0001
C(4) 0.00927 0.00414 2.2380 0.0275
R-squared = 0.4857 Adjusted R-squared = 0.4640
Log likelihood = -197.2096 Durbin-Watson statistic = 1.7358
Interpretation of result
 We have obtained in Table 15.18 is the estimated
mean value for the ith individual, that is:

0.3702 1.10036 X1i  0.02194 X 2 i  0.0106 X 3 i  0.00927 X 4 i


uˆi  e
 Example:
Subject 99 had these values: Y=4; X1=0; X2=1;
X3=50; and X4=56. We obtain
as the estimated mean value for the 99th subject.
uˆ99  3.3538
Interpretation of results
 If we want to find out the probability that a
subject similar to subject 99 has less than 5 falls
per year, we can obtain it as follows:
P  Y  5   P  Y  0   P  Y  1  P  Y  2   P  Y  3   P  Y  4 
 3.3538   3.3538   3.3538 
0 3.3538 1 3.3538 2
e e e 3.3538
  
0! 1! 2!
 3.3538 e3.3538  3.3538  e3.3538
3 4

 
3! 4!
 0.7491
Interpretation of results
 We can also find out the marginal effect of a
regressor on the mean value of Y. Suppose we
want to find out the effect of a unit increase in
the strength index (X4) on mean.
 C0  C1 X1i  C2 X 2 i  C3 X 3 i  C4 X 4 i
 C4 e  C4 
X 4
 The intercept and variable X2 are individually
statistically insignificant.
Interpretation of results
Concluding:
 The model makes restrictive assumptions in that

the mean and the variance of the Poisson


process are the same and that the probability of
an occurrence is constant at any point in time
 Generally, the result of all nonlinear iterative

estimating procedures have validity in large


samples only
Further topics in qualitative response
regression models
1. Ordinal Logit and Probit Models
 The response variable can have more than two
outcomes and these outcomes are ordinal in
nature (such as a Likert-type scale or
categories in order).
 Ordinal scales  ranking among the
categories (multiple ranked categories)
 Example: 1=high school education; 2=college
education; and 3=postgraduate education
Further topics in qualitative response
regression models
2. Multinomial Logit and Probit Models
 The regressand is categories unordered or no
ranking  nominal categories
 Example:
 the choice of transportation mode to work (bicycle,
motorbike, car, bus, or train)
 Occupational classifications (unskilled, semiskilled, or
highly skilled)
 Occupational choice (self-employed, working for
private firm, or working for a government)
Further topics in qualitative response
regression models
3. Duration Models
 Survival analysis or time-to-event data analysis
 The key variable: the length of time or spell
length, which is modeled as a random variable.
 Example:
 what determines the duration of unemployment
spells?
 What factors determine the duration of a strike?

Anda mungkin juga menyukai