Ch15 Qualitative Response Regression Models

Qualitative Response
Regression Models
Chapter 15
MODEL KUALITATIF
1. Model Pilihan Ganda (Binary-Choice Models)

2. Model Pilihan Berganda (Multiple-Choice
Models)
3. Models untuk data hitungan (Models for Count
Data)
4. Model Tobit (Tobit Models)
5. Model Durasi (Duration Models)
Model Pilihan Ganda (Binary-choice
models)
1. Model Probabilitas Linier (Linear Probalibity
Models)
2. Model Probit atau Normit (Probit/Normit
Models)
3. Model Logistik (Logit Models)
Model Probabilitas Linier (LPM)
Y i = a + b X i + ei (1)
=1 ya pi
Yi Distribusi binomial
Bukan distribusi normal
=0 tidak 1-pi
E(YiXi) = a + bXi + E(ei)  E(ei)=0

E(YiXi) =  Yi . p(Yi)
= 1. pi + 0(1-pi) = pi (2)
Model Probabilitas Linier (LPM)
E(ei) = (1-a-bXi) pi + (0-a-bXi) (1-pi) = 0

pi = a+bXi
1-pi = 1-a-bXi (3)
E(ei2) = (1-a-bXi)2 pi + (0-a-bXi)2 (1-pi)

= pi (1-pi) (4)
2 = E(Yi) (1-E(Yi))  heteroskedastisitas

Yi
Over estimated slope Fitted line
Actual predictions
1
True regression line
E(Yi) = a + bXi
Under estimated line
Xi
-2 -1 0 1 2
Masalah Estimasi LPM
1. Distribusi binomial  bukan distribusi normal

 N    distribusi normal
2. Heteroskedastisitas
3. Tidak terpenuhinya 0  E(Yi)  1
4. R2 dipertanyakan  goodness of fit (?)
 diatasi dengan weighted least square (WLS)

Kelemahan estimasi WLS
 Tidak ada jaminan bahwa nilai estimasi Y
terletak antara 0 dan 1. Jika nilai terletak diluar
area (0,1), observasi harus dibuang atau
dianggap sama dengan 0,01 atau 0,99. Untuk
kedua kasus tersebut WLS tidak efisien untuk
sampel yang terbatas.
 Sangat sensitif terhadap kemungkinan adanya
kesalahan spesifikasi. Jika terdapat
kemungkinan kesalahan spesifikasi sangat
dianjurkan WLS untuk tidak digunakan.
 Sangat peka terhadap data yang ekstrem
Model Probit atau Normit
 Melihat permasalahan LPM  dicari solusi 

mentransformasi model asli sdmkn rupa shg
hasil prediksinya terletak antara (0,1)
 Caranya dengan fungsi probabilitas kumulatif
 Normal  normit atau probit
 Logistik  logit
Model Probit atau Normit
 Distribusi Normal
Zi = a + bXi
Zi
1
Pi  F  Z i   e
s2 2
ds
2 
Zi  tidak tersedia  didekati dengan “indeks” Zi*

Contoh: pemilihan ketua kelas
Zi > Zi*  calon I
Zi  Zi*  calon II
Zi*  variabel random  ~ dist. normal
S ~ N(0,1)
Zi = F-1(Pi) = a + bXi
F(Z)
LPM
1
PROBIT
Z
-3 -2 -1 0 1 2 3
Model Logit
 Probabilitas kumulatif  logistik
1 1
Pi  F  Z i   F  a  bX i    Zi
  a  bX i 
1 e 1 e

Pi 1  e  Zi
  P  Pe
i i
 Zi
1
 Zi 1  Pi Pi
e   e  Zi
Pi 1  Pi
 Pi  ni
Li  Z i  ln    a  bX i Pi =
 1  Pi  Ni
Contoh:
I (000 USD) Ni ni Pi Zi=F-1(Pi)
6 40 8 0.20 -0.84
8 50 12 0.24 -0.70
10 60 18 0.30 -0.38
13 80 28 0.35 -0.12
15 100 45 0.45 0.03
20 70 36 0.51 -0.52
25 65 39 0.60 0.25
30 50 33 0.66 0.40
35 40 30 0.75 0.67
40 25 20 0.80 0.84
Values of Cumulative probability
Functions
P(Zi) P(Zi)
Zi Dist. Normal Dist. Logistik
-3 0.0013 0.0474
-2 0.0228 0.1192
-1 0.1587 0.2689
0 0.5000 0.5000
1 0.6915 0.6225
2 0.9772 0.8808
3 0.9987 0.9526
y
0.8
1
y= –x
1+e
0.6
0.4
2
-0.5 – 0.5x
y = (2) e
0.2
-4 -2 2 4 x
Kurva distribusi normal dan kurva logit

kurva distribusi kumulatif
1.2
0.8
F(Z)
0.6
0.4
0.2
0
-4 -3 -2 -1 0 1 2 3 4
Z
probit logit
Tabel 15.1 (Gujarati, 2003)
FAMILY = Family
Y = Home Ownershi, where 1=Owns a House; 0= Does Not Own a House
X = Family Income, Thousands of $
FAM Y X FAM Y X FAM Y X
1 0 8 15 0 6 28 1 18
2 1 16 16 1 19 29 0 11
3 1 18 17 1 16 30 0 10
4 0 11 18 0 10 31 1 17
5 0 12 19 0 8 32 0 13
6 1 19 20 1 18 33 1 21
7 1 20 21 1 22 34 1 20
8 0 13 22 1 16 35 0 11
9 0 9 23 0 12 36 0 8
10 0 10 24 0 11 37 1 17
11 1 17 25 1 16 38 1 16
12 1 18 26 0 11 39 0 7
13 0 14 27 1 20 40 1 17
14 1 20
The LPM estimated by OLS
Dependent Variable: Y
Method: Least Squares
Sample: 1 40
Included observations: 40
Variable Coefficient Std. Error t-Statistic Prob.
C -0.945686 0.122841 -7.698428 0.0000
X 0.102131 0.008160 12.51534 0.0000
R-squared 0.804761 Mean dependent var 0.525000
Adjusted R-squared 0.799624 S.D. dependent var 0.505736
S.E. of regression 0.226385 Akaike info criterion -0.084453
Sum squared resid 1.947505 Schwarz criterion -9.31E-06
Log likelihood 3.689066 F-statistic 156.6336

Durbin-Watson stat 1.955187 Prob(F-statistic) 0.000000
Interpretasi hasil
 LPM  OLS
Yi = -0.945686 + 0.102131 Xi
(-7.698428) (12.51534)
R2= 0.804761
 b0 = -0.9457  b0 = 0 probabilitas rt dengan

pendapatan nol memiliki rumah adalah 0%
 b1 = 0.1021  untuk setiap unit kenaikan
pendapatan, rata-rata probabilitas memiliki rumah
naik 0.1021%
Interpretasi hasil
 Misal untuk X=12
estimasi probabilitas memiliki rumah adalah
-0.9457 + 0.1021 (12) = 0.2795
probabilitas rt dengan pendapatan $12000 adalah
28%
Terdapat nilai estimasi yang >1 atau <0

Kel. 1: Xi = 8  Yi = -0.12864
Kel. 7: Xi = 20  Yi = 1.09693
Interpretasi hasil
 LPM dengan wi Weighted Least-Squares
(WLS)

wi  Yî 1  Yî 
Yi
Yi 
*
wi
Xi
X *
i 
wi
The WLS
Dependent Variable: Y/SW
Sample(adjusted): 2 40
Excluded observations: 11 after adjusting endpoints
1/SW -1.245592 0.120555 -10.33211 0.0000
X/SW 0.119589 0.006852 17.45438 0.0000
R-squared 0.981050 Mean dependent var 2.191518
S.E. of regression 0.498942 Akaike info criterion 1.516095
Sum squared resid 6.472517 Schwarz criterion 1.611252
Log likelihood -19.22533 F-statistic 1345.999
 Nilai estimasi LPM yang bernilai >1 atau <0
dihilangkan dari observasi.
Yi/wi = -1.245592(1/wi)+0.119589(Xi/wi)
(-10.33211) (17.45438)
R2 = 0.981050
 Logit
Dibedakan atas 2 jenis data:
1. Data individu (data at individual or micro level)
2. Data grup (grouped or replicanted data)
Didasarkan pada data pada Tabel 15.4

(Gujarati, 2003: 598)
Dependent Variable: Y Logit dengan data individu
Method: ML - Binary Logit
Sample: 1 580
Convergence achieved after 3 iterations
Covariance matrix computed using second derivatives
Variable Coefficient Std. Error z-Statistic Prob.
C -1.602343 0.204034 -7.853317 0.0000
X 0.079066 0.010112 7.818651 0.0000
Mean dependent var 0.463793 S.D. dependent var 0.499118
Log likelihood -365.3014 Hannan-Quinn criter. 1.272422
Restr. log likelihood -400.5033 Avg. log likelihood -0.629830
LR statistic (1 df) 70.40395 McFadden R-squared 0.087894
Probability(LR stat) 0.000000
Obs with Dep=0 311 Total obs 580
Obs with Dep=1 269
Beberapa hal yang harus diperhatikan
 Estimasi dengan maximum likelihood

 Uji hipotesa koefisien dengan Z statistik 
dibandingkan dengan Tabel Z (normal)
 R2 tdk bisa digunakan
 McFadden R2(R2McF) = 1-(LLFur/LLFr)
LLFur = fungsi log likelihood unrestrik dimana semua
regresor dimasukkan dalam model
LLFr = fungsi log likelihood restrik dimana hanya
intersep yang dimasukkan dalam model
 Count R2 = jml prediksi benar/jml observasi
Beberapa hal yang harus diperhatikan
 Uji hipotesa bahwa semua koefisien sama
dengan nol  LR statistik  dibandingkan
dengan CS (df = jml variabel bebas)
 Interpretasi
b1= 0.079066 e 0.079066 = 1.0823
 rumah tangga yang mempunyai
pendapatan lebih tinggi mempunyai
probabilitas memiliki rumah 1.0823 kali lebih
tinggi
Logit dengan data grup  dengan weigted
Xi Ni ni pi 1-pi pi/(1-pi)
6 40 8 0.20 0.80 0.25
8 50 12 0.24 0.76 0.32
10 60 18 0.30 0.70 0.43
13 80 28 0.35 0.65 0.54
15 100 45 0.45 0.55 0.82
20 70 36 0.51 0.49 1.06
25 65 39 0.60 0.40 1.50
30 50 33 0.66 0.34 1.94
35 40 30 0.75 0.25 3.00
40 25 20 0.80 0.20 4.00
Lanjutan….
Li=ln(pi/(1-pi)) wi=Nipi(1-pi) wi L1i=Liwi X1i=Xiwi

-1.3863 6.40 2.5298 -3.5071 15.1789
-1.1527 9.12 3.0199 -3.4810 24.1595
-0.8473 12.60 3.5496 -3.0076 35.4965
-0.6190 18.20 4.2661 -2.6409 55.4599
-0.2007 24.75 4.9749 -0.9983 74.6241
0.0572 17.49 4.1816 0.2390 83.6318
0.4055 15.60 3.9497 1.6015 98.7421
0.6633 11.22 3.3496 2.2218 100.4888
1.0986 7.50 2.7386 3.0087 95.8514
1.3863 4.00 2.0000 2.7726 80.0000
Dependent Variable: L1
Sample: 1 10
SW -1.593238 0.111494 -14.28984 0.0000
X1 0.078669 0.005448 14.44122 0.0000
R-squared 0.963656 Mean dependent var -0.379142
Log likelihood -6.920086 F-statistic 212.1217
 Interpretasi Logit
 b = 0.078669  untuk setiap unit (1000$)
kenaikan income yang dibobot, kepemilikan

rumah yang dibobot akan naik 0.08 unit 
tidak lazim!
 Antilog fungsi regresi
Pi 1.593238 wi
e e 0.078669X1i
1  Pi
e0.078669 = 1.0818  untuk setiap unit
kenaikan income yang dibobot, kepemilikan
rumah naik 1.0818 atau 8.18%
 Menghitung probabilitas
Pada X=20($20000)
L1i = -1.593238wi + 0.078669 X1i = - 0.019858
dibagi wi =4.1816  - 0.004605
-0.004605 = ln (p/1-p)
e-0.004605 = p/1-p =0.9954
(1-p) 0.9954 = p
0.9954 = p(1.9954)
p = 0.9954 / 1.9954 = 0.4988
 rt dengan pendapatan $20000 mempunyai
probabilitas memiliki rumah 49.88%
 Probit (data grup)
Zi = -1.0088 + 0.0481 Xi
(-17.330) (19.105) R2 = 0.9786
 Probit (+5)  Ki = Zi+5

Ki = 3.9911 + 0.0481 Xi
(68.560) (19.105) R2 = 0.9786
Logit and Probit Models
Which model is preferable?
P
 The conditional probability
Pi approaches zero or 1
one at a slower rate in Probit
logit than in probit. Logit
 The logit model had
mathematically simplicity
 Has to be careful in
interpreting the coefficient
estimated by the two
model and not directly
comparable 0
The Tobit Model
 An extension of the probit model  develop by
James Tobin
 Example: home ownership  finding out the amount
of money a person or family spends on a house in
relation to socioeconomic variables.
 Problem: if a consumer does not purchase a house,
we have no data on housing expenditure. We have
data only on consumer who actually purchase a
house
The Tobit Model
 Consumers divided into two groups
 n1 consumers: we have information on the
regressors and the regressand  cencored
sample
 n2 consumers: we have information only on the
regressors but not on the regressand
 Also known as a cencored regression model,
limited dependent variable regression models
The Tobit Model
 We can express the Tobit model as
Yi  1   2 X i  ui if RHS > 0
 0 otherwise
 The OLS estimates of parameters obtained from
the subset of n1 observations will be biased as
well as inconsistent, they are biased even
asymptotically.
 How does estimate the Tobit model?
How does estimate the Tobit model?
 The method of maximum likelihood

 Comparative simple  James Heckman
Consists of a two-step estimating:
1. Estimate the probability of a consumer owning a
house, which is done on the basis of the probit
model
2. Estimate the tobit model by adding to it a variable
that is derived from the probit estimate  the
inverse Mills ratio or the hazard rate
 The Heckman procedure yields consistent
estimates of the parameters, but they are
not as efficient as the ML estimates.
 Illustration of the Tobit Model: Ray Fair’s
Model of Extramarital Affairs.
 Ray Fair collected a sample of 601 men and
women then married for the first time and
analyzed their responses to a question about
extramarital affairs.
 451 individuals had no extramarital affairs, and
150 individuals had one or more affairs
The variables used in that study are defined as follows:
 Y = number of affairs in the past year
 Z1 = 0 for female and 1 for male
 Z2 = number of years married
 Z3
 Z4 = 0 if no children and 1 if children
 Z5 = religiousness on a scale 1 to 5, 1 being antireligion
 Z6 = education, years: grade school=9; high school=12;
PhD or other=20
 Z7 = occupation, “Hollingshead” scale, 1-7
 Z8 = self rating of marriage, 1=very unhappy, 5=very happy
Modeling Count Data: the Poisson
Regression Model
 The regressand is of the count type, such as:
 the number of vacations taken by a family per year, the
number of patents received by a firm per year, the number
of visits to the dentist or doctor per year, the number of
visits to a grocery store per week, the number of days
stayed in a hospital in a given period, etc
 The variable is discrete
 The probability distribution is suited for count data
is the Poisson probability distribution
Regression Model
 The pdf of the Poisson distribution is
 Y e 
f  Yi   Y  0,1, 2,...
Y!
where f  Yi   the probability that the variable Y takes
non-negative integer values.
It can be proved that
EY  
var  Y   
Its variance is the same as its mean value
Regression Model
 The Poisson regression model:
Yi  E  Yi   ui  i  ui
where the Y’s are independently distributed as
Poisson random variables with mean I
i  E  Yi   1   2 X 2i   3 X 3i  ...   k X ki
where the X’s are some of the variables that might
affect the mean value.
Regression Model
 For estimation purposes, we write the model as
 Y e 
Yi   ui
Y!
 Example: The data related to 100 individuals 65
years of age and older. The objective of the
study was to record the number of falls (Y) in
relation to gender (X2=0 female and 1 for male),
a balance index (X3/+) and a strength index
(X4/+), and intervention variable (X1=0 education
and 1 for education plus aerobic exercise
training)
Table 15.18
Dependent Variable: Y
Sample: 1 100
Convergence achieved after 7 iterations
Y=EXP(C(0)+C(1)*X1+C(2)*X2+C(3)*X3+C(4)*X4)
Coefficient Std. Error t-Statistic Prob.
C(0) 0.37020 0.34590 1.0701 0.2873
C(1) -1.10036 0.17050 -6.4525 0.0000
C(2) -0.02194 0.11050 -0.1985 0.8430
C(3) 0.01066 0.00270 3.9483 0.0001
C(4) 0.00927 0.00414 2.2380 0.0275
R-squared = 0.4857 Adjusted R-squared = 0.4640
Log likelihood = -197.2096 Durbin-Watson statistic = 1.7358
Interpretation of result
 We have obtained in Table 15.18 is the estimated
mean value for the ith individual, that is:
0.3702 1.10036 X1i  0.02194 X 2 i  0.0106 X 3 i  0.00927 X 4 i

uî  e
 Example:
Subject 99 had these values: Y=4; X1=0; X2=1;
X3=50; and X4=56. We obtain
as the estimated mean value for the 99th subject.
uˆ99  3.3538
Interpretation of results
 If we want to find out the probability that a
subject similar to subject 99 has less than 5 falls
per year, we can obtain it as follows:
P  Y  5   P  Y  0   P  Y  1  P  Y  2   P  Y  3   P  Y  4 
 3.3538   3.3538   3.3538 
0 3.3538 1 3.3538 2
e e e 3.3538
  
0! 1! 2!
 3.3538 e3.3538  3.3538  e3.3538
3 4
 
3! 4!
 0.7491
 We can also find out the marginal effect of a
regressor on the mean value of Y. Suppose we
want to find out the effect of a unit increase in
the strength index (X4) on mean.
 C0  C1 X1i  C2 X 2 i  C3 X 3 i  C4 X 4 i
 C4 e  C4 
X 4
 The intercept and variable X2 are individually
statistically insignificant.
Concluding:
 The model makes restrictive assumptions in that
the mean and the variance of the Poisson

process are the same and that the probability of
an occurrence is constant at any point in time
 Generally, the result of all nonlinear iterative
estimating procedures have validity in large

samples only
Further topics in qualitative response
regression models
1. Ordinal Logit and Probit Models
 The response variable can have more than two
outcomes and these outcomes are ordinal in
nature (such as a Likert-type scale or
categories in order).
 Ordinal scales  ranking among the
categories (multiple ranked categories)
 Example: 1=high school education; 2=college
education; and 3=postgraduate education
regression models
2. Multinomial Logit and Probit Models
 The regressand is categories unordered or no
ranking  nominal categories
 Example:
 the choice of transportation mode to work (bicycle,
motorbike, car, bus, or train)
 Occupational classifications (unskilled, semiskilled, or
highly skilled)
 Occupational choice (self-employed, working for
private firm, or working for a government)
regression models
3. Duration Models
 Survival analysis or time-to-event data analysis
 The key variable: the length of time or spell
length, which is modeled as a random variable.
 Example:
 what determines the duration of unemployment
spells?
 What factors determine the duration of a strike?

Ch15 Qualitative Response Regression Models

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ch15 Qualitative Response Regression Models

Diunggah oleh

Hak Cipta:

Format Tersedia

Qualitative Response

1. Model Pilihan Ganda (Binary-Choice Models)

E(YiXi) = a + bXi + E(ei)  E(ei)=0

E(ei) = (1-a-bXi) pi + (0-a-bXi) (1-pi) = 0

E(ei2) = (1-a-bXi)2 pi + (0-a-bXi)2 (1-pi)

2 = E(Yi) (1-E(Yi))  heteroskedastisitas

Over estimated slope Fitted line

Under estimated line

1. Distribusi binomial  bukan distribusi normal

 diatasi dengan weighted least square (WLS)

 Melihat permasalahan LPM  dicari solusi 

Zi  tidak tersedia  didekati dengan “indeks” Zi*

Kurva distribusi normal dan kurva logit

Log likelihood 3.689066 F-statistic 156.6336

 b0 = -0.9457  b0 = 0 probabilitas rt dengan

Terdapat nilai estimasi yang >1 atau <0

Didasarkan pada data pada Tabel 15.4

 Estimasi dengan maximum likelihood

Li=ln(pi/(1-pi)) wi=Nipi(1-pi) wi L1i=Liwi X1i=Xiwi

kenaikan income yang dibobot, kepemilikan

 Probit (+5)  Ki = Zi+5

 The method of maximum likelihood

0.3702 1.10036 X1i  0.02194 X 2 i  0.0106 X 3 i  0.00927 X 4 i

the mean and the variance of the Poisson

estimating procedures have validity in large

Anda mungkin juga menyukai