Anda di halaman 1dari 26

# Binary Choice Models

1. Binary Dependent Variables 2. Probit and Logit Regression 3. Maximum Likelihood estimation 4. Estimation Binary Models in Eviews 5. Measures of Goodness of Fit 6. Other Limited Dependent Variable Models 7. Exercise

## Binary Dependent Variables

The variable of interest Y is binary. The two possible outcomes are labeled as 0 and 1. We want to model Y as a function of explanatory variables X = (X1 , . . . , Xp ). Example: Y =employed (1) or unemployed (0); X =educational level, age, marital status, ... Example: Y =expansion (1) or recession (0); X =unemployment level, ination, ...

## Can we still use linear regression? Then

E [Y |X ] = 0 + 1 X1 + . . . p Xp
and the OLS tted values are given by

0 + 1 X1 + . . . p Xp . = Y
! Problem: the left hand side of the above equations takes values between 0 and 1, while the right hand side may take any value on the real line. Note that

## E [Y |X ] = 0.P (Y = 0|X ) + 1.P (Y = 1|X ) = P (Y = 1|X )

The conditional expected values are conditional probabilities

data cloud

0.8

Sshaped fit

0.6 y

Linear Fit
0.4

0.2

0.2

5 x

10

## Binary regression model:

P (Y = 1|X ) = F (0 + 1 X1 + . . . p Xp )
with

## 1 (a) F (u) = Logit 1 + exp(u)

(b) F (u) = (u), standard normal cumulative distribution function Probit (c) ...

data cloud

0.8

Probit
0.6 y

Logit

## 0.4 Difference is small; Probit function is steeper. 0.2

0.2

5 x

10

Interpretation of parameters

dP (Y = 1|X ) = 1 f (0 + 1 X1 + . . . p Xp ) dX1
with f = F always positive. Marginal eects are non constant, dierent for each value of X . Sign of 1 Sign of marginal eect Marginal eects can be summarized by evaluating at the average . value X = X At Logit model: interpretation in terms of Odds-Ratio (OR)

P ( Y = 1 |X ) log OR = log( ) = 0 + 1 X1 + . . . p Xp P ( Y = 0 |X )

## 0 + 1 xi1 + . . . p xip ). P (Y = 1|X = xi ) = F (

Set y i = 1 if P (Y = 1|X = xi ) > 0.5 and zero otherwise. (Other cut-o values than 0.5=50% are sometimes taken)

## Maximum Likelihood Estimation (MLE)

General principle: let L( ) be the likelihood or joint density of the observations y1 , . . . , yn , depending on an unknown parameter
n

L( ) =
i=1

f (yi , )

## (assumes independent observations)

is the maximizing L( ): Then the maximum likelihood estimator = argmax log L( ) = argmax
n

log f (yi , )
i=1

). Denote Lmax = L(

MLE for Bernoulli Variables Let yi be the outcome of a 0/1 (failure/success) experiment, with p be the probability of success. Then f (1, p) = p and f (0, p) = 1 p, hence f (yi , p) = pyi (1 p)1yi The MLE p maximizes
n

i=1 n i=1

## It is not dicult to check that p = successes in the sample.

1 n

yi , the percentage of

MLE for Probit Model We will condition on the explanatory variables; hence keep them xed.
1 yi i f (yi , pi ) = py (1 p ) with pi = (0 + 1 Xi1 + . . . p Xip ) i i

n

## {yi log (0 + 1 Xi1 + . . . p Xip )+(1 yi ) log(1 (0 + 1 Xi1 + . . . p Xip ))}.

i=1

The MLE needs to be computed using a numerical algorithm on the computer. (Similar for Logit model)

If the model is correctly specied, then 1. MLE is consistent and asymptotically normal. 2. MLE is asymptotically the most precise estimator, hence ecient. 3. Inference (testing, condence intervals) can be done. If the model is misspecied, then the MLE may loose the above properties.

## Estimation Binary Models in Eviews

Example: Deny =application for mortgage denies (1) or accepted (0) Sample of 2380 applicants, Boston. Explanatory variables: black: dummy race variable, 1 if applicant is black, 0 otherwise pi rat: ratio of monthly loan payments to monthly income married: 1 if married, 0 otherwise ltv med: 1 if loan to value ratio is medium (between 80% and 95%). ltv high: 1 if loan to value ratio is high (above 95%). (loan to value ratio below 80% is the reference category)

(1) We rst regress deny on a constant, black and pi rat. In Eviews, we specify within the equation specication, Estimation Settings: Method: BINARY-Binary Choice and select logit.

Both explanatory variables are highly signicant. They have a positive eect on the probability of deny, as expected. They are also jointly highly signicant (LR stat =152, P<0.001). The pseudo R-squared is pretty low (R2 = 0.08). Below some descriptives (Categorical regressor stats) :

## Predictive accuracy: (Expectation-prediction table)

88% is correctly classied, with a sensitivity of only 4.2% and a specicity of 99.7%. The gain is only 0.25 percentage points w.r.t. a majority forecast (i.e. all applications are accepted). (2) Repeat the analysis, now with all predictor variables.

Measures of Fit

Pseudo R-squared Compare the value of the likelihood of the full model with an empty model: M(full): P (Y = 1|X ) = F (0 + 1 X1 + . . . + p Xp ) M(empty): P (Y = 1|X ) = F (0 )

## LR = 2{log Lmax (F ull) log Lmax (Empty )}

We reject H0 for large values of LR.

The LR statistics can be used to compare any pair of two nested model. Suppose that M1 is a submodel of M2 , and we want to test H0 : M1 = M2 . Then, under H0 :

## LR = 2{log Lmax (M 2) log Lmax (M 1)} 2 k,

where k is the number of restrictions (i.e. the dierence in number of parameters between M 2 and M 1).

## In practice, we work with the P-value. For example, if k = 4 and LR = 7.8

density of chisquared distribution with 4 degrees of freedom 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 Pvalue=0.092

LR=7.8

## Percentage correctly predicted This is a measure of predictive accuracy, and dened as

1 n

I (yi = y i )
i=1

This is an estimate of the error rate of the prediction rule. [This estimate is over optimistic, since based on the estimation sample. It is better to compute it using an out-of-sample prediction.]

## Other Limited Dependent Variable Models

Censored regression models Examples: Car expenditures, Income of Females, ... The value zero will often be observed. Mixture of a discrete (at 0) and continuous variable (Tobit models) Truncated regression models Data above of below a certain threshold are unobserved or censored. These data are not available. We have a selected sample.

Count data Examples: number of strikes in a rm, number of car accidents, number of children Poisson type models Multiple choice data Examples: mode of transport Multinomial logit/probit Ordered response data Examples: educational level, credit ratings (B/A/AA/...) Ordered probit

Exercise

We will analyse the data in the le grade.wf1. We have a sample of students and we want to study the eect of the introduction of a new teaching method, called PSI. The dependent variable is GRADE, indicating whether students grade improved or not after the introduction of the new method. The explanatory variables are PSI: a binary variable indicating whether the student was exposed to the new teaching method or not. TUCE: the score on a pretest that indicates entering knowledge of the material to be taught. We will now run a LOGIT-regression of GRADE on a constant, PSI and TUCE.

1. Why do we add TUCE to the regression model, if we are only interest in the eect of PSI? 2. Interpret the estimated regression coecients. 3. Take a student with TUCE=20. (a) Estimate the probability that he will increase his grade if he follows the PSI-method. (b) What is this probability to increase his grade if he will not follow this PSI-method. (c) Will this student improve his grade, if PSI=1? (d) Compute the log odds-ratio (for improving the grade or not) for this student once for PSI=1 and once for PSI=0. Compute the dierence between these two log-odds ratios. Compare with the regression coecient of PSI. 4. Compute the percentage of correctly classied observations and comment (you can use View/Expectation-Prediction table). 5. The output shows the value LR statistic? How is this value computed? 6. Run now a PROBIT regression. Is there much dierence between the estimates? And for the percentage of correctly classied observations?

References: Greene, W.H., Econometric Analysis, 5th edition (2003) Prentice Hall. Stock, J.H., Watson, M.W., Introduction to Econometric, 2nd edition (2007) Pearson.