) .. exp( 1
1
1 2 2 1 1 ik i i
i
x x x
p
| | | o +
=
4
Why can linear regression work reasonable well on binary dependent
variables ?
2
i
i Var o c = ) (
i i i bx a y c + + =
Noraml
i
~ c
0 ) , ( =
j i
Cov c c
0 ) ( =
i
E c
Assumptions Consequence of violations Notes
1
Biased parameter estimates
Parameters meaning hard to interpret,
except a linear approximation to
nonlinear functions. Prediction can be
<0, >1
2
Biased intercept estimate
3
Unbiased estimates but biased Variance of
Biased confidence interval
4
Same as 3
5
Unable for us to use t , F statistical tests for
regression models.
The estimates may still be normal if
sample size is large.
^
b
^
b
If 1) and 2) are true, it can be shown that 3) and 5) are necessarily false. However, the consequences
may not be as serious as you expect.
5
Logistic regression for binary response variables
Basic Syntax:
proc logistic data=chdage1 outest=parms descending;
model chd = age /
selection = stepwise
ctable pprob = (0 to 1 by 0.1)
outroc=roc1;
proc score data=chdage1 score = parms out=scored type=parms;
var age;
run;
In the events/trials syntax, you specify two variables that contain count data for a binomial experiment. These two variables
are separated by a slash. The value of the first variable, events, is the number of positive responses (or events). The value of
the second variable, trials, is the number of trials.
6
Interpretation of SAS output - continued
Model Selection Criteria:
Convergence - difference in parameter estimates is small enough.
Model Fit Statistics Criteria:
Likelihood Function:
2 * log (likelihood )
AIC = 2 * log ( max likelihood ) + 2 * k
SIC = 2 * log ( max likelihood ) + log (N) * k
Testing Global Null Hypothesis: BETA=0
Likelihood ratio: ln(L
intercept
)- ln(L
int + covariates
),
Score: 1
st
and 2
nd
derivative of Log(L)
Wald: (coefficient / std error)
2
i
i
y y
i
n
i
i
p p L
=
=
[
1
1
) 1 (
7
Interpretation of SAS output - continued
Analysis of Maximum Likelihood Estimates
Parameter estimates and significance test
Odds Ratio Estimates
Odds:
Odds ratio: O
i
/ O
j
per unit change in covariate.
Association of Predicted Probabilities and Observed Responses
Pairs: 43 (event) * 57 (non event) = 2451
Concordant (0- lower prob vs. 1- higher prob)
Discordant (0- higher prob vs. 1- lower prob)
Tie all other
ROC used to visualize model model prediction strength.
) exp(
0
ij j
k
j
i
x O |
[
=
=
8
Interpretation of SAS output - continued
Classification Table:
The model classifies an observation as an event if its estimated probability is greater than
or equal to a given probability cutpoints.
Percentages (%)
Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG
0 57 0 43 0 57 100 0 43 .
0.1 57 1 42 0 58 100 2.3 42.4 0
0.2 55 7 36 2 62 96.5 16.3 39.6 22.2
0.3 51 19 24 6 70 89.5 44.2 32 24
0.4 50 25 18 7 75 87.7 58.1 26.5 21.9
0.5 45 27 16 12 72 78.9 62.8 26.2 30.8
0.6 41 32 11 16 73 71.9 74.4 21.2 33.3
0.7 32 36 7 25 68 56.1 83.7 17.9 41
0.8 24 39 4 33 63 42.1 90.7 14.3 45.8
0.9 6 42 1 51 48 10.5 97.7 14.3 54.8
1 0 43 0 57 43 0 100 . 57
Tot
Correct /
Total
Correct
Event/ Tot
Event
Correct
N.Event/ Tot
N.Event
F.Pos /
(F.Pos+Pos)
F.Neg /
(F.Neg+Neg)
Item a b c d
(a+b) /
(a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)
Correct Incorrect
9
Logistic regression for polychotomous response variables
Example: Three outcomes
The cumulative probability model
The assumption:
A common slope parameter associated with the predictor.
2 1 3
2
1
1 ) 3 (
) 2 (
) 1 (
p p x Y pr p
x Y pr p
x Y pr p
= = =
= =
= =
x
p p
p p
x
p
p
| o
| o
+ =
+
+ =
2
2 1
2 1
1
1
1
)
1
log(
)
1
log(
10
Logistic regression for polychotomous response variables
Examples:
proc logistic data=diabetes descending;
model group=glutest;
output out=probs predicted=prob xbeta=logit;
format group gp.;
run;
11
Other SAS Procedures for Logistic Regression Models
Proc Model Options Notes
Logistics Event/Trial format only works for
binary Response
GENMOD Dist = Binormial
Link=logit
One of General linear models
PROBIT Dist=Logistic
CATMOD
Proc catmod data=diabetes;
direct glutest;
response logits / out =
cat_prob;
model group = glutest;
run;
Allow for individual parameters:
PHREG
proc phreg data=diabetes;
model t * group = glutest;
run;
Trick: events occur at time one, non
events occur at a later times (censored).
* t is dummy time var,
group is censoring var
x
p
p
x
p
p
2 2
3
2
1 1
2
1
) log(
) log(
| o
| o
+ =
+ =
12
References
Hosmer, D.W, Jr. and Lemeshow, S. (1989), Applied Logistic Regression,
New York: John Wiley & Sons, Inc.
SAS Institute Inc. (1995), Logistic Regression Examples Using the SAS
System, Cary, NC: SAS Institute Inc.
Paul D. Allison (1999) Logistic Regression Using the SAS System: Theory and
Application, BBU Press and John Wiley Sons Inc.