Anda di halaman 1dari 32

Conditional logistic regression

(AS02)
EPM304 Advanced Statistical Methods in Epidemiology

Course: PG Diploma/ MSc Epidemiology

This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0

Section 1: Conditional logistic regression


Aim
To learn how to use regression methods to analyse data from matched casecontrol studies.
Objectives

By the end of this session you will be able to:

fit
fit
fit
fit
fit

and
and
and
and
and

interpret
interpret
interpret
interpret
interpret

a model with a single binary variable


a model with effect modification by the matching variable
a model with an exposure having more than 2 levels
models with more than 1 exposure
models for data with more than 1 control per case

This session should take you between 1.5 and 2 hours to complete.

Section 2: Planning your study


In this session you will learn about the regression method for matched case-control
data. This is called conditional logistic regression.
To work through the material you should be familiar with multivariable regression,
specifically logistic regression, and matched case-control studies. Students who
completed the core units and EP202 can refer to the sessions listed opposite.
Occasional students should refer to their preferred text for these subjects together
with session AS02 on matched case-control studies.
The approaches to fitting and interpreting regression models that have already been
presented for (unconditional) logistic regression are used here in exactly the same
way as before.
SC13

Matched analysis for paired


binary data
Logistic regression

SM07, SM08, SM09

Matched case-control
studies

AS02

Interaction: Hyperlink: SC13:


Opens new window of SC13.
Interaction: Hyperlink: SM07
Opens new window of SM07
Interaction: SM08:

Opens new window of SM08


Interaction: SM09:
Opens new window of SM09
Interaction: AS02
Opens new window of AS02

2.1: Planning your study


The method of conditional logistic regression is illustrated using data from a matched
case-control study carried out in the state of Rio Grande do Sul, Southern Brazil. The
study was carried out to assess potential risk factors for infant death from diarrhoea.
This is the same data used to illustrate the classical analysis of matched case-control
studies in session AS02.
Click below for details of the study.
Interaction: Button: Details (pop up box appears):
Brazilian case-control study
A matched case-control study to investigate potential risk factors for infant death
from diarrhoea was conducted in Southern Brazil (Victora et al. 1989). In this study,
each infant dying from diarrhoea at less than 1 year of age (a case) was linked with
2 neighbourhood controls matched on age to within one month. This procedure was
designed to provide a control group with a similar age and socio-economic
distribution to that of the cases. Information on social and environmental factors,
birth weight, and type of feeding was collected on 170 cases and 340 controls.
To illustrate the methods for paired sets the case is matched to one of the controls,
the one that was closest in age. The data are further restricted to case-control pairs
where both the case and control were in the same age group (0-2 months, 3-5
months, 6 months).

Section 3: What is conditional logistic regression?


Interaction: Tabs: 1:
In session AS02 you saw that it is essential for a study with a matched design to
have a matched analysis. If the analysis is not matched the true association is
underestimated. This also applies with the regression analysis of matched data.
You will be familiar with logistic regression analysis used to model data from
unmatched case-control studies. The corresponding regression model for matched
case-control data is called conditional logistic regression (CLR).
Interaction: Tabs 2:
These two regression methods are similar except that the second one takes account
of the case-sets; i.e. the estimates are conditional on the cases being linked to the
controls.
Conditional logistic regression is required for individually matched studies.
Instead of individual matching, some studies use frequency matching, which involves
choosing the cases and controls to have the same overall distribution of the
matching factor. For example, equal numbers of cases and controls may be selected
in each age group. In this design, each age group can be regarded as a matched set,
and since there are usually substantial numbers of cases and controls in each set, it

is generally acceptable to use unconditional (ordinary) logistic regression for such


studies, with the matching factor (e.g. age group) included in the model.

3.1: What is conditional logistic regression?


So what does conditional logistic regression do that makes it different from logistic
regression?
The simplest way to describe this is to start with an (unconditional) logistic model.
If we were to use this model for matched data, we would have to somehow account
for the matching within an unconditional logistic model. This could be done by
including a parameter for every case-control set. Click below for an example.
Interaction: Button: Example (card appears on RHS):
Imagine a model with a single binary exposure. To account for the matching (strata
defined by the case sets) in logistic regression we could include a variable set.
If there were N case-control sets, with set 1 the baseline group, then the
corresponding logistic model would be as follows:
Logistic model with single binary exposure and matched case-control sets

log odds = + 1 exposure + 2 set 2 + 3 set 3 + N set N

3.2: What is conditional logistic regression?


As you can see, such a model would result in many parameter estimates: and 1
to N(i.e. the constant, an estimate for each case-set except for "set 1", and the
exposure effect). We are only really interested in the estimate of the exposure effect.
The many estimates for each set are of no interest.
More importantly, including the pairing as a factor in the regression model does not
solve the problem. In particular, when one control has been matched to each case,
the odds ratio incorrectly estimated by a logistic regression performed in this way is
the square of the correct odds ratio, which should be obtained from the use of
conditional analysis. The bias is most extreme when the cases and controls form
matched pairs.
Logistic model with single binary exposure and matched case-control sets

log odds = + 1 exposure + 2 set 2 + 3 set 3 + N set N

3.3: What is conditional logistic regression?


In conditional logistic regression, we formulate the likelihood for the model in a
special way called conditional likelihood. This means that cases are only compared to
controls in the same matched set. Therefore we are able to eliminate the estimates
for each case-set 2 to N from the model.
Another consequence is that for a conditional likelihood, we do not estimate the
constant term , and therefore only estimate the exposure effect we are interested
in (e.g. 1 in the above model). We estimate the exposure effect by comparing
cases and controls, within each matched set, for their exposure values.
Go on to the next card to see how this is done.

3.4: What is conditional logistic regression?


This is done by considering permutations of the values of the exposure variable or
variables between the members of the matched set. For example, for a matched set
with a case-control pair there are 4 permutations, which we here label as 1., 2., 3.,
and 4.:
1.
2.
3.
4.

case
case
case
case

exposed : control exposed


exposed : control not exposed
not exposed : control exposed
not exposed : control not exposed

The conditional likelihood works out the likelihood of each permutation. The log
(conditional) likelihood is then multiplied by the number of matched sets with that
permutation.

Conditional
likelihood
L1

n1

L2

n2

L3

n3

L4

n4

Permutation

Frequency

Total log likelihood = n1log(L1) + n2log(L2)+ n3log(L3)+ n4log(L4)

3.5: What is conditional logistic regression?


In fitting a conditional logistic model it is assumed that the relationship between the
disease and the exposure is the same for all case-sets.
Using conditional likelihood estimation the constant and the set terms cancel out in
the calculation. These terms are therefore no longer estimated in the regression.
Conditional logistic regression

We will present each model as


log odds = constant + set + exp
to indicate that the constant terms are eliminated by the conditional likelihood
estimation.

3.6: Summary
1. Logistic regression is used to model case-control data, person-by-person.
Conditional logistic regression is used to model matched case-control data, set-byset (pair-by-pair in the special case of a pair-matched case-control study).
2. When a conditional logistic model is fitted, the constant within each case-set is
eliminated using conditional likelihood.

3. This conditional likelihood accounts for the fact that the case and control(s) are
linked, and uses only case-control sets in which the exposure status of the case is
different to 1 control.
4. The only estimates that are produced by this model are the effects of the
exposures and/or other variables in the model.
5. There is no constant term in the output from a conditional logistic regression
model.
6. The interpretation of parameter estimates is identical to that of unconditional
logistic regression. And as with unconditional logistic regression, the effects of the
exposure variables are measured using odds ratios.

Section 4: Conditional logistic regression with a single binary


exposure
Lets consider an example with a single binary exposure and one control per case,
i.e., matched pairs.
In the Brazilian study of infant mortality due to diarrhoea the effect of low
birthweight was examined. The table below shows the results for this association for
the 86 case-control pairs.
OR = 25 / 18 = 1.39
(95% CI: 0.76 to 2.55)
Matched table for association between birth weight and infant
diarrhoea mortality
Control
< 3.0 kg
> 3.0 kg
< 3.0 kg
12
25
Case
> 3.0 kg
18
31

4.1: Conditional logistic regression with a single binary


exposure
We can also obtain the results shown below using conditional logistic regression.
The corresponding model is:
Log odds =

constant + set + 1 Bwt1

The parameter Bwt1, given in the model, estimates the effect of low birthweight
compared to high birthweight.

Bwt0
Bwt1

> 3.0 kg
< 3.0 kg

Click swap to see the output from this model below.


Interaction: Button: Swap (table on centre bottom changes to the following):
Conditional logistic regression for birthweight - on a log scale
Standa
Coeffic
95% confidence
rd
z
P > |z|
ient
limits
error
Bwt1
0.3285
0.3091
1.063
0.288
0.277
0.934

4.2: Conditional logistic regression with a single binary


exposure
This is the standard output for a conditional logistic regression model on a log scale.
What do you need to do to obtain the corresponding odds ratio for the effect of
birthweight?
Interaction: Button: clouds picture (pop up box appears):
To obtain the odds ratio for the effect of birthweight you should take the exponential
of the parameter estimated in the model (the coefficient).
Calculate the odds ratio for the effect of low birthweight on infant mortality due to
diarrhoea, giving your answer to 4 decimal places.
Odds ratio =
Interaction: Calculation: Odds ratio = ___:
Correct Response 1.3889:
Correct
Yes, the odds ratio is given by
exp(0.3285) = 1.3889
Incorrect Response:
Incorrect
Remember that the odds ratio is equal to the exponential of the coefficient, so:
Odds ratio = exp(0.3285) = 1.3889
Conditional logistic regression for birthweight - on a log scale
Standa
Coeffic
95% confidence
rd
z
P > |z|
ient
limits
error
Bwt1
0.3285
0.3091
1.063
0.288
0.277
0.934

4.3: Conditional logistic regression with a single binary


exposure
Notice that the results from the model are identical to the results from the classical
methods. Use the swap button below to compare them. These exact equalities occur
only with matched pair data. Finally, note that there is no constant term on the log
scale, the reason for which was given earlier.
In conditional logistic regression, estimation and hypothesis tests are applied in the
same way as with other regression models.
Interaction: Button: Swap:
Matched table for association between birth weight and infant
diarrhoea mortality
Control
< 3.0 kg
> 3.0 kg
Case
< 3.0 kg
12
25
> 3.0 kg
18
31
OR = 25/18 = 1.39
(95% CI 0.76 to 2.55)
Examine the results for the conditional logistic regression below, and answer the
following questions:
Does the result of the Wald test provide evidence against the null hypothesis that
there is no effect of birthweight on infant mortality due to diarrhoea?
How would you interpret the confidence interval for the odds ratio?
Interaction: Pulldown: Does the result of the Wald test provide evidence against the
null hypothesis that there is no effect of birthweight on infant mortality due to
diarrhoea? (Yes/No):
Correct Response No:
Correct
Thats right, the P-value for the Wald test is 0.288, which is quite high. We have no
evidence against the null hypothesis of OR = 1.
Incorrect Response Yes:
In fact, the Wald test gives a P-value of 0.288, as you can see in the CLR results
below. This is quite high. We have no evidence against the null hypothesis of OR =
1.
OR = 25 / 18 = 1.39
(95% CI: 0.76 to 2.55)
Interaction: Button: clouds picture (pop up box appears):

You can be 95% confident that the true OR lies between 0.758 and 2.546.

4.4: Conditional logistic regression with a single binary


exposure
You can also apply a likelihood ratio test to this model. This would assess the
contribution of birthweight to the fit of the model. In other words, the
likelihood ratio test will assess the strength of the evidence for an association
between birthweight and infant mortality due to diarrhoea.
How do we carry out a LRT to test birth-weight?
Interaction: Button: clouds picture (card appears on right handside):
The likelihood ratio statistic (LRS) is calculated as:
LRS = 2(L1 L0)
where,
L1 is the log likelihood of the model with
the exposure variable
L0 is the log likelihood of the model
without the exposure variable
The LRS is then referred to the 2 distribution. The number of degrees of freedom is
equal to the number of parameters excluded from the model.

4.5: Conditional logistic regression with a single binary


exposure
To carry out a likelihood ratio test for birthweight, compare the log likelihood of this
model with the log likelihood of the simpler model without birthweight. Examine the
models below, then click the LRT button to do the test.
Interaction: Tabs: Model 1:
Log odds = constant + 1 Bwt1

Log likelihood = 59.0383


Interaction: Tabs: Model 2:

Log odds =

constant + set

Log likelihood = 59.6107


Interaction: Button: LRT (card appears on right handside):
LRS = 2(L1 L0 )
= 2(59.0383 (59.6107)) = 1.145
The corresponding P-value for this LRS is 0.288.
Note: This is equal to the Wald test since we have a single binary exposure, so we
are assessing the evidence for the OR being different to 1 for a single odds ratio (i.e.
a single parameter). You can see this in the table of model estimates that you saw
earlier on this page.

Section 5: Interaction with a matching variable


If a variable is used for matching cases and controls you cannot examine the effect
of that variable on the odds of disease. This is because within a case-set the value of
the matching variable is the same because they were matched on this!
You can, however, look for interaction between a matching variable and exposure.

5.1: Interaction with a matching variable


To assess for interaction with a matching variable, additional terms that represent
interactions are included in the conditional logistic regression model, but the main
effects for a matching variable are excluded. This is shown opposite, for a binary
exposure variable and a binary matching variable.
Note: If there were more than 2 levels in the exposure variable, or in the matching
variable, you would need more interaction terms in the model. For example, for a
matching variable with 3 levels, and a binary exposure variable, we would have an
extra term Exp1.Exp3, where Exp3 represents the third category of the matching
variable.
Conditional logistic model with interaction term for a matching variable
Log odds =

constant + set

+ 1 Exp1 + 2 Exp1.Exp2

Where,
Exp1 is the exposure of interest and
Exp2 is the matching variable

5.2: Interaction with a matching variable

The 3 tables presented opposite show data from the Brazilian study. The matched
table for exposure to low birthweight is stratified by the 3 levels of the matching
variable age.
Click swap to see a table of discordant pairs and test for effect modification.
Interaction: Tabs: Agegroup0:
02 months

Case

< 3.0 kg
> 3.0 kg

Control
< 3.0 kg

4
6

> 3.0 kg

7
8

OR1 = 1.17
Interaction: Tabs: Agegroup1:
35 months

Case

< 3.0 kg
> 3.0 kg

Control
< 3.0 kg

4
7

> 3.0 kg

12
15

OR2 = 1.71
Interaction: Tabs: Agegroup3:
> 6 months

Case

< 3.0 kg
> 3.0 kg

Control
< 3.0 kg

4
5

> 3.0 kg

6
8

OR3 = 1.20
Interaction: Button: Swap (right handside card changes to text below):
Table of discordant pairs
02
Case < 3.0,

Age group (months)


35
7

12

>6
6

control > 3.0


Case > 3.0,
control < 3.0

X = 0.35; P = 0.84 on 2 d.f.


Note: There was no evidence for interaction using classical methods, and the odds
ratios across the 3 age-groups are similar:
OR1 = 1.17; OR2 = 1.71; OR3 = 1.20.

5.3: Interaction with a matching variable


The results of a conditional logistic model with birthweight and the interaction with
age-group are shown below. Birth weight is coded 0 and 1, with the baseline group
coded 0. Age group is coded 0, 1, and 2 and the baseline group is coded 0. Notice
there is no main effect for age-group (because cases and controls were matched on
age group, so it is not possible to assess the effect of age group on the odds of
diarrhoea). Click swap to change from the log scale to the odds ratio scale.
Interaction: Button: Swap (graph on centre bottom changes to the following):
CLR birthweight and interaction with age-group - OR scale
Odd
Standa
s
95% confidence
rd
z
P > |z|
rati
limits
error
o
Bwt1
1.16
0.649
0.27
0.782
0.392
3.471
7
7
Bwt1.Agegrp
1.46
1.075
0.52
0.599
0.350
6.168
1
9
6
Bwt1.Agegrp
1.02
0.846
0.03
0.973
0.205
5.154
2
9
4
What are the odds ratio estimates for the effect of birthweight in the 3 age-groups,
to 3 decimal places?
Age-group 0 to 2 months
Age-group 3 to 5 months
Age-group 6 to 11 months
Interaction: Calculation: Age-group 0 to 2 months:
Correct Response 1.167:
Correct
Thats right, the Bwt1 OR represents the effect of birthweight in the baseline group
of age. Therefore the odds ratio for birthweight in age group 0-2 months is simply
OR1 = 1.167

Incorrect Response:.
No, remember that the lowest age-group is the baseline group. This means that
there is no interaction effect. The Bwt1 OR represents the effect of birthweight in the
baseline group of age. Therefore the odds ratio for birthweight in age group 0-2
months is simply OR1 = 1.167. You can read this directly from the table.
Interaction: Calculation: Age-group 3 to 5 months:
Correct Response 1.714:
Correct
Yes, the estimated odds ratio for the effect of birthweight in the age group 3-5
months is given by 1.167 x 1.469 = 1.714.
Incorrect Response:
Sorry, thats not right. The additional effect of birthweight in this age-group is
estimated as OR2 = 1.47. Therefore, in infants aged 3 to 5 months, the estimated
odds ratio for the effect of birthweight is given by 1.167 x 1.469 = 1.714.
Interaction: Calculation: Age-group 6 to 11 months:
Correct Response 1.201:
Correct
Yes, the estimated odds ratio for the effect of birthweight in age group 6-11 months
is 1.167 x 1.029 = 1.201.
Incorrect Response:
Sorry, thats not right. The additional effect of birthweight in this age-group is
estimated as OR3 = 1.029. Therefore, in infants aged over 6 months, the estimated
odds ratio for the effect of birthweight is 1.167 x 1.029 = 1.201.
CLR birthweight and interaction with age-group - on a log
scale
Standa
Coeffici
P>
95% confidence
z
rd
ent
|z|
limits
error
Bwt1
0.1542
0.5563 0.27 0.78
0.936
1.245
7
2
Bwt1.Agegrp
0.3848
0.7319 0.52 0.59
1.050
1.819
1
6
9
Bwt1.Agegrp
0.0282
0.8223 0.03 0.97
1.584
1.640
2
4
3

5.4: Interaction with a matching variable


Likelihood ratio test for interaction

To assess whether the odds ratios are heterogeneous across strata of a matching
variable, you can use a likelihood ratio test. Consider the models opposite and click
LRT to calculate the likelihood ratio test.
What can you conclude from this test?
Interaction: Button: clouds picture:
The P-value is high (p=0.84), so you can conclude that there is no evidence that the
effect of birthweight varies with age. We can reasonably proceed on the assumption
that the odds ratio is the same across the different age groups.
Interaction: Tabs: Model 1:
Log odds =

constant +

1 Bwt1
+ 2 Bwt1.Agegrp1
+ 3 Bwt1.Agegrp2

Log likelihood = 58.861


Interaction: Tabs: Model 2:
Log odds =

constant +

1 Bwt1

Log likelihood = 59.038


Interaction: Button: LRT (text appears on bottom RHS):
LRS

=
2(L1L0)
=
2(58.861 (59.038))
=
0.354
The corresponding P-value is 0.84.

Section 6: Exposures with more than 2 levels


In matched case-control studies it is difficult to use classical methods to analyse
exposures with more than 2 levels. As always, the advantage of regression methods
is that all types of variables can be included in a model. The analysis of exposures
with more than 2 levels is easy with conditional logistic regression.
A conditional logistic model for an exposure with many levels can give:
An estimate for each level compared to the baseline
A test for trend across levels of the exposure.

6.1: Exposures with more than 2 levels


To illustrate a model with more than 2 levels of exposure we will look at exposure to
breast feeding in the Brazilian study dataset. The variable is called Milkgrp and the 3
levels are:
1. Milkgrp: Breast fed
2. Milkgrp: Breast fed + other types of feeding
3. Milkgrp: Other types of feeding only
Consider the table below. How many discordant case-control pairs are there?
Discordant pairs =
Again, from the table, how many concordant case-control pairs are there, i.e., those
that do not contribute to the analysis and do not provide any information about the
effect of the exposure?
Concordant pairs =
Interaction: Calculation: Discordant pairs = ____:
Correct Response 45:
Correct
Thats right, there are 5 + 19 + 5 + 10 + 1 + 5 = 45 discordant pairs.
Incorrect Response:
The discordant pairs are those for which the case and control disagree about
feeding habit. Therefore there are 5 + 19 + 5 + 10 + 1 + 5 = 45 discordant pairs.
When answered, the table on the centre bottom also highlights as shown below:
Controls
Breast fed
Cases

Breast fed
Breast fed +
other
Other only

5
5

Breast fed
+ other
5
7

19

10

Interaction: Calculation: Concordant pairs = ___


Correct Response 39:
Correct
That's right, there are 5 + 7 + 27 = 39 concordant pairs.
Incorrect Response:

Other only
1
5
27

The concordant pairs do not provide any information about the effect of the
exposure. So the number of case-control pairs that do not contribute to the analysis
is 5 + 7 + 27 = 39 pairs.
When answered, the table on the centre bottom also highlights as shown below:
Controls
Breast fed
Cases

Breast fed
Breast fed +
other
Other only

5
5

Breast fed
+ other
5
7

19

10

Other only
1
5
27

p6c2centre bottom
Controls
Breast fed
Cases

Breast fed
Breast fed +
other
Other only

5
5

Breast fed
+ other
5
7

19

10

Other only
1
5
27

6.2: Exposures with more than 2 levels


The table below shows the results for the following conditional logistic model:

Log odds =

constant + set
+ 2 milkgrp2 + 3 milkgrp3

Click swap to see the results on the odds ratio scale.


Consider the estimates from this table.
Interaction: Button: Swap (the table on centre bottom changes to below):
CLR for type of feeding (Milkgrp) - odds ratio
Odds
Standa Z
P > |z|
ratio
rd
error
Milkgrp
2.328
1.208
1.628
0.104
2
Milkgrp
7.355
3.977
3.690
< 0.001
3
log likelihood = -49.968

scale
95% confidence
limits
0.842

6.436

2.549

21.223

When compared with the baseline group of infants who were only breast fed, how
would you interpret the estimates for Milkgrp:
1. for infants who were breast fed with other types of feeding?
2. for infants who were only fed with other types of feeding?
Interaction: Button: clouds picture(1):
The estimated OR for the group with a combination of breast feeding with other
types of feeding is:
OR = 2.33 (0.84 to 6.44), P = 0.10.
There is weak evidence that the odds of infant mortality due to diarrhoea are higher
in infants with mixed feeding, compared to infants who are exclusively breast-fed
Interaction: Button: clouds picture(2):
The estimates for the group with no breast feeding (i.e. only other types of feeding)
is:
OR = 7.36 (2.55 to 21.22), P < 0.001.
There is strong evidence that the odds of infant mortality due to diarrhoea are higher
in infants with no breast-feeding, compared to infants who are exclusively breast-fed
CLR for type of feeding (Milkgrp) - on the log
Coeffic Standar Z
P > |z|
ient
d error
Milkgrp
0.8449
0.5189
1.628
0.104
2
Milkgrp
1.9953
0.5407
3.690
< 0.001
3

scale
95% confidence
limits
0.172
1.862
0.9356

3.055
Log likelihood = 49.968

6.3: Exposures with more than 2 levels


An increase in the level of Milkgrp represents an increase in the level of exposure to
non-breast milk. The odds ratio associated with each level is shown in the plot
opposite.
We may be able to simplify this model and describe the effect of increasing exposure
to non-breast milk by a linear trend, instead of separate effects for each level. Click
show to see a table of results for a model assuming a linear trend (linear on the log
odds scale).
Interaction: Button: Show (table appears on centre bottom and graph is altered on
RHS):

CLR for type of feeding (Milkgrp) - assuming a linear trend on


the log(odds) scale
Odds
Standar
z
P>
95% confidence
ratio
d error
|z|
limits
Milkgrp

2.7242

0.7367

3.70
6

<
0.001

1.603

4.628

6.4: Exposures with more than 2 levels


Do you think there is a linear trend in the effect of exposure to breast milk on infant
mortality due to diarrhoea (on the log(odds) scale)?

Interaction: Button: clouds picture:


It appears that the model assuming a linear trend on the log(odds) scale describes
well the separate effects of each level of exposure. There is an increasing trend in
the effect of exposure to breast milk on infant mortality due to diarrhoea.

6.5: Exposures with more than 2 levels


Before we can assume a linear trend (on the log(odds) scale), we must test whether
this assumption is valid.
Can you remember how we do this?
Interaction: Button: clouds picture:
To test this assumption we carry out a test for departure from a linear trend. This is
a likelihood ratio test that compares a model with separate effects for each level of
exposure to a model with a common effect for each unit increase in exposure.
Interaction: Button: Swap: (the table on centre bottom changes to below):
CLR for type of feeding (Milkgrp) - assuming a linear trend on
the log(odds) scale
Coeffic Standa z
P > |z| 95% confidence
ient
rd
limits
error
Milkgrp
1.0022
0.2704
3.706
< 0.001 0.472
1.532
log likelihood = -50.029

The table below shows the result of a CLR model with separate effects for each level
of Milkgrp. Click swap to see the results for a model with a linear effect (on the
log(odds) scale) for Milkgrp.
CLR for type of feeding (Milkgrp) - separate effects, on the
log(odds) scale
Coeffic Standa z
P > |z| 95% confidence
ient
rd
limits
error
Milkgrp
0.8449
0.5189
1.628
0.104
0.172
1.862
2
Milkgrp
1.9953
0.5407
3.690
< 0.001 0.9356
3.055
3

6.6: Exposures with more than 2 levels


For a likelihood ratio test to compare these two models, calculate the likelihood ratio
statistic (LRS) to 3 decimal places:
LRS =
Interaction: Calculation: LRS =___:
Correct Response 0.122:
Correct
Yes, the likelihood ratio statistic is given by:
LRS = 2(L1 L2) = 2(49.968 (50.029)) = 0.122
When the calculation has been answered, the following text and interaction appears
on the top RHS:
On 1 degree of freedom (because the "simpler" model has one less parameter than
the model that does not assume a linear trend) this gives P = 0.73.
What can you conclude from this test?
Interaction: Button: clouds picture (pop up box appears):
This P-value is high, so there is no evidence of departure from a linear trend. So we
can work on the basis that a linear trend is a reasonable description of reality, and
we can use the "simpler" model that assumes a linear trend (on the log(odds) scale).
Incorrect Response:
No, thats not right. Remember that the likelihood ratio statistic is given by LRS =
2(L1 L2). Now, review the values of log likelihood from the tables below, and you
can see that
LRS = 2(49.968 (50.029)) = 0.122.
Interaction: Button: Swap (table on centre bottom changes same as card 6):

Section 7: Multiple controls per case


With classical methods the analysis of matched pairs is simple. But if a case has
more than one control the analysis is more complex - although an analysis can be
performed readily in Stata using the mhodds command, stratifying on matched sets.
However, with conditional logistic regression no new difficulties are introduced.

7.1: Multiple controls per case


To illustrate this you will again use the Brazilian data, but this time consider the "full"
dataset with 2 controls per case.

One of the environmental factors examined in the analysis was access to piped water
(yes or no).

Exposure is defined as having no access to piped water.


Each case is either exposed or not exposed
Also, for each case
0 controls may be exposed,
1 control may be exposed or
both controls may be exposed.

7.2: Multiple controls per case


The exposure status for cases and controls, with 2 controls per case, can be
tabulated as shown opposite. In how many case-sets did the case have no access to
piped water while both controls did have access?
How many case-sets do not tell us anything about the odds of exposure to a lack of
piped water?
Interaction: Calculation: The exposure status for cases and controls, with 2 controls
per case, can be tabulated as shown opposite. In how many case-sets did the case
have no access to piped water while both controls did have access?
Correct Response 18 (pop up box appears):
Correct
Yes, thats right, there were 18 case sets where the case was not exposed to piped
water and the controls were exposed.
Incorrect Response (pop up box appears and cell in table on RHS is highlighted):
No, thats not right. Remember that exposure is no access to piped water. Therefore
the cases with no access are exposed, and the controls that do have access are not
exposed.
Therefore, the highlighted cell in the table shows you that there were 18 case sets
where the case was not exposed to piped water and the controls were exposed.
Frequency of exposure in case-control sets
Number of
controls exposed
0
1
2
Case
Exposed
18
15
26
Not
98
12
1
exposed
Interaction: Calculation: How many case-sets do not tell us anything about the odds
of exposure to a lack of piped water?:
Correct Response 124:
Correct
Yes, there are 124 case-sets where cases and both controls agree, so these do not
contribute to the estimate of the odds ratio associated with exposure to a lack of
piped water.
Incorrect Response:

Sorry, thats not right. It is the case-sets where cases and both controls agree that
do not contribute to the estimate of the odds ratio associated with exposure to a lack
of piped water.
There were 26 case sets where the case and both controls had no access to piped
water (exposed), and 98 case sets where the case and both controls did have access
(not exposed), so 124 case-sets do not tell us anything about the odds of exposure
to a lack of piped water.
Exposure = no access to piped water
Frequency of exposure in case-control sets
Number of
controls exposed
0
1
2
Case
Exposed
18
15
26
Not
98
12
1
exposed

7.3: Multiple controls per case


In examining this exposure, the hypothesis is that no access to piped water
increases the odds of being a case (death due to infant diarrhoea). So we would
expect more controls to have piped water compared to the case they are matched to.
The results from a classical Mantel-Haenszel analysis are shown below.
OR = 3.64, (1.80, 7.36) ; P < 0.001
Exposure = no access to piped water
Frequency of exposure in case-control sets
Number of
controls exposed
0
1
2
Case
Exposed
18
15
26
Not
98
12
1
exposed

7.4: Multiple controls per case


The table below shows the results of a conditional logistic regression model for
exposure to no piped water where each case has 2 controls. Note that there is no
constant term in the model.
Consider the table. What does the coefficient 1.2423 represent?
Interaction: Button: clouds picture (pop up box appears):

The coefficient represents the log(odds ratio) for the effect of exposure to no piped
water compared to having access to piped water.
So, what is the odds ratio estimated from the CLR model for the effect of no piped
water on infant mortality due to diarrhoea, to 2 decimal places?
Odds ratio =
Interaction: Calculation: Odds ratio = ___:
Correct Response 3.46:
The odds ratio is given by exp(1.2423) = 3.46.
The corresponding confidence interval is 1.78 to 6.73. You can now see these results
on the odds ratio scale in the table below.
Incorrect Response:
No, remember that the odds ratio is given by the exponential of the coefficient. So in
this case,
Odds ratio = exp(1.2423) = 3.46.
The corresponding confidence interval is 1.78 to 6.73. You can now see these results
on the odds ratio scale in the table below.
p7c5centre bottom
CLR for no piped water - on an OR scale
Odds
Standard Z
P>z
95% confidence
ratio
error
limits
Pipewat
3.4637
1.1726 3.67 <0.00
1.784
6.7251
0
1
Log likelihood = 179.300

7.5: Multiple controls per case


Note that the Mantel-Haenszel estimate (OR=3.64) is only approximately equal to
the conditional MLE (OR=3.46) in this case. The likelihood ratio test gives a chisquare value of 14.93 on 1 degree of freedom (p<0.001), indicating like the classical
chi-squared test that there is very strong evidence of an effect. No access to piped
water increases the odds of infant mortality from diarrhoea.

Section 8: Models with several exposures/confounders


In a matched case-control study we can control for important known confounders by
using them as matching variables. However, there are often additional variables, not
matched for, that may have a confounding effect on the main exposure(s) of
interest. Moreover, there are usually several exposures of interest, and we generally
need to examine their joint effects, to see whether their effects are mutually
confounding and to examine for interactions. Methods for doing this were
unsatisfactory without CLR.

8.1: Models with several exposures/confounders


We will now examine a model with 2 exposure variables using the Brazilian data with
only 1 control per case.
We have already examined the effect of access to piped water on infant mortality.
Another exposure of interest was the type of housing. Type of house was categorised
as Shack or Regular house.
The tabs opposite show the pair matched tables for the association with infant
mortality due to
diarrhoea of:
a) access to piped water and
b) housing.
Interaction: Tabs: Piped water:
Access to piped water
Controls
No
Cases

No

Yes
17

11

56

Yes
OR = 5.5
Interaction: Tabs: Housing:
Housing type

Controls
Shack
Cases

Shack
Regula
r

Regular

22

18

39

OR = 2.6

8.2: Models with several exposures/confounders


To examine the joint effect of the two exposures with classical methods, and to
control for housing type when estimating the OR for piped water, we would have to
stratify the matched table for access to piped water by the levels of housing. In
doing this, in each stratum for housing both the cases and controls must have the
same housing type.

Therefore the analysis is restricted to the matched pairs where housing was the
same i.e., the matched pairs should be homogeneous for the confounding factor.
As cases and controls were not matched on housing type, for some pairs housing
type will be different. In fact, only 61 pairs were homogeneous for housing type, as
shown opposite.
An estimate of the OR associated with lack of piped water supply, controlling for
housing, is given by adding the data for the discordant pairs over the strata, giving
OR = (2+3) / (2+0) = 2.5. However, with only 7 discordant pairs, the estimate is
clearly very imprecise, and using the exact test the p-value is 0.45, providing no
evidence of an association. Similarly there is very little scope for testing for
interaction. In this example, relatively few pairs were thrown away, because the
matching on neighbourhood has probably helped to ensure that cases and controls
often had the same type of housing. For many confounding factors, far more pairs
would have to be thrown away. Any attempt to control for more than one confounder
would clearly be even less satisfactory.
Interaction: Tabs: Shack:
Access to piped water: Shack only

Cases

No
yes

No

Controls
Yes
14
2

2
4

Interaction: Tabs: Regular:


Access to piped water: Regular house only

Cases

No
yes

No

Controls
Yes
1
0

3
35

8.3: Models with several exposures/confounders


Conditional logistic regression, on the other hand, allows us to examine the effect of
the two exposures simultaneously without the loss of data. The estimates from this
model will therefore be more precise.
The table below shows the results of a model with the two binary exposures:
1. access to piped water
2. type of housing
Interaction: Button: Swap (table on centre bottom changes to below):
CLR for access to piped water and housing type - on an OR
scale
Odds
Standa z
P > |z| 95% confidence

House2
Pipewat

ratio

rd
error

2.2289
4.6583

1.0323
3.6476

limits
1.730
1.965

0.084
0.049

0.899
1.004

5.525
21.614
Log likelihood = 54.570

This model gives the effect of each exposure adjusted for the other. Click swap to
see the results on an OR scale. The OR for piped water, controlled for housing type,
is 4.7
Does this model account for the additional joint effect of the possible interaction
between access to piped water and housing type?
Interaction: Button: clouds picture:
No, there are no interaction terms in this model. This model gives estimates for each
exposure adjusted for the other but assumes that the OR for piped water is the same
for the 2 categories of housing, and that the OR for housing is the same for the 2
categories of piped water.
CLR for access to piped water and housing type - on a log scale
Coefficie
Standard z
P>
95% confidence
nt
error
|z|
limits
House2

0.8015

0.4632

Pipewat

1.5387

0.7830

1.73
0
1.96
5

0.084
0.049

0.1063
0.0040

1.7093
3.0733
Log likelihood = 54.570

8.4: Models with several exposures/confounders


To assess the statistical evidence for the association of each exposure with the
outcome, adjusted for the other, we now need to exclude each exposure in turn and
carry out a likelihood ratio test. First, lets look at the model excluding access to
piped water.
Use swap to compare this with the model that includes both exposures.
Interaction: Button: Swap: (table on centre bottom changes to below):
CLR for access to piped water and housing type - on a log scale
Coeffic Standa z
P > |z| 95% confidence
ient
rd
limits
error
House2
0.8015
0.4632
1.730
0.084

1.7093
0.1063
Pipewat 1.5387
0.7830
1.965
0.049
0.0040
3.0733
Log likelihood = 54.570
Likelihood ratio test for access to piped water:

LRS = 5.07; P = 0.02.


What can you conclude from this test about the effect of access to piped water on
infant mortality due to diarrhoea?
Interaction: Button: clouds picture:
After taking account of housing type, there is moderately strong evidence of an
association between piped water and the odds of infant mortality from diarrhoea (P
= 0.02).
CLR for housing type, excluding access to piped water - on a
log scale
Coeffic Standa z
P > |z| P > |z|
ient
rd
error
House2
0.9445
0.4454
2.120
0.034
0.0714
1.817
Log likelihood = 57.106

8.5: Models with several exposures/confounders


Next, we look at the model excluding housing type. Again, use swap to compare
this with the model that includes both exposures.
Interaction: Button: Swap:
CLR for access to piped water and housing type - on a log scale
Coeffic Standa z
P > |z| 95% confidence
ient
rd
limits
error
House2
0.8015
0.4632
1.730
0.084

1.7093
0.1063
Pipewat 1.5387
0.7830
1.965
0.049
0.0040
3.0733
Log likelihood = 54.570
Likelihood ratio test for housing type:
LRS = 3.22; P = 0.07.
What can you conclude from this test about the effect of housing type on infant
mortality due to diarrhoea?
Interaction: Button: clouds picture (pop up box appears):
After taking account of piped water, there is weak evidence of an association
between housing type and risk of infant mortality from diarrhoea (P = 0.07).
p8c6centre bottom
CLR for access to piped water, excluding housing type - on a
log scale
Coeffic Standa z
P > |z| 95% confidence

ient
Pipewat

1.7047

rd
error
0.7687

limits
2.218

0.027

0.1981

3.211
Log likelihood = 56.181

8.6: Models with several exposures/confounders


Finally, to examine whether the OR for the effect of water supply varies with housing
type, you can include the interaction between access to piped water and housing
type in the model. The estimates from this model are shown below. The likelihood
ratio test for the interaction is also shown opposite.
Likelihood ratio test:
LRS = 0.08; P = 0.78.
Do you think there is evidence that the OR for water supply varies with housing
type?
Interaction: Button: clouds picture:
No, since P = 0.78 there is no evidence of interaction between housing type and
access to piped water.
It is important to recognise that the additional power achieved by CLR is achieved at
the cost of additional assumptions. In the above example, the joint effects of housing
type and water supply are assumed to be the same in all strata. In the classical
analysis, we do not have to assume that the effect of the confounder, housing type,
is constant over strata.
CLR with interaction between access to piped
housing type - on a log scale
Odds
Standa z
P > |z|
ratio
rd
error
House2
2.3474
1.1791
1.699
0.089
Pipewat 5.6700
6.0627
1.623
0.105
House2. 0.7214
0.8483
0.278
0.781
pipewat

water and
95% confidence
limits
0.877
0.697
0.072

6.283
46.103
7.229
Log likelihood = 54.532

Section 9: Summary
The main points of this session will appear below as you click through the step card
opposite. Click on any of the list entries below to go back to that card.
Matching
Matched casecontrol studies must be analysed taking account of the matching.
Conditional logistic regression

The use of ordinary (i.e. not conditional) logistic regression for individually matched
studies, or frequency-matched studies in which the number of cases and controls in
each strata is small, will give biased estimates of odds ratios. So for these studies
conditional logistic regression must be used.
Ordinary logistic regression can be used for frequency-matched studies in which
the number of cases and controls in each strata is not small, for example at least 10
cases and 10 controls in each strata. In this case, the matching variables must
always be included in the regression model. For example, if cases and controls have
been frequency matched on age and sex, then age group and sex must be included
in the logistic regression model
Advantages of conditional logistic regression
Classical analysis for matched data is limited.
Conditional logistic regression can deal with
multiple controls per case
all types of variables
many exposures/confounders
Interpretation
The interpretation of the estimates from a conditional logistic model is the same as in
other regression models.
Owing to the conditional nature of the model, there is no constant estimated in the
model.
Matching variables and interaction
The effect of a matching variable cannot be estimated in the analysis.
However, the effect of an interaction with a matching variable can be estimated in a
model. So we can calculate the odds ratio for an exposure variable, separately for
each level of a matching variable.

Anda mungkin juga menyukai