(AS02)
EPM304 Advanced Statistical Methods in Epidemiology
This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0
fit
fit
fit
fit
fit
and
and
and
and
and
interpret
interpret
interpret
interpret
interpret
This session should take you between 1.5 and 2 hours to complete.
Matched case-control
studies
AS02
case
case
case
case
The conditional likelihood works out the likelihood of each permutation. The log
(conditional) likelihood is then multiplied by the number of matched sets with that
permutation.
Conditional
likelihood
L1
n1
L2
n2
L3
n3
L4
n4
Permutation
Frequency
3.6: Summary
1. Logistic regression is used to model case-control data, person-by-person.
Conditional logistic regression is used to model matched case-control data, set-byset (pair-by-pair in the special case of a pair-matched case-control study).
2. When a conditional logistic model is fitted, the constant within each case-set is
eliminated using conditional likelihood.
3. This conditional likelihood accounts for the fact that the case and control(s) are
linked, and uses only case-control sets in which the exposure status of the case is
different to 1 control.
4. The only estimates that are produced by this model are the effects of the
exposures and/or other variables in the model.
5. There is no constant term in the output from a conditional logistic regression
model.
6. The interpretation of parameter estimates is identical to that of unconditional
logistic regression. And as with unconditional logistic regression, the effects of the
exposure variables are measured using odds ratios.
The parameter Bwt1, given in the model, estimates the effect of low birthweight
compared to high birthweight.
Bwt0
Bwt1
> 3.0 kg
< 3.0 kg
You can be 95% confident that the true OR lies between 0.758 and 2.546.
Log odds =
constant + set
constant + set
+ 1 Exp1 + 2 Exp1.Exp2
Where,
Exp1 is the exposure of interest and
Exp2 is the matching variable
The 3 tables presented opposite show data from the Brazilian study. The matched
table for exposure to low birthweight is stratified by the 3 levels of the matching
variable age.
Click swap to see a table of discordant pairs and test for effect modification.
Interaction: Tabs: Agegroup0:
02 months
Case
< 3.0 kg
> 3.0 kg
Control
< 3.0 kg
4
6
> 3.0 kg
7
8
OR1 = 1.17
Interaction: Tabs: Agegroup1:
35 months
Case
< 3.0 kg
> 3.0 kg
Control
< 3.0 kg
4
7
> 3.0 kg
12
15
OR2 = 1.71
Interaction: Tabs: Agegroup3:
> 6 months
Case
< 3.0 kg
> 3.0 kg
Control
< 3.0 kg
4
5
> 3.0 kg
6
8
OR3 = 1.20
Interaction: Button: Swap (right handside card changes to text below):
Table of discordant pairs
02
Case < 3.0,
12
>6
6
Incorrect Response:.
No, remember that the lowest age-group is the baseline group. This means that
there is no interaction effect. The Bwt1 OR represents the effect of birthweight in the
baseline group of age. Therefore the odds ratio for birthweight in age group 0-2
months is simply OR1 = 1.167. You can read this directly from the table.
Interaction: Calculation: Age-group 3 to 5 months:
Correct Response 1.714:
Correct
Yes, the estimated odds ratio for the effect of birthweight in the age group 3-5
months is given by 1.167 x 1.469 = 1.714.
Incorrect Response:
Sorry, thats not right. The additional effect of birthweight in this age-group is
estimated as OR2 = 1.47. Therefore, in infants aged 3 to 5 months, the estimated
odds ratio for the effect of birthweight is given by 1.167 x 1.469 = 1.714.
Interaction: Calculation: Age-group 6 to 11 months:
Correct Response 1.201:
Correct
Yes, the estimated odds ratio for the effect of birthweight in age group 6-11 months
is 1.167 x 1.029 = 1.201.
Incorrect Response:
Sorry, thats not right. The additional effect of birthweight in this age-group is
estimated as OR3 = 1.029. Therefore, in infants aged over 6 months, the estimated
odds ratio for the effect of birthweight is 1.167 x 1.029 = 1.201.
CLR birthweight and interaction with age-group - on a log
scale
Standa
Coeffici
P>
95% confidence
z
rd
ent
|z|
limits
error
Bwt1
0.1542
0.5563 0.27 0.78
0.936
1.245
7
2
Bwt1.Agegrp
0.3848
0.7319 0.52 0.59
1.050
1.819
1
6
9
Bwt1.Agegrp
0.0282
0.8223 0.03 0.97
1.584
1.640
2
4
3
To assess whether the odds ratios are heterogeneous across strata of a matching
variable, you can use a likelihood ratio test. Consider the models opposite and click
LRT to calculate the likelihood ratio test.
What can you conclude from this test?
Interaction: Button: clouds picture:
The P-value is high (p=0.84), so you can conclude that there is no evidence that the
effect of birthweight varies with age. We can reasonably proceed on the assumption
that the odds ratio is the same across the different age groups.
Interaction: Tabs: Model 1:
Log odds =
constant +
1 Bwt1
+ 2 Bwt1.Agegrp1
+ 3 Bwt1.Agegrp2
constant +
1 Bwt1
=
2(L1L0)
=
2(58.861 (59.038))
=
0.354
The corresponding P-value is 0.84.
Breast fed
Breast fed +
other
Other only
5
5
Breast fed
+ other
5
7
19
10
Other only
1
5
27
The concordant pairs do not provide any information about the effect of the
exposure. So the number of case-control pairs that do not contribute to the analysis
is 5 + 7 + 27 = 39 pairs.
When answered, the table on the centre bottom also highlights as shown below:
Controls
Breast fed
Cases
Breast fed
Breast fed +
other
Other only
5
5
Breast fed
+ other
5
7
19
10
Other only
1
5
27
p6c2centre bottom
Controls
Breast fed
Cases
Breast fed
Breast fed +
other
Other only
5
5
Breast fed
+ other
5
7
19
10
Other only
1
5
27
Log odds =
constant + set
+ 2 milkgrp2 + 3 milkgrp3
scale
95% confidence
limits
0.842
6.436
2.549
21.223
When compared with the baseline group of infants who were only breast fed, how
would you interpret the estimates for Milkgrp:
1. for infants who were breast fed with other types of feeding?
2. for infants who were only fed with other types of feeding?
Interaction: Button: clouds picture(1):
The estimated OR for the group with a combination of breast feeding with other
types of feeding is:
OR = 2.33 (0.84 to 6.44), P = 0.10.
There is weak evidence that the odds of infant mortality due to diarrhoea are higher
in infants with mixed feeding, compared to infants who are exclusively breast-fed
Interaction: Button: clouds picture(2):
The estimates for the group with no breast feeding (i.e. only other types of feeding)
is:
OR = 7.36 (2.55 to 21.22), P < 0.001.
There is strong evidence that the odds of infant mortality due to diarrhoea are higher
in infants with no breast-feeding, compared to infants who are exclusively breast-fed
CLR for type of feeding (Milkgrp) - on the log
Coeffic Standar Z
P > |z|
ient
d error
Milkgrp
0.8449
0.5189
1.628
0.104
2
Milkgrp
1.9953
0.5407
3.690
< 0.001
3
scale
95% confidence
limits
0.172
1.862
0.9356
3.055
Log likelihood = 49.968
2.7242
0.7367
3.70
6
<
0.001
1.603
4.628
The table below shows the result of a CLR model with separate effects for each level
of Milkgrp. Click swap to see the results for a model with a linear effect (on the
log(odds) scale) for Milkgrp.
CLR for type of feeding (Milkgrp) - separate effects, on the
log(odds) scale
Coeffic Standa z
P > |z| 95% confidence
ient
rd
limits
error
Milkgrp
0.8449
0.5189
1.628
0.104
0.172
1.862
2
Milkgrp
1.9953
0.5407
3.690
< 0.001 0.9356
3.055
3
One of the environmental factors examined in the analysis was access to piped water
(yes or no).
Sorry, thats not right. It is the case-sets where cases and both controls agree that
do not contribute to the estimate of the odds ratio associated with exposure to a lack
of piped water.
There were 26 case sets where the case and both controls had no access to piped
water (exposed), and 98 case sets where the case and both controls did have access
(not exposed), so 124 case-sets do not tell us anything about the odds of exposure
to a lack of piped water.
Exposure = no access to piped water
Frequency of exposure in case-control sets
Number of
controls exposed
0
1
2
Case
Exposed
18
15
26
Not
98
12
1
exposed
The coefficient represents the log(odds ratio) for the effect of exposure to no piped
water compared to having access to piped water.
So, what is the odds ratio estimated from the CLR model for the effect of no piped
water on infant mortality due to diarrhoea, to 2 decimal places?
Odds ratio =
Interaction: Calculation: Odds ratio = ___:
Correct Response 3.46:
The odds ratio is given by exp(1.2423) = 3.46.
The corresponding confidence interval is 1.78 to 6.73. You can now see these results
on the odds ratio scale in the table below.
Incorrect Response:
No, remember that the odds ratio is given by the exponential of the coefficient. So in
this case,
Odds ratio = exp(1.2423) = 3.46.
The corresponding confidence interval is 1.78 to 6.73. You can now see these results
on the odds ratio scale in the table below.
p7c5centre bottom
CLR for no piped water - on an OR scale
Odds
Standard Z
P>z
95% confidence
ratio
error
limits
Pipewat
3.4637
1.1726 3.67 <0.00
1.784
6.7251
0
1
Log likelihood = 179.300
No
Yes
17
11
56
Yes
OR = 5.5
Interaction: Tabs: Housing:
Housing type
Controls
Shack
Cases
Shack
Regula
r
Regular
22
18
39
OR = 2.6
Therefore the analysis is restricted to the matched pairs where housing was the
same i.e., the matched pairs should be homogeneous for the confounding factor.
As cases and controls were not matched on housing type, for some pairs housing
type will be different. In fact, only 61 pairs were homogeneous for housing type, as
shown opposite.
An estimate of the OR associated with lack of piped water supply, controlling for
housing, is given by adding the data for the discordant pairs over the strata, giving
OR = (2+3) / (2+0) = 2.5. However, with only 7 discordant pairs, the estimate is
clearly very imprecise, and using the exact test the p-value is 0.45, providing no
evidence of an association. Similarly there is very little scope for testing for
interaction. In this example, relatively few pairs were thrown away, because the
matching on neighbourhood has probably helped to ensure that cases and controls
often had the same type of housing. For many confounding factors, far more pairs
would have to be thrown away. Any attempt to control for more than one confounder
would clearly be even less satisfactory.
Interaction: Tabs: Shack:
Access to piped water: Shack only
Cases
No
yes
No
Controls
Yes
14
2
2
4
Cases
No
yes
No
Controls
Yes
1
0
3
35
House2
Pipewat
ratio
rd
error
2.2289
4.6583
1.0323
3.6476
limits
1.730
1.965
0.084
0.049
0.899
1.004
5.525
21.614
Log likelihood = 54.570
This model gives the effect of each exposure adjusted for the other. Click swap to
see the results on an OR scale. The OR for piped water, controlled for housing type,
is 4.7
Does this model account for the additional joint effect of the possible interaction
between access to piped water and housing type?
Interaction: Button: clouds picture:
No, there are no interaction terms in this model. This model gives estimates for each
exposure adjusted for the other but assumes that the OR for piped water is the same
for the 2 categories of housing, and that the OR for housing is the same for the 2
categories of piped water.
CLR for access to piped water and housing type - on a log scale
Coefficie
Standard z
P>
95% confidence
nt
error
|z|
limits
House2
0.8015
0.4632
Pipewat
1.5387
0.7830
1.73
0
1.96
5
0.084
0.049
0.1063
0.0040
1.7093
3.0733
Log likelihood = 54.570
1.7093
0.1063
Pipewat 1.5387
0.7830
1.965
0.049
0.0040
3.0733
Log likelihood = 54.570
Likelihood ratio test for access to piped water:
1.7093
0.1063
Pipewat 1.5387
0.7830
1.965
0.049
0.0040
3.0733
Log likelihood = 54.570
Likelihood ratio test for housing type:
LRS = 3.22; P = 0.07.
What can you conclude from this test about the effect of housing type on infant
mortality due to diarrhoea?
Interaction: Button: clouds picture (pop up box appears):
After taking account of piped water, there is weak evidence of an association
between housing type and risk of infant mortality from diarrhoea (P = 0.07).
p8c6centre bottom
CLR for access to piped water, excluding housing type - on a
log scale
Coeffic Standa z
P > |z| 95% confidence
ient
Pipewat
1.7047
rd
error
0.7687
limits
2.218
0.027
0.1981
3.211
Log likelihood = 56.181
water and
95% confidence
limits
0.877
0.697
0.072
6.283
46.103
7.229
Log likelihood = 54.532
Section 9: Summary
The main points of this session will appear below as you click through the step card
opposite. Click on any of the list entries below to go back to that card.
Matching
Matched casecontrol studies must be analysed taking account of the matching.
Conditional logistic regression
The use of ordinary (i.e. not conditional) logistic regression for individually matched
studies, or frequency-matched studies in which the number of cases and controls in
each strata is small, will give biased estimates of odds ratios. So for these studies
conditional logistic regression must be used.
Ordinary logistic regression can be used for frequency-matched studies in which
the number of cases and controls in each strata is not small, for example at least 10
cases and 10 controls in each strata. In this case, the matching variables must
always be included in the regression model. For example, if cases and controls have
been frequency matched on age and sex, then age group and sex must be included
in the logistic regression model
Advantages of conditional logistic regression
Classical analysis for matched data is limited.
Conditional logistic regression can deal with
multiple controls per case
all types of variables
many exposures/confounders
Interpretation
The interpretation of the estimates from a conditional logistic model is the same as in
other regression models.
Owing to the conditional nature of the model, there is no constant estimated in the
model.
Matching variables and interaction
The effect of a matching variable cannot be estimated in the analysis.
However, the effect of an interaction with a matching variable can be estimated in a
model. So we can calculate the odds ratio for an exposure variable, separately for
each level of a matching variable.