study (AS03)
EPM304 Advanced Statistical Methods in Epidemiology
This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0
To learn about the different sampling schemes for a case-control study and
know what the exposure odds ratio estimates.
Objectives
By the end of this session students will be able to:
This session should take you between 1.5 and 2.5 hours to complete.
FE16
Measures of effect
SM01
Case-control studies
SM04
Section 3: Introduction
Title: Introduction
The tabs below discuss the origin and main advantages of case-control studies.
Interaction: Tabs: 1:
Case-control studies originated within research on the causes of rare diseases.
Can you think why this design was (and still is) particularly efficient for this area of
research?
Interaction: Button: clouds picture (pop up box appears and card on RHS
appears):
With a rare disease only a small proportion of the population are cases, as shown
opposite.
With a case-control study you can usually focus on all the cases, and then study only
a small fraction of the non-case population.
The controls are a sample of the non-cases, selected from the population that
produced the cases.
Interaction: Tabs: 2:
Case-control studies are also useful for diseases with long latent or incubation
periods. This is because they are retrospective in nature, so there is no need to wait
for a disease to develop.
Interaction: Tabs: 3:
Case-control studies have also been used to assess the effects of health
interventions where, for example, randomised controlled trials (RCTs) are not
possible.
For what reasons might it not be possible to use RCTs?
Interaction: Button: clouds picture (pop up box appears ):
It is usually because of ethical or logistic reasons that it is not possible to use
randomised controlled trials.
Examples
Some examples of where case-control studies have been used are given below.
1. Investigation of cervical cancer and women within a screening programme.
2. The assessment of risk factors for legionnaires disease.
3. The assessment of the efficacy of BCG vaccination against leprosy.
3.1: Introduction
Case-control studies, for many understandable reasons, originated from research
into rare diseases. However, more recently case-control studies have been applied
within the research of 'common' diseases.
Can you think why using a case-control design within common disease research
could be an advantage over cohort or intervention studies?
Interaction: Button: clouds picture (pop up box appears and card appears on
RHS):
A case-control study can avoid many of the ethical issues that arise with prospective
cohort and intervention studies. This is because the disease status of study
individuals is already determined. Another advantage is that a case-control study
may also be simpler and quicker to conduct, if it can, for example, be conducted
entirely within a health facility.
Example
Case-control studies have been applied to common diseases, such as the study of
childhood diseases like diarrhoea and acute respiratory illness.
3.2: Introduction
In case-control studies you must decide at the design stage how to recruit controls.
This is one of the most important decisions for an investigator designing a casecontrol study.
Why do you need to choose the 'correct' controls?
Interaction: Button: clouds picture (box appears and card appears on RHS):
Controls are chosen in order to give an unbiased estimate of the appropriate
measure of effect, i.e. the measure defined in the study objective.
The way in which controls are recruited determines both what is estimated (risk
ratio, rate ratio, odds ratio) and the validity of the estimate.
Before the 1950's, the analysis of case-control studies was restricted to a simple
comparison of the proportion of cases exposed and the proportion of controls
exposed. No attempt was made to estimate the size of the exposure effect.
Click below for an example of this.
Interaction: Button: Example (pop up box appears):
In a case-control study to examine the association between smoking and lung
cancer, initially the analysis would address the question:
Do more lung cancer patients have a history of smoking than controls?
The analysis would not attempt to answer the question of the size of the exposure
effect:
How much does smoking increase the risk of lung cancer?
3.3: Introduction
The first estimate for the magnitude of the exposure effect was suggested by
Cornfield in 1951.
He pointed out for the first time that it was possible to estimate the relative risk
associated with exposure, using the ratio of the odds of exposure of cases to the
odds of exposure of controls. It was suggested that this estimate was only possible if
the disease was rare.
In referring to relative risk, no distinction was made between the risk ratio,
rate ratio and odds ratio of disease .
The rare disease assumption remained until 1976 when Miettinen argued that it only
applied to case-control studies:
"...in which the subjects are ascertained after the end of the entire risk-period of
interest."
It did not apply to the case-control studies in chronic disease epidemiology in which
incident cases and controls are recruited concurrently. Meittinen showed that in this
case it was possible to obtain an exposure odds ratio which estimated the rate ratio,
and that no rarity assumption was needed for this.
When these ideas were explored in more detail (by Smith et al 1984, International
Journal of Epidemiology 13: 87-93; and by Greenland and Thomas 1982; American
Journal of Epidemiology 116: 547-553), it was found that it was possible to design a
case-control study for which the exposure odds ratio gave an estimate of the risk
ratio.
3.4: Introduction
On the following pages we will review the concepts of risk ratio, rate ratio and odds
ratio of disease. If the choice of the sampling scheme for controls is suitable, each of
these estimates can be estimated directly in a case-control study by the exposure
odds ratio. This applies to both 'rare' and 'common' diseases.
Sorry, that's not right. The person-time at risk is not used in the risk
formula. Please try again.
Incorrect Response N-D (text appears in bottom LHS):
Sorry, that's not right. This term is not used in the risk formula. Please try
again.
Interaction: Drag and Drop: Risk = denominator:
Correct Response N:
That's right, the denominator of the risk is the number of individuals initially
at risk.
Incorrect Response Y:
Sorry, that's not right. The person-time at risk is not used in the risk
formula. Please try again.
Incorrect Response D:
No, although the risk does depend on the number of cases, that is not
correct. Please try again.
Incorrect Response N-D:
Sorry, that's not right. This term is not used in the risk formula. Please try
again.
When both the numerator and denominator have been placed correctly the following
text appears in a pop up box:
Well done
That's right, the risk is given by the number of cases at the end of follow-up divided
by the number of individuals initially at risk.
Sorry, this term is not involved in the rate calculation. Please try again.
Interaction: Drag and Drop: Rate = denominator:
Correct Response: Y:
That's right, the total person-time at risk (i.e. the shaded area under the
graph) is the denominator of the rate formula.
Incorrect Response: N:
Sorry, this term is not involved in the rate calculation. Please try again.
Incorrect Response: D:
Although rate does depend on the number of new cases, this is not the
correct position of this term. Please try again.
Incorrect Response: N-D:
Sorry, this term is not involved in the rate calculation. Please try again.
When both the numerator and denominator have been placed correctly the following
text appears in a pop up box:
That's right
The number of new cases is related not to the number initially at risk but to the sum
of the lengths of time that each person remained at risk during the period of
observation. This sum is called the total person time at risk (often expressed as
person-years). The shaded areas in the figures represent the number of personyears at risk for exposed and unexposed populations. So the rate is calculated as the
number of new cases divided by the person-time at risk. The ratio of the incidence
rate in the exposed group (D1/Y1) to that in the unexposed group (D0/Y0) is called
the rate ratio.
The ratio of risk in the exposed to the risk in the unexposed is called the risk
ratio.
The ratio of the incidence rate in the exposed group to the incidence rate in the
unexposed group is called the rate ratio.
The ratio of the odds in the exposed group to the odds in the unexposed group is
called the odds ratio.
These relative measures are summarised in the table opposite.
Measure
Definition
Risk ratio
D1 / N1
D0 / N0
Rate ratio
D1 / Y1
D0 / Y0
Odds ratio
D1 / (N1 D1)
D0 / (N0 D0)
Click 'show' below to see how the exposure odds ratio is calculated from this table.
Interaction: Button: Show (text appears on card on RHS):
Exposure OR =
D1 / D0
H1 / H0
D1 x H0
D0 x H1
This exposure odds ratio estimates one of the 3 measures of relative incidence
which one depends on the way in which controls are sampled. We usually refer to it
simply as the exposure odds ratio.
Cases
Controls
Expos
ed
D1
H1
Unexpo
sed
D0
H0
Total
D
H
Definition
Risk ratio
D1 / N1
D0 / N0
D1 / Y1
D0 / Y0
D1/(N1 D1)
D0/(N0 D0)
Rate ratio
Odds ratio
Alternative
formulation
D1 / D0
N1 / N0
D1 / D0
Y1 / Y0
D1 / D0
(N1 D1)/(N0 D0)
This is useful because D1 / D0 is easily estimated from the cases group in a study.
Measures of relative incidence
Measure
Definition
Risk ratio
D1 / N1
D0 / N0
Alternative
formulation
D1 / D0
N1 / N0
Rate ratio
Odds ratio
D1 / Y1
D0 / Y0
D1/(N1 D1)
D0/(N0 D0)
D1 / D0
Y1 / Y0
D1 / D0
(N1 D1)/(N0 D0)
N1 / N0
Y1 / Y0
That's right, N is the number of people who are at risk at the start of the
study period.
Incorrect Response: The ratio of exposed to non-exposed for people who are
disease-free at the end of the study. (text appears on bottom RHS):
Sorry, N does not represent the number of people still disease-free at the
end of the study period. Please try again.
Incorrect Response: The ratio of exposed to non-exposed for person-time at risk
during the study.(text appears on bottom RHS):
Sorry, N does not represent person-time. Please try again.
Interaction: Hotspot: Y1 / Y0:
Correct Response: The ratio of exposed to non-exposed for person-time at risk
during the study. (text appears on bottom RHS):
That's right, Y is the total person-time over the duration of the study period.
Incorrect Response: The ratio of exposed to non-exposed for people who are
disease-free at the end of the study. (text appears on bottom RHS):
Sorry, Y does not represent the number of disease-free people at the end of
the study. Please try again.
Incorrect Response: The ratio of exposed to non-exposed for people at risk at the
start of the study. (text appears on bottom RHS):
Sorry, Y does not represent the number of people at risk at the start of the
study. Please try again.
For any case-control study we take two samples from a population: a sample (or
maybe all) of the cases and a sample of controls. Therefore, all case-control studies
can be thought of as being 'nested' within the total population of interest. Click below
to review the illustration of this that you saw earlier.
Interaction: Button: View (card appears on RHS):
The controls are a sample of the non-cases, selected from the population that
produced the cases.
Conceptually, all case-control studies are 'nested' within a cohort, the cohort being
the population from which the cases and controls were drawn.
The sampling schemes outlined on the following pages illustrate this idea.
What does the exposure odds ratio estimate in this sampling scheme?
Interaction: Button: clouds picture (text appears and graph on RHS is altered):
With an inclusive sampling scheme the exposure odds ratio gives an estimate of the
risk ratio. This is because
N1 / N0 is the denominator of the risk ratio in the alternative formulation you saw
earlier.
With a concurrent sampling scheme the exposure odds ratio gives an estimate of a
rate ratio, because (Y1 / Y0) is the denominator of the rate ratio in the alternative
formulation you saw earlier.
The analysis needs to be matched on time of selection, and in our estimate of the
rate ratio we are assuming that it isconstant over the study period.If an unmatched
analysis were carried out then the concurrent sampling scheme would give you an
unbiased estimate of the rate ratio only if:(i) the proportion of the at risk population
in the exposed and unexposed groups is constant over time and(ii) the rate ratio is
constant over time. Condition (i) will not generally be true in a closed population if
the rate ratio is not equal to 1, but may hold if the population is open (i.e. new
individuals enter the population during the course of the study). If the disease is rare
this will not matter in practice.
Usually case-control studies with concurrent sampling are analysed using a matched
analysis.
Note: Excluding controls who later become cases will turn a concurrent design into a
exclusive one, leading to an estimate of the disease odds ratio rather than the rate
ratio.
2. Inclusive sampling samples controls at the beginning of the time period and the
exposure odds ratio gives an estimate of the risk ratio.
3. Concurrent sampling samples controls over time and, hence, the exposure odds
ratio gives an estimate of the rate ratio.
How do you know which of these sampling schemes to use? It depends on:
whether the disease is rare or not
what you need to measure to meet the study objectives
Interaction: Tabs: Rare disease?:
For a rare disease, the exposure odds ratio obtained with each of the three different
control sampling schemes will be identical.
For a common disease, the three sampling schemes will yield different values for the
exposure odds ratio.
Interaction: Tabs: Which measure?:
Each sampling scheme produces an exposure odds ratio that estimates either risk
ratio, rate ratio, or odds ratio. The choice of sampling is determined by whichever
measure is most appropriate for the study objectives. This will depend on the type
of disease under study and the nature of the exposure under investigation.
Risk ratio
Rate ratio
Definition
D1 / N1
D0 / N0
D1 / Y1
D0 / Y0
D1 / (N1 D1)
D0 / (N0 D0)
Inclusive
Concurrent
Year 1
1.90
2.00
2.10
Year 2
1.82
2.00
Year 3
1.74
2.35
No, that's not right. Remember that the odds is the total number of cases divided by
the number of individuals still at risk. Therefore, at the end of year 2, this is:
Odds ratio = (D11 + D21) / N31
(D10 + D20) / N30
= (181.3 + 148.4) / 670.3 = 2.22
(95.2 + 86.1) / 818.7
Interaction: Calculation: Year 3/Rate ratio:
Correct Response 2.00:
Correct
Make sure that you reached the correct answer using the following calculation:
Rate ratio = (D11 + D21 + D31) / (Y11 + Y21 + Y31)
(D10 + D20 + D30) / (Y10 + Y20 + Y30)
= (181.3 + 148.4 + 121.5) / (906.3 + 742.0 + 607.5) = 2.00
(95.2 + 86.1 + 77.9) / (951.6 + 861.0 + 779.1)
Incorrect Response:
Sorry, that's not right. The rate ratio for years 1, 2 and 3 is calculated as follows:
Rate ratio = (D11 + D21 + D31) / (Y11 + Y21 + Y31)
(D10 + D20 + D30) / (Y10 + Y20 + Y30)
= (181.3 + 148.4 + 121.5) / (906.3 + 742.0 + 607.5) = 2.00
(95.2 + 86.1 + 77.9) / (951.6 + 861.0 + 779.1)
Inclusive
Concurrent
Risk ratio
Rate ratio
Odds ratio
Year 1
1.90
2.00
2.10
Year 2
1.82
2.00
2.22
Year 3
1.74
2.00
2.35
Note that an exception to the choice of using the rate ratio for Situation B is if the
focus is on the increased risk to an individual over a specific time period, such as in
an investigation of risk factors for death during infancy. In this case the risk ratio is
the summary measure of choice and the issue of invariance does not arise
Risk ratio
Rate ratio
Odds ratio
Year 1
0.20
0.18
0.17
Year 2
0.20
0.14
Year 3
0.16
0.12
Exclusive
Concurrent
Risk ratio
Year 1
0.20
Year 2
0.20
Year 3
0.20
Rate ratio
Odds ratio
VE
0.18
0.17
80%
80%
0.17
0.14
80%
0.16
0.12
The rate ratio takes account of not only whether the person experiences the disease
or not, but also how often they experience it.
So why do you think the risk ratio and odds ratio are not appropriate?
Interaction: Button: clouds picture (pop up box appears):
Since the disease is recurrent and cases can return to the population at risk, the
numerator for the measure of incidence is not individuals, but episodes. Therefore
the risk ratio and odds ratio are not valid.
Imagine an extreme situation, in which all children experience an episode of
diarrhoea in the first two years of life, whether or not they were exposed. The risk
ratio will tend towards unity as the period of follow-up increases to two years. This is
because the risk in both the exposed group and unexposed group will tend to 1.
year 1:___:
year 2:___:
Correct
Yes, but make sure that you have calculated it correctly. Remember to use the total
number of cases and the total amount of person-time since the start of follow-up.
Thus,
Rate ratio = 400 / 2000 = 2.0
200 / 2000
Incorrect Response:
Sorry, that's not correct. Remember that the rate ratio is given by:
Rate ratio = D1 / Y1
D0 / Y0
= 400 / 2000 = 2.0
200 / 2000
Interaction: Calculation: Rate ratio
year 3:___:
Inclusive
Concurrent
Type of disease
Rare
Type of
risk/protective
factor
All
Example(s)
Most cancers
Invariant measure
All three
Appropriate
sampling
Interaction: Tabs: B:
Situation
Type of disease
Type of
risk/protective
factor
Example(s)
Cases return to the
population at risk
Proportion exposed
constant
Exposed group at
uniform risk
B
Common, nonrecurrent
Risk/protective
factors that affect
all exposed equally
Vaccines that give
partial protection
to those
vaccinated
No
No
Yes
Invariant measure
Rate ratio
Appropriate
sampling
Concurrent
Interaction: Tabs: C:
Situation
Type of disease
Type of
risk/protective
factor
Example(s)
Cases return to the
population at risk
Proportion exposed
constant
Exposed group at
uniform risk
C
Common, nonrecurrent
Protective factor
that does not give
equal protection to
all
Vaccines that give
'all or nothing'
protection
No
No
No
Invariant measure
Risk ratio
Appropriate
sampling
Inclusive
Interaction: Tabs: D:
Situation
Type of disease
Type of
risk/protective
factor
Example(s)
Cases return to the
population at risk
Proportion exposed
constant
Exposed group at
uniform risk
D
Common,
recurrent
All
Diarrhoea, acute
respiratory illness,
malaria
Yes
Yes
?
Invariant measure
Rate ratio
Appropriate
sampling
Concurrent
Section 8: Exercise
Exercise 1
The following 3 examples are case-control studies looking at risk factors for cervical
cancer. The population consists of women participating in a screening programme.
Click on each button below to see the example.
Example A
Example B
Example C
Example A:
Example B:
Example C:
Interaction: Pulldown: Example A: _____:
Incorrect Response Disease odds ratio:
No, note that in this example, the controls are selected from all women registered in
the programme. That means that the sampling is inclusive. So what does that imply
about the measure that is estimated by the exposure odds ratio? Please try again.
Correct Response: Risk ratio:
Correct
Yes, the controls are selected from all women registered in the programme, in other
words inclusive sampling. Therefore the exposure odds ratio in this case estimates
the risk ratio.
Incorrect Response: Rate ratio:
No, note that in this example, the controls are selected from all women registered in
the programme. That means that the sampling is inclusive. So what does that imply
about the measure that is estimated by the exposure odds ratio? Please try again.
Interaction: Pulldown: Example B:____:
Incorrect Response: Disease odds ratio:
No, note that in this example, the controls are selected at the time of diagnosis of
each case. That means that the sampling is concurrent. So what does that imply
about the measure that is estimated by the exposure odds ratio? Please try again.
Incorrect Response: Risk ratio:
No, note that in this example, the controls are selected at the time of diagnosis of
each case. That means that the sampling is concurrent. So what does that imply
about the measure that is estimated by the exposure odds ratio? Please try again.
Correct Response: Rate ratio:
Correct
Yes, the controls are selected at the time of diagnosis, in other words this is
concurrent sampling. Therefore the exposure odds ratio in this case estimates the
rate ratio.
Interaction: Pulldown: Example C:___:
Do you expect these measures to be very different for each of the examples
opposite?
Interaction: Button: clouds picture (pop up box appears):
They are likely to be very similar because cervical cancer is a rare disease.
8.1: Exercise
Exercise 2
This exercise refers to a paper in your reader, Mahmood et al (1989).
Read the extracts from the paper until you reach the section 'Materials and Methods'
and then think about the questions opposite. Click the button at the bottom of each
page to read the (suggested) answer in each case.
Interaction: Tabs: Q1:
From what population were the controls recruited?
Interaction: Button: clouds picture (pop up box appears):
Controls were recruited from infants visiting MCHCs (maternal & child health clinics)
for immunisation and/or routine check-up.
Interaction: Tabs: Q2:
What are the potential advantages of this approach?
relatively rare event and so for the outcome investigated in this study the rate ratio,
risk ratio and odds ratio are all likely to be similar.
Concurrent sampling has the practical virtue that the investigator does not need to
worry whether the control has ever had the outcome of interest nor whether they will
develop it at some time in the future. In this study, though, there was a relatively
wide exclusion window of 1 month.
Interaction: Tabs: Q7:
Why do you think infants "3 months of age and older, with no history of being taken
to an MCHC for immunisation" were excluded from the cases?
Interaction: Button: clouds picture (pop up box appears):
Cases who had never been to an MCHC were excluded to ensure that all cases were
potential controls.
Section 9: Summary
The main points of this session will appear below as you click through the pages
opposite. Click on any of the list entries below to go back to that page.
What does the exposure odds ratio measure?
A case-control study provides an estimate of the exposure odds ratio. Depending on
the design of the study, e.g. the sampling of controls, the exposure odds ratio
estimates:
1. the disease odds ratio
2. the risk ratio
3. the rate ratio
The sampling of controls
Controls are sampled from the same population of interest as cases. The 3 sampling
schemes for control selection are:
Interaction: Tabs: Exclusive:
Exclusive sampling
Controls are sampled at the end of the time period, and the exposure odds ratio
gives an estimate of the disease odds ratio.
Interaction: Tabs: Inclusive:
Inclusive sampling
Controls are sampled at the beginning of the time period and the exposure odds ratio
gives an estimate of the risk ratio.
Interaction: Tabs: Concurrent:
Concurrent sampling
Controls are sampled over time and, hence, the exposure odds ratio gives an
estimate of the rate ratio.
Which situation, which sample
The choice of sampling scheme depends on:
1) whether the disease is rare or not
2) the characteristics of the disease - is it recurrent or not
3) the characteristics of the exposure - e.g. does vaccination confer all-or-nothing
protection, or partial protection to all
You must assess these things before deciding which sampling scheme to use