Comparison of Three Relative Effect Measures in Cohort Studies (Risk Ratio, Rate Ratio,
and Hazard Ratio)
This comparison, pages 1 to 6, was presented in Chapter 11. It is repeated here for review. The
topic of this chapter is how to do design and analyze case-control studies to obtain the same types
of effect estimates.
For illustration, we will use the following data in life table format from a hypothetical cohort
study.
Life Table of Hypothetical Data
Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20
These data can be entered into Stata using the following commands in the gregchapter4.do do-
file.
clear
input day exposure disease count
1 1 1 5
1 1 0 15
2 1 1 10
2 1 0 10
3 1 1 10
1 0 1 2
1 0 0 8
2 0 1 8
2 0 0 12
3 0 1 10
3 0 0 10
end
drop if count==0
expand count
drop count
_____________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
This type of analysis ignores time-at-risk. For that reason, it assumes an equal follow-up time for
every study subject.
The risk ratio analysis uses partial information (shown in blue) from the complete data in the life
table.
Risk Ratio Analysis Data
Exposed Non-Exposed
Disease 25 (50%) 20 (40%)
Non-Disease 25 30
N 50 50
cs disease exposure
| exposure |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 25 20 | 45
Noncases | 25 30 | 55
-----------------+------------------------+----------
Total | 50 50 | 100
| |
Risk | .5 .4 | .45
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .1 | -.0940265 .2940265
Risk ratio | 1.25 | .8064465 1.937512
Attr. frac. ex. | .2 | -.2400079 .4838742
Attr. frac. pop | .1111111 |
+-----------------------------------------------
chi2(1) = 1.01 Pr>chi2 = 0.3149
Analyzing these data in this way, we do not demonstrate a significant effect (RR=1.25, p=0.315).
In fact, this crude RR underestimates each of the day-specific RR estimates.
This type of analysis uses time-a-risk, but in a crude way. It does not assume an equal follow-up
time for each study subject. It assumes, however, that risk is constant across the follow-up time.
The rate ratio analysis uses partial information (shown in blue) from the complete data in the life
table.
| exposure |
| Exposed Unexposed | Total
-----------------+------------------------+----------
disease | 25 20 | 45
day | 90 110 | 200
-----------------+------------------------+----------
| |
Incidence Rate | .2777778 .1818182 | .225
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Inc. rate diff. | .0959596 | -.0389695 .2308887
Inc. rate ratio | 1.527778 | .8147248 2.900724 (exact)
Attr. frac. ex. | .3454545 | -.2274083 .6552584 (exact)
Attr. frac. pop | .1919192 |
+-----------------------------------------------
(midp) Pr(k>=25) = 0.0800 (exact)
(midp) 2*Pr(k>=25) = 0.1599 (exact)
Analyzing these data in this way, we almost demonstrate a significant effect (IRR=1.53,
p=0.080). Notice again, this crude IRR underestimates each of the day-specific IRR estimates.
This type of analysis uses time-a-risk in a very complete way, using all of the information from
the life table. It does not assume an equal follow-up time for each study subject. It allows for
and models a changing risk across the follow-up time.
Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20
The rate ratio analysis only considers the ratio of cases to average person-time, without
distinguishing times to event and times to censored.
Suppose the individual times-at-risk for a sample are: 10, 20, and 30. The person-time is
computed as:
which is equivalent to
PT = mean time N
= (10+20+30)/3 3
= 20 3
= 60
Thus, if we had the scenario where events occurred early while censoring occurred later in one
study group, while in the other study group an equal number of events occurred later while
censoring occurred early, the person time could be equal for the two study groups and we would
erroneously conclude no difference in rates (rate ratio =1) between the two groups.
In this example, the person-time is equal and the rate ratio = 1, yet clearly Group A shows greater
risk for death (Group B survives longer).
Conclusion: Cox regression is sensitive to a changing risk across time, while Poisson regression
(or a rate ratio analysis) is not. Usually both approaches beat the risk ratio approach, since they
have the advantage of using more information, namely time, in the analysis.
It would be nice if we could gain the additional power of a hazard ratio analysis somehow in a
case-control study.
In a case-control study, we do not use time in the analysis. It would seem, then, that a case-
control study can do no better than the risk ratio analysis from a cohort study, which also does
not use time in the analysis.
If the rare-disease assumption is meet, the ordinary case-control study OR is approximately the
RR that would be obtained in a cohort study.
However, if a case-cohort study design is used, which we will see how to design in this chapter,
the OR provides an unbiased estimate of the RR, without the need for the rare-disease
assumption.
Furthermore, if a density case-control study design is used, which is also referred to by many as a
case-cohort study design, the OR provides an unbiased estimate of the HR, also without the need
for the rare-disease assumption. Thus, we can obtain the benefit of a HR analysis, which
improves our chances of demonstrating an exposure-disease association.
We will see how to conduct the sampling required for these three variants of the case-control
study design, use simulation to demonstrate what the OR estimates, and explain why it estimates
these effect measures.
We will use the dataset found in the Breslow and Day (1987, Appendix VIII and Appendix ID).
Men (n=679) employed in a nickel refinery in South Wales were investigated to determine
whether the risk of developing carcinoma of the bronchi and nasal sinuses (ICD = 160), which
had been associated with the refining of nickel from previous studies in the 1930s, was present in
this cohort. The data are in the file nickelrefinary.csv. The variables are:
CaseID Case ID
PrimaryICD Primary ICD Code
Exposure Nickel Exposure Level
DateBirth Date of Birth
AgeEmp Age First Employed
AgeBegFol Age Follow-up Began
AgeEndFol Age at Death or Loss
Executing the following commands in the do-file editor, we see up the variables and save them to
a new file (this has already been done, with nickelrefinary.dta in the datasets & do-files
subdirectory.
clear
set mem 10M
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "IntroEpiCourse\datasets & do-files"
insheet using nickelrefinary.csv
So that we have a dataset where the rare disease assumption is not met, we next duplicate the
cases five times save this augmented dataset to a separate file. (This is for illustration only—of
course you would not do this in an actual data analyis.)
This has already been done. The file nickelrefinary5xcases.dta is in the datasets & do-files
subdirectory.
For illustration, we will assume our N=679 represents the population that we will sample from.
Doing this, we can determine the population effect measures that our samples will be estimates
of. Reading in the original data (not the 5 x cases dataset),
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on nickelrefinary.dta
Open
Statistics
Observational/Epi. analysis
Tables for epidemiologists
case control odds ratio
Main tab: Case variable: tumor
Exposed variable: nickel
OK
cc tumor nickel
cc tumor nickel
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 46 10 | 56 0.8214
Controls | 343 280 | 623 0.5506
-----------------+------------------------+----------------------
Total | 389 290 | 679 0.5729
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 3.755102 | 1.824533 8.48588 (exact)
Attr. frac. ex. | .7336957 | .4519145 .8821572 (exact)
Attr. frac. pop | .6026786 |
+-----------------------------------------------
chi2(1) = 15.41 Pr>chi2 = 0.0001
Statistics
Observational/Epi. analysis
Tables for epidemiologists
Cohort study: risk ratio etc.
Main tab: Case variable: tumor
Exposure variable: nickel
OK
cs tumor nickel
Statistics
Observational/Epi. analysis
Tables for epidemiologists
Incidence rate ratios
Main tab: Case variable: tumor
Exposed variable: nickel
Person-time variable: timerisk
OK
Statistics
Survival analysis
Set up and utilities
Declare data to be survival time data
Main tab: Time variable: timerisk
Failure event: Failure variable: tumor
Failure values: 1
OK
Statistics
Survival analysis
Regression models
Cox proportional hazards model
Model tab: Independent variables: nickel
OK
stcox nickel
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 5.022065 1.765996 4.59 0.000 2.520922 10.00472
------------------------------------------------------------------------------
. cc tumor nickel
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 230 50 | 280 0.8214
Controls | 343 280 | 623 0.5506
-----------------+------------------------+----------------------
Total | 573 330 | 903 0.6346
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 3.755102 | 2.635221 5.405224 (exact)
Attr. frac. ex. | .7336957 | .6205252 .8149938 (exact)
Attr. frac. pop | .6026786 |
+-----------------------------------------------
chi2(1) = 61.12 Pr>chi2 = 0.0000
. cs tumor nickel
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.188323 .660423 9.08 0.000 3.074829 5.705048
------------------------------------------------------------------------------
Considering our total sample as our population, we observed the following population effect
measures.
Most researchers choose their controls from the population controls only. First we will do this
for one sample.
We will use 2 controls for each case (2:1 sampling ratio). In real practice, you might choose a
greater number, such as 8 controls for each case. We use 2:1 in this illustration, in order to keep
the sample size much smaller than the population size, which makes the simulation more
believable.
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on nickelrefinary.dta
Open
Statistics
Summaries, tables & tests
Tables
Two-way tables with measures of association
Main tab: Row variable: tumor
Column variable: nickel
Cell contents: Within-column relative frequencies
OK
carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 280 343 | 623
| 96.55 88.17 | 91.75
------------+----------------------+----------
1. tumor | 10 46 | 56
| 3.45 11.83 | 8.25
------------+----------------------+----------
Total | 290 389 | 679
| 100.00 100.00 | 100.00
From this population 2 × 2 table, we want to use all of the cases (n=56, the entire tumor row) and
twice as many controls (sample 112 controls from the “no tumor” row),
carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 280 343 | 623 <= sample 56 x 2 = 112 controls
| 96.55 88.17 | 91.75
------------+----------------------+----------
1. tumor | 10 46 | 56 <= use all 56 cases
| 3.45 11.83 | 8.25
------------+----------------------+----------
Total | 290 389 | 679
| 100.00 100.00 | 100.00
First, we set the random number generator seed so we can get the same sample if we need to
replicate our results later,
We now want to sample n=112 if tumor = 0 and just keep all of the cases (tumor = 1),
carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 56 56 | 112
| 84.85 54.90 | 66.67
------------+----------------------+----------
1. tumor | 10 46 | 56
| 15.15 45.10 | 33.33
------------+----------------------+----------
Total | 66 102 | 168
| 100.00 100.00 | 100.00
cc tumor nickel
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+------------------------
Cases | 46 10 | 56 0.8214
Controls | 56 56 | 112 0.5000
-----------------+------------------------+------------------------
Total | 102 66 | 168 0.6071
| |
| Point estimate | [95% Conf. Interval]
|------------------------+------------------------
Odds ratio | 4.6 | 2.019615 11.17006 (exact)
Attr. frac. ex. | .7826087 | .5048562 .910475 (exact)
Attr. frac. pop | .6428571 |
+-------------------------------------------------
chi2(1) = 16.17 Pr>chi2 = 0.0001
In our sample, we get an OR of 4.60 (in contrast to the population OR of 3.76), which seems
rather off. However, we cannot judge if this is an unbiased estimate for the population OR,
because this estimate is subject to sampling variability.
.4
.2
0
2 4 6 8
or
Using the mean of the 1,000 ORs as the long-run average, we get OR = 3.81.
We see that the OR is an unbiased estimate of the OR, regardless of the rare disease assumption.
The OR is not an unbiased estimate of the RR, however. If the rare disease assumption was met,
however, it would be a reasonable close estimate. Notice the OR=3.81 is much closer to the
RR=3.43 in the “almost rare disease” column.
1 M1 = M0 = 1 (c+d) = 1c + 1d
2 M1 = M0 = 2 (c+d) = 2c + 2d
Sampling in general,
Rothman (2002, pp.84-86) suggests sampling controls from the entire population, regardless of
case or control status. Thus some cases may be selected as controls as well. (Rothman does not
even mention the classical case-control design, where controls are sampled only from non-
diseased subjects.) When controls are selected this way, which is from the entire population at
risk, than the study design is called a case-cohort design, rather than a case-control design
(Rothmand and Greenland, 1998, p.108).
c = km1
d = km0
So if we choose our controls as some fraction of the total row, our odds ratio is identically the
risk ratio. That is, in a case-cohort study, the OR directly estimates the RR, regardless of the rare
disease assumption.
Notice in the second sentence of the abstract, they point out that the controls are sampled
from the “total row of the full cohort 2 x 2 table” when they state,
...a case-cohort design, which consists of a small random sample of the whole cohort and
all of the disease subjects...”
In the second paragraph, they cite some studies that have used the case-cohort design.
This is a bit more complex, so we will commands rather than menus, since it will be easier to see
what we are doing.
Then using the Monte Carlo method to obtain the long-run average OR from the original (rare
disease) dataset
with a similar simulation using the augmented (frequent disease) dataset, also in the do-file.
For this sampling approach, we see that the sample OR is an unbiased estimator of the population
RR, as Rothman claims it should be.
For the case-cohort design, the rare-disease assumption is not required for the OR to be an
estimate of RR (Rothman and Greenland, 1998, p.110). We have demonstrated that to be the
case.
Rothman (2002, pp.76-80) suggests sampling controls from the entire population, regardless of
case or control status, but also select the controls from subjects with similar or longer time at risk
as the cases, in a matched fashion. Again, some cases may be selected as controls as well. This
is called risk-set sampling.
In this study design, we want the OR to be an unbiased estimate of the hazard ratio, HR, where
HR is a type of weighted average of the day-specific risk ratios.
In this design, we also use a type of “total row sampling”. That is we select our controls from the
“Begin N” column’s of the life table.
For the 5+2 cases that occurred on day 1, we sample our controls from the 50+50 persons still at
risk on day 1.
For the 10+8 cases that occurred on day 2, we sample our controls from the 30+40 persons still at
risk on day 2,
and so on.
We do this by forming risk sets. For every case, we form a risk set that includes all subjects with
an equal or longer follow-up time. Then we sample 2 controls from that risk set, if we use a 2:1
sampling ratio, that we match with that case.
This is identically sampling from the correct row of the Begin N column.
Just as we saw in the case-cohort study, proportionality is maintained, which guarantees that our
OR is an estimate of RR for each row of the life table.
Cox regression, which computes the HR directly, also summarizes the day-specific RR, (which is
also called the day-specific HR), computing a type of weighted average which it reports as the
HR.
Thus, the conditional logistic regression from the case-control study does the same thing as the
Cox regression from a cohort study. (NOTE: the conditional logistic approach is biased and so
should not be used, as is pointed out below.)
Let’s do it.
we get
Conditional (fixed-effects) logistic regression Number of obs = 168
LR chi2(1) = 17.65
Prob > chi2 = 0.0000
Log likelihood = -52.698159 Pseudo R2 = 0.1434
------------------------------------------------------------------------------
_case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.951997 2.131085 3.72 0.000 2.130428 11.51049
------------------------------------------------------------------------------
Notice we used conditional logistic regression to obtain the odds ratio. Given our matching of
time-at-risk (from risk-set sampling), we had to obtain an odds ratio using a matched sample
approach since matched studies require a matched analysis (Rothman and Greenland, 1998, p.
98; Greenland and Thomas, 1982).
+--------------------------------------------------------------------+
| tumor nickel timerisk _case _set _time |
|--------------------------------------------------------------------|
1. | 0. no tumor 1. exposure 35.6823 0 1 .70149994 |
2. | 0. no tumor 1. exposure 16.3425 0 1 .70149994 |
3. | 1. tumor 1. exposure .7014999 1 1 .70149994 |
|--------------------------------------------------------------------|
4. | 0. no tumor 1. exposure 9.000103 0 2 1.1797981 |
5. | 0. no tumor 1. exposure 6.370903 0 2 1.1797981 |
6. | 1. tumor 1. exposure 1.179798 1 2 1.1797981 |
|--------------------------------------------------------------------|
7. | 0. no tumor 1. exposure 19.6768 0 3 1.4220009 |
8. | 0. no tumor 0. no exposure 41.7836 0 3 1.4220009 |
9. | 1. tumor 0. no exposure 1.422001 1 3 1.4220009 |
+--------------------------------------------------------------------+
Notice that for each risk set, the follow-up time of the matched controls was greater than or equal
to the follow-up time of the case.
We see that the OR is a biased estimate of the HR, and so the conditional logistic regression
model should not be used for the analysis. It is close though. Another approach is taught below.
Not all authors use the study design names consistently. For example,
A)
Rothman (2002, p.84-86) uses the term case-cohort study design to refer to the situation when
follow-up is assumed equal for all subjects, or just simply ignored, so controls are simply
selected from all subjects at risk (from the total row of the 2 × 2 table).
Prentice (1986, p.2) calls this study design a case-cohort design: binary response.
B)
Rothman (2002, pp. 76-80) uses the term density case-control study design to refer to the
situation when follow-up is not equal for all subjects, so risk-set sampling is used to select
controls from all subjects at risk with equal or longer follow-up times (from the appropriate row
of a life table).
Prentice (1986, p. 4) calls this study design a case-cohort design: time to response data.
Jewell (2004, pp.51-53) presents risk-set sampling (density case-control study design) as a way
to use a case-control study to obtain an estimate of the hazard ratio (HR).
Rothman (2002, pp.76-80) presents the density case-control study design as a way to obtain an
estimate of the incidence rate ratio (IRR). Although he does not say so, Rothman is apparently
making the assumption that risk is constant across time. Under that assumption, HR = IRR.
Prentice RL. (1986). A case-cohort design for epidemiologic cohort studies and diease prevention
trials. Biometrika 73:1-11.
King G, Zeng L. (2002). Estimating risk and rate levels, ratios and differences in case-control
studies. Statist Med 21:1409-1427.
Volovics A, van den Brandt PA. (1997). Methods for the analysis of case-cohort studies. Biom J
39(2):195-214.
1) Rossing MA, Daling JR, Weiss NS, Moore DE, Self SG. (1996). Risk of breast cancer in a
cohort of infertile women. Gynecol Oncol 60(1):3-7.
Abstract
The purpose of this study was to assess: (1) the risk of breast cancer associated with use
of ovulation-inducing agents (such as clomiphene citrate) as treatment for infertility; and
(2) the risk associated with ovulatory abnormalities that result in infertility. We
performed a case-cohort study among 3837 women evaluated for infertility at clinics in
Seattle, Washington, at some time during 1974–1985. Computer linkage with a
population-based tumor registry was used to identify women diagnosed with breast cancer
before January 1, 1992. Data regarding infertility testing and treatment were abstracted
from the infertility clinic medical records for women who developed breast cancer and a
randomly selected subcohort. Twenty-seven women in the cohort developedin situor
invasive breast cancer, in comparison with an expected number of 28.8 cases
(standardized incidence ratio, 0.9; 95% confidence interval (CI), 0.6–1.4). Infertile
women with evidence of an ovulatory abnormality were at a risk of breast cancer similar
to that of women whose infertility was believed to be due to other causes. The risk among
women who had taken clomiphene was reduced relative to infertile women who had not
used this drug (adjusted relative risk, 0.5; 95% CI, 0.2–1.2), but the reduction in risk did
not increase with duration of use. The possibility that use of clomiphene as treatment for
infertility lowers the risk of breast cancer should be examined in other, larger studies.
Notice this study has a long and unequal follow-up, so the density case-control design is well-
suited for this study.
2) Savitz DA, Cai J, van Wijngaarden E, et al (2000). Case-cohort analysis of brain cancer and
leukemia in electric utility workers using a refined magnetic field job-exposure matrix.
American Journal of Industrial Medicine 38:417-425.
3) Voorrips LE, Goldbohm RA, Brants HA, et al. (2000). A prospective cohort study on
antioxidant and folate intake and male lung cancer risk. Cancer Epidemiol Biomarkers Prev
9:357-65.
In the 3rd paragraph of their Data Analysis section, they state they did something special to
adjust the variance estimates, using a software routine they developed:
“Because standard software was not available for case-cohort analysis, specific macros
were developed to account for the additional variance introduced by sampling from the
cohort instead of using the entire cohort (29).”
This approach in Stata fits a Cox regression model to the data with an appropriate variance
estimate, so the p values and confidence intervals are correct.
Usually when sampling is from a larger cohort, follow-up times are available. Rather than using
the risk set sampling and conditional regression approached described above, researchers instead
using Cox regression model with a special variance estimator (at least three such estimators have
been proposed). All of the example papers presented in this chapter followed this suitably
adapted Cox regression analysis approach.
2) sampled cohort
. stcascoh, alpha(.2) seed(999) // .2 or 20% of the cohort
------------------------------------------------------------------------------
| Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.652655 1.846243 3.87 0.000 2.137624 10.12676
------------------------------------------------------------------------------
Prentice Scheme
------------------------------------------------------------------------------
| Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.587959 1.82057 3.84 0.000 2.1079 9.98594
------------------------------------------------------------------------------
clear
set obs 1
gen or=.
save hr_simulation, replace // create a file with 1 missing
observation
*
set more off // turn off scrolling prompt
* set seed 999 // doesn't work outside of shcascoh command
forvalues i=1(1)1000{
quietly use nickelrefinary, clear
quietly stset timerisk , failure(tumor==1) id(caseid)
quietly stcascoh, alpha(.1798) // sample 18% (n=112) of controls
quietly stselpre nickel // fit model
quietly matrix A=e(b)
quietly svmat A // creates variables from matrix columns
quietly gen hr = exp(A1) in 1/1 // convert coefficient to OR
quietly keep hr
quietly keep in 1/1
quietly append using hr_simulation
quietly save hr_simulation, replace
display `i' // display iteration number
}
set more on
use hr_simulation, clear
sum hr
we get
. sum hr
so the long-run HR=5.05, compared to the population HR=5.02, which is an unbiased estimate.
In contrast, the risk-set sampling, followed by conditional logistic regression, which was
demonstrated above and gave a long-run average HR=5.42, produces a biased estimate and so
should not be used.
We have seen that subjects can be included as both cases and controls in the case-cohort
approach. This overlap requires that the sample size be inflated to allow for this. Rothman,
Greenland, and Lash (2008) comment,
The “1.25” comes from: 80%, or 4/5 of the controls are “controls only”. To get this sample size
back up to 100% controls only, with equals number of cases, you (5/4)(4/5) = 1, where 5/4 =
1.25.
Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research, Vol II: The Design and
Analysis of Cohort Studies, Lyon, France, IARC.
Cai J, Zeng D. (2004). Sample size/power calculation for case-cohort studies. Biometrics
60:1015-1024.
Dupont WD. (2002). Statistical Modeling for Biomedical Researchers: A Simple Introduction to
the Analysis of Complex Data. Cambridge UK, Cambridge University Press.
Greenland S, Thomas DC. (1982). On the need for the rare disease assumption in case-control
studies. Am J Epidemiol 116(3):547-553. with erratum in Am J Epidemiol
1990;131(6):1102.
King G, Zeng L. (2002). Estimating risk and rate levels, ratios and differences in case-control
studies. Statist Med 21:1409-1427.
Jewell NP. (2004). Statistics for Epidemiology. New York, Chapman & Hall/CRC.
National Heart, Lung, and Blood Institute. (1998). Clinical guidelines for the identification,
evaluation, and treatment of overweight and obesity in adults: the evidence report.
Bethesda, MD, National Heart, Lung, and Blood Institute.
Onyike CU, Crum RM, Lee HB, Lyketsos CG, Eaton WW. (2003). Is obesity associated with
major depression? Results from the third national health and nutrition examination
survey. Am J Epidemiol 158(12):1139-1153.
Prentice RL. (1986). A case-cohort design for epidemiologic cohort studies and diease prevention
trials. Biometrika 73:1-11.
Rothman KJ. (2002). Epidemiology: An Introduction. New York, Oxford University Press.
Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA.
Rothman KJ, Greenland S, Lash TL. (2008). Case-control studies. In Rothman KJ, Greenland S,
Lash TL, Modern Epidemiology, Philadelphia, Lippincott Williams & Wilkins, 2008,
pp.111-127.
Savitz DA, Cai J, van Wijngaarden E, et al (2000). Case-cohort analysis of brain cancer and
leukemia in electric utility workers using a refined magnetic field job-exposure matrix.
American Journal of Industrial Medicine 38:417-425.
Volovics A, van den Brandt PA. (1997). Methods for the analysis of case-cohort studies. Biom J
39(2):195-214.
Voorrips LE, Goldbohm RA, Brants HA, et al. (2000). A prospective cohort study on antioxidant
Chapter 3-14 (revision 16 May 2010) p. 36
and folate intake and male lung cancer risk. Cancer Epidemiol Biomarkers Prev
9:357-65.