net/publication/254847492
CITATIONS READS
71 2,861
3 authors, including:
Vytautas Kasiulevičius
Vilnius University
52 PUBLICATIONS 162 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Vytautas Kasiulevičius on 14 April 2015.
Summary
sample sizes, making the interpretation of negative results
Sample-size determination is often an important difficult. Conducting a study with an inadequate sample
step in planning an epidemiological study. There are size is not only futile, it is also unethical. Exposing pa-
several approaches to determining sample size. It depends on tients to the risks inherent in a research is justifiable on-
the type of the study. Descriptive, observational and randomized ly if there is a realistic possibility that the results will be-
controlled studies have different formulas to calculate sample nefit those subjects, future subjects, or lead to substantial
size. In this article, we discuss the formulas that can help to scientific progress.
estimate sample size in an epidemiological trial. We present a How many individuals will I need to study? This ques-
few examples from clinical practice, which may contribute to tion is commonly asked by a clinical investigator and ex-
the understanding of this problem. poses one of many issues that are best to be settled before
Keywords: sample size determination actually carrying out a study. Consultation with a statisti-
cian is worthwhile in addressing many issues of study de-
Determining an appropriate sample size for a clinical sign, but a statistician is not always readily available.
trial is an essential step in the statistical design of the pro- Sample Size (n) is the number of individuals in a
ject. An adequate sample size helps ensure that the stu- group under study. The larger the sample size, the grea-
dy will yield reliable information, regardless of whether ter the precision and, thus, power for a given study de-
the ultimate data suggest a clinically important difference sign to detect an effect of a given size. For statisticians,
between the treatments being studied, or the study is in- an n > 30 is usually sufficient for the Central Limit Theo-
tended to measure the accuracy of a diagnostic test or the rem to hold so that normal theory approximations can be
incidence of a disease. Unfortunately, many studies pub- used for measures such as the standard error of the mean.
lished in medical literature are conducted with inadequate However, this sample size (n = 30) is unrelated to the cli-
Address: V. Kasiulevičius nicians’ objective of detecting biologically significant ef-
Santariškių g. 2, Vilnius fects, which determines the specific sample size needed
Tel. 8 5 2365192
El. paštas Vytautas.Kasiulevicius@santa.lt for a specific study [1].
226 V. Kasiulevièius, V. Šapoka, R. Filipavièiûtë2
where N is the size of the population and n is the size of to that of the proportion, except for the measure of variabi-
the sample. If fpc is close to 1, then there is almost no ef- lity. The formula for the mean is shown in the equation.
fect. When fpc is much smaller than 1, then sampling a
large fraction of the population is indeed having an effect ,
on precision.
When the sample size is 50, it does not matter much Where n0 is the sample size, z is the abscissa of the
whether the population is 10 thousand or 10 million. normal curve that cuts off an area at the tails, e is the de-
When the sample size is four thousand, then we sired level of precision (in the same unit of measure as
have about 23% more precision with a population of the variance), and σ2 is the variance of an attribute in the
ten thousand than we would for a population of ten population.
million.
It is possible to calculate sample size directly, without Case-control studies
the fpc calculation: In a case-control study, patients who have developed
a disease are identified and their past exposure to suspec-
. ted etiological factors is compared with that of controls
or referents who do not have the disease. This permits
estimation of odds ratios (but not of attributable risks).
Researchers must be cautious when using the fpc. Allowance is made for potential confounding factors by
Frequently we want to generalize results to a larger popu- measuring them and making appropriate adjustments in
lation. We may have restricted the population for conve- the analysis. This statistical adjustment may be rendered
nience, but we are interested in more than just a convenient more efficiently by matching cases and controls for ex-
population. This extrapolation will add to the uncertainty posure to confounders, either on an individual basis (for
of our estimates, so the last thing we would want to do is example, by pairing each case with a control of the same
to use the fpc to make your confidence intervals narrower. age and sex) or in groups (for example, choosing a con-
The finite population correction factor really applies only trol group with an overall age and sex distribution similar
to “warehouse” type studies, where we are trying to cha- to that of the cases). Unlike in a cohort study, however,
racterize all the data in a single physical or conceptual lo- matching, on its own, does not eliminate confounding.
cation. Warehouse studies are quite common in accounting, Statistical adjustment is still required.
but they are unusual in medical research. In a descriptive
study we also need to know the likely response rate. For Sample size for independent case-control studies
example, if our calculations indicate that we need a mini- The estimated sample size n for independent case-con-
mum sample size of 384, but we only expect a 80% res- trol study is calculated as
ponse rate, then we will need a minimum sample size of
480 to allow for a possible non-response [4].
There are two methods to determine sample size for
variables that are polytomous or continuous. One method
is to combine responses into two categories and then use a
sample size based on proportion. The second method is to
use the formula for the sample size for the mean. Yamane
(1967) provides a simplified formula to calculate sample
sizes for proportions [2]:
sure in controls. p0 can be estimated as the population pre- statistical power. If you are using more than one control
valence of exposure, nc is the continuity corrected sample per case, then this function also provides the education in
size and Zp is the standard normal deviate for probability sample size relative to a paired study that you can obtain
p. If possible, choose a range of odds ratios that you want using your number of controls per case [9].
to detect the statistical power.
The formulas give the minimum number of case sub- Cohort studies
jects required to detect a real odds ratio or case exposure The starting point of a cohort study is the recording of
rate with power and two-sided type I error probability α. healthy subjects with and without exposure to the puta-
This sample size is also given as a continuity-corrected tive agent or the characteristic being studied. Individuals
value intended for the use with corrected chi-square and exposed to the agent under study (index subjects) are fol-
Fisher's exact tests [7, 8, 10]. lowed over time and their health status is observed and
recorded during the course of the study. In order to com-
Sample size for matched case-control studies pare the occurrence of disease in exposed subjects with
The estimated sample size n for matched case-control its occurrence in non-exposed subjects, the health status
study is calculated as of a group of individuals not exposed to the agent under
study (control subjects) is followed in the same way as
that of the group of index subjects.
as a continuity-corrected value intended for use with cor- not receiving the treatment under study give investigators
rected chi-square and Fisher’s exact tests [7–10]. important clues to the effectiveness of the treatment, its si-
de effects, and the parameters that modify these effects. In
Sample size for paired cohort studies the hierarchy of research designs, the results of randomi-
The estimated sample size n for paired cohort studies zed, controlled trials are considered to be evidence of the
is calculated as highest grade, whereas observational studies are viewed
as having less validity because they reportedly overesti-
mate treatment effects [16, 17].
erode the study event rate compared with observed rates Hence, 578 patients would be required in each treat-
in the population. ment group.
The effect of the treatment (P2) in a trial can be expres-
sed as an absolute difference, i. e. the difference between
the rate of the event in the control group and the rate in Calculating the sample size for comparing a con-
the intervention group, or as a relative reduction, i. e. the tinuous outcome between two groups
proportional change in the event rate with the treatment. The number required to participate in each group is
If the rate in the control group is 6.3% and the rate in the given in the equation below:
intervention arm is 4.2%, the absolute difference is 2.1%;
the relative reduction with intervention is 2.1% / 6.3%, or
33%. Investigators should take into consideration any cost
or logistical advantages or disadvantages of the interven-
tional treatment compared with a standard care.
The number required to participate in each group is
given in the equation below:
We will power our trial for a 5% 2-sided test with 80%
power, i. e. zα = 1.96 and zβ = 0.84. Thus, the required
number per group is:
POWER (1-Beta) In the reported trial, only 59/69 = 85.5% had data on
0.05 0.1 0.2 0.5
social functioning (though this probably refers to social
functioning at 18 months). So, anticipating 15% to drop
0.1 10.8222 8.5638 6.1826 2.7055
out gives 141.12/.85 = 166.02 per group, i.e. about 170
0.05 12.9947 10.5074 7.8489 3.8415
per group, i.e. 340 in total.
0.02 15.7704 13.0169 10.0360 5.4119
4. Eng J. Sample size estimation: how many individuals 14. Moher D, Schulz KF, Altman D. The CONSORT
should be studied? Radiology. 2003; 227: 309–13. statement: revised recommendations for improving the
5. Daniel W. Biostatistics: a foundation for analysis in quality of reports or parallelgroup trials. Lancet. 2001;
the health sciences. 7th ed. New York, NY: Wiley, 1999; 357: 1191–94.
180–5, 268–70. 15. Altman DG, Schulz KF, Moher D et al. The re-
6. Israel GD. The Evidence of Sampling Extension Pro- vised CONSORT statement for reporting randomized tri-
gram Impact. Program Evaluation and Organizational De- als: explanation and elaboration. Ann Intern Med. 2001;
velopment, IFAS, University of Florida. PEOD-5. 1992. 134: 663–94.
7. Dupont WD. Power calculations for matched case- 16. Kirby A, Gebski V, Keech AC. Determining the
control studies. Biometrics. 1988; 44: 1157–68. sample size in a clinical trial. MJA. 2002; 177(5): 256–
8. Dupont WD. Power and sample size calculations. 7.
Controlled Clinical Trials. 1990; 11: 116–28. 17. Moher D, Dulberg CS, Wells GA. Statistical power,
9. Casagrande JT, Pike MC, Smith PG. An improved sample size, and their reporting in randomized controlled
approximate formula for calculating sample sizes for trials. JAMA. 1994; 272: 122–4.
comparing two binomial distributions. Biometrics. 1978; 18. Goodman SN, Berlin JA. The use of predicted con-
34: 483–6. fidence intervals when planning experiments and the mis-
10. Schlesselman JJ. Case-Control Studies. New York: use of power when interpreting results. Ann Intern Med.
Oxford University Press, 1982. 1994; 121: 200–6.
11. Breslow NE, Day NE. Statistical Methods in Can- 19. Schulz KF, Grimes DA. Sample size calculations
cer Research: The Analysis of Case-Control Studies. Lyon: in randomized trials: mandatory and mystical. Lancet.
International Agency for Research on Cancer, 1980. 2005; 365: 1348–53.
12. Fleiss JL. Statistical Methods for Rates and Pro-
portions. 2nd ed. New York: Wiley, 1981. Received 16 June, 2006,
13. Meinert CL. Clinical Trials: Design, Conduct and accepted 11 September, 2006
Analysis. New York: Oxford University Press, 1986.
IMTIES DYDŽIO NUSTATYMAS EPIDEMIOLOGINĖSE dydį. Tai priklauso nuo tyrimo rūšies. Aprašomųjų, stebimųjų
STUDIJOSE ir atsitiktinės atrankos būdu kontroliuojamų tiriamųjų imtims
nustatyti naudojamos skirtingos formulės. Šiame straipsnyje
V. Kasiulevičius1, V. Šapoka1, R. Filipavičiūtė2 aptariame tyrime vartojamas formules imties dydžiui nustatyti,
1
Vilniaus universitetas pateikiame keletą pavyzdžių, padėsiančių geriau suvokti pro-
2
Vilniaus universiteto Eksperimentinės ir klinikinės medicinos blemos aktualumą.
institutas Raktažodžiai: imties dydžio nustatymas
Santrauka
Imties nustatymas dažnai yra svarbus numatant epidemio-
loginį tyrimą. Yra keletas metodikų, padedančių nustatyti imties