Prevalencia e Tamanho de Amostra

Opinion TRENDS in Parasitology Vol.22 No.
5 May 2006
Parasite prevalence and sample size:

misconceptions and solutions
Roger Jovani and José L. Tella
Department of Applied Biology, Estación Biológica de Doñana, Consejo Superior de Investigaciones Cientı́ficas,
Avenid Maria Luisa s/n, 41013 Sevilla, Spain
Parasite prevalence (the proportion of infected hosts) is changes dramatically when we move from small to large
a common measure used to describe parasitaemias and sample sizes. For instance, if we sample six adult
to unravel ecological and evolutionary factors that lizards and find four to be infected (nZ6, iZ4) we
influence host–parasite relationships. Prevalence esti- obtain pZ(4/6)*100Z67%, but if we found five infected,
mates are often based on small sample sizes because of then pZ83%, a difference of 16%. However, for nZ100,
either low abundance of the hosts or logistical problems iZ4 returns pZ4%, and iZ5 returns pZ5%, only a 1%
associated with their capture or laboratory analysis. difference. This produces the effect that slight prevalence
Because the accuracy of prevalence estimates is lower differences can be detected at high sample sizes but not at
with small sample sizes, addressing sample size has low ones. In other words, our uncertainty about the real
been a common problem when dealing with prevalence prevalence (i.e. population prevalence, P) is higher at low
data. Different methods are currently being applied to sample sizes (Figure 2).
overcome this statistical challenge, but far from being The accuracy with which we calculate the prevalence
different correct ways of solving a same problem, some decreases not only at low sample sizes (as expected), but
are clearly wrong, and others need improvement. also when the populational prevalence is close to 50%
(Figure 2). At first, this property could seem counter-
intuitive, because rare events (e.g. a parasitised individ-
Introduction ual at PZ1% or an uninfected individual at PZ99%) are
In a given population with N individuals, the prevalence difficult to observe, particularly at low sample sizes.
(P) denotes the proportion of infected individuals by a However, when populational prevalence is close to zero or
given parasite species or group of species (e.g. a parasite 100%, sample prevalence is also close to or equal to zero
genus). The actual prevalence of a population, however, is or 100%, respectively, precisely because rare events are
usually unknown because the number of sampled hosts (n; difficult to detect. At intermediate populational preva-
the sample size) is generally lower than total population lences (e.g. PZ52%), the chance of finding parasitised or
size (N). However, we can easily obtain an estimate (p) by non-parasitised individuals is similar, and sample preva-
dividing the number of infected individuals (i) by the lence could then freely fluctuate from zero to 100%,
number of sampled ones [pZ(i/n)*100], where iZ0,1,2.n, falling away from the actual populational prevalence and
and nZ1,2,3.N. Nonetheless, the accuracy of this increasing the uncertainty in the prevalence estimate.
estimate is known to be affected by sample size. This
problem has long concerned researchers because the
Minimum sample size
results of statistical analyses dealing with prevalence
The most intuitive method to overcome the high
data and the derived conclusions could depend greatly on
statistical uncertainty of prevalences calculated from
the number of sampled hosts [1]. Here, we review current
low sample sizes is to reject data obtained from small
methods used to overcome this statistical problem,
sample sizes. However, the threshold for establishing a
identify wrong approaches, improve others and suggest
minimum sample size is highly variable because it
more powerful practices.
depends on subjective decisions of the researchers. This
leads to researchers either not considering a minimum
Basic concepts sample size (e.g. [2]), or considering low minimum sample
The low accuracy of prevalence estimates when using a sizes, such as three [3], five [4], or eight [5]; medium
small sample size has a mathematical basis. Given that sample sizes, such as ten [6], 15 [7] or 20 [8]; and higher
both the number of infected individuals and sample size ones up to 30 [9] or even 75 [10]. Obviously, more is
are integers, sampling prevalence (i.e. the prevalence better, but rather than being a linear relationship,
within a given sample of individuals) is constrained to a uncertainty rapidly decreases as sample size increases
particular set of values for each sample size (Figure 1). up to 10–20 individuals, but not much more with further
Moreover, the behaviour of prevalence estimates for increasing sample sizes (Figure 2). Thus, a sample size
slight increases in the number of infected individuals around 15 could be used as a reasonable trade-off
Corresponding author: Jovani, R. (jovani@ebd.csic.es). between not losing too much data from analyses and
Available online 13 March 2006 maintaining acceptable levels of uncertainty (around 1/3
www.sciencedirect.com 1471-4922/$ - see front matter Q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.pt.2006.02.011
Opinion TRENDS in Parasitology Vol.22 No.5 May 2006 215
Avoiding zero prevalences

(a) Gregory and Blackburn [13] claimed that the minimum
100
but not the maximum prevalence that could be achieved in
a sample of a population is affected by sample size. This
influential paper caused several studies to reject zero
Sampling prevalence (p)
75
prevalences from their analyses or to question its previous
use. These authors stated correctly [13] that large sample
sizes (e.g. 1000 hosts) are needed to detect very low
50
prevalences (e.g. 0.1%), but that 100% prevalences could
be detected with only one individual sampled if it is
infected. They presented figures with both axes log-
25
transformed (similar to Figure 1b), but this hid the
symmetric shape of the actual relationship between
sample size and prevalence (Figure 1a,c). That is, as
0
happens with prevalences near zero, prevalences near
0 10 20 30 40 50 100% (e.g. 99.9%) could only be achieved with high sample
Sample size (n)
sizes (e.g. 1000 hosts). Thus, according to this symmetry
prevalence (p)
prevalence (p)
(b) 100 (c) 100 and the suggestions of Gregory and Blackburn [13], we
Sampling
Sampling
10 75 should also reject 100% prevalences from analyses.

1 50
25 Parasite prevalence differs widely both between and
0.1
0 within host species [14,15]. Therefore, when assessing
1 10 100 1000 1 10 100 1000 sources of variability in parasite prevalences (e.g. between
Sample size (n) Sample size (n) marine and freshwater habitats [16]), a prevalence value
TRENDS in Parasitology of zero has the same ecological relevance as a prevalence of
0.1%; and the same happens between 100 and 99.9%.
Figrue 1. All the possible values that prevalence could reach at different sample
sizes. For instance, for a sample sizeZ1, prevalence could be either (0/1)*100Z0%
Thus, we are throwing out very relevant information by
or (1/1)*100Z100%; for a sample size of 2, there are three possibilities: (0/2)*100Z rejecting zero and 100% prevalences. Accordingly, zero
0%, (1/2)*100Z50%, and (2/2)*100Z100%, and so on. This is illustrated for sample (see Box 1) and 100% prevalences should be included in
sizes (a) from 1 to 50 in lineal axes, (b) from 1 to 3500 in log-log axes, and (c) from 1
to 3500 with logarithmic x-axis and lineal y-axis.
the analyses.
of the sampling prevalence). An additional recommen- Residuals of prevalence on sample size

dation is to test the robustness of the results of the Another method used to control for the potential effects of
analyses using different minimum sample size cut-offs sample size is to obtain the residuals of a linear regression
[11,12]. between sampling prevalence (the dependent variable)
and sample size (the independent variable), and use them
as the dependent variable for comparative studies [17,18].
A clear example of this rationale is a study [18] in which
50 residuals of parasite prevalence against sample size were
used in an independent contrast analysis when a
correlation was found among these variables, but not in
40 another analysis of the same study, in which such a
relationship was not found [18]. This approach follows
previous methods that aimed to remove the effect of body
size on body-size-related variables, such as home range
Standard error
30
area [19]. Moreover, it has been influenced by the need to
control for sample size when analysing parasite richness,
because the more host individuals that are examined, the
20
more parasite species could be found [7,20].
Adding more support to this method, and using
empirical log-log plots similar to Figure 1b, Gregory and
10 Blackburn [13] suggested that a negative relationship
between sample size and prevalence is expected as a
100 mathematical artefact (and thus needed to be controlled
al )
0
50
t ion (P for). In addition, they stated that nothing but negative
10 20 30 40 50 60 70 0 l a e
80 90 100 pu nc slopes were found when simulating the effect of sample
Sample size (n) Po vale
e
pr size (from 1 to 3500) on prevalence estimates when zero
TRENDS in Parasitology
values were deliberately avoided. Clearly, however, there
is no mathematical relationship between prevalence and
Figrue 2. Standard error of prevalence estimates at different sample sizes and sample size per se (Figure 1). We confirmed this by
populational prevalences. Standard error (SE) is calculated using the formula: SEZ
100*[p*q/(n-1)] where p is the sample prevalence, qZ1-p, and n is the sample size.
repeating the same simulation study done in Ref. [13]. We
The white line shows results for nZ15. performed100 simulations in which 200 hypothetical
www.sciencedirect.com
216 Opinion TRENDS in Parasitology Vol.22 No.5 May 2006
Box 1. Zeros are also relevant prevalences

Non-parasitised individuals and populations are clearly not the real absence of parasites; or be the result of a low prevalence and
scope of parasitologists. This has produced a parasitological intensity of parasites, leading to an apparent absence because of
literature traditionally biased towards positive prevalence values. low sample size or low methodological sensitivity [29]. However,
As an example, the world avian host–haemoparasite catalogues the relevant question here is whether the inclusion of zero
[26,27] report, for a given bird species, all studies that found some prevalences improves the conclusions of the research. We feel
parasites, but only one study example for species that were not that a full understanding of natural variation in parasite burdens
parasitised. In 1982, the seminal paper by Hamilton and Zuk [28] should also consider why some populations and species have zero
extended the interest in parasites among evolutionary ecologists by or very low parasite prevalences, whereas others have 100% or
suggesting a role for parasites in the evolution of plumage very high prevalences. For comparative purposes, a zero prevalence
colouration and song in birds. Moreover, this generated a plethora is as informative as a 1% prevalance (and similarly for 99% or
of hypotheses that were also initially tested by making use of data 100%), if calculated using appropriated sample sizes (as for any
previously published by parasitologists, thus suffering from their other prevalence data). However, by excluding zero prevalences, we
biases [7]. are throwing out extreme values from analyses, and thus excluding
These new hypotheses encouraged evolutionary ecologists to potential host populations or species with interesting ecological or
initiate extensive parasite surveys, finding great variability in life-history traits that make them completely or almost completely
parasite burdens among and within species and previously hidden free from parasites.
zero prevalences. This led to new questions about the ecological Thus, we make here a plea that zero prevalences should be
factors, host behaviours and life history traits shaping such a reported. As an alternative to scientific journals, we propose the
variation in nature [2,5,15,23]. Perhaps recalling the parasitological creation of a website under scientific supervision for compiling data
tradition, doubts arose about the effects of sample size on zero published in leading journals as well as ‘grey’ literature (such as
prevalences and the accuracy of prevalence estimates, even non-international journals or conference proceedings), old data
including suggestions that zero prevalences should be excluded never published submitted by researchers, and data coming from
from analyses and journal reports [13]. This publication bias has future surveys of parasite prevalence, whether or not they are
started to creep into ecological and zoological journals, enhanced finally published. This would allow an easily available source of
by the fact that zero prevalences have already ceased to be permanently updated data for researches worldwide, becoming one
surprising and thus generate less interest among journal editors. more of the invaluable virtual services provided by natural history
Failure to find parasites in a sample could either be because of a museums in this century.
species (or populations) took prevalence values varying (1–15), sampling prevalences and sample sizes were
between zero and 100% and sample sizes between 1 and usually positively correlated at low populational preva-
3500 (R.J. And J.L.T., unpublished). Among the resulting lences, uncorrelated at intermediated populational pre-
relationships, 49 were negative (Spearman rank corre- valences, and negatively correlated at high populational
lations, r rangeZK0.0008 to K0.1714; meanZK0.0565) prevalences (Figure 3a). At higher sample sizes (15–100),
and 51 positive (r rangeZ0.0045–0.2675; meanZ0.0598), however, significant correlations were fewer and were
only two negative and four positive weak correlations evenly distributed (Figure 3b). This is because, at low
being statistically significant. Moreover, the results were populational prevalence, the chances of finding an infected
identical when zero prevalences were not allowed in the
simulations. Our results differ greatly from those of Ref.
[13], suggesting that the conclusions of Gregory and (a) (b)
Blackburn were based on the visual examination of log- 0.6
log plots similar to Figure 1b.
In a more recent but largely unnoticed study regarding
0.4
the relationships between prevalence and sample size,
Spearman-r between n and p
Gregory and Woolhouse [21] simulated the effect of sample

sizes ranging from 10 to 1280 on the mean and the 0.2
accuracy of prevalences estimated for a theoretical
population with a prevalence of 80%. They concluded
that prevalence estimates were not biased under any 0.0
sample size, only causing greater inaccuracy at low
sample sizes, being thus consistent with our own analysis –0.2
(Figure 3; see later). However, Gregory and Woolhouse
[21] did not link these challenging results with their
previous work, and their first recommendations [13] have –0.4
prevailed among researchers so far.
Curiously, however, although clearly there is not a
–0.6
relationship between prevalence and sample size 0 20 40 60 80 100 0 20 40 60 80 100
(Figure 1a), there have been reports not only of null Populational prevalence (P) Populational prevalence (P)
empirical correlations between prevalence and sample
size [18], but also of negative [18] and positive [22]
correlations. Why? Because of the effect of detecting rare Figrue 3. Simulation of the effect of sample size on the correlation between sample
events at low sample sizes. This point is illustrated by size and prevalence at different actual populational prevalences. Each point
indicates the Spearman correlation coefficient (in red p value !0.05) between
simple simulations using different sample sizes and sample size and prevalence for 100 simulated species with sample sizes randomly
populational prevalences (Figure 3). At low sample sizes varying (a) from 1 to 15 or (b) from 15 to 100.
Opinion TRENDS in Parasitology Vol.22 No.5 May 2006 217
individual among a low sample number is low, but the estimated and real prevalences in the initial data
chances increase with increasing sample size; the reverse (Figure 4b). However, the correlation between real
happens at higher populational prevalences (Figure 3a). prevalences and calculated residuals is null. This means
However, when all the species have a minimum sample that the residuals are unrelated to populational preva-
size above 15, the effect of rare events on sample lences! Residuals thus become statistical artefacts that
prevalence is buffered at any population prevalence cannot be used as estimators of prevalence for compara-
(Figure 3b). tive purposes, and thus previous results obtained through
The problem with using the residuals from a regression this method should be taken with caution.
between prevalence and sample size is that it artificially Finally, it is worth noting that some relevant biological
increases the prevalence estimates of some species and factors could be shaping a prevalence–sample-size
underestimates the prevalences of others. Moreover, this relationship. For instance, in a recent study, Ricklefs et
method could not be applied even if (by chance) a al. [15] used the number of individuals trapped (the
correlation exists between prevalence and sample size in sample size) as an index of bird species abundance in a
a given data set. To illustrate this point, one can given study area. In this way, they used the relationship
deliberately create a statistical relationship between between prevalence and sample size to assess the
sample size and populational prevalence by sampling potential relationship between host density and parasite
more individuals from populations with a higher popula- prevalence. Thus, the prevalence–sample sizerelationship
tional prevalence (Figure 4). The simulated sample size should be seen as a potentially interesting pattern by
and the sample prevalence result correlate here, of course, itself, rather than a statistical artefact that should be
because they have been made to do so deliberately controlled for.
(Figure 4a). There is also the expected correlation between
Concluding remarks
Current practices for the analysis of prevalence data must
(a) be revised. Statistical tools that take into account the
sample size from which each proportion has been obtained
[23], that weight for sample size [8], or use individual
87.5 infection status (infected or not) as the dependent variable
[16] are increasingly being used. In this way, information
Sampling prevalence (p)
is not lost because of sample size restrictions, but more

62.5 weight is given to those data with higher sample sizes,
such as in meta-analysis [24] or generalized linear (mixed)
models [25].
However, there are many circumstances in which
37.5
methods that do not control for sample size when
analysing prevalence data must still be used [17], and
some decision must be taken about how to choose and
12.5 analyse data. In these cases, we have shown that the use
of residuals is a flawed method; avoiding zero prevalences
is unfounded and supposes the loss of very relevant
0 25 50 75 100 information; and rejecting prevalence data obtained from
Sample size (n) low sample sizes should be done in a conscientious way
(b) (c) according to the shape of the curve in Figure 2, even to the
extent of testing the robustness of the results at different
prevalence (p)
Residuals n–p
87.5 25
Sampling
62.5
minimum sample sizes.
0
37.5
–25 References
12.5
1 Read, A.F. and Harvey, P.H. (1989) Reassessment of comparative
evidence for Hamilton and Zuk theory on the evolution of secondary
12.5
37.5
62.5
87.5
12.5
37.5
62.5
87.5
sexual characters. Nature 339, 619–620

2 Torchin, M.E. et al. (2003) Introduced species and their missing
Populational prevalence (P) parasites. Nature 421, 628–630
3 Poiani, A. (1992) Ectoparasitism as a possible cost of social life: a
comparative analysis using Australian passerines (Passeriformes).
Figrue 4. An illustration of why residuals from prevalence–sample-size regressions Oecologia 92, 429–441
cannot be used to correct for low sample sizes. (a) A simulated positive relationship 4 Yezerinac, S.M. and Weatherhead, P.J. (1995) Plumage coloration,
between prevalence and sample size was created deliberately by retrieving sample differential attraction of vectors and haematozoa infections in birds.
prevalence (p) for 25 species (in red) with a populational prevalence PZ12.5% and n J. Anim. Ecol. 64, 528–537
from 1 to 25, 25 (in blue) with PZ37.5% and n from 26 to 50, 25 (in green) with PZ 5 Schalk, G. and Forbes, M.R. (1997) Male biases in parasitism of
62.5% and n from 51 to 75, and 25 (in black) with PZ87.5% and n from 76 to 100. (b) mammals: effects of study type, host age, and parasite taxon. Oikos 78,
The correlation between populational prevalence and sampling prevalence
67–74
(Spearman rZ0.847, nZ100, p!0.0001) for the same simulated prevalences. (c) A
graph showing that there was no relationship (Spearman rZ0.124, nZ100, pZ
6 Poulin, R. (1996) Sexual inequalities in helminth infections: a cost of
0.219) between true populational prevalence and the prevalence estimated as the being a male? Am. Nat. 147, 287–295
residuals of the regression line obtained in (a) between sample size and 7 Tella, J.L. (2002) The evolutionary transition to coloniality promotes
populational prevalence. higher blood parasitism in birds. J. Evol. Biol. 15, 32–41
218 Opinion TRENDS in Parasitology Vol.22 No.5 May 2006
8 Scheuerlein, A. and Ricklefs, R.E. (2004) Prevalence of blood parasites 20 Walther, B.A. and Morand, S. (1998) Comparative performance
in European passeriform birds. Proc. Biol. Sci. 271, 1363–1370 of species richness estimation methods. Parasitology 116,
9 Arneberg, P. et al. (1998) Host densities as determinants of abundance in 395–405
parasite communities. Proc. R. Soc. Lond. B. Biol. Sci. 265, 1283–1289 21 Gregory, R.D. and Woolhouse, M.E.J. (1993) Quantification of parasite
10 Poulin, R. and Mouritsen, K.N. (2003) Large-scale determinants of aggregation: a simulation study. Acta Trop. 54, 131–139
trematode infections in intertidal gastropods. Mar. Ecol. Prog. Ser. 22 Pruett-Jones, M. and Pruett-Jones, S. (1991) Analysis and ecological
254, 187–198 correlates of tick burdens in a New Guinea avifauna. In Bird-Parasite
11 John, J. (1995) Parasites and the avian spleen: helminths. Biol. Interactions (Loye, J.E. and Zuk, M., eds), pp. 155–176, Oxford
J. Linn. Soc 54, 87–106 University Press
12 Pruett-Jones, S.G. et al. (1990) Parasites and sexual selection in birds 23 Tella, J.L. et al. (1999) Habitat, world geographic range, and
of Paradise. Am. Zool. 30, 287–298 embryonic development of host explain the prevalence of avian
13 Gregory, R.D. and Blackburn, T.M. (1991) Parasite prevalence and hematozoa at small spatial and phylogenetic scales. Proc. Natl.
host sample size. Parasitol. Today 7, 316–318 Acad. Sci. U. S. A. 96, 1785–1789
14 Sol, D. et al. (2000) Geographical variation in blood parasites in feral 24 Hedges, L.V. and Olkin, I. (1985) Statistical Methods for Meta-
pigeons: the role of vectors. Ecography 23, 307–314 Analysis, Academic Press
15 Ricklefs, R.E. et al. (2005) Community relationships of avian malaria 25 Paterson, S. and Lello, J. (2003) Mixed models: getting the best use of
parasites in southern Missouri. Ecol. Monogr. 75, 543–559 parasitological data. Trends Parasitol. 19, 370–375
16 Mendes, L. et al. (2005) Disease limited distributions? Contrasts in the 26 Bennett, G.F. et al. (1982) Host–Parasite Catalogue of the Avian
prevalence of avian malaria in shorebird species using marine and Haematozoa, Occasional Papers in Biology. Memorial University of
freshwater habitats. Oikos 109, 396–404 Newfoundland
17 Harvey, P.H. and Pagel, M.D., eds (1991) The Comparative Method in 27 Bishop, M.A. and Bennett, G.F. (1992) Host-Parasite Catalogue of the
Evolutionary Biology, Oxford University Press Avian Haematozoa (Suppl. 1), Occasional Papers in Biology. Memorial
18 Poulin, R. and Valtonen, E.T. (2001) Nested assemblages resulting University of Newfoundland
from host size variation: the case of endoparasite communities in fish 28 Hamilton, W.D. and Zuk, M. (1982) Heritable true fitness and bright
hosts. Int. J. Parasitol. 31, 1194–1204 birds: a role for parasites? Science 218, 384–387
19 Garland, T., Jr. et al. (1992) Procedures for the analysis of comparative 29 Cooper, J.E. and Anwar, M.A. (2001) Blood parasites of birds: a plea for
data using phylogenetically independent contrasts. Syst. Biol. 41, 18–32 more cautious terminology. Ibis 143, 149–150
ScienceDirect collection reaches six million full-text articles
Elsevier recently announced that six million articles are now available on its premier electronic platform, ScienceDirect. This
milestone in electronic scientific, technical and medical publishing means that researchers around the globe will be able to access
an unsurpassed volume of information from the convenience of their desktop.
ScienceDirect’s extensive and unique full-text collection covers over 1900 journals, including titles such as The Lancet, Cell,
Tetrahedron and the full suite of Trends and Current Opinion journals. With ScienceDirect, the research process is enhanced with
unsurpassed searching and linking functionality, all on a single, intuitive interface.
The rapid growth of the ScienceDirect collection is due to the integration of several prestigious publications as well as ongoing
addition to the Backfiles – heritage collections in a number of disciplines. The latest step in this ambitious project to digitize all of
Elsevier’s journals back to volume one, issue one, is the addition of the highly cited Cell Press journal collection on ScienceDirect.
Also available online for the first time are six Cell titles’ long-awaited Backfiles, containing more than 12,000 articles highlighting
important historic developments in the field of life sciences.
The six-millionth article loaded onto ScienceDirect entitled "Gene Switching and the Stability of Odorant Receptor Gene Choice"
was authored by Benjamin M. Shykind and colleagues from the Dept. of Biochemistry and Molecular Biophysics and Howard
Hughes Medical Institute, College of Physicians and Surgeons at Columbia University. The article appears in the 11 June issue of
Elsevier’s leading journal Cell, Volume 117, Issue 6, pages 801–815.

Prevalencia e Tamanho de Amostra

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Prevalencia e Tamanho de Amostra

Diunggah oleh

Hak Cipta:

Format Tersedia

Opinion TRENDS in Parasitology Vol.22 No.

Parasite prevalence and sample size:

Avoiding zero prevalences

10 75 should also reject 100% prevalences from analyses.

of the sampling prevalence). An additional recommen- Residuals of prevalence on sample size

Box 1. Zeros are also relevant prevalences

Gregory and Woolhouse [21] simulated the effect of sample

is not lost because of sample size restrictions, but more

sexual characters. Nature 339, 619–620

ScienceDirect collection reaches six million full-text articles

Anda mungkin juga menyukai