Anda di halaman 1dari 9

Journalof

Applied Ecology
A taxonomic distinctness index and its statistical
1998,35, properties
523-531
K.R. CLARKE and R.M. WARWICK
Centrefor Coastal and Marine Sciences, Plymouth Marine Laboratory, Prospect Place, West Roe, Plymouth
PLl3DR, UK

Surnrnary

1. For biological community data (species-by-sample abundance matrices), Warwick


& Clarke (1995) defined two biodiversity indices, capturing the structure not only of
the distribution of abundances amongst species but also the taxonomic relatedness of
the species in each sample. The first index, taxonomic diversity (~), can be thought
of as the average taxonomic 'distance' between any two organisms, chosen at random
from the sample: this distance can be visualized simply as the length of the path
connecting these two organisms, traced through (say) a Linnean or phylogenetic
classification of the full set of species involved. The second index, taxonomic dis-
tinctness (A"), is the average path length between any two randomly chosen indi-
viduals, conditional on them being from different species. This is equivalent to dividing
taxonomic diversity, ~, by the value it would take were there to be no taxonomic
hierarchy (all species belonging to the same genus). ~ * can therefore be seen as a
measure of pure taxonomic relatedness, whereas ~ mixes taxonomic relatedness with
the evenness properties of the abundance distribution.
2. This paper explores the statistical sampling properties of ~ and ~ *. Taxonomic
diversity is seen to be a natural extension of a form of Simpsori's index, incorporating
taxonomic (or phylogenetic) information. Importantly for practical comparisons,
both ~ and ~ * are shown not to be dependent, on average, on the degree of sampling
effort involved in the data collection; this is in sharp contrast with those diversity
measures that are strongly inftuenced by the number of observed species.
3. The special case where the data consist only of presence/absence information is dealt
with in detail: ~ and ~ * converge to the same statistic (~+), which is now defined as the
average taxonomic path length between any two randomly chosen species. Its lack of
dependence, in mean value, on sampling effort implies that ~ + can be compared across
studies with differing and uncontrolled degrees of sampling effort (subject to assumptions
concerning comparable taxonomic accuracy). This may be of particular significance for
historie (diffuselycollected) species lists from different localities or regions, which at first
sight may seem unamenable to valid diversity comparison of any sort.
4. Furthermore, a randomization test is possible, to detect a difference in the taxonomic
distinctness, for any observed set of species, from the 'expected' ~ + value derived from a
master species list for the relevant group of organisms. The exact randomization pro-
cedure requires heavy computation, and an approximation is developed, by deriving an
appropriate variance formula. This leads to a 'confidence funnel' against which dis-
tinctness values for any specific area, pollution condition, habitat type, etc., can be
checked, and formally addresses the question of whether a putatively impacted locality
has a 'lower than expected' taxonomic spread. The procedure is illustrated for the UK
species list of free-living marine nematodes and sets of samples from intertidal sites in
two localities, the Exe estuary and the Firth of Clyde.
Key-words: biodiversity, randomization test, sampling effort, unbiasedness, variance
estimate.
Journal of Applied Ecology (1998) 35,523-531

:i:) 1998 British


Ecological Society Correspondence: Dr K. R. Clarke (fax: 01752633101; e-mail: b.clarkera pml.ac.uk).

523
524
Properties of a does impose an ordering of branch lengths which is
Introduction
taxonomic interpretable and should be used. For exarnple, even
distinctness index lt is increasingly recognized (e.g. Harper & Hawks- allowing that the only data available are in the form
worth 1994) that adequate measures of biodiversity of presence/absences, measures that rely solely on top-
within a particular taxonornic group should not be ology and/or species richness would not distinguish
merely functions of the number of species present between Fig. 2a and Fig. 2b, yet Fig. 2a clearly exhibits
and their relative abundances, but should also include greater biodiversity in the sense of richness in higher
information on the 'relatedness' ofthese species. There taxa. Similarly, PO, applied to a Linnean classification
is now a substantialliterature (Faith 1994; Humphr- (Faith 1994), has a focus exclusively on 'character
ies, Williams & Vane-Wright 1995 and referrals ther- richness' rather than 'character combinations' (in the
ein) on measures incorporating, principally, phylo- terminology of Humphríes, Williams & Vane-Wright
genetic relationships amongst species and their 1995), so that PD concentrates on higher taxon rich-
possible use in selecting species or reserves of greatest ness and ignores the evenness component in diversity.
conservation priority. Vane-Wright, Humphries & Thus, PO would not distinguish between Fig.2c and
Williams (1991), Williarns, Humphries & Vane- Fig.2d, yet Fig.2d c1early represents a less taxo-
Wright (1991) and May (1990) introduced measures nomically diverse assemblage than Fig. Zc, both in the
of distinctness based only on the topology of a phylo- sense of possessing greater vulnerability to species loss
genetic tree, appropriate when branch lengths are and in potentia1 functional inefficiency.
entirely unknown, and Faith (J 992, 1994)defined and An over-riding consideration in a comparative
justified a phylogenetic diversity (PO) measure based biodiversity study is the extent to which a puta tive
on known branch lengths: PO is simply the cumulative statistic is sensitive to differing sampling effort at
branch length of the full tree. different sites or times. lt is well-known, and demon-
This 1iterature does not appear, to date, to have strated starkly in Fig.3a--c, that standard diversity
carried over into the area of environmental moni- estimates can be strongly dependent on sampling
toring and assessrnent, where the emphasis is not on effort, particularly in so far as they are infiuenced by
choosing species to conserve but monitoring for the number of species in the sample. Species richness
environmental degradation or the benefits of reme- is crucially dependent on sampling effort and it must
diation. The considerations here are rather different: be expected that only carefully controlled and equi-
the raw material is often a set of community samples table sampling studies can provide compara tive data.
with recorded abundances for each observed species, Warwick & Clarke (1995), however, define taxonomic
rather than a single species list, thought of as a com- diversity/distinctness measures that satisfy the aboye
plete inventory. The outcome required is not a pref- requirements of incorporating higher taxon richness
erential selection of species from the inventory for and evenness concepts (see the ,1+ values in the legend
conservation status, but an assessment of whether to Fig.2) but also have an apparent insensitivity to
sampled assemblages display some pattern in bio- sampling effort (Fig. 3d-f).
diversity through time or in space. Natural variation, In the Methods that follow, the construction of ,1
and thus sampling properties of the resulting abun- and ,1* is described and the link to the Simpson diver-
dance matrices and derived indices, are of paramount sity drawn. It is demonstrated theoretically that, if ,1m
importance. Also, the basic information on species and ,1m* are defined as the values of ,1 and ,1* from a
relatedness is often just a Linnean taxonomy (Fig. 1), subset of m organisms, randomly selected from a total
a crude approximation to a phylogeny but one that of n individuals, then they are either exactly (,1m) or

........
..........
Families -

Genera -

Species -
© 1998 British
Ecological Society,
Journa/ of Applied Individuals - x,
Ecology. 35, Fig. 1. Part of a taxonomic classification, showing examples of path Iength weighls {w,J used to define taxonomic diver-
523-531 sityjdistinctness measures; conventional diversity índices utilize only the species abundances {Xi; i = 1, ... , s}.
525
K.R. Clarke &
Order

Family
-
_.
(a) (b)

-
R.M. Wanvick

Genus

Species -

(e) (d)

Family _

Genus -

Species -
Fig.2. Sorne simple, contrasting taxonomic trees for presence/absence data (i.e. ignoring species abundance information).
Diversity measures based only on topology ofthe trees would not distinguish (a) from (b), and measures based on total branch
length would not distinguish (e) from (d), but taxonomie distinctness 8 +, based on the average of pair-wise path lengths
(eguation4 ofthe text), does draw these distinetions. Using a simple (1, 2, 3, ... ) weighting ofpath lengths, 8+ values are (a)
3,0, (b) 1,0, (e) 1·56and (d) 1,2, plaeing the four eonfigurations in the intuitively expeeted distinctness order.

3·8 (a)

III •
. 14 (b) 1·0 (e)
J:
>- 3·4
o
al
12 I. 0·95
al ....,
e
C1> 3·0
C1>
c::
s:
10
~ 0·90
..,> o
8
C1>
c::
c::
c:: 2·6 (1) C1> 0·85
o C1> 6 >
c:: o w
c:: C1>
os 2·2 o. 0·80
s: en 4
en
1·8 2 0·75LL ~ ~ __ ~ ~
10 100 1000 10000 10 100 1000 10000 10 100 1000 10000

4·4 (d) <l 4·8 (e) -;;; 5·0 (f)


<1 al ..._

-~
al a.
>- 4·2 C1> ~ 4·8
.. -
c:: 4·4

. .. .E ·· ••
. I. I I I .
al
4·0 o 4·6

..
C1> c::
.I
..,> I •
. <1 i l.

..
• .1 al
.., i i II I
o
3·8 4·0 al
al
4·4
• :I
: C1>

E
o 3·6
c::
o
x 3·4
:
o
E
o 3·6
c::
-e
o
c::
4·2

4·0
=

os
1-
o
x Oí
os
1- 3·2
o
3·2 3·8
10 100 1000 10000 10 100 1000 10000 O 20 40 60 80 100 120

No. of individuals in subsample


Fig.3. Simulation study on the effeets of sample size on (bio )diversity índices, using a single, composite sample of abundanees
of III nematode speeies (e. 10000 individuals) from six sites in the Firth ofClyde (Lambshead 1986). Subsamples ofindividuals
were drawn at random for lO (logarithmically increasing) subset sizes, with lO replieate simulations at each size, and the
following indiees eomputed: (a) Shannon diversity, (b) Margalefs d (a species riehness index that attempts to adjust for sample
size), (e) Pielou's J (refleeting evenness of abundances across species), (d) 8 (equation 1),(e) 8 * (equation 3), (1) 8 + (equation 4).
The simulations for the final plot ignored the speeies abundanees and seleeted fixed numbers of species (from the 111) for
computation of 8 +. The eonventional diversity indices are seen to be dependent on subsample size, unlike the taxonomic
diversity /distinetness measures.

L 1998 British
Ecological Society,
Joumal ofApplied approximately (L\",*) unbiased estima tes ofthe respec- exact in the particular case where the data only records
Ecology, 35, tive true L\ and L\* for the whole sample, whatever the the presence or absence of species, not their abun-
523-531 subset size m. The unbiasedness is also shown to be dances.
526 Methods domly chosen individuals from the sarnple, con-
Properties oI a ditional on them being from different species. Note
taxonomic DEFINITION OF INDICES that, unlike ~, the expression for ~* is invariant to
distinctness index a scale change in .x, so that ~* could incorporate
'Taxonomic diversity' (Warwick & Clarke 1995) IS
straightforwardly cases where the data are not counts
defined, using the nota tion of Fig. 1, as:
of individuals but, say, total biomass for each species.
~= [LLi<,WUXix¡ + LiO.Xi(X¡ - 1)/2]/ It would also accommodate the use of transformed
counts, e.g. the log (1 + x) or XO 15 transformations
[LLi<,XiX¡ + LiXi(Xi - 1)/2]
commonly used in multivariate community analysis
== [LLi<,Wi¡Xix,]/[n(n - 1)/2] eqn I to down-weight the contributions of dominant species
(Clarke & Oreen 1988). A special case (in a sense, the
where x, (i = 1,... , s) denotes the abundance of the ith
ultimate down-weighting transformation) is the use
species, n ( = LiX') is the total number of individuals
only of presence/absence information for each species.
in the sample and wi¡ is the 'distinctness weight' given
The {Xi} are then all thought of as equating to unity
to the path length Iinking species i and j in the hier-
(for species that are present) and ~ and ~* reduce to
archical c1assification. The double summations are
the same statistic, namely:
over all pairs of species i and j (with i < i. for sake of
definiteness). The first form of equation I exemplifies
eqn 4
the construction of ~ as a pair-wise, averaged (weigh-
ted) path length in the diagram; the (null) second term
where s is the number of species present and, for the
in the numerator is included here to emphasize the
double sumrnation, i andj range over these s species.
zero path length defined for two individuals of the
The statistical results of the paper concentrate only
same species. In more formal statistical terrns, ~ is
on two cases: when the {x,} are (untransformed)
the expected path length between any two randomly
counts and when only presence/absence information
chosen individual s from the sample. The second, more is available. They encompass any weighting scheme
succinct, form of equation 1 shows the relationship of
for the {wi;}. The examples in this and the companion
~ to standard diversity indices: when Wu == 1 (for all
paper (Warwick & Clarke 1998), however, all use the
i < j), i.e. when the taxonomic hierarchy is ignored, ~ simplest possible weights, in the context of the c1ass
reduces to a form of Simpson diversity (e.g. Pielou of free-living marine nematodes: W = 1 (species in the
1975), namely:
same genus), 2 (same family but different genera), 3
~ = [2LLi<iPp¡]/(1 - n-I), where n, = xln, (same suborder but different families), 4 (same order
but different suborders), 5 (same subclass but different
= [(L,pY - L,p/]/(I - n-I) orders), 6 (different subclasses). For exarnple, treating
= (1 - L,p/)/(1 - n-I). eqn 2 the subtree in Fig. l as a complete sample, a simple
(O, 1,2) weighting ofthe levels gives, from equations 1,
Indeed, the Simpson index was first constructed 3 and 4, the values ~ = I048, ~* = 1·82 and
from the probability that any two organisms, selected ~ + = 1·80. Note, however, that any other set of
at random from the full set of individuals, are from increasing weights would honour the structure implied
the same species (Simpson 1949).Taxonomic diversity by a taxonomic c1assification. A natural refinement
~ can therefore be seen as a generalization ofSimpson would be for the weights to depend on the quantitative
diversity, incorporating an element of taxonomic reduction in taxon richness on moving up the hier-
relatedness. archy, although the number of species per genus, gen-
This motivates the introduction of a second index, era per farnily, etc., would need to be set globally for
'taxonornic distinctness, ~* (Warwick & Clarke each faunal group, in sorne way, for the index to be
1995), which is modified to remove sorne of the overt comparable across studies.
dependence of ~ on the species abundance dis- For all three indices, the effect of the denominator
tribution represented by the {x,}. It divides the taxo- terms in equations 1, 3 and 4 is to reduce or eliminate
nomic diversity of equation I by the Simpson-type direct dependence on the number of species s: thus in
index of equation 2, i.e. ~ is divided by its value when Fig. 3d-f there is an apparently static mean value for
the hierarchical c1assification collapses to the special ~, ~* and ~ +, whatever the subsample size (or, in the
case of all species belonging to a single genus. By case of Fig. 3f, whatever the number of species in the
dcfinition, the resulting ratio, ~ *, must be more nearly subset). Naturally, the variance increases with
a function of pure taxonomic relatedness of indi- decreasing information in all cases. The ~ statistics
viduals. The algebraic definition of taxonomic dis- cannot therefore c1aim to represent all aspects of a
tinctness is: cornmunity's diversity: if the results of Fig.3 have
(i.) 1998 British
Eco1ogica1Society,
sorne theoretical generality then they can be seen to
eqn 3
Journal ofApplied partition out from species richness sorne combination
Ecolog_)',35, and an alternative way of viewing this is as the ofhigher taxonomic spread and evenness, making the
523-531 expected (weighted) path length between any two ran- c1aim that average distinctness in a sample can be
527 reliably estimated, while total distinctness in the sam- These two 0'2 terms are straightforward properties of
K.R. Clarke & ple (which clearly depends on the richness) cannot. the taxonomic tree for the full species set, with a,}
R.M. Warwick corresponding to the variance of all path lengths {w;¡}
between different species, and 0',,/ the variance of the
MEANS ANO VARIANCES
mean path lengths {w;} from each species to al! others.
For the model underlying Fig.3d,e, the fuI! set of Note that equation 5 is an exact result not a Taylor
species abundances {x;: i = 1, ... , s} and the total series approximation.
species richness s are thought of as fixed, and the 'true' These sampling properties now motiva te a statis-
taxonomic diversity and distinctness values are given tical test for increase or decrease in observed taxo-
by equations I and 3, with the {w,¡} being known nomic distinctness, based either on direct simulation
weights. A random subsample (without replacement) or approximate confidence intervals (of the usual
of a fixed number of organisms, m, is taken from the mean ± 2 SO form), constructed from the variance
fuI! set of individual s n (=L;X;), and ~m and ~",* expression of equation 5.
denote the ca1culated taxonomic diversity and dis-
tinctness values from that subsarnple. The Appendix
Results
(case 1) shows quite generally that ~'" is an unbiased
estima te of ~, and ~,,,* is approximately unbiased for
A PRACTICAL TEST FOR CHANGE IN
~* (using a Taylor series), whatever the subsample
TAXONOMIC OISTINCTNESS
size m and the structure of the trees,
An important special case is when only pres- The fact that, for presence/absence data, the dis-
encejabsence information is available, and the sub- tinctness estimate (~n'+) from a subset of m species
samples now draw species at random (without replace- unbiasedly estimates the distinctness (A") of the full
ment): m species from the full set of s. The distinctness set, suggests the following test scenario for situations
for the subsarnple, ~",+, is again an (exactly) unbiased in which, at first sight, no valid diversity comparisons
estimate of ~ + for the fuI! species set, for any m (see seem possible, The starting assumption is that there
the Appendix, case 2). This firms up the suggestion exists a reasonably comprehensive species list (inven-
from Fig. 3f that the mean of a number of repeated tory) for a region, within which certain localities are
subsamples at each size is constant, and there is no postulated to have reduced diversity. If the only data
subsampling bias. available at these localities are local species lists from
The Appendix develops these results for mean one-off studies, and there is no control ofthe sampling
values formally, in order to set the symbolism for effort expended in each location (or in constructing
derivation ofvariance formulae, but note that there is the regional inventory), then the only conventional
a simpler heuristic explanation for the exact unbiased- diversity measure calculable - the number of species
ness of ~'" and ~m+' For exarnple, the expectation found at each locality - is uninterpretable. However,
of ~m + is just the expected path length between two the aboye results show that one can unbiasedly com-
randomly selected species from a subset of m species. pare taxonornic distinctness at a locality with that for
However, the latter subset is selected randomly from the global list. For the null hypothesis of no differcnce,
the fuI! set of s species, and a random pair from a a randomization test can be performed by repeatedly
random sample of m species (m > 1) is also a random subsampling species sets of size m, drawn at random
pair from the full set of s species, By definition, ~ + is from the global list, and constructing the histogram
the expected path length for a randomly selected pair of the resulting ~m + estirnates. These will centre
of species from the full set of s species, so it must around the global distinctness of ~ + and the spread
follow that E(~",+) = ~". Similar reasoning yields the of the simulated values can be used to determine if the
exact unbiasedness result for ~'" but not for ~",*, observed ~",+ for that locality is at variance with the
because of the conditionality clause in its definition; null hypothesis.
recourse needs to be made to the Taylor series Figure 4 is based on a UK species list for free-living
approximation of the Appendix. marine nematodes (s = 395; see the companion paper
The Appendix then goes on to show that the vari- Warwick & Clarke 1998), a nematode species list
ance ofthe subsampleestimate ~m +has the followingform: (m = 122) from combined core samples taken over the
var(~",+) = 2(.1' - m)[m(m - 1)(.1' - 2)(.1' - 3)]-1 course of ayear at eight sandy sites in the Exe estuary,
England, UK (Warwick 1971), and a further nema-
[(s - m - 1)0'2,,,+ 2(.1' - 1)(/11- 2)0'2",] eqn 5 tode species list (m = 111) from six sandy sites in the
where: Clyde cstuary, Scotland, UK (Lambshead 1986). For
a,} = [(L;L¡I";)W/)/s(s - 1)]- (j/ eqn 6 a total of 1000 random samples of size m = 122 (for
{ 1998 British
Fig.4a), and a further 1000 random samples with
Eco1ogica1Society,
a,e/ = [(L,6)/)/S] - di eqn 7 m = III (for Fig. 4b), drawn from the global list, the
Journol ofApplied ~'" + estima tes give the histograms of Fig. 4a,b, show-
eqn 8
Ec%gy. 35, ing the typically rather narrow range of distinctness
523-531 w = (L;W¡}/S = (L;L¡(,,;)W,¡)/[s(s - 1)] == ~+. eqn 9 values commensurate with the null hypothesis for
528 160 Exe sands (a) 160 Clyde sands (b)
Properties 01a A+ = 4·75
taxonomic
120 120
distinctness index >.
o
c:
Q)
::1 80 80
O'
Q)
'-
LL

] ~IIIIIII~
4·4 4·5 4·6 4·7 4·8 4·9 4·4 4·5 4·6 4·7 4·8 4·9
Taxonomic distinctness A+
Fig.4. Histogram of ¡\+ values for 1000 random subsamples of a fixed number m of species, from a full list of free-living
marine nematodes of the UK (s = 395 species): (a) m = 122, (b) m = 111, corresponding to the sublist sizes for combined
samples at intertidal sandy sites in the Exe and Clyde estuaries, respectively. The true ¡\ values for both localities are also
T

indicated: for the Clydc, the null hypothesis that the average distinctness equates with that for the UK as a whole is clearly
rejected (P < 0·1%).

these subsample sizes. The true ~m + for the Exe estu- for the Chilean nematode data of the companion
ary sands, of 4,75, lies centrally to the distribution of paper; Warwick & Clarke 1998) but the normality
Fig. 4a and therefore provides no evidence of a differ- approximation to the lower confidence limit (the
ent average distinctness at this locality than in the UK important limit in practice) is good enough to suggest
region as a whole. To reject the null hypothesis, at that this may be a useful short-cut to the full ran-
approximately the 5% level, the true ~n'+ would need domization procedure, in non-borderline cases, when
to fall below the 25th lowest (of 1000) simulated ~m + computing power is limiting. An improved empirical
values in the histogram, or aboye the 25th highest. In fit could doubtless be constructed from an expression
contrast, the true ~m + for the Clyde sands (4-46) is for the third moment of ~m +.
below this lower limit in Fig. 4b and in fact it is smaller
than any of the 1000 simulated values, so there is
Discussion
significant evidence of a lower taxonomic distinctness
here than for the UK as a whole (P < 0·1%). As shown in Fig. 5, distinctness values for any specific
The computational burden of this large number locality, habitat type, pollution condition, etc., can
of simulations, which needs to be repeated for every be plotted on the confidence funnel created from a
locality under test (with a different species subset sizc), regional species list, to test for significant departures
can be heavy, although not usually prohibitive. A from the null hypothesis (that a particular subsample
much faster, approximate procedure is provided by behaves, in terms of its pair-wise average distinctness,
the variance formula of equation 5. The constants (J} as if it were a random sample from the larger list).
and (J",2 in this expression are a function only of the The companion paper, Warwick & Clarke (1998),
tree structure of the globallist (of s species) and need applies and interprets this method in a range of situ-
to be calculated only once (for all marine nematodes ations,
of the UK, for example). The variance expression is It is perhaps surprising that a diversity test of any
then a rather simple function of subsample size m and sort should be possible in a case where sampling effort
these constants, so that an approximate 95% con- is uncontrolled and the only data consist of presence
fidence 'funnel' (mean ± 2 SD) can easily be con- or absence of species, Indeed, the test could not be
structed over the full range of m-values. Here the mean expected to have the same sensitivity as that obtain-
is equal to ~ + for the globallist ( = 4·72 for UK marine able from a wider range of diversity measures (or
nematodes) and the SD is the square root of the vari- multivariate analysis) calculable from abundance data
ance expression in equation 5. Figure 5 displays this in carefully standardized sampling plans. The key
funnel (the smooth, darker lines) and contrasts it with point to recognize here is that certain diversity
the results of extensive simulation runs (the circles, features, most obviously the number of species re-
joined by lighter lines) for subset lists of m = 10, 15, corded in a sample, are highly dependent on the sam-
20, 25, ... , 350 species. At each point there are 1000 pling regime, and can only be straightforwardly com-
© 1998 British
random selections and the circles denote the 25th low- pared under conditions of comparable sampling
Ecological Society,
Journal of Applied est and 25th highest distinctness values (simulated effort. The same caveats will apply to other diversity
Ecology.35, 95% confidence limits), There is clear evidence of a totals, such as PD, the total phylogenetic or taxo-
523-531 left-skewed distribution for ~n'+ in this case (as also nomic branch length in a subtree for a particular
529 5·4 UK nematodes
K.R. Clarke &
R.M. Warwick '4 5·2
(/)
(/)

-
al
c: 5·0 Exe aanda
o
c:
:;:
.~ 4·8
"C .................... ---te--- .. -------- .. -------- .... ----.--.---------
o
'EO 4.6
e
~ +
~ 4·4
Clyde
/ .--- .. Simulated mean (true mean Is 4·72)
aanda ___ Simulated 95% confidence limita
4·2
Theoretical 95% confidence limita

4·0
o 100 200 300 400
Subset size (m)
Fig.5. Confidence funnels for the Ó + randomization test, from the all-UK list of marine nematode species. Circles correspond
to direct randomization results for each sublist size, and smooth (thick) lines to approximate limits using the variance formula
of equation 5. The dashed line gives the mean Ó + over each simulation, confirming the theoretical unbiasedness result
(L'. + = 4·72 for the full set of 395 species).

locality/condition, They will not apply in general to Acknowledgements


average properties, such as the pair-wise taxonomic
This work forms part of the Marine Biodiversity pro-
distinctness indices discussed here or, possibly, an
ject of the CCMS Plymouth Marine Laboratory,
average phylogenetic diversity, defined as PDjs. (Note
Natural Environment Research Council, UK, and is
though that, as pointed out earlier, the latter would
part-funded by the UK Ministry of Agriculture, Fish-
have certain interpretational drawbacks: average PD
eries and Food (project no. AEII13). We thank Paul
takes the same value for Fig. 2c,d. It is also true that
Harvey (University of Oxford) for useful con textual
average PD calculated from a randomly selected sub-
comments, and also an anonymous referee who con-
list of m species does not unbiasedly estimate average
tributed helpful insights into the derivation of results.
PD for the total lis! of s species, a fact that can be
seen as further limiting the usefulness of this possible
alternative formulation.)
Thus, for historie data and/or meta-analyses in References
which results from different workers are contrasted,
there may be little choice but to recognize that only Clarke, K.R. & Green, R.H. (1988) Statistical design and
analysis for a 'biological effects' study. Marine Ecology
certain aspects of diversity, such as average taxonomic Progress Series, 46, 213-226.
distinctness, may be validly compared. This raises a Faith, O.P. (1992) Conservation evaluation and phylogenetic
final question, on the extent to which the com- diversity. Biological Consertation, 61, 1-10.
parability of ,1+ is compromised by the differing taxo- Faith, O.P. (1994) Phylogenetic pattern and the quanti-
nomic identification skills of different workers. In fact, fication of organismal biodiversity. Philosophical Trans-
actions of the Royal Society of London Series B, 345, 45-
for ,1",+ to remain unbiased for ,1+, it is not necessary
58.
to assume that all workers are equally efficient, only Harper, J.L. & Hawksworth, D.L. (1994) Biodiversity:
that taxonomic accuracy is independent of the taxo- measurement and estimation. Preface. Philosophical
nomic relatedness of the species involved. To put it Transactions ofthe Royal Society ofLondon Series B. 345,
another way, certain workers may miss (or mis- 5-12.
Hurnphries, c.r., Williams, P.H. & Vane-Wright, R.1. (1995)
identify) species but, provided they do so at random
Measuring biodiversity value for conservation. Annual
across the species pool, in effect the test remains Reciew of Ecology ami Systematics, 26, 93-111.
unchanged. (Whether low numbers of species are Larnbshead, P.J.D. (1986) Sub-catastrophic sewage and
found because of low sampling effort or a low identi- industrial waste contamination as revealed by marine
i{) 1998 British nematode faunal analysis. Marine Ecology Progress Series,
Ecological Socíety, fication rate is then irrelevant to the construction of
29, 247-260.
Journal of Applied ,1 +.) Whether such an independence scenario is
May, R.M. (1990) Taxonomy as destiny. Na/are, 347, 129-
Ec%gy, 35, reasonable in practice is discussed further in the corn- 130.
523-531 panion paper (Warwick & Clarke 1998). Pielou, E.e. (1975) Ecological Dirersitv, Wiley, New York.
530 Simpson, E.H. (1949) Measurement ofdiversity. Nature, 163, ratio as approximately the ratio of the means, and
Properties ola 688. again using equation A6:
Vane-Wright, R.J., Humphries, c.i. & Williams, P.H. (1991)
taxonomic What to protect? Systematics and the agony of choice. E(,1",*) ~ [~~¡</wi¡E(Y¡Y¡)]/[~~¡<jE(Y¡Y¡)]
distinctness index Biological Conserva/ion, 55, 235-254.
= [~~¡<,wijx¡x¡]/[~~¡<Ix¡xi] == ,1*. eqn A8
Warwick, R.M. (1971) Nematode associations in the Exe
estuary. Journal ofthe Marine Biological Association of the
United Kingdom, 51, 439-454, CASE 2: RANDOM SUBLlSTS OF SPECIES
Warwick, R,M, & Clarke, K.R, (1995) New 'biodiversity'
measures reveal a decrease in taxonomic distinctness with The exact unbiasedness of the ,1",+ estimator for ,1+ is
increasing stress. Marine Ecologv Progress Series, 129, demonstrated for random sublists of m species drawn
301-305, (without replacement) from the full list of s species.
Warwick, R,M. & Clarke, K.R. (1998) Taxonomic dis- In addition, the exact varianee of ,1",+ is derived, as a
tinctness and environmental assessment. Journal oI basis for confidence funnels, such as that in Fig. 5.
Applied Ecology, 35,532-543. This is a special case of the formulation in case 1,
Williams, P.H., Humphries, c.i. & Vane-Wright, R.1. (1991) with abundanees taking the values X¡ == 1 for all i = 1,
Measuring biodiversity: taxonomic relatedness for con- ... , s species present in the full set; the taxonomic
servation priorities. Australian Systematic BOTan)', 4, 665- distinctness ,1+ of the full tree (395 speeies in the UK
679. nematode example of Fig. 5) is given by equation 4 of
the main text. For a fixed-size (m) sublist of species,
Receired 31 August 1997; rerision received 2 April 1998 drawn randomly without replacement, the random
variables {Y¡; i = 1, ... , s; L¡Y¡ = m} now take only
the values O or 1 ('indicator' variables), dependent on
Appendix whether the ith species is absent or present, respec-
tively, from the sublist. The taxonomic distinctness
CASE 1. RANDOM SUBSAMPLES OF ,1",+ for the sublist is defined as:
INDIVlDUALS eqn A9
For the situation represented by Fig.3d,e, the exact where the double summation is over all species {i,
unbiasedness of ,1", and asymptotic unbiasedness of j = 1, ... , s; i < j}.
,1",*, as estimators for ,1 and ,1.*, respectively, is The joint probability distribution of the {Y,} now
demonstrated under random subsampling (without reduces to the simple case:
replaeement) of m organisms from the full set of n.
The species abundanees [x; i = 1, ... , s; ~¡X¡ = n} Pr(Y1 = y" Y2 = Y2,"" Y,o= yJ = 1/(sCm)
and the total number of speeies s are thought of as
eqn AIO
fixed, with the taxonomie diversity ,1 and distinetness
,1* for the full data set given by equations 1 and 3 of i.e. all combinations of m species drawn from s are
the main text. For a fixed-size (m) subset of indi- equally likely. It follows that:
viduals, denote the abundances of each of the s species
by { Y¡; i = 1, ... , s; ~¡ Y¡ = m}, capitalletters reftecting E( Y¡) = mis, E( Y,Y¡) = [m(m - 1)]/[.1'(.1' - 1)]
the fact that these are the 'random variables'. The (i,j = 1, ... , s; i -:f. j) eqn A 11
estimators of ,1 and ,1* from a sample of size m are:
and:
,1", = [~~¡<,w¡¡Y¡YrlI[m(m - 1)/2] eqn Al
E(,1",+) = [~L¡<,wijE(Y¡Y¡)]/[m(m - 1)/2]
,1",* = [~~¡<,wijY¡Y¡]/[~~¡<,Y¡Y¡]. egnA2
= [~~i<IWij]/[S(s - 1)/2] == ,1 + eqn AI2
The {Y¡} are jointly hypergeometric, with probability
distribution: establishing the unbiasedness of ,1",+ as an estimator
of ,1 +.
Pr(Y1 =Yl, Y2 =Y2, ... , Y, =yJ Derivation of the variance result, of equation 5 of
= (x,Cy,)(X2C.h) ... (x,Cy,)/(nCm) eqn A3 the main text, starts from a modified form of equation
A9, using the symmetry in the weights (wij == Wj,) and
and mean values: the standard formula for the variance of a sum of
random variables:
E(Y;) = mxjn (i = 1, ... ,s). eqn A4
var(,1",+) = var([~¡~j(,,¡)w¡iY¡Y¡]/[m(m - 1)])
Using the fact that the expectation of a sum of random
variables is the sum of the expectations, even when = [m (m - 1)]-2~¡LI("i)
non-independent, the expectation of ,1", is:
[Lk~,(#)W'iWk,COV(Y¡YÍ' YkY,)] eqn AI3
E(,1",) = [~~¡<,wijE( Y¡Y¡)]/[m(m - 1)/2] egnA5
where the four summations are all in the range 1, ... ,
It can be shown from equation A3 that: s. Consider now only the inner pair of summations,
over (k, r), for a speeific ti, j) (withj -:f. i). Under the
E(Y¡Y,) = [m(m - I)x¡x¡]/[n(n - 1)] various combinations of subscript equivalenees, e.g.
(i,j= I, ... ,s;i-:f.j) eqn Aó (k = i, r = j), (k = i, r -:f. ior j), (k -:f. i, r = j), etc., only
three different covariance terms emerge (note that in
D 1998 British so that: equations AI4-AI9 subscripts i.], k, r all differ):
Ecological Society,
E(,1",) = [~~i<,wijx¡x¡]/[n(n - 1)/2] == ,1. eqn A? (i)eov(Y,Y" Y,Y¡) == var(Y¡Y¡) = a - a2 eqn AI4
Journal of Applied
Ecology, 35, In a similar way, but this time utilizing an asyrnp- where, beca use the {Y¡} are indicator variables taking
523-531 totic Taylor series expansion to express the mean of a only the values (O, 1):
531 a = E(Y/Y/) == E(Y,Y¡) = Pr{Y, = 1, Y¡ = I}
K.R. Clarke & = [m(m - 1)]/[.1'(.1' - 1)] eqn AI5 eqn A21
R.M. Wanviek
(ii) cov( Y;Y¡, Y,Y,) = b - a' eqn A 16 Substituting equation A20 into equation A 13, the
summation over the two outer subscripts (i,j) is:
where:
(e - a2)woo2 + 4(b - C)L,{{),o2
h= E(Y/Y¡Y,) =Pr{Y,= J,Y,= 1, Y,= I}
+ 2(a - 2h + e)LiL¡(,;,)W/ eqn A22
= [m(m - 1)(m - 2)]/[5(.1' - 1)(.1' - 2)] eqn A 17
and using the definitions of Q¡ (== L1+) and the 'vari-
(iii) cov( YiY¡, Yk Y,) = e - a' eqn A 18 ance-like' properties (J.} and (J} in equations 6-9 of
the main text, equation A 13 beco mes:
where:
var(L1,,/) = m-2(m - 1)-2.1'(.1' - 1){2(a - 2b + c)(J,}
e = E(Y,Y¡YkY,) = [m(m - I)(m - 2)(m - 3)]/
+ 4(b - c)(s - 1)(J,.,2 + [2(a - 2b + e)
[ses - J )(.1' - 2)(.1' - 3)]. eqn A 19
+ 4(b - (')(.1' - 1) + (e - (h\'(s - 1)]W2}
Summing over the inner pair of subscripts (k, r) in eqn A23
equation A 13 gives the (i, j)th term as:
On substituting for a, b and e from equations A 15,
w;¡[2(a - a2)wi¡ + 2(h - a2)(w,o + wo¡ - 2w;¡) A 17 and A 19, the coefficient of Q¡2 disappears, and the
desired variance formula is obtained:
+ (e - a2)(woo - 2w,o - 2wo¡ + 2w;¡)] eqn A20
var(L1",+) = 2(.1' - m)[m(m - 1)(.1' - 2)(.1' - 3)t I
where, in standard statistical notation, a circle indi-
cates summation across that subscript: [(s - m - 1)(J2,,, + 2(.1' - 1)(m - 2)(J2,.,]. eqn A24

<o 1998 British


Ecological Society,
Journal ofApplied
Ecology.35.
523-531

Anda mungkin juga menyukai