Ecological Informatics
j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e c o l i n f
Using multi-target clustering trees as a tool to predict biological water quality indices
based on benthic macroinvertebrates and environmental parameters in the
Chaguana watershed (Ecuador)
Luis Dominguez-Granda a, b, Koen Lock a,, Peter L.M. Goethals a
a
b
a r t i c l e
i n f o
Article history:
Received 22 December 2009
Received in revised form 23 May 2011
Accepted 24 May 2011
Available online 1 June 2011
Keywords:
Aquatic insects
Biological indices
Multitarget clustering trees
Diversity
Macroinvertebrates
Richness
a b s t r a c t
Macroinvertebrates were sampled in the Chaguana river basin in SW Ecuador in the wet season (March) and
the dry season (September) of 2005 and 2006. To assess the robustness of several biological indicators,
correlations were calculated between both years and between the wet and the dry season. In addition, it was
tested if the indices gave signicantly different results for sites with a bad, poor, moderate and good ecological
water quality. Composition measures performed poorly in most cases, however, abundance, diversity and
richness measures often performed better and tolerance measures, the so-called biotic indices, performed
very well, even indices developed for temperate regions. By using pruned multitarget clustering trees, it was
possible to predict several well-performing ecological water quality indices simultaneously on the basis of the
occurring key macroinvertebrate taxa or, alternatively, on the basis of key environmental variables. In
contrast to unpruned trees, which resulted in complex trees that were difcult to interpret and performed
inferiorly, pruning resulted in transparent trees. Water quality indices scored high when Hydropsychidae
were present and even higher when in addition also Megapodagrionidae were present. When no
Hydropsychidae nor Libellulidae were present, the indices reached the lowest scores. However, this model
based on key taxa occurrences did not perform well during validation. Water quality indices scored higher
with increasing dissolved oxygen concentrations and a strong current velocity. The latter model based on
environmental variables also performed well during validation. In the presented study, the ecological water
quality could thus be accurately predicted solely on the basis of dissolved oxygen concentration and current
velocity. It can therefore be concluded that multitarget clustering trees can be easily used as a practical tool for
cost-effective decision support by water quality managers.
2011 Elsevier B.V. All rights reserved.
1. Introduction
Assessment of river health using biological methods is currently
commonplace in most temperate countries. Several of these methods
have been standardized and included in national and regional
monitoring programs (De Pauw et al., 2006; Hering et al., 2003),
serving as a basis for policy decisions concerning surface water
management. However, this is not the case in most tropical countries,
where physicalchemical methods, some of which require expensive
laboratory analysis, are predominantly used to assess running water
quality. Since most tropical regions consist of developing countries,
their limited technical and nancial resources for environmental
issues constrain the establishment of national monitoring programs
and therefore, cost-effective monitoring programs are needed. After a
Corresponding author at: Ghent University, Laboratory of Environmental Toxicology and Aquatic Ecology, J. Plateaustraat 22, 9000 Ghent, Belgium. Tel.: + 32 9 2643996;
fax: + 32 9 2643766.
E-mail address: Koen.Lock@UGent.be (K. Lock).
1574-9541/$ see front matter 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.ecoinf.2011.05.004
304
et al., 2009). The latter technique was already used to predict the
presence of several alien macroinvertebrates based on the measured
environmental variables (Everaert et al., in press) and is potentially also
a suitable tool in water quality assessments.
The general objectives of the present study were (1) to evaluate the
suitability of several biological indices for water quality assessment of
the Chaguana watershed in Ecuador and (2) using multi-target
clustering trees to predict several reliable biological indices simultaneously on the basis of the presence of key macroinvertebrate taxa or,
alternatively, based on key environmental variables. In this way, it was
assessed whether multi-target clustering trees could be used as a tool for
cost-effective water quality assessment.
11
10
1
17
12
22
14
13
28
16
20
15
26
24
29 23
27
21
25
19
18
6
9
2
km
Fig. 1. Location of the sampling stations in the Chaguana watershed, with indication of the sites with a high (black), an intermediate (grey) and a low human impact (white).
conicts. In total, 104 samples were taken: 29 during the dry season of
2005, 29 during the wet season of 2005, 24 during dry season of 2006
and 22 during the wet season of 2006. Macroinvertebrate samples were
always collected by the same operator by means of a standard hand net
consisting of a metal frame holding a conical net (20 30 cm, 300 m
mesh size). Sampling duration was 3 min active sampling in 2005, in
2006 sampling duration was increased to 10 min active sampling.
Organisms were collected from the different habitats present at the
sampling site. Rife habitats were sampled by holding the net
downstream while the operator disturbed the substratum by kicking
directly in front of the net opening. Stream edge habitats were sampled
by vigorously sweeping along the stream margins disturbing bottom
and bank substratum. The objective of the sampling was to collect the
most representative taxa of macroinvertebrates at the site examined.
After separation, macroinvertebrates were identied under a stereomicroscope. The taxonomical knowledge of stream fauna in Ecuadorian
streams is still scarce; therefore aquatic insects were identied at family
level with the available literature containing identication keys and
descriptions of the riverine fauna of the region (Domnguez et al., 1994;
Fernndez and Dominguez, 2001; Roldn, 1988). Non insects were
mostly identied at higher taxonomic levels.
2.3. Indices
This study deals with the performance of biological methods
developed in Western Europe, North, South and Central America,
305
Africa, Asia and Australia for the assessment of the Chaguana river
basin. The Biological Monitoring Working Party (BMWP) (Armitage et
al., 1983), which is the water quality index used in the UK, has its
origin in the Trent Biotic Index, the rst biotic index developed for the
assessment of running water. The BMWP was improved by Walley
and Hawkes (1996, 1997). If the BMWP index is divided by the
number of scoring families present in the taxa list, the result is known
as the Average Score Per Taxon (ASPT) index. The BMWP was adapted
for Colombia (BMWP/Col) by Roldn (2003) and for Costa Rica
(BMWP/CR) by Astorga et al. (1997). The Stream Invertebrate Grade
Number Average Level index (SIGNAL) was developed by Chessman
(1995) for the assessment of organic pollution in running waters of SE
Australia and later adjusted for application in the whole country
(Chessman, 2003). This index can be calculated with and without the
abundance weighing. Also the Nepalese Biotic Score (NEPBIOS)
(Sharma and Moog, 1998) is an adaptation of the BMWP and Mustow
(2002) developed the BMWP for Thailand (BMWP THAI). The South
African Scoring System (SASS) was originally developed by Chutter
(1972) for river quality assessment in South Africa. Several improvements have been made and here, the fth version (SASS5) was
applied (Dickens and Graham, 2002). The Family Biotic Index (FBI)
was developed by Hilsenhoff (1988) for application in the US. The
Iberian BMWP (IBMWP) was adapted for the Iberian Peninsula (AlbaTercedor and Sanchez-Ortega, 1988). When available, latest versions
of these indices were applied aiming to take into account their recent
improvements (e.g. updated tolerance values, inclusion of new taxa).
Table 1
Spearman rank correlations between the years 2005 and 2006 and between the dry and the wet season. The discriminative power (MannWhitney U test) is given for the samples
with a bad, poor, moderate and good ecological quality according to the BMWP-Colombia.
Year
Abundance measures
# Individuals
Diversity measures
Margalef
Shannon
Simpson
Evenness
Richness measures
# Taxa
# EPT taxa
# Ephemeroptera taxa
# Plecoptera taxa
# Trichoptera taxa
# Diptera taxa
Tolerance measures
BMWP
BMWP-ASPT
IBMWP
IBMWP-IASPT
Family Biotic index
BMWP/Col
BMWP-ASPT/Col
BMWP (CR)
BMWP-ASPT (CR)
Signal 2 score (ab.)
Signal 2 score (not ab.)
SASS5
SASS5-ASPT
NEPBIOS
NEPBIOS-ASPT
BMWPTHAI
BMWP-ASPTTHAI
Composition measures
% Hydropsychidae of Trichoptera
% EPT
% Ephemeroptera
% Trichoptera
% Diptera
% Chironomidae
Season
Discriminative power
Badgood
Badmoderate
Poorgood
Badpoor
Poor_moderate
Moderategood
0.19
nc
0.33
***
***
***
**
**
0.55
0.64
0.56
0.15
***
***
***
nc
0.49
0.49
0.31
0.26
***
***
*
nc
***
***
***
***
***
**
***
***
***
***
**
***
**
***
***
**
0.47
0.72
0.60
0.80
0.60
0.55
**
***
***
***
***
***
0.42
0.59
0.53
0.29
0.44
0.56
**
***
***
*
**
***
***
***
***
**
***
***
***
***
***
***
***
**
***
***
***
***
***
*
***
***
***
***
***
*
***
***
***
**
***
***
0.59
0.66
0.56
0.40
0.052
0.64
0.53
0.66
0.74
0.69
0.68
0.68
0.63
0.68
0.66
0.64
0.52
***
***
***
**
nc
***
***
***
***
***
***
***
***
***
***
***
***
0.47
0.53
0.45
0.32
0.00054
0.61
0.39
0.64
0.62
0.64
0.59
0.60
0.50
0.60
0.68
0.57
0.46
***
***
***
*
nc
***
**
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
*
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
**
***
***
***
***
***
**
***
***
***
***
***
***
***
***
0.44
0.59
0.48
0.75
0.046
0.012
*
***
**
***
nc
nc
***
**
***
***
*
***
0.056
0.16
0.23
0.58
0.080
0.096
nc
nc
nc
***
nc
nc
***
***
***
***
***
***
**
***
*
***
***
*
***
***
***
*
***
*
***
**
***
*
**
*
*
***
***
306
present
vector of class values, instead of storing a single class value like singletarget classication trees do. This means that each component of the
vector is a prediction for one of the target attributes. Multi-target
clustering trees were constructed by top-down induction and the trees
were pruned by only generating clusters with at least 10 instances in
each subset. The stability of the trees was maximised using a 10-fold
cross-validation procedure. Model performance was evaluated based on
Pearson correlation.
Megapodagrionidae
present
absent
present
Hydropsychidae
absent
3. Results
Libellulidae
absent
The total number of individuals was not constant over the years
because the sampling effort was increased during the second year of
sampling, however, the number of individuals already gave a
relatively good idea of the ecological quality (Table 1). The diversity
numbers of Margalef and Shannon performed very well, while the
Simpson index performed less good and the evenness performed very
badly: results varied depending on the year and the season and it had
a very low discriminative power. Most richness measures, especially
the number of EPT taxa, were good indicators for the water quality:
constant results were obtained over the years and the seasons and
discriminative power was high. Only the number of Plecoptera taxa
did not perform that good. With one exception, the biotic measures
performed very good: results were hardly affected by the year or the
season of sampling and the discriminative power was usually high.
However, the Family Biotic Index varied between the years and the
seasons and only the sites with a high and a low human impact could
be separated. The discriminative power of the indices was usually
higher than the average score per taxon (ASPT) variant of the
respective indices. The composition measures were poor indicators,
only the fraction Trichoptera performed relatively well.
Using a multi-target clustering tree, three ecological water quality
indices with a good performance (number of EPT-taxa, BMWPColombia and BMWP-Costa-Rica) were predicted based on the
occurring macroinvertebrate taxa (Pearson R = 0.79) (Fig. 2). If
Hydropsychidae were present, water quality was better. When in
addition also Megapodagrionidae were present, the highest water
quality scores were obtained. However, when no Hydropsychidae nor
Libellulidae were present, the lowest scores were predicted. However,
when the model developed for the samples of 2005 was validated
with the samples of 2006, the ecological water quality could not be
predicted accurately (Pearson R = 0.21).
The ecological water quality indices could also be predicted on the
basis of the environmental variables (Pearson R = 0.73) (Fig. 3). When
the dissolved oxygen concentration was lower than 6.83 mg l 1, the
lowest ecological water quality scores were found. When the dissolved
Fig. 2. Multi-target clustering tree which predicted the number of EPT-taxa, the BMWPColombia and the BMWP-Costa Rica based on the occurrence of macroinvertebrate
taxa.
Apart from biotic indices, also several diversity indices were calculated:
the Margalef (1951) index, the Simpson (1949) index, the Shannon
Wiener index (Shannon and Weaver, 1963) and the Shannon evenness
index, which is calculated by dividing the ShannonWiener index by the
natural logarithm of the number of taxa.
2.4. Statistics
To check the robustness of the biological indicators, Spearman
rank correlations were applied between the years 2005 and 2006 and
between the dry and the wet season. As the BMWP-Colombia
included the highest number of taxa that were present in the
Chaguana watershed and because Colombia is the country which is
closest to Ecuador of all countries which developed biological indices,
the performance of all indices was evaluated based on the outcome of
the BMWP-Colombia. In addition, the BMWP-Colombia best reected
the three classes of human impact, which were separated based on the
macroinvertebrate community composition using multivariate analysis (Fig. 1) (Dominguez-Granda et al., in press). Four water quality
classes were recognised based on outcome of the BMWP-Colombia:
bad (040), poor (4170), moderate (71100) and good (N100). The
MannWhitney U Test was used to identify signicant differences
between sites with different water quality classes.
Based on a training set consisting of the 58 samples taken in 2005,
multi-target clustering trees were built using CLUS (Blockeel and Struyf,
2002). A test set consisting of the 46 samples taken in 2006 was used for
model validation. The leaves of a multi-target classication tree store a
Dissolved oxygen
6.83 mg.l-1
Number of EPT-taxa = 2.0
BMWP Colombia = 35
BMWP Costa Rica = 24
>6.83 mg.l-1
Current velocity
32 cm.s-1
>32 cm.s-1
Dissolved oxygen
7.62 mg.l-1
>7.62 mg.l-1
Number of EPT-taxa = 6.3
BMWP Colombia = 122
BMWP Costa Rica = 93
Fig. 3. Multi-target clustering tree which predicted the number of EPT-taxa, the BMWP-Colombia and the BMWP-Costa Rica based on environmental variables.
307
308