Anda di halaman 1dari 16

The Langat River Water Quality Index Based on Principal

Component Analysis
Zalina Mohd Alia,d, Noor Akma Ibrahima, Kerrie Mengersenb, Mahendran Shitana
and Hafizan Juahirc
a

Institute for Mathematical Research, Universiti Putra Malaysia,


43400 UPM Serdang,Selangor DE, Malaysia
b
School of Mathematical Sciences,Queensland University of Technology,
Brisbane, Australia
c
Faculty of Environmental Studies, Universiti Putra Malaysia,
43400 UPM Serdang,Selangor DE, Malaysia
d
School of Mathematical Sciences, Faculty of Science and Technology
Universiti Kebangsaan Malaysia, 43600 UKM Bangi,
Selangor DE, Malaysia
Abstract. River Water Quality Index (WQI) is calculated using an aggregation function of the six water quality subindices variables, together with their relative importance or weights respectively. The formula is used by the Department
of Environment to indicate a general status of the rivers in Malaysia. The six elected water quality variables used in the
formula are, namely: suspended solids (SS), biochemical oxygen demand (BOD), ammoniacal nitrogen (AN), chemical
oxygen demand (COD), dissolved oxygen (DO) and pH. The sub-indices calculations, determined by quality rating curve
and their weights, were based on expert opinions. However, the use of sub-indices and the relative importance established
in the formula is very subjective in nature and does not consider the inter-relationships among the variables. The
relationships of the variables are important due to the nature of multi-dimensionality and complex characteristics found in
river water. Therefore, a well-known multivariate technique, i.e. Principal Component Analysis (PCA) was proposed to
re-calculate the waterquality index specifically in Langat River based on the inter-relationship approach. The application
of this approach is not well-studied in river water quality index development studies in Malaysia. Hence, the approach in
the study is relevant and important since the first river water quality development took place in 1981. The PCA results
showed that the weights obtained indicate the difference in ranking of the relative importance for particular variables
compared to the classical approaches used in WQI-DOE. Based on the new weights, the Langat River water quality index
was calculated and the comparison between both indexes was also discussed in this paper.
Keywords: Water Quality Index (WQI), Principal Component Analysis (PCA), Langat River
PACS: 92.40.qc, 92.40.Cy, 92.40.qh

INTRODUCTION
Water quality is generally described according to biological, chemical and physical properties [1]. Based on
these properties, the quality of water can be expressed via a numerical index (i.e. Water Quality Index (WQI)) by
combining measurements of selected water quality variables. The selected water quality variables were identified
with respective weights and the determining processes were based on personal evaluation, namely, opiniongathering techniques [2]. The weighting assigned to the selected variables was based on the relative importance
given by the experts [3]. The weights determination technique also retained by other researchers includes the
Malaysian Department of Environment (DOE) [4]. The selected variables, together with respective weights, are
applied to calculate water quality index in all rivers in Malaysia. Due to varying characteristics for each river, the
weights for water quality variables may be different for separate rivers. Therefore, it is clear that the existing
weights of the selected variables, as per DOE are subjective in nature and no detailed studies have been done to
determine the weights objectively. The objective weights can also be obtained by using multivariate statistical
techniques, i.e. principal component analysis (PCA).
Principal component analysis (PCA) is a statistical method used to determine a new set of artificial variables,
namely principal components (PCs).These are linear combinations of the original variable. This method is also
known as a variable reduction procedure. It obtains a small number of components that subsequently clarify the
largest variability of the original data set. In river water quality data analysis, PCA has been utilized to characterize
Proceedings of the 20th National Symposium on Mathematical Sciences
AIP Conf. Proc. 1522, 1322-1336 (2013); doi: 10.1063/1.4801283
2013 AIP Publishing LLC 978-0-7354-1150-0/$30.00

1322

and evaluate freshwater quality for different seasons [5], to divide selected water quality variables into groups [6]
and to identify factors based on water quality variables compositional patterns that influence particular regions [7].
PCA has also been used for water quality index development. The water quality index calculation is based on
modified un-rotated principal components and provides valuable insights into the relationship between water quality
and biological community composition [8]. In addition, the water quality index can also be calculated as the
weighted sum of all principal component scores [9] or the sum of selected principal component scores [10]. The
PCA approach was also found to be useful in developing a coastal marine eutrophication index based on the first
principal component criteria accounting for more than 50% of the whole variation [11]. In their study, no data
transformations were considered and data was randomly selected from three different coastal areas.
In river water quality, data with no gaps were used in all of the index developments based on PCA analyses. In
addition, selected imputation techniques have been done to estimate missing values [10]. However, there is a large
number of missing data at all sampling sites in this study (refer Figure 1). As a result, the traditional PCA approach
(i.e. either no normality assumptions or no outliers detection) with complete case analysis for data dimension
reduction may result in misleading conclusions in determining the index. Due to this problem, we have not
considered unobserved data in this study. Since missing data was not imputed in this study, we have limited the
analysis to dealing with water quality data using a parametric approach (i.e. assuming that the distribution of water
quality measurement is approximately multivariate normal with no extreme outliers). We firmly believe that the
PCA approach works better under the multivariate normality assumptions and that the assumptions made will be
useful in future inferences. We may consider the PCA with missing data values in our next analysis and propose that
the extreme values (outliers) should be considered in a different analysis or can be maintained in the same analysis
using robust PCA. Instead of using the principal component scores, researchers also used information from rotated
principal components to develop an index [12]. In addition, the researchers also summarized information from
rotated principal components in order to calculate positive relative weights that were used in the index calculation
[13]. However, negative relative weights are also important, especially in river water quality, as it shows the natural
effect between the variables. Therefore, in this study, we examine the possibility of developing a PCA index, taking
into account six physicochemical variables, and suggest new scales for assessing water quality on the proposed
index. The proposed index is based on new statistical weights defined by the variables statistical importance from
the combination of more than one PC with consideration given to negative PC loadings (i.e. correlations between the
PCs and standardized data). The proposed methodology is applied to the Langat River in Selangor, Malaysia. The
results obtained are compared with the existing PCs method and the relationship between the new index and DOEWQI is also discussed.

FIGURE 1. Plots of dissolved oxygen (DO) in the lower stream at Langat River from September 1995 December 2007

METHODS
Data and Monitoring Sites
In this analysis, data for Langat River was collected based on the availability of recorded data from 1995-2007.
Five main monitoring stations were selected, as illustrated in Table 1, and the location of the selected stations is

1323

shown in Figure1. Data from 2008-2009 was used as an independent dataset for the validation of the proposed PCA
index.
TABLE(1). DOE sampling station at the study area
DOE Station
Number
2814602
2815603

Station
Number
1
2

DOE Station
Code
IL01
IL02

Distance From
Estuary (Km)
4.19
33.49

Grid Reference

Location

252.027 10126.241
248.952 10130.780

2817641
2917642
3118647

3
4
5

IL03
IL05
IL07

63.43
86.94
113.99

251.311 10140.882
259.533 10147.219
309.953 10150.926

Air Tawar Village


TelokDatuk, near Banting
town
Bridge at Dengkil Village
Kajang Bridge
Bridge at Batu 18

FIGURE2. Locations of the selected sampling stations

Based on the collected data, the water quality status in Langat River can be evaluated and this can be obtained by
using Water Quality Index (WQI) calculation. The WQI is formed as a weighted sum of six selected water quality
variables, namely: Suspended Solids (SS), Biochemical Oxygen Demand (BOD), Ammonia Nitrogen (AN),
Chemical Oxygen Demand (COD), Dissolved Oxygen (DO) and pH4. These variables were selected by a panel of
experts as being the variables that will give some indication on the water quality level or water quality index of a
river. The relative importance or weights determined by the experts are shown in Table 3. The weights for each
variable indicate the relative importance of the variables in determining water quality index in Malaysia and details
are discussed in Mustapha (1981) [14].
TABLE(3). The relative importance determined by the experts
Water Quality Variables
Weights
Dissolved Oxygen (%)
0.22
Biochemical Oxygen Demand
0.19
Chemical Oxygen Demand
0.16
Ammonia Nitrogen
0.15
Suspended Solids
0.16
Potential of Hydrogen
0.12

Data Treatment
PCA was applied on physicochemical variables defined by the DOE in Table 3. In this study, the PCA approach
requires variables to conform to a normal distribution. Some variables showed values either too low or too high,
with the skewness and kurtosis being high for the original data. The results in Table 4 show that the DO (%) and PH
were tested as normal and the other water quality variables were identified as being non-normally distributed. All
the non-normal variables were log10 transformed. This method is very common in environmental data [15]. After the
selected water quality data was transformed, univariate normality for each variable was checked and the results are
as shown in Table 5. This is a normal practice among researchers as part of data screening. Detection on the

1324

presence of outliers in water quality data is also important due to the effects on the normality distribution of the data.
Detailed discussion on the evaluation of water quality data for outliers can be found in Robinson et al. [16]. Initially,
the univariate approach in water quality considers whether the variables are independent and not correlated. The
approach may not be very helpful in checking each of them separately due to the nature of the relationship existing
among water quality data statistics. To the best of our knowledge, most of the researchers in the previous studies did
not discuss in detail the existence of outliers in water quality data in their analyses. Apart from that, the outliers were
maintained in the multivariate analysis as stated in Gazzaz et al. [17]. Since the outliers detection is also important
in the analysis, the commonly-used practice may not be very beneficial in confirming multivariate normality. If the
multivariate normal variables are proven, then the variables are considered to be univariate normal as well.
However, the univariate normal variables are not necessarily multivariate normal [18]. Therefore, multivariate
outlier detections must be screened and this can be achieved by using Mahalanobis distance or their associated pvalue threshold of 0.001. The data presents as multivariate normal if the distances plot forms a straight-line pattern
or the associated p-value > 0.001. If the associated p-value is smaller than the threshold, then we can say that the
influential outlier was detected. We should stress that the influential outliers detected from the Mahalanobis distance
were not considered in our study. This is due to the sensitivity of the presence of outliers in the multivariate
techniques [18].
TABLE (4). Descriptive Statistics (n=466) of the untransformed selected Water Quality Variables in the Langat River
Variables
Min
Max
Mean
Standard
Median
Trimmed Mean Skewness
Kurtosis
Deviation
(5%)
DO
0.00
116.20
61.48
27.48
64.65
62.22
-0.36
-0.71
BOD
0.30
80.80
5.71
7.27
4.00
4.66
4.91
35.86
COD
0.50
3850.00
49.70
185.46
33.00
35.86
18.93
382.73
AN
0.0050
14.30
1.17
1.30
0.78
1.035
3.17
22.56
SS
0.50
5010.00
294.43
490.19
157.00
220.89
5.15
38.51
PH
3.60
8.34
6.66
0.75
6.84
6.72
-1.40
2.55

Principal Component Analysis


PCA was applied on the standardized log transformation physicochemical variables after the multivariate normal
variables are proven. The multivariate normality result on the descriptive analysis is shown in Table 6 and the
visualization results in Figure 3. The use of standardization minimizes the differences scales in measurement units
and variance [19]. All six PCs were derived easily from SAS PRINCOMP procedure and default settings in SAS
PROC FACTOR procedure. The results from both procedures remained the same and additional information on the
factor loadings based on PCA was obtained from the PROC FACTOR procedure. The PCA scores can be calculated
by using equation 1.
PCi

ei1Z1 j  ei 2 Z2 j } ei 6 Z6 j

(1)

where PC is the principal component score, e is the component loading or weights obtained from the eigenvector,
Z is the standardized transformed data, i is the component number and j is the sample number. In PCA, the first
PC signifies the largest variability of the original data set and is obtained from the linear combination of the
variables with maximal variance. The second PC is the linear combination with the next largest variability and is
non-correlated (orthogonal) to the first component. All PCs are arranged in decreasing order of importance
according to their variability. Usually, the WQI was calculated as the weighted sum of site score from all possible
axes where, theoretically, the possible axes are equivalent to the number of variables used. Instead of using all the
PCs to calculate the WQI score as suggested by Chow-Fraser [9], or selected PCs based on the eigenvalues of 1 or
greater as proposed by Kaiser in 1958 [20], we calculated the index based on the idea of positive weights loading in
rotated PCs by Hurdlikova and Fischer [13] with consideration given to the negative weight loading in un-rotated
PCs.

1325

Variables
DO
BOD
COD
AN
SS
PH

Variables
DO
BOD
COD
AN
SS
PH

TABLE (5). Descriptive Statistics (n=453) of the transformed selected Water Quality Variables
Min
Max
Mean
Standard
Median
Trimmed Mean
Skewness
Deviation
(5%)
0.00
116.20
61.53
27.26
64.70
62.28
-0.37
-0.30
1.91
0.57
0.37
0.60
0.56
0.29
0.60
2.22
1.51
0.24
1.52
1.52
-0.23
-2.30
1.16
-0.32
0.81
-0.11
-0.27
-1.29
-0.30
3.70
2.06
0.71
2.20
2.10
-0.92
4.05
8.34
6.67
0.71
6.83
6.72
-1.20
TABLE (6). Descriptive Statistics (n=446) of the transformed selected Water Quality Variables
Min
Max
Mean
Standard
Median
Trimmed Mean
Skewness
Deviation
(5%)
0.00
116.20
61.6
27.33
64.95
62.36
-0.37
-0.30
1.91
0.58
0.37
0.60
0.57
0.31
0.70
2.22
1.52
0.23
1.52
1.52
-0.07
-2.30
1.16
-0.32
0.80
-1.3
-0.11
-0.26
-0.30
3.70
2.06
0.70
2.2
2.1
-0.94
4.05
8.34
6.69
0.68
6.84
6.74
-1.12

Kurtosis
-0.65
0.05
1.44
0.90
1.07
1.81

Kurtosis
-0.65
0.05
1.20
0.95
1.16
1.64

FIGURE 3. Probability plots of Mahalanobis Distance before Outliers Deletion (Left) and after Outliers Deletion (Right)

RESULTS
Correlation Analysis
The Pearson correlation matrix for the variables is found in Tables 7-8 and most of the paired water quality
variables show similar results. However, the relationship between SS-BOD, SS-COD, COD-DO, and COD-BOD
before and after outliers deletion shows a slight difference. A significant positive correlation was found in most of
the paired water quality variables, excluding negative correlation between DO and other pollutants. A significant
positive correlation was also found between the two indicators, i.e. DO and PH. A weak positive correlation between
BOD-PH shows that these two variables maybe redundant.
TABLE (7). The Pearson correlation matrix of the Untransformed Selected Water Quality Variables (N=466)
Variables
DO
BOD
COD
AN
SS
BOD
-0.355 (0.000)
COD
-0.427 (0.000)
0.532 (0.000)
AN
-0.504 (0.000)
0.460 (0.000)
0.315 (0.000)
SS
-0.488 (0.000)
0.411 (0.000)
0.474 (0.000)
0.464 (0.000)
PH
0.508 (0.000)
0.007 (0.875)
-0.104 (0.024)
-0.283 (0.000) -0.272 (0.000)
* corresponding p-values in brackets.

1326

TABLE (8). The Pearson correlation matrix of the Transformed Selected Water Quality Variables (N=446)
Variables
DO
BOD
COD
AN
SS
BOD
-0.366 (0.000)
COD
-0.501 (0.000)
0.629 (0.000)
AN
-0.502 (0.000)
0.480 (0.000)
0.388 (0.000)
SS
-0.484 (0.000)
0.385 (0.000)
0.513 (0.000)
0.476 (0.000)
PH
0.530 (0.000)
0.010 (0.836)
-0.182 (0.000) -0.289 (0.000)
-0.285 (0.000)
*corresponding p-values in brackets.

River Water Quality Index Development


The eigenvalues and contribution of the principal components are listed in Table 9. Eigenvalues of the first,
second and third principal components were 3.05, 1.15 and 0.60 respectively. The respective contributions (in
percentage) for each principal component were 51%, 70% and 80%. Only the results for the first and second
principal components were discussed (i.e. the eigenvalues were greater than 1). The eigenvectors (loadings) for each
axis are shown in Table 10. The loadings indicate the relative importance of each variable within the individual
axes. The importance can be determined based on the absolute magnitude of the eigenvector loadings. No specific
rules were used for picking out the loadings and large loadings were chosen based on the values which are greater
than 0.40 in this study. The eigenvectors were then used to determine the latent variables (i.e. PC scores that signify
water quality scores [10]. When normalized, the scores will give the values of the water quality indices under
normality distribution. The calculated indices are in the area under the curve and may be regarded as a degree of
pollution value [12]. The values are then multiplied by 100 to give a range between 0-100, with zero representing
good water quality. Conversely, other researchers used the sum of weighted PC scores to obtain water quality
indices with zero representing low water quality [9]. The number of variables entered into the PCA will give a
possible similar number of PCs. In our case, since six variables were entered, six possible axes or PCs were fitted.
From Table 10, the first PC appears to have large positive loadings on the COD, AN, SS and large negative loadings
on the DO. This means that all four variables determine more of the variance explained by the first PC. The second
PC has a large loading on the BOD and pH and this variable determines more of the variance explained by the
second PC. The results also appear consistent with the correlation analysis.
TABLE (9). Summary of eigenvalues produced by PCA using Standardized Values of Six Water-Quality Variables
PC Axis
Eigenvalue
Proportion Of Variation Explained Cumulative Proportion Of Variation
Explained
1
3.05
0.51
0.51
2
1.15
0.19
0.70
3
0.60
0.10
0.80
4
0.55
0.09
0.89
5
0.35
0.06
0.95
6
0.29
0.05
1.00

Variables
DO
BOD
COD
AN
SS
PH

TABLE (10). Eigenvectors produced by PCA using Standardized Values of Six Water-Quality Variables
PC1
PC2
PC3
PC4
PC5
PC6
0.31
0.133
0.24
0.77
0.16
-0.45
0.40
0.01
-0.36
0.26
-0.61
0.53
0.29
-0.54
-0.084
0.13
0.64
0.44
0.005
0.82
-0.08
0.06
0.37
0.43
-0.003
-0.04
0.87
0.03
-0.22
0.43
-0.27
0.14
0.193
-0.56
0.11
0.73

Results on the un-rotated PCA/FA of the correlations coefficients (i.e. loadings between the two principal
components and the standardized water quality variables) are shown in Table 11. The loadings are classified as
strong or high to absolute loading values of > 0.75, moderate to the values of 0.75-0.50 and weak or low to the
values of 0.50-0.30 [21]. The high loadings between the first PC and a variable indicate that the variable is related to
the maximum amount of variation in the dataset. A strong association between the second PC and a variable
indicates that the variable is responsible for the next largest variation in the data perpendicular to the first PC. The
sum of the squared loading for each principal component is the percentage of variance in that variable explained by
the principal component. The normalized squared component was then used to group the selected highest loading to
the lowest loading of the variables into temporary loadings [22]. First temporary loadings include DO (with a weight

1327

0.21), BOD (0.16), COD (0.20), AN (0.18), SS (0.18) and the second temporary loading is formed by pH (0.79).
Subsequently, the actual weights were obtained by assigning temporary loadings for each variable with a weight to
each of them, i.e. the weight is equal to the proportion of the explained variance: 0.73=3.05/ (3.05+1.15) and 0.27
for the second. Finally, with preserving the negative sign of DO on the PCA loadings into the actual weights, the
new PCA-weights can be determined by normalizing the actual weights. The negative weights (as well as positive
weights) should be maintained in calculating the index as long as the sum of weights is greater than zero.
The negative loading on DO shows the natural effect between DO and other pollutant variables. The negative
loadings on DO were also reported in other studies [7, 6, 23, 15, 17]. These are suspected to have come from
domestic wastewater, wastewater treatment plants, industries and agricultural activities. Hence, it is clear that the
increase in organic matter will decrease the DO [7]. Apart from that, DO was eliminated from WQI calculation due
to the negative weights or loadings [24]. However, we preserved the negative sign of DO in the component loadings
to the new weights due to the natural effect (i.e. the influence of DO in determining the quality of water with
presence of the pollutants). For comparability, the final PCA weights were rescaled to sum up to one (i.e.
normalization of the actual weights). The weights determined were based on the effect of decreasing or increasing
the water quality variable to the quality of water. For instance, the negative sign DO shows the effect of decreasing
DO, and the higher the effect, the DO score (i.e. the negative weights of DO multiplied by the standardized value of
DO) will be higher at the polluted area compared to the clean area (refer Figure 4). Conversely, the new positive
weights of the pollutants (i.e. BOD, COD, AN and SS) show the increasing effect of the pollutants to the river water.
The higher the effect of the pollutants, the higher the pollutant score will be. Positive BOD and COD related to
anthropogenic pollution sources, and were expected to come from point sources pollution such as sewage treatment
plants and industrial effluents. Positive BOD and AN also represent the influence of organic pollutants from point
sources (such as discharge from wastewater treatment plants, domestic wastewater and industrial effluent), while
positive COD and SS were explained as being the erosion from upland areas during rainfall events and soil
cultivation. The presence of SS in water quality also explained discharge from urban development areas involving
land clearing or specifically as surface runoff sources. Detailed information on the pollution sources can be found in
Juahir et al. [7] and Mohd Nasir et al. [15].
On the other hand, the weights determined by DOE-WQI show the relative importance of the variables in
determining the quality of water. It means that the higher the weights the more important the variables should be.
However, the natural effect of the water quality variables which are based on the inter-relationship between the
variables was not clear from the DOE-WQI weights. Therefore, we firmly believe the new weights obtained in this
study will give beneficial information on the status of the river from a different perspective.
TABLE (11). Summary of Component Loadings and New PCA Weights
Component
Squared component
Temporary
Actual
Variables
loadings
loadings
(normalized)
loadings
Weights
PC 1
PC 2
PC 1
PC 2
DO
0.33
0.09
0.21
-0.15
-0.79
0.21
BOD
0.57
0.28
0.16
0.11
0.69
0.16
COD
0.32
0.09
0.20
0.14
0.78
0.20
AN
0.01
0.00
0.18
0.13
0.74
0.18
SS
0.00
0.00
0.18
0.13
0.75
0.18
PH
-0.48
0.08
0.54
0.15
0.79
0.54
Variation Explained, VE
3.05
1.15
Proportion of VE
0.73
0.27

PCA
Weights
-0.29
0.22
0.28
0.25
0.26
0.29

The PCA approach used in this study summarizes the relative importance based on the inter-relationship between
the variables. The results from Table 11 show that all variables signified similar influences in determining the
quality of water. In detail, the highest relative importance is PH and DO with the lowest being BOD. The negative
statistical weight of DO maintained the natural relationship between DO and other pollutants. The positive statistical
weight of PH also shows the natural effect of water quality variable to the quality of water. A study done by Mamun
et al. [25] suggested that pH should be monitored to assess the suitability of water for other usages. From the expert
opinion approach on the other hand in Table 3, it can be seen that DO (%) is claimed as having the highest relative
importance, followed by BOD, with the lowest being PH. The weight assigned in DOE-WQI was in accordance with
its relative importance in the overall quality of surface water for general purposes. Hence, we may consider that the
strong negative weights on DO in the new WQI formula are related to the organic pollution suspected as coming

1328

from domestic wastewater, wastewater treatment plants, industries, agricultural activities and forest areas [7].
Therefore, the following expression is used to calculate the PCA-WQI.

PCA  WQI

0.29ZDO  0.22ZBOD  0.28ZCOD  0.25ZAN  0.26ZSS  0.29ZPH

(2)

The PCA-WQI values in this study are in the interim between -3 and 3. The PCA-WQI score was transformed to
the percentage of the normal score as a PCA-WQI final score. The final score was more comparable with DOEWQI, as the index is in the range of 0-100. The relationship between DOE-WQI and PCA-WQI in Figure 4 clearly
indicates that the better the quality of the water, the lower PCA-WQI values.

FIGURE 4. Plot of DOE-WQI versus PCA-WQI values

Validation Analysis
In this study, we proposed new scores of the individual variables that preserve the same pattern as the first two
PCs scores. Both of the PCs scores were calculated based on the weighted principal component scores as suggested
by Chow-Fraser [9], since the results were more encouraging for our data. From Figure 5, the same pattern was
found between PC1 score and the new scores for the first important variables (i.e. DO, COD and SS). However,
there was a slight difference between the BOD and AN patterns. A similar pattern was also found between PC2
score and the new score for PH. A general score of WQI was then calculated as:(i) weighted sum of six PCs, (ii)
weighted sum of two PCs, and (iii) sum of the new weight PCA-WQI. The average of the general score (except the
score in 1995 at Station 1) were calculated and plotted as shown in Figure 6. The plots show similar patterns in the
scores (except for data in 1995 at Station 2). The results confirmed the potential of the third method, i.e. the new
score of WQI, which is easier in calculation than the existing method using the weighted sum of six and two PCs.
The new WQI calculated were clustered in particular groups, using the hierarchical agglomerative clustering
analysis (HACA) (i.e. Wards method with Euclidean distance as a measure of similarity). The calculation can be
easily computed in SPSS package and the results were summarized in Table 12 below.
Group
1
2
3

TABLE (12). Summary of the New PCA-WQI based on HACA


Min
Max
Mean
SD
Status
27
74
53.57
13.36
Slightly Polluted
75.00
99.00
84.68
6.87
Polluted
1.00
25.00
9.70
7.10
Clean

1329

FIGURE 5. Plot of the PC Scores and the Individual Variables Scores

1330

We re-classified the PCA-WQI index range and slightly modified the index range for items of slightly polluted
status, so it will be comparable with DOE-WQI findings that are based on an expert opinion (EO) approach as
shown in Table 13. The new groups with selected ranges for PCA-WQI (refer Table 14) were used in the validation
part for independent data from 2008-2009.
Group
1
2
3

Index Range
1-25 (3)
26-74 (2)
75-99 (1)

Min
81.00
60.00
16.00

TABLE (13). Summary of the DOE-WQI based on EO


Max
Mean
SD
96.00
90.36
4.13
80.00
69.15
5.42
59.00
48.53
9.45

Status
Clean
Slightly Polluted
Polluted

TABLE (14). Index range of Water Quality based on PC-WQI and DOE-WQI
PC-WQI
DOE-WQI
Status
Index Range
Status
Clean
81-100 (1)
Clean
Slightly Polluted
60-80 (2)
Slightly Polluted
Polluted
0-59.4 (3)
Polluted

The descriptive statistics of the transformed variables for 2008-2009 show all values in the range of original
values from data sets and are illustrated in Table 6. The results permitted us to use the standardized transformed data
from 2008-2009 in the new WQI calculation.
TABLE (15). Descriptive Statistics (n=96) of the transformed selected Water Quality Variables for 2008-2009
Variables
Min
Max
Mean
Standard Deviation
Skewness
Kurtosis
DO
38.50
104.80
72.71
15.80
-0.05
-0.63
BOD
0.00
1.18
0.58
0.28
-0.32
-0.21
COD
0.30
1.85
1.35
0.28
-1.08
1.60
AN
-2.30
0.46
-0.62
0.87
-1.07
-0.16
SS
0.00
3.18
2.15
0.63
-1.27
1.83
PH
5.84
7.71
6.83
0.38
-0.47
-0.07

1331

FIGURE 6. Plot of the general scores of WQI for three different methods at station 1-5

1332

Combining the results from both methods, we summarized the general WQI score in Table 16. We then plotted
the data from Table 16, as shown in Figure 7. The plots confirmed the inverse relationship between both methods.

Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009

TABLE (16). Summary of general scores for PCA-WQI and DOE-WQI in Station 1-5
Station 1
Station 2
Station 3
Station 4
N
N
N
N
PC
DOE
PC
DOE
PC
DOE
P
DOE
C
56
62
53
43
57
61
74
52
1
2
2
2
77
46
63
48
62
60
86
35
3
4
4
3
88
44
95
33
68
53
82
40
2
2
4
4
56
58
4
74
39
5
80
48
6
80
49
6
49
62
6
44
61
6
57
64
6
72
53
6
58
60
5
64
43
6
64
54
5
73
53
6
57
56
6
69
48
9
64
54
9
68
62
9
44
67
5
55
51
11
66
60
12
65
61
12
36
68
6
54
58
12
49
69
12
64
65
12
41
75
6
57
63
12
61
67
12
68
61
12
26
76
6
53
64
12
53
70
12
73
59
12
28
74
6
46
72
11
56
72
12
63
69
12
32
77
6
49
72
12
50
75
12
61
69
12
37
73
6
45
70
12
36
76
12
53
68
12
38
72
6
51
66
12
45
72
12
54
68
12

Station 5
PC
DOE

2
11
9
9
4
17
6
2
9
5
5
9
10
5
6

2
2
2
5
5
6
6
6
6
6
6
6
6
6
6

93
92
87
90
93
88
92
94
92
90
93
92
91
93
92

Classification Analysis
A cross-tabulation in the status of both methods was performed, as shown in Tables 17 and 18. The results show
a high percentage in the same category, indicating the ability of the new PCA-WQI to be in the same group with
DOE-WQI or vice versa. Overall, 81.2% of the original data set is classified in the same group and 91.7% for the
independent data sets.
.

PCA-WQI

PCA-WQI

TABLE (17). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 1995-2007
DOE-WQI
Clean
Slightly Polluted Polluted
Total
Percentage of
Same Agreement
18
2
93
78.5
Clean
73
1
53
253
78.7
Slightly Polluted
199
0
10
100
90.0
Polluted
90
74
227
145
446
Total
Percentage of
98.6
87.7
62.1
81.2
Same Agreement
TABLE (18). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 2008-2009
DOE-WQI
Clean Slightly Polluted
Polluted
Total
Percentage of
Same Agreement
0
0
6
100.0
Clean
6
1
0
66
98.5
Slightly Polluted
65
0
7
24
70.8
Polluted
17
7
72
17
96
Total
Percentage of
91.7
85.7
90.3
100.0
Same Agreement

1333

FIGURE 7. Plots of the general scores of PCA-WQI and DOE-WQI in stations 1-5

1334

CONCLUSIONS
The aim of this study was to develop the water quality index procedure in Langat River based on the established
method, PCA, by using water quality data measured during 1995-2007. The distribution of water quality
measurement is assumed to be approximately multivariate normal with no extreme outliers after a thorough
evaluation was performed. Unobserved data were not considered in this study. All selected water quality data were
then analyzed using PCA and the results show strong positive loading on BOD and COD and the relationship
representing influences from Non-Pollution Sources (NPS), such as agricultural activities and forest areas. The
strong negative loadings on DO are related to high levels of organic matter consuming large amounts of oxygen. On
the other hand, the positive loadings on PH shows the natural effect on the body of Langat river water and small
variations of pH were found in Langat River between the periods of this study. Conversely, high variations of other
variables were found in the same river.
The loadings were then re-calculated to perform new statistical weights. The new statistical weights which were
based on the modification of the variable loading makes the WQI calculation easier and simpler to handle. To
validate the new weights, the PCA-WQI was compared with other existing PCs methods. We found that the new
weights used in the PCA-WQI calculation generated fairly similar scores with the existing method of using the
weighted sum of all PCs or selected PCs. The new PCA-WQI also shows the inverse relationship with the DOEWQI. This relationship clearly signifies that the better the quality of the water, the lower are PCA-WQI values. The
results of the water quality status in this study are also consistent with Kambe et al. [10] and Colleti et al. [26]. We
also defined a new index range concerning river water quality status. Based on the new index range, the ability of
the new PCA-WQI to be in the same group with DOE-WQI or vice versa was found to be good. It was classified
with a high percentage in the same category (i.e. 81.2% of the original data set is classified in the same group
compared to 91.7% for the independent data sets). Thus, the simplicity of the proposed PCA-WQI calculation was
very sound and the methodology can be applied to other rivers in Malaysia.

ACKNOWLEDGMENTS
We would like to thank the Malaysian Department of Environment for supplying the data on which this work
was based.

REFERENCES
1. S.E. Cooke, S.M. Ahmed and N.D. Macalphine, Introductory Guide to Surface Water Quality Monitoring in Agriculture,
Alberta: Alberta Agriculture, Food and Rural Development (2005).
2. R. Brown, N. McClelland, R. Deininger, and R. Tozer, Water and Sewage Works, 339-343 (1970) .
3. D.G. Smith, Water Research 10, 1237-1244 (1990).
4. Department of Environment, DOE, Malaysian Environmental Quality Reports, Kuala Lumpur: Ministry of Science,
Technology and Environment (1997).
5. A. Z. Garizi, V. Sheikh and A. Sadoddin, International Journal of Environmental Science and Technology 8, 581-592 (2011).
6. P.T.M. Hanh, S. Sthiannopkao, D. The Ba. and K-W. Kim, J Environ Eng-ASCE 137, 273-283 (2011).
7. H. Juahir, M. S. Zain, M. Yusoff, T. Tengku Hanidza, A. Mohd Armi and M. Toriman, Environmental Monitoring and
Assessment 173, 625-641 (2010).
8. R. Mahmood, J.J. Messer, F.J. Nemanich, C.I. Liff and D.B. George, Reports, Paper 231.
9. P. Chow-Fraser, Development of the Wetland Water Quality Index (WQI) to Assess Effects of Basin-Wide Land-Use
Alteration on Coastal Marshes of the Laurentian Great Lakes in Coastal wetlands of the Laurentian Great Lakes: health,
habitat and indicators, edited by T.P. Simon and P.M. Stewart, Indiana Biological Survey, Bloomington, IN. Chapter 5.
2006, pp. 137-166.
10. J. Kambe, T. Aoyama, A. Yamauchi and U. Nagashima, Journal of Computer Chemistry Japan, 6, 19-26 (2007).
11. I. Primpas, G. Tsirtsis, M. Karydis and G.D. Kokkoris, Ecological Indicators 10, 178-183 (2010).
12. B.N. Lohani, M. Asce and G. Todino, Journal of Environmental Engineering 110, 1163-1176 (1984).
13. L. Hudrlkov and J. Fischer, Journal of Applied Mathematics 4, 291-298 (2011).
14. N.Mustapha, "Indices for Water Quality Assessment in a River", Master Thesis, Asian Institution of Technology, 1981.
15. M.F. Mohd Nasir, M.S. Samsudin, I. Mohamad, M.R.A. Awaluddin, M.A. Mansor, H. Juahir and N. Ramli, World Applied
Sciences Journal 14, 73-82 (2011).
16. R. B. Robinson, M. ASCE., C.D. Cox and K.M.A. Odom, Journal of Environmental Engineering 131, 651-65 (2005).

1335

N.M. Gazzaz., M.K. Yusoff, M.F. Ramli, A.Z. Aris and H. Juahir, Marine Pollution Bulletin 64, 688-698 (2012).
J.J. Hair, R. Anderson, R. Tathma and W. Black, Multivariate Data Analysis, US: Pearson Prentice Hall, 2005.
I. Gupta, S. Dhage and R. Kumar, Indian J. Mar. Sci. 38, 170-177 (2009)
H.F. Kaiser, Psychometrika 23, 187-200 (1958)
Y. Ouyang, P. Nkedi-Kizza, Q.T. Wu, D. Shinde and C.H. Huang, Water Research 40, 3800-3810 (2006).
G. Nicoletti, S. Scarpetta and O. Boylaud., Economics department working papers. No. 26, ECO/WKP(99)18 (2000).
J. Zhao, G. Fu, K. Lei and Y. Li, Journal of Environmental Sciences 23, 1460-1471 (2011).
P. Debels, R. Figueroa, R. Urrutia, R. Barra, X. Niell X, Environ Monit Assess 110, 301322 (2005).
A.A. Mamun and A. Idris, Revised Water Quality Indices for the Protection of Rivers in Malaysia, Twelfth International
Water Technology Conference, IWTC12 2008, Alexandria, Egypt, 2008, pp. 1687-1698.
26. C. Coletti, R. Testezlaf, T.A.P. Ribeiro, R.T.G. de Souza, D.A. Pereira, Revista Brasileira de Engenharia Agrcola e
Ambiental. 14, 517522 (2010).

17.
18.
19.
20.
21.
22.
23.
24.
25.

1336

Copyright of AIP Conference Proceedings is the property of American Institute of Physics and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.

Anda mungkin juga menyukai