ZHANG Shi-wen1, 2, SHEN Chong-yang1, CHEN Xiao-yang2, YE Hui-chun1, HUANG Yuan-fang1 and LAI
Shuang3
1 China Agricultural University/Key Laboratory of Arable Land Conservation (North China), Minstry of Agriculture/Key Laboratory of
Agricultural Land Quality Monitoring, Ministry of Land and Resources, Beijing 100193, P.R.China
2 School of Earth and Environment, Anhui University of Science and Technology, Huainan 232001, P.R.China
Abstract
The spatial interpolation for soil texture does not necessarily satisfy the constant sum and nonnegativity constraints.
Meanwhile, although numeric and categorical variables have been used as auxiliary variables to improve prediction
accuracy of soil attributes such as soil organic matter, they (especially the categorical variables) are rarely used in spatial
prediction of soil texture. The objective of our study was to comparing the performance of the methods for spatial
prediction of soil texture with consideration of the characteristics of compositional data and auxiliary variables. These
methods include the ordinary kriging with the symmetry logratio transform, regression kriging with the symmetry logratio
transform, and compositional kriging (CK) approaches. The root mean squared error (RMSE), the relative improvement
value of RMSE and Aitchison’s distance (DA) were all utilized to assess the accuracy of prediction and the mean squared
deviation ratio was used to evaluate the goodness of fit of the theoretical estimate of error. The results showed that the
prediction methods utilized in this paper could enable interpolation results of soil texture to satisfy the constant sum and
nonnegativity constraints. Prediction accuracy and model fitting effect of the CK approach were better, suggesting that
the CK method was more appropriate for predicting soil texture. The CK method is directly interpolated on soil texture,
which ensures that it is optimal unbiased estimator. If the environment variables are appropriately selected as auxiliary
variables, spatial variability of soil texture can be predicted reasonably and accordingly the predicted results will be
satisfied.
Key words: compositional kriging, auxiliary variables, regression kriging, symmetry logratio transform
of soil data in simulation models have often shown to performance of different prediction methods.
improve model predictions (McBratney et al. 1992;
Lathrop et al. 1995; Lilburne and Webb 2002; Chaplot
RESULTS AND DISCUSSION
2005). The spatial variability of soil texture is not an
independent process and is usually correlated with other
environment variables. The studies on using numeric Prediction of spatial distribution of soil texture
variables and categorical variables to improve the pre- using RK-SLR and OK-SLR methods
diction accuracy of soil attributes such as soil organic
matter have been reported. For example, Hengl et al. Soil texture is impacted by many environment variables
(2004) proposed a methodological framework for spa- (e.g., parent material, climate, topography, geology and
tial prediction of organic matter, pH in topsoil and top- hydrology, human activities) during the long-term
soil thickness by comparing regression kriging with geochemical process. This article selected some environ-
ordinary kriging and plain regression. Chai et al. (2008) ment variables, including the numeric variables (e.g., SOM,
compared the performance of the empirical best linear elevation and soil bulk density) and categorical variables
unbiased predictor (E-BLUP) with residual maximum (e.g., land use types, soil types and parent material) to
likelihood (REML) with that of regression kriging for carry out analysis of variance (ANVON) using SPSS ver.
predicting soil organic matter (SOM) in the presence 20.0 software (SPSS Institute 2012). The original data of
of different external drifts. Zhang et al. (2011) exam- soil texture were transformed by SLR approach, and the
ined whether inclusion of categorical variables can im- predicted results were backtransformed by means of the
prove the accuracy of SOM prediction through sys- antisymmetric logratio transform.
tematical analyses of variability. However, the reports As shown in Table 1, results of one-way ANVON indi-
about uses of numeric variables and categorical vari- cated that sandSLR had significant correlations with soil
ables to improve the prediction accuracy of soil texture types, parent material, soil bulk density, elevation and land
are very few. The objective of the paper is to find an use types, siltSLR had significant correlations with soil bulk
appropriate interpolation method for soil texture pre- density and elevation, and claySLR had significant correla-
diction by testing various spatial prediction methods tions with soil types, parent material and soil bulk density.
which take the interpolation requirements of compo- Because these environment variables are significantly re-
sitional data and auxiliary information into account. lated to different soil particles, they were selected to carry
Compositional kriging (CK), firstly described and ap- out multiple linear stepwise regressions by ordinary least
plied in de Gruijter et al. (1997), was introduced as a squares. Land use types, soil types and parent material
straightforward extension of ordinary kriging that com- were taken as categorical variables.
plies with these constraints. Also, CK appears to be a The assignment of categorical variables in the re-
promising alternative for indicator kriging, because or- gression was as follows: If a categorical variable in-
der relation is implicitly taken into account in the CK cludes n categories, then n-1 dummy variables are
(Isaaks and Srivastava 1989; Walvoort and de Gruijter produced. If any one category is taken as a control
2001; Tan et al. 2009; Zhang et al. 2011). Taking the group, the other categories would be given 1 or 0. This
numeric variables (e.g., SOM, elevation and bulk assignment method ensures the independence of the
density) and categorical variables (e.g., land use types, regression for the independent variable. Details about
soil types and parent material) as auxiliary variable, soil the assignment method can be referred to previous stud-
texture was predicted using ordinary kriging with the ies (SPSS Institute 2012; Zhang et al. 2012). The
symmetry logratio transform (OK-SLR), regression dummy variables derived from the soil type, parent ma-
kriging with the symmetry logratio transform (RK-SLR) terial and land use type were denoted as “X11, X12,
and CK methods. The root mean squared errors X13, X14”, “X21, X22, X23,X24, X25, X26”, and
(RMSE), the relative improvement (RI) values of “X31, X32, X33, X34”, respectively.
RMSE, the mean squared deviation ratio (MSDR) and Fitted equations and related parameters were as
Aitchison’s distance (DA) were adopted to evaluate the follows:
(1)
Results of regression analyses showed that the 1994; Goovaerts 1997). The C0/(C0+C1) ratios of siltSLR,
sandSLR content was proportional to aquic soil, while claySLR and siltSLR residuals were smaller than 50% (Table 2),
inversely proportional to paddy and dryland; the siltSLR demonstrating that their spatial heterogeneities were
content was proportional to bulk density and the claySLR mainly caused by systematic variability, while the C0/
content was inversely proportional to aquic soil and (C0+C1) ratios for sandSLR and claySLR residuals were
brown soil. There was a significant linear relationship larger than 50% (Table 2), which demonstrated that
between siltSLR and related auxiliary variables (P<0.05) their spatial heterogeneities were mainly caused by ran-
and there was an extremely significant linear relation- dom components. The spatial correlations of sandSLR,
ship between sandSLR and claySLR and related auxiliary siltSLR and claySLR were stronger than the corresponding
variables (P<0.01), which indicated that selected auxil- residuals. The semivariogram in the east (E)-west (W)
iary variables could explain the variability of sandSLR, direction showed a spatial correlation within a range of
slitSLR and claySLR to some extent. 15.84 m, 19.30 km and 13.60 km were larger or equal
We calculated the regression values and residuals of to those in the north (N)-south (S) direction for sandSLR,
sandSLR, siltSLR and claySLR using eq. (1). Semivariogram siltSLR and claySLR, respectively, while ranges of a spatial
models were obtained using the ARCGIS GA function correlation in the east (E)-west (W) direction were al-
modules (ESRI 2010) and their models with minimal most all smaller than those of a spatial correlation in the
residual sum were selected as the best fitting models. north (N)-south (S) direction for sandSLR, siltSLR and
In order to better describe the spatial distribution of soil claySLR residual, respectively, indicating that spatial vari-
texture, anisotropy and trend parameters were obtained ability of sandSLR, siltSLR and claySLR caused by stochas-
by taking the semivariogram model and the interpola- tic factors was stronger than their corresponding
tion process into account. residuals. The C0/(C0+C1) ratio of siltSLR was the small-
Fig. 1 and Table 2 showed semivariogram models est (C0/(C0+C1)=36.36%), and the C0/(C0+C1) ratio of
and corresponding parameters of sandSLR, siltSLR, claySLR, sandSLR residual was the largest (C0/(C0+C1)=68.29%).
and their residuals. The nugget to sill ratio C0/(C0+C1), The C0/(C 0+C1) ratios of various types of data were
that shown in Table 2 was designated the degree of between 25 and 75%. Therefore, various types of data
spatial heterogeneity arising from random components had a medium spatial correlation, which is in agree-
to that the total spatial heterogeneity (Cambardella et al. ment with the decrease trend in range.
Fig. 1 Semivariogram model of different soil particles and corresponding parameters. The blue lines were a defining model that provides
the best fit through the point. We need to find a line such that the weighted squared difference between each point and the line is as small
as possible. Binned values were showed as red dots, and were generated by grouping (binning) empirical semivariogram points together using
square cells that are one lag wide. Average points are showed as blue crosses, and are generated by binning empirical semivariogram points
that fall within angular sectors. Binned points show local variation in the values, whereas average values show smooth semivariogram values
variation, sandSLR, siltSLR and claySLR represented values of sand, silt and clay transformed by SLR.
Prediction of spatial distribution of soil texture ward extension of ordinary kriging (OK) that complies
using CK method with these constraints. The CK procedure utilized in
this article is John T version which contains six but-
Compositional kriging is introduced as a straightfor- tons and five edit boxes. For details about CK see
Table 2 Semivariogram model of different soil particle composition and corresponding parameters1)
Rang (km)
Variables C0 C1 C0/(C0+C1) (%) phi (°)
Major range Minor range
SandSLR 0.040 0.048 45.45 15.84 15.84 0
SiltSLR 0.012 0.021 36.36 19.30 10.21 19.16
ClaySLR 0.041 0.047 46.59 13.60 13.60 0
SandSLR residual 0.029 0.056 34.12 19.69 18.69 43.06
SiltSLR residual 0.014 0.019 42.42 10.40 20.16 111.45
Clay SLR residual 0.047 0.026 64.38 4.67 7.00 14.06
1)
C 0 is the nugget variance; C1 is the autocorrelated variance; C0/(C0+C1) is the nugget (C0) to sill (C0+C 1) ratio (%); phi (°) is the angle of anisotropy, i.e., the angle
between the major axis of the ellipse and the North, taken in clockwise direction (range: 0-180 degrees). All fitted models used here were spherical model.
Str, number of variogram structures; Model, 1=Spherical, 2=Exponential, 3=Linear with sill, 4=Gaussian; smj1, parameter α of structure 1 in major direction; smn1,
parameter α of structure 1 in minor direction; smj, the major search radius; smn, the minor search radius; Min, the minimum number of conditioning points; Max, the
maximum number of conditioning points within the search ellipse.
accuracy of the RK-SLR and CK methods are improved. also be found in other studies (Chai et al. 2008; Zhang
Specifically, the relative improvement values of RMSE et al. 2012).
of sand, silt and clay for the CK approach reached to
46.64, 45.89 and 7.83%, while the relative improve-
CONCLUSION
ment values of RMSE of sand, silt and clay for RK-
SLR method reached to 13.06, 45.75 and 6.17% (Fig. 2).
The predicted methods utilized in this article enable the
Aitchison’s distance (D A) was computed between
interpolation results to satisfy the four requirements for
the predicted (x i) and observed z(x i) for all valida-
spatial interpolation of compositional data. By compari-
tion points xi. The scatter plots of DA among differ-
son of RMSE, RI and MSDR of various predicted
ent predicted methods were shown in Fig. 3. A one-
methods, CK was more appropriate for soil texture and
tailed paired difference t test showed that the null
its prediction accuracy and model fitting effect of compo-
hypothesis of no difference between the average DA
for the CK, RK-SLR and OK-SLR methods should
be rejected (pCK-(OK-SLR)=0.042<0.05; pRK-(OK-SLR)=0.048
<0.05); a one-tailed paired difference t test showed
that the null hypothesis of no difference between the
average DA for the CK and RK-SLR methods should
not be rejected (p (RK-SLR)-CK=0.564>0.05). It can be
concluded that predictions obtained with CK and RK-
SLR method were more accurate than those obtained
with the OK-SLR approach, while predictions ob-
tained with the CK method were no significant dif-
ference to those obtained with the RK-SLR method.
The spatial variability of soil texture is independent
process, which is certain correlated with other soil
properties. If their explanations of the auxiliary vari-
ables to various soil particle types were enough, the
prediction results would be satisfied. This result can
sitional were better. Values of RMSE, RI and MSDR of The spatial variability of soil texture is not an indepen-
sand for the CK method were 1.73, 46.64% and 6.38, dent process. The introduction of the auxiliary vari-
respectively; values of RMSE, RI and MSDR of silt for ables (especially the categorical variables) can explain
the CK method were 1.71, 45.891% and 3.55, better about spatial variability of soil texture and give
respectively; values of RMSE, RI and MSDR of clay satisfied prediction results.
for the CK method were 0.62, 7.83% and 8.88,
respectively. Scatter plots of DA showed that predic-
MATERIALS AND METHODS
tions obtained with the CK and RK-SLR methods were
more accurate than those obtained with the OK-SLR
approach, while predictions obtained with the CK method Study area
have no significant difference on those obtained with
The study was conducted in the plain area of Fangshan
RK-SLR method. The CK method is directly interpo-
District with an area of 805 km2 (39°30´-39°50´N and 115°41´-
lated with soil texture, which is an unbiased predictor 116°14´E), located in the southeast of Beijing City (Fig. 4).
that minimizes the prediction error variance and that In the study area, the topography slopes slightly from the
complies fully with the nonnegativity and constant sum southwest to the northeast with the relative elevation vary-
constraints of compositional data. Obviously, if the ing between 27 and 390 m. Orchards and arable land are
the main types of land use (Fig. 5). The soil types include
active inequality constraints were known in advance,
brown soil, cinnamon soil, and fluvo-aquic soil, of which
the solution of the compositional kriging system would cinnamon soil and fluvo-aquic soil are dominant, occupy-
be rather straightforward. Wismer and Chattergy ing 85.47% of the total study area.
(1978) provided an effecient iterative algorithm to find
these active constraints. This algorithm, known as Data collection and analysis
the method of Theil and van de Panne (1960) starts
with solving the compositional kriging system with all Soil samples were collected in August 2010. The longi-
inequality constraints removed. Its solution is opti- tudes and latitudes of each sampling site were recorded
mal if no inequality constraints are countered. using a global positioning system receiver. For a spe-
cific site, three to five soil samples were collected from
Otherwise, combinations of the violated inequality
the 0-20 cm layer within the diameter of 10 m surround-
constraints are added iteratively as equality constraints ing a specific sampling location and then mixed
to the CK system until the optimal solution is obtained. thoroughly. A total of 1.5 kg of soil per sampling site
Fig. 4 The map of the study position, sampling sites and elevation.
Fig. 5 The map of land use types and parent materials for the study area.
was taken from the mixed samples to perform chemical equation is as follows:
analysis based on the quartile method. The samples
were air-dried and ground to pass a 2-mm sieve. SOM
content was determined using the potassium dichro- (2)
mate wet combustion procedure (NSS 1995). Soil par-
ticles were measured using laser grain analyzing equip-
After performing semivariogram analysis and inter-
ment (Mastersizer 2000, Malvern Instruments Ltd., UK),
and soil texture was classified based on International polation on the SLR-transformed data, predicted results
system. Soil bulk density was calculated by dividing are backtransformed via the antisymmetric logratio
the mass of the oven-dried soil (105°C) by the core transform:
volume. We obtained soil types, types of parent material,
land use types and elevation of sampling points from (3)
maps of soil types, land use types in 2007 and DEM
using the extraction function of ArcGIS10.0 platform.
To validate the performance of the different prediction
Where ij(x) is the relative content of the j kind of soil
methods, the data was randomly split into 220 sites as a
particle on sampling site , ij(x) is the relative content
prediction set and 52 sites as a validation set (Fig. 4).
transform value of the j kind of particle on sample site i.
The constant i takes the 1/2 of the smallest percentage of
Methods the j kind of soil particle except 0 in the study area. k is
number of components.
(5)
(10)
Where k2 is the estimation variance of the kth compo- Where α * is a Lagrange multiplier. These results are
nent of z(xi), Wk is the kth column of weight W, Ck is the called the Kuhn-Tucker stationary conditions (Wismer and
n×n matrix containing the covariances between the data Chattergy 1978). The inequality constraint is said to be
points for component k, and dk is the vector of dimension active if α <0 and consequently f(x)>0 and x=0. On the
n containing the covariances between the data points and other hand, it is inactive if α=0 and f(x)=0. Hence, active
the prediction point for component k. This constrained inequality constraints can be considered as equality
optimization problem can be converted into an uncon- constraints, whereas inactive inequality constraints can
strained one by adding the unbiasedness constraint with be left out of consideration. Analogously, the Kuhn-Tucker
Lagrange multiplier k to the objective function. The ob- conditions for the compositional kriging optimization prob-
tained objective function, i.e., the Lagrangian, can be mini- lem are given by:
mized by setting its partial first derivatives with respect to
the weights and the Lagrange multiplier equal to zero
(Walvoort and de Gruijter 2001). This results in the ordi-
nary kriging system:
(11)
(6)
and Random Forest models. Geoderma, 170, 70-79. M L. 2006. Fine-resolution mapping of soil organic
Lilburne L R, Webb T H. 2002. Effect of soil variability, carbon based on multivariate secondary data.
within and between soil taxonomic units, on simulated Geoderma, 132, 471-489.
nitrate leaching under arable farming, New Zealand. SPSS Institute. 2012. SPSS Software. ver. 20. SPSS, New
Australian Journal of Soil Research, 40, 1187-1199. York, Armonk.
Martin-Fernandez J A, Barcelo-Vidal C, Pawlowsky-Glahn Sumfleth K, Duttmann D. 2008. Prediction of soil property
V. 1998. Measures of difference for compositional data distribution in paddy soil landscapes using terrain data
and hierarchical clustering methods. In: Buccianti A, and satellite information as indicators. Ecological
Nardi G, Potenza R, eds., Proceedings of IAMG’98. Indicators, 8, 485-501.
Italy. pp. 526-539. Tan M Z, Mi S X, Li K L, Chen J. 2009. Influences of different
McBratney A B, de Gruijter J J, Brus D J. 1992. Spatial interpolation methods on spatial prediction of
prediction and mapping of continuous soil classes. compositional data - A case of fuzzy membership values
Geoderma, 54, 39-64. of soil continuous classification. Soils, 41, 998-1003.
McBratney A B, Mendonca Santos M L, Minasny B. 2003. (in Chinese)
On digital soil mapping. Geoderma, 117, 3-52. Theil H, van De Panne C. 1960. Quadratic programming as
McBratney A B, Odeh I O A, Bishop T F A, Dunbar M S, an extension of classical quadratic maximization.
Shatar T M. 2000. An overview of pedometric Management Science, 7, 1-20.
techniques for use in soil survey. Geoderma, 97, 293- Walvoort D J J, de Gruijter J J. 2001. Compositional kriging:
327. A spatial interpolation method for compositional data.
Meul M, Meirvenne M V. 2003. Kriging soil texture under Mathematical Geology, 33, 951-966.
different types of nonstationarity. Geoderma, 112, 217- Wismer D A, Chattergy R. 1978. Introduction to Nonlinear
233. Optimization: A Problem Solving Approach. Elsevier
NSS (National Soil Survey Office). 1995. Chinese Soil Genus North-Holland, Amsterdam, The Netherlands. p. 395.
Records. vol. 1-6. China Agriculture Press, Beijing. (in Zhang S W, Huang Y F, Shen C Y, Ye H C, Du Y C. 2012.
Chinese). Spatial prediction of soil organic matter using terrain
Odeh I O A, McBratney A B, Chittleborough D J, 1995. indices and categorical variables as auxiliary
Further results on prediction of soil properties from information. Geoderma, 171-172, 35-43.
terrain attributes: heterotopic cokriging and regression- Zhang S W, Wang S T, Liu N, Ye H C, Huang Y F. 2011.
kriging. Geoderma, 67, 215. Comparison of spatial prediction method for soil texture.
Pang S, Li T X, Wang Y D, Yu H Y, Li X. 2009. Spatial Transactions of the Chinese Society of Agricultural
interpolation and sample size optimization for soil copper Engineering, 27, 333-339. (in Chinese)
(Cu) investigation in cropland soil at county scale using Zhao Z, Chow T L, Rees H W, Yang Q, Xing Z, Meng F R.
cokriging. Agricultural Sciences in China, 8, 1369-1377. 2009. Predict soil texture distributions using an artificial
(in Chinese) neural network model. Computers and Electronics in
Simbahan G C, Dobermann A, Goovaerts P, Ping J, Haddix Agriculture, 65, 36-48.