Suction (kPa)
SW
SW1
SW2
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
SW
SW3
SW4
Suction (Kpa)
0.01 0.1 1 10 100 1000
1E-3 0.01 0.1 1 10 100 1000 10000
1E-3 0.01 0.1 1 10 100 1000 10000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
SW
SW5
SW6
Suction (kPa)
(a)
(b)
(c)
Fig. 2 a Details of SW, SW1 and SW2. b Details of SW, SW3
and SW4. c Details of SW, SW5, SW6
Geotech Geol Eng
1 3
that the experts dealing with the unsaturated soil need
to be more cautious in using the appropriate SWCCfor
the seepage modeling for obtaining the realistic
results.
2.2 Data Preparation for Training MGGP Model
168 sets of data samples are generated from FEM as
discussed. Five input variables (AEV (x
1
), RWC (x
2
),
h
s
(x
3
), slope (x
4
), depth (x
5
)) and output variables
(PWP (y)) is discussed. First 120 samples are chosen
as a set of training data with the remaining as a set of
test samples. Training samples include the measure-
ment values of PWP at RWC values of 0.04, 0.05 and
0.06, whereas, the testing samples include the
measurement values of PWP at the RWC of 0.07
and 0.09. The test data samples are used for testing the
extrapolation ability of the MGGP model while only
the training is used for formulating the model.
3 Multi-gene Genetic Programming
To have an idea about the working of the evolutionary
approach, MGGP, rstly the GP is discussed. Based on
the collected experimental data, GP evolves the
models. These models are generated automatically
without any pre-denition of the structure of the model
(Koza 1996). Mechanism of GP is same as that of
genetic algorithms (GAs). Only difference between
them is that the latter evolves solutions represented by
Table 1 Details of SWCC variations used in this study
SWCC h
s
AEV (kPa) RWC Slope (linear) % variation w.r.t to. SW
h
s
AEV RWC Slope (Linear)
SW 0.35 0.695 0.05 0.020 0 0 0 0
SW1 0.40 0.695 0.06 0.023 14 0 0 15
SW2 0.45 0.695 0.07 0.026 29 0 0 30
SW3 0.35 1.110 0.06 0.012 0 60 0 -35
SW4 0.35 1.530 0.06 0.009 0 120 0 -55
SW5 0.45 0.903 0.07 0.020 29 30 0 0
SW6 0.55 1.112 0.09 0.019 57 60 0 -5
0 1 2 3 4 5 6 7 8
-80
-60
-40
-20
0
20
40
60
80
SW
SW1
SW2
P
W
P
(
k
P
a
)
Depth (m)
Fig. 3 Variation of PWP with depth for SW1 and SW2
0 1 2 3 4 5 6 7 8
-80
-60
-40
-20
0
20
40
60
80
SW
SW3
SW4
P
W
P
(
k
P
a
)
Depth (m)
Fig. 4 Variation of PWP with depth for SW3 and SW4
Geotech Geol Eng
1 3
strings of xed length in real or binary form, whereas,
the former evolves models represented by tree struc-
tures of different sizes (Garg et al. 2014a, b).
In GP, the models are generated by combining the
elements randomly fromthe user-dened functional and
terminal set. Ramped half-and-half algorithmis applied
to generate the models of uniform shape and size. The
elements, specically, basic arithmetic operations
(?, -, 9, /, etc.), occupy the functional set F. The input
variables and range of constants considered in the study
denes the terminal set T. Number of models generated
is represented bya population size. One such example of
model formed is shown in Fig. 6. After the initialization
of the models, their performance is evaluated based on
the user-dened tness function. The tness function
commonly used is root means square error (RMSE)
given by
RMSE
P
N
I1
jG
i
A
i
j
z
N
s
100 1
Where G
i
is the valued predicted of ith data sample by
the MGGP model, A
i
is the actual value of the ith data
sample and N is the number of training samples.
Given the tness values of the models, models are
ranked and selected for the genetic operations such as
crossover, mutation and reproduction to form a new
population. In crossover operation, a branch of tree is
randomly selected from both the parents and swapped
between them. In mutation operation, a random node
from the tree is selected and replaced by the branch/or
the whole new generated random tree. The process of
producing new population/generation continues as
long as the termination criterion is not met. Termina-
tion criterion is set by the user and is the maximum
number of generations and the threshold error of the
model, whichever is achieved earlier.
In MGGP algorithm, each model in the evolution-
ary stage is formed from the combination of set of
genes/GP trees. There are numerous applications of
MGGP algorithm in eld of engineering and nance
(Garg and Tai 2011, 2012a, b, 2013a, b, c; Garg et al.
2013a, b, c, d, e). The step-by-step procedure of
MGGP algorithm is as follows
BEGIN
Step 1: Dene problem
Step 2: MGGP algorithm
Begin
2.a Dene parameters such as population size,
generations, terminal set, functional set, maximum
number of genes, depth, etc.
2.b Generate initial population of genes
2.c Combine genes using least square method to
form MGGP models
2.d Evaluate performance of models based on
tness function, namely, RMSE
2.e Apply genetic operations and form the new
population
2.f Cross-check the models performance against the
termination criterion, and if not satised, GOTO
Step 2.e
End;
END;
0 1 2 3 4 5 6 7 8
-80
-60
-40
-20
0
20
40
60
80
SW
SW5
SW6
P
W
P
(
k
P
a
)
Depth (m)
Fig. 5 Variation of PWP with depth for SW5 and SW6
tanh
3
+
x
+
4
y
F
u
n
c
t
i
o
n
s
Terminals
Fig. 6 GP model: 3tanh (x) ? (4 ? y)
Geotech Geol Eng
1 3
3.1 Implementation of MGGP
The evolutionary search in GP for a generalised model
is highly inuenced by its parameters settings. There
are few important parameters that need to be set
properly for the evolution of model of desired
generalisation ability. In the present work, trial-and-
error route is adopted to select the parameter settings.
The parameter settings selected are shown in Table 2.
The function set F consists of few non-linear mathe-
matical functions and arithmetic operators. The func-
tion set chosen comprise of many elements since this
can assist in evolutionary search of broader variety of
nonlinear mathematical models. The parameters:
population size and number of generations represent
the number of models and number of new population
formed from genetic operations respectively. The
population size and number of generations fairly
depends on the complexity of the data. Based on
literature review by Garg and Tai (2012b), the
population size and number of generations should
not be high in-case of large number of data samples to
avoid the problem of over-tting. The parameters that
inuence the size of search space and number of
models to be searched in space is the maximum
number of genes and maximum depth of the gene.
Based on trial-and-error approach and recommenda-
tions by Garg and Tai (2012b), the maximum number
of genes and maximum depth of gene is kept at 6 and 6
respectively.
MGGP method for the prediction of pore water
pressure of soil is implemented in MATLAB
R2010b usingsoftware GPTIPS(Searson et al. 2010).
This software is a new Genetic Programming and
Symbolic Regression code written based on Multigene
GP (Hinchliffe et al. 1996) for use with MATLAB.
MGGP method is applied to the data set obtained from
FEManalysis inSect. 2. The best model is selectedbased
on minimum RMSE on training data from all runs. The
performance of the best MGGP model (see Eq. 3) on
training and testing data is discussed in Sect. 4.
MGGP 1357:0099 0737:0697
sintanhsquaretanhx5 973:9708
tanhexptanhx5 2:8049
exptanhx3 squareexptanhx5
exptanhx3
x2 x5 598:4632
ppower(ppower(x2,tanh(x5)),x5
0:0001063 ppower(x5,(cos(x5))
x5))) + (48:5523 (sin(square(exp(tanh(x5)))
2
4 Results and Discussion
The results obtained from the MGGP model is shown
in Figs. 7, 8, 9 on training and testing data respec-
tively. Square of the correlation coefcient (R
2
) and
relative error (%) between the predicted values and the
actual values of the PWP estimated are given by
R
2
P
n
i1
A
i
A
t
M
i
M
t
P
n
i1
A
i
A
t
2
P
n
i1
M
i
M
t
2
q
0
B
@
1
C
A
2
3
Relative error%
M
i
A
i
j j
A
i
100 4
where M
i
and A
i
are predicted and actual values
respectively, M
i
and A
i
are the average values of
predicted and actual respectively, and n is the number
of training samples.
Figures 7 and 8 show the performance of the
MGGP model on the training and testing data in terms
of statistical values of R
2
. The graph shown in Fig. 7
indicates that the MGGP model have impressively
well learned the non-linear relationship between the
input and output process variables with high R
2
values.
The result of the testing phase shown in Fig. 8
indicates that the MGGP model has shown very good
generalisation ability.
Table 2 Parameter settings for MGGP
Parameters Values assigned
Runs 15
Population size 400
Number of generations 100
Tournament size 2
Max depth of tree 6
Max genes 7
Functional set (F) Multiply, plus, minus, plog,
tan, tanh, sin, cos
Terminal set (T) (x
1
, x
2
, [- 10 10])
Crossover probability rate 0.85
Reproduction probability rate 0.10
Mutation probability rate 0.05
Geotech Geol Eng
1 3
The box plot of relative error (%) for the MGGP
model on the training and testing data is shown in
Fig. 9. The box plot shown in Fig. 9 indicates that the
MGGP model have lower mean relative error of 2.18
and 3.22 % on training and testing data respectively,
which explains that it is able to capture the relationship
between process variables reasonably well.
5 Conclusion and Future Work
The present work highlights the importance and need
of estimating the relationship between PWP and
SWCC components of the soil. The study conducts
FEM analysis for analysing the behaviour of PWP in
respect to various parameters of SWCC. Further, the
novel MGGP method is proposed to estimate the PWP
of the soil based on the given set of input parameters.
The performance of the MGGP model is compared
against the data obtained from the FEM. The results
discussed in Sect. 4 conclude that the performance of
the MGGP model is well in agreement with the FEM
generated data. The high generalization ability of the
MGGP model is benecial for geotechnical experts,
who are currently looking for high delity models that
predict the soil behaviour under uncertain input
process conditions, and therefore the additional cost
of measuring input parameters (SWCC, AEV, RWC,
slope and hs) can be avoided.
The MGGP method provides model that represents
explicit mathematical relationship (see Eq. 2) between
the input parameters and PWP, and, thus can be used
ofine to extrapolate the PWP. Future work to be done
include the introduction of newcomplexity measure of
the MGGP model that can gives more compact and
accurate models.
References
Biddle PG (1998) Tree root damage to buildings. Volume 1:
causes, diagnosis and remedy. Volume 2: patterns of soil
drying in proximity to trees on clay soils. Willowmead
Publishing Ltd, Wantage
Blight GE (2005) Desiccation of a clay by grass, bushes and
trees. Geotech Geol Eng 23(6):697720
Garg A, Tai K (2011) A hybrid genetic programming-articial
neural network approach for modeling of vibratory
Fig. 7 Statistical t of the MGGP model on training data
Fig. 8 Statistical t of the MGGP model on testing data
Fig. 9 Box plot showing the error distribution on training and
testing data
Geotech Geol Eng
1 3
nishing process. In: International proceedings of com-
puter science and information technology (ICIIC
2011-International Conference on Information and Intel-
ligent Computing), vol 18, pp 1419
Garg A, Tai K (2012a) Comparison of regression analysis,
Articial Neural Network and genetic programming in
Handling the multicollinearity problem. In: Proceedings of
2012 international conference on modelling, identication
and control (ICMIC2012), Wuhan, China, 2426 June
2012. IEEE
Garg A, Tai K (2012b) Review of genetic programming in
modeling of machining processes. In: Proceedings of 2012
international conference on modelling, identication and
control (ICMIC2012), Wuhan, China, 2426 June 2012.
IEEE
Garg A, Tai K (2013a) Comparison of statistical and machine
learning methods in modelling of data with multicolline-
arity. Int J Model Identif Control 18(4):295312
Garg A, Tai K (2013b) Modelling of FDM process using genetic
programming with classiers for model selection. In: Pro-
ceedings of 43rd international conference on computers and
industrial engineering (CIE 43rd), Hong Kong, pp.123-1-10
Garg A, Tai K(2013c) Selection of a robust experimental design
for the effective modeling of the nonlinear systems using
genetic programming. In: Proceedings of 2013 IEEE
symposium series on computational intelligence and data
mining (CIDM), Singapore, 1619 April 2013, pp 293298
Garg A, Bhalerao Y, Tai K (2013a) Review of empirical mod-
eling techniques for modeling of turning process. Int J
Model Identif Control 20:121129
Garg A, Rachmawati L, Tai K (2013b) Classication-driven
model selection approach of genetic programming in
modelling of turning process. Int J Adv Manuf Tech
69:11371151
Garg A, Garg A, Tai K (2013c) A multi-gene genetic pro-
gramming model for estimating stress dependent soil water
retention curves, Computational Geosciences (in press)
doi:10.1007/s10596-013-9381-z
Garg A, Sriram S, Tai K (2013d) Empirical analysis of model
selection criteria for genetic programming in modeling of
time series system. In: Proceedings of 2013 IEEe confer-
ence on computational intelligence for nancial engineer-
ing and economics (CIFEr), Singapore, 1619 April 2013,
pp 8488
Garg A, Tai K, Lee CH, Savalani MM (2013e) A hybrid M5-
genetic programming approach for ensuring greater trust-
worthiness of prediction ability in modelling of fdm process.
J Intell Manuf (in press). doi:10.1007/s10845-013-0734-1
Garg A, Savalani MM, Tai K (2014a) State-of-the-art in
empirical modelling of rapid prototyping processes. Rapid
Prototyp J 20(2):164178
Garg A, Vijayaraghavan V, Mahapatra SS, Tai K, Wong CH
(2014b) Performance evaluation of microbial fuel cell by
articial intelligencemethods. Expert Syst Appl 41:13891399
Geo-Slope, Seep/W (2007) Users guide. Geo-Slope Interna-
tional Limited, Calgary, Alberta
Hinchliffe M, Hiden H, Mckay B, Willis M, Tham M, Barton G
(1996) Modelling chemical process systems using a multi-
gene genetic programming algorithm. Late Breaking
Papers at the Genetic Programming, pp 2831
Koza JR (1996) On the programming of computers by means of
natural selection. Mit Press, USA
Malaya C, Sreedeep S (2010) A study on the inuence of
measurement procedures on suction-water content rela-
tionship of a sandy soil. Geotech Test J ASTM 38(6):1
Richards LA (1931) Capillary conduction of liquids through
porous mediums. Physics 1:318333
Searson DP, Leahy DE, Willis MJ (2010) Gptips: an open source
genetic programming toolbox for multigene symbolic
regression. Int Multiconf Eng Comp Sci 2010:7780
Shah PH, Sreedeep S, Singh DN (2006) Evaluation of meth-
odologies used for establishing soil-water characteristic
curve. J ASTM Int 3:111
Sreedeep S (2006) Modeling contaminant transport in unsatu-
rated soils. Doctoral dissertation, Ph. D. thesis submitted to
the Department of Civil Engineering, Indian Institute of
Technology Bombay, Mumbai, India)
Sreedeep S, Singh DN (2010) A critical review of the method-
ologies employed for soil suction measurement. Int J
Geomech ASCE. Special Issue: Environmental Geotech-
nology: Contemporary Issues, 99104
Sreedeep S, Singh DN (2011) Critical review of the methodol-
ogies employed for soil suction measurement. Int J Geo-
mech ASCE 11(2):99104
Van Genuchten MT (1980) A closed-form equation for pre-
dicting the hydraulic conductivity of unsaturated soils. Soil
Sci Soc of Am J 44(5):892898
Yildiz AR (2009a) A novel hybrid immune algorithm for global
optimization in design and manufacturing. Robot Comp
Integr Manuf 25(2):261270
Yildiz AR (2009b) An effective hybrid immune-hill climbing
optimization approach for solving design and manufac-
turing optimization problems in industry. J Mater Process
Technol 209(6):27732780
Yildiz AR (2012a) A comparative study of population-based
optimization algorithms for turning operations. Inf Sci
210:8188
Yildiz AR (2012b) Comparison of evolutionary based optimi-
zation algorithms for structural design optimization.
Engineering applications of articial intelligence
26(1):327333
Yildiz AR (2013a) A new hybrid differential evolution algo-
rithm for the selection of optimal machining parameters in
milling operations. Appl Soft Comput 13(3):15611566
Yildiz AR (2013b) A new hybrid articial bee colony algorithm
for robust optimal design and manufacturing. Appl Soft
Comput 13(5):29062912
Yildiz AR (2013c) Optimization of cutting parameters in multi-
pass turning using articial bee colony-based approach. Inf
Sci 220:399407
Yildiz AR (2013d) Hybrid Taguchi-differential evolution
algorithm for optimization of multi-pass turning opera-
tions. Appl Soft Comput 13(3):14331439
Yildiz AR (2013e) Cuckoo search algorithm for the selection of
optimal machining parameters in milling operations. Int J
Adv Manuf Technol 64(14):5561
Geotech Geol Eng
1 3