Anda di halaman 1dari 50

Advanced Machine learning methods for

Structural Engineering
MAHESH PAL

Department of Civil Engineering


National Institute of Technology
Kurukshetra, 136119, INDIA

Support vector machines

Relevance vector machines

Gaussian process regression

Extreme Learning machines

Tree based approaches

Training samples

Learning algorithm

Model/ function

Also called as
Hypothesis

Output values

Testing samples

Hypothesis can be considered as a machine that provides the prediction for test
data

Neural Networks
Extensively used in civil engineering since 1990

Issues
Number of hidden layers and neurons
Weight initialisation and adjustment
Size of training set
Learning rate, Momentum and number of iterations
Local minima
Learning algorithm (like back propagation)
Black box- no information

Support Vector Machines


(SVM)

Basic Theory: in 1965


Margin based classifier: in 1992
Support vector network: In 1995
Since 1998, support vector network called as
Support Vector Machines (SVM) - used as an
alternative to neural network.
ANN requires different parameters: learning
rate, momentum, number of nodes in the
hidden layer and the number of hidden
layers.

SVM: structural risk minimisation (SRM)

statistical learning theory proposed in 1960s


by Vapnik and co-workers.
SRM:
Minimise
the
probability
of
misclassifying an unknown data drawn
randomly
Neural network: Empirical risk minimisation

Minimise the
training data

misclassification

error

on

SVM
Map data from the original input feature
space to a very high dimensional feature
space.
Data becomes linearly separable but problem
becomes computationally difficult.
Kernel function allows SVM to work in
feature space, without knowing mapping and
dimensionality of feature space.

SVM kernels need to satisfy Mercers


Theorem: Any continuous, symmetric, positive
semi-definite kernel function K, there is a
feature vector function such that:
K xi x j xi x j

The linear classification in the new space is


equivalent to non-linear classification in the
original space.

For a 2-class classification problem, Training


patterns are linearly separable if:
w x i b 1 for all y = 1

w x i b 1

for all y = -1

w = orientation of discriminating plane and b,


the offset from origin.
The classification function will be:
f w ,b sign w x b

To classify the dataset


There

can be a large
discriminating planes.

number

of

SVM tries to find a plane farthest from

both classes.
Assume

two
supporting
planes,
maximise the distance (called margin)
between them.

Vapnik proposed - Support Vector Regression


(SVR) by introducing - insensitive loss function.
This loss function allows the concept of margin to
be used for regression problems.
This function is less sensitive to one bad data
point.
The purpose of the SVR is to find a function
having at most deviation from the actual target
vectors for all given training data and have to be
as flat as possible

The optimisation problem for SV


regression can be written
based on the assumption that there
exists a function that provides an
error on all training pairs which is
less than .
Above optimisation problem is based on the
assumption that there exists a function that
provides an error on all training pairs which is less
than .

Support vector regression function can be written as:

Where

are Lagrangian multipliers and

used while solving the optimisation problem.

Advantages
uses fewer number of training data (called
support vectors)
QP solution, so no local minima
Not many user-defined parameters

Shear strength prediction using


data
for
deep
beams

Mahesh Pal and Surinder Deswal. 2011, Support vector regression based shear strength modelling of
deep beams. Computers & Structures 89, no. 13 ,1430-1439.

Two kernel functions: a polynomial kernel function


and a radial basis kernel were used in present
study.
The use of SVR requires setting of user-defined
parameters such as regularisation parameter (C),
type of kernel, kernel specific parameters and errorinsensitive zone .
Variation in error-insensitive zone found to have no
effect on the predicted shear strength in present
study so a default value of 0.0010 was chosen for all
experiments

Results
SVR

ANN

Disadvantages

Choice of kernel function and kernel specific


parameters
The kernel function should satisfy the Mercers
theorem
Choice of Regularisation Parameter C

Parameter selection
Grid search and trial & error methods
commonly used approach
computationally expensive
Other approaches
Genetic algorithm
Particle swarm optimization
Their combination with grid search.

Relevance vector Machines (RVM)

Based on a probabilistic Bayesian


formulation of a linear model (Tipping,
2001).
Ability to use non-Mercer kernels
Probabilistic output
No need of parameter C

For a RVM, Tipping (2001) suggested using


the following objective function:
Where

,
,

,
is variance.

An iterative re-estimation method to obtain the


values of
.
Training data with non-zero coefficients gi will
contribute to the decision function.

Major difference from SVM

Selected points are anti-boundary (away from


decision boundary)

Support vectors represent the least


prototypical examples (closer to boundary,
difficult to classify)

Relevance vectors are the most prototypical


(more representative of class)

Disadvantages
Requires

large computation
comparison to SVM.

cost

in

Designed for 2-class problem- similar to

SVM.
Choice of kernel
May have a problem of local minima

The
database
contains
information about
Cement (kg/m3)(C), Fly ash
(kg/m3)(F),
Water/powder
(w/p),
Superplasticizer dosage (%)
(SP) Sand (kg/m3)(S), Coarse
Aggregate
(kg/m3)(CA) and fc(MPa).

Prediction of Compressive Strength of Self-Compacting Concrete using Least Square Support Vector
Machine and Relevance Vector Machine, KSCE Journal of Civil Engineering (2014) 18(6):1753-1758

Gaussian process regression

GP regression is a nonparametric regression approach

Gaussian processes are a natural generalization of the


Gaussian distribution, with mean and covariance as a
vector and matrix
The use of kernel functions relates GP regression well with
SVM and RVM
GP regression minimize following function by using the
partial derivative w.r.t.

Estimating Compressive Strength of High Performance Concrete with Gaussian Process Regression
Model , Advances in Civil Engineering, Volume 2016, Article ID 2861380, 8 pages

Extreme Learning Machines


(ELM)

A neural network classifier

Use one hidden layer only

No parameter except number of hidden nodes

Global solution (no local optima like NN)

Performance comparable to SVM and better


than back-propagation neural network

Very fast

http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-WCCI2012.pdf

Use smallest norm least squares solution in place of


gradient-descent based approach used by backpropagation NN
This solution provides:
1.minimum training error,
2.smallest norm of weights with best generalization
performance and
3.Unique solution (global minima)

HUANG, G.-B., ZHU, Q.-Y. and SIEW, C.-K., 2006, Extreme learning machine: Theory and
applications, Neurocomputing, 70, 489501.

Disadvantages
Weights are randomly assigned. Large variation in
accuracy using same number of hidden nodes with
different trials.
Difficult to replicate results

Kernlised ELM
Kernel function can be used in place of
hidden layer by modifying the optimization
problem.
Multiclass
Same Kernel
SVM/RVM

function

as

used

with

Huang, G-B. Zhou H. Ding X. and Zhang R. 2012, Extreme Learning Machine for Regression and Multiclass
Classication. IEEE Transactions on Systems, Man, and CyberneticsPart B: Cybernetics 42: 513-529.

Comparison of Several Extreme learning Machine Algorithm for Modeling Concrete Compressive Strength, Applied
Mechanics and Materials Vols. 548-549 (2014) pp 1735-1738

Random Forest Algorithm

Tree based Algorithm


A multistage or hierarchical algorithm
Break up of complex decision into a union of
several simpler decision
Use different subset of features/data at
various decision levels.

Root node

Internal
node

Terminal
node

A tree based algorithm requires


Splitting rules/tree creation [called attribute selection]
Most popular are:
a) Gain ratio criterion (Quinlan, 1993)
b) Gini Index (Breiman, et. al., 1984)
) Termination rules/ pruning rules
Most popular are:
a) Error-based pruning (Quinlan, 1993)
b) Cost-Complexity pruning (Brieman, et. al., 1984)

Random forest

An ensemble of tree based algorithm


Uses a random set of features (i.e. input
variables)
Uses a bootstrapped sample of original data
Bootstrapped sample consists of ~63% of
original data
Remaining ~37% is left out and called out of
bag data (OOB).
Multiclass and require no pruning

Prediction of Concrete Mix Strength using Random Forest Model, International Journal of Applied
engineering Research ISSN 0973-4562 Volume 11, Number 22 (2016) pp. 11024-11029

M5 model tree
M5 model tree (Quinlan, 1992) is a binary decision
tree having linear regression function at the terminal
(leaf) nodes, which can predict continuous numerical
attributes.
The splitting criterion is based on treating the
standard deviation of the class values that reach a
node as a measure of the error at that node and
calculating the expected reduction in this error as a
result of testing each attribute at that node.
To remove the problem of over-fitting, the tree is
pruned back, by replacing a sub tree with a leaf.

Behzad Abounia Omran, Qian Chen, and Ruoyu Jin, 2016, Comparison of Data Mining Techniques for Predicting
Compressive Strength of Environmentally Friendly Concrete, Journal of Computing in Civil Engineering, 30(6)

Several approaches to combine the


output of multiple regression models
have been reported in literature:
1) using different subset of training data with a single
algorithm
2) using different training parameters with a single
algorithm
3) using different algorithms with same training data.
4) Using different combinations of features with same
algorithm

Jui-Sheng Chou, Chih-Fong Tsai, Anh-Duc Pham, Yu-Hsin Lu, 2014, Machine learning in concrete strength simulations:
Multi-nation data, Analytics, CONSTRUCTION & BUILDING MATERIALS, 73, 771-780

Which Algorithm to Use?


No algorithm performs better than any other when their
performance is averaged uniformly over all possible
problems of a particular type(Wolpert and Macready, 1995)

Algorithm must be designed for a particular domain and


there is no such thing as a general purpose algorithm.

Data dependent nature .

QUESTIONS?

Anda mungkin juga menyukai