SVR Concrete

Advanced Machine learning methods for
Structural Engineering
MAHESH PAL
Department of Civil Engineering

National Institute of Technology
Kurukshetra, 136119, INDIA
Support vector machines
Relevance vector machines
Gaussian process regression
Extreme Learning machines
Tree based approaches
Training samples
Learning algorithm
Model/ function
Also called as
Hypothesis
Output values
Testing samples
Hypothesis can be considered as a machine that provides the prediction for test
data
Neural Networks
Extensively used in civil engineering since 1990
Issues
Number of hidden layers and neurons
Weight initialisation and adjustment
Size of training set
Learning rate, Momentum and number of iterations
Local minima
Learning algorithm (like back propagation)
Black box- no information
Support Vector Machines

(SVM)
Basic Theory: in 1965

Margin based classifier: in 1992
Support vector network: In 1995
Since 1998, support vector network called as
Support Vector Machines (SVM) - used as an
alternative to neural network.
ANN requires different parameters: learning
rate, momentum, number of nodes in the
hidden layer and the number of hidden
layers.
SVM: structural risk minimisation (SRM)
statistical learning theory proposed in 1960s

by Vapnik and co-workers.
SRM:
Minimise
the
probability
of
misclassifying an unknown data drawn
randomly
Neural network: Empirical risk minimisation
Minimise the
training data
misclassification
error
on
SVM
Map data from the original input feature
space to a very high dimensional feature
space.
Data becomes linearly separable but problem
becomes computationally difficult.
Kernel function allows SVM to work in
feature space, without knowing mapping and
dimensionality of feature space.
SVM kernels need to satisfy Mercers

Theorem: Any continuous, symmetric, positive
semi-definite kernel function K, there is a
feature vector function such that:
K xi x j xi x j
The linear classification in the new space is

equivalent to non-linear classification in the
original space.
For a 2-class classification problem, Training

patterns are linearly separable if:
w x i b 1 for all y = 1
w x i b 1
for all y = -1
w = orientation of discriminating plane and b,

the offset from origin.
The classification function will be:
f w ,b sign w x b
To classify the dataset

There
can be a large
discriminating planes.
number
of
SVM tries to find a plane farthest from
both classes.
Assume
two
supporting
planes,
maximise the distance (called margin)
between them.
Vapnik proposed - Support Vector Regression

(SVR) by introducing - insensitive loss function.
This loss function allows the concept of margin to
be used for regression problems.
This function is less sensitive to one bad data
point.
The purpose of the SVR is to find a function
having at most deviation from the actual target
vectors for all given training data and have to be
as flat as possible
The optimisation problem for SV

regression can be written
based on the assumption that there
exists a function that provides an
error on all training pairs which is
less than .
Above optimisation problem is based on the
assumption that there exists a function that
provides an error on all training pairs which is less
than .
Support vector regression function can be written as:
Where
are Lagrangian multipliers and
used while solving the optimisation problem.
Advantages
uses fewer number of training data (called
support vectors)
QP solution, so no local minima
Not many user-defined parameters
Shear strength prediction using

data
for
deep
beams
Mahesh Pal and Surinder Deswal. 2011, Support vector regression based shear strength modelling of
deep beams. Computers & Structures 89, no. 13 ,1430-1439.
Two kernel functions: a polynomial kernel function

and a radial basis kernel were used in present
study.
The use of SVR requires setting of user-defined
parameters such as regularisation parameter (C),
type of kernel, kernel specific parameters and errorinsensitive zone .
Variation in error-insensitive zone found to have no
effect on the predicted shear strength in present
study so a default value of 0.0010 was chosen for all
experiments
Results
SVR
ANN
Disadvantages
Choice of kernel function and kernel specific

parameters
The kernel function should satisfy the Mercers
theorem
Choice of Regularisation Parameter C
Parameter selection
Grid search and trial & error methods
commonly used approach
computationally expensive
Other approaches
Genetic algorithm
Particle swarm optimization
Their combination with grid search.
Relevance vector Machines (RVM)
Based on a probabilistic Bayesian

formulation of a linear model (Tipping,
2001).
Ability to use non-Mercer kernels
Probabilistic output
No need of parameter C
For a RVM, Tipping (2001) suggested using

the following objective function:
Where
,
,
,
is variance.
An iterative re-estimation method to obtain the

values of
.
Training data with non-zero coefficients gi will
contribute to the decision function.
Major difference from SVM
Selected points are anti-boundary (away from

decision boundary)
Support vectors represent the least

prototypical examples (closer to boundary,
difficult to classify)
Relevance vectors are the most prototypical

(more representative of class)
Disadvantages
Requires
large computation
comparison to SVM.
cost
in
Designed for 2-class problem- similar to
SVM.
Choice of kernel
May have a problem of local minima
The
database
contains
information about
Cement (kg/m3)(C), Fly ash
(kg/m3)(F),
Water/powder
(w/p),
Superplasticizer dosage (%)
(SP) Sand (kg/m3)(S), Coarse
Aggregate
(kg/m3)(CA) and fc(MPa).
Prediction of Compressive Strength of Self-Compacting Concrete using Least Square Support Vector
Machine and Relevance Vector Machine, KSCE Journal of Civil Engineering (2014) 18(6):1753-1758
Gaussian process regression
GP regression is a nonparametric regression approach
Gaussian processes are a natural generalization of the

Gaussian distribution, with mean and covariance as a
vector and matrix
The use of kernel functions relates GP regression well with
SVM and RVM
GP regression minimize following function by using the
partial derivative w.r.t.
Estimating Compressive Strength of High Performance Concrete with Gaussian Process Regression
Model , Advances in Civil Engineering, Volume 2016, Article ID 2861380, 8 pages
Extreme Learning Machines

(ELM)
A neural network classifier
Use one hidden layer only
No parameter except number of hidden nodes
Global solution (no local optima like NN)
Performance comparable to SVM and better

than back-propagation neural network
Very fast
http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-WCCI2012.pdf
Use smallest norm least squares solution in place of

gradient-descent based approach used by backpropagation NN
This solution provides:
1.minimum training error,
2.smallest norm of weights with best generalization
performance and
3.Unique solution (global minima)
HUANG, G.-B., ZHU, Q.-Y. and SIEW, C.-K., 2006, Extreme learning machine: Theory and
applications, Neurocomputing, 70, 489501.
Disadvantages
Weights are randomly assigned. Large variation in
accuracy using same number of hidden nodes with
different trials.
Difficult to replicate results
Kernlised ELM
Kernel function can be used in place of
hidden layer by modifying the optimization
problem.
Multiclass
Same Kernel
SVM/RVM
function
as
used
with
Huang, G-B. Zhou H. Ding X. and Zhang R. 2012, Extreme Learning Machine for Regression and Multiclass
Classication. IEEE Transactions on Systems, Man, and CyberneticsPart B: Cybernetics 42: 513-529.
Comparison of Several Extreme learning Machine Algorithm for Modeling Concrete Compressive Strength, Applied
Mechanics and Materials Vols. 548-549 (2014) pp 1735-1738
Random Forest Algorithm
Tree based Algorithm

A multistage or hierarchical algorithm
Break up of complex decision into a union of
several simpler decision
Use different subset of features/data at
various decision levels.
Root node
Internal
node
Terminal
node
A tree based algorithm requires

Splitting rules/tree creation [called attribute selection]
Most popular are:
a) Gain ratio criterion (Quinlan, 1993)
b) Gini Index (Breiman, et. al., 1984)
) Termination rules/ pruning rules
Most popular are:
a) Error-based pruning (Quinlan, 1993)
b) Cost-Complexity pruning (Brieman, et. al., 1984)
Random forest
An ensemble of tree based algorithm

Uses a random set of features (i.e. input
variables)
Uses a bootstrapped sample of original data
Bootstrapped sample consists of ~63% of
original data
Remaining ~37% is left out and called out of
bag data (OOB).
Multiclass and require no pruning
Prediction of Concrete Mix Strength using Random Forest Model, International Journal of Applied
engineering Research ISSN 0973-4562 Volume 11, Number 22 (2016) pp. 11024-11029
M5 model tree
M5 model tree (Quinlan, 1992) is a binary decision
tree having linear regression function at the terminal
(leaf) nodes, which can predict continuous numerical
attributes.
The splitting criterion is based on treating the
standard deviation of the class values that reach a
node as a measure of the error at that node and
calculating the expected reduction in this error as a
result of testing each attribute at that node.
To remove the problem of over-fitting, the tree is
pruned back, by replacing a sub tree with a leaf.
Behzad Abounia Omran, Qian Chen, and Ruoyu Jin, 2016, Comparison of Data Mining Techniques for Predicting
Compressive Strength of Environmentally Friendly Concrete, Journal of Computing in Civil Engineering, 30(6)
Several approaches to combine the

output of multiple regression models
have been reported in literature:
1) using different subset of training data with a single
algorithm
2) using different training parameters with a single
algorithm
3) using different algorithms with same training data.
4) Using different combinations of features with same
algorithm
Jui-Sheng Chou, Chih-Fong Tsai, Anh-Duc Pham, Yu-Hsin Lu, 2014, Machine learning in concrete strength simulations:
Multi-nation data, Analytics, CONSTRUCTION & BUILDING MATERIALS, 73, 771-780
Which Algorithm to Use?

No algorithm performs better than any other when their
performance is averaged uniformly over all possible
problems of a particular type(Wolpert and Macready, 1995)
Algorithm must be designed for a particular domain and

there is no such thing as a general purpose algorithm.
Data dependent nature .
QUESTIONS?

SVR Concrete

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

SVR Concrete

Diunggah oleh

Hak Cipta:

Format Tersedia

Advanced Machine learning methods for

Department of Civil Engineering

Support vector machines

Relevance vector machines

Gaussian process regression

Extreme Learning machines

Tree based approaches

Support Vector Machines

Basic Theory: in 1965

SVM: structural risk minimisation (SRM)

statistical learning theory proposed in 1960s

SVM kernels need to satisfy Mercers

The linear classification in the new space is

For a 2-class classification problem, Training

w = orientation of discriminating plane and b,

To classify the dataset

SVM tries to find a plane farthest from

Vapnik proposed - Support Vector Regression

The optimisation problem for SV

Support vector regression function can be written as:

are Lagrangian multipliers and

used while solving the optimisation problem.

Shear strength prediction using

Two kernel functions: a polynomial kernel function

Choice of kernel function and kernel specific

Relevance vector Machines (RVM)

Based on a probabilistic Bayesian

For a RVM, Tipping (2001) suggested using

An iterative re-estimation method to obtain the

Major difference from SVM

Selected points are anti-boundary (away from

Support vectors represent the least

Relevance vectors are the most prototypical

Designed for 2-class problem- similar to

Gaussian process regression

GP regression is a nonparametric regression approach

Gaussian processes are a natural generalization of the

Extreme Learning Machines

A neural network classifier

Use one hidden layer only

No parameter except number of hidden nodes

Global solution (no local optima like NN)

Performance comparable to SVM and better

Use smallest norm least squares solution in place of

Random Forest Algorithm

Tree based Algorithm

A tree based algorithm requires

An ensemble of tree based algorithm

Several approaches to combine the

Which Algorithm to Use?

Algorithm must be designed for a particular domain and

Data dependent nature .

Anda mungkin juga menyukai