Anda di halaman 1dari 4

International Journal of Latest Trends in Engineering and Technology (IJLTET)

APPLICATION OF NEURAL NETWORKS IN PREDICTIVE DATA MINING FOR INSURANCE


Parveen Sehgal1, Sangeeta Gupta2, Dharminder Kumar3
1

Associate Professor, PPIMT, Hisar


Director, Om Institute of Technology and Management (Mgt.), Hisar
3
Chairman, Department of Computer Science & Engineering, GJUS&T, Hisar
1
parveensehgal@yahoo.com, 2sangeet_gju@yahoo.co.in, 3dr_dk_kumar_02@yahoo.com
2

Artificial Neural networks can be used for


prediction with good levels of success. They have
a natural propensity for storing experiential
knowledge and making it available for use in
prediction. They have capability to approximate
virtually any non-linear and complex function to
any desired accuracy. They scour databases for
hidden patterns, finding predictive information that
experts may miss because it lies outside their
expectations. Various types of neural networks can
be used for prediction, such as Multi Layer
Perceptron, ART, Radial Basis Function Networks
and others.

Abstract Data mining tools predict future trends and


behaviours allowing businesses to make proactive and
knowledge-driven decisions. This paper presents a study of
application of neural network based techniques of data
mining for prediction. Neural networks have high
acceptance ability for non-linear, complex and stochastic
data. Neural Networks show high accuracy in comparison
with traditional techniques and are preferred in predictive
data mining. Experiments are performed by developing
predictive models with data sets from insurance sector.
Keywords ANN, ART, Error Back Propagation,
Prediction Modeling, Predictive Data Mining, Network
Training, RBF
INTRODUCTION
Prediction or forecast is a statement about the way
things will happen in the future, often but not
always based on experience or knowledge.
Everyone solves the problem of prediction every
day with various degrees of success. For example
weather, harvest, stock market, company sales,
customer behaviour, credit risks, natural calamities
like earthquakes, medical problems, data
prefetching and a lot of other stuff needs to be
predicted.
In technical domain predictable parameters of a
system can be often expressed and evaluated using
equations of prediction variables - prediction is
then simply evaluation or solution of such
equations. Traditional techniques like regression,
rule induction, decision trees (Kotsiantis, S. B.,
2007). etc. are helpful but require a lot of initial
hypothesis. However, practically we face problems
where such a description would be too complicated
or not possible at all. In addition, the solution by
traditional methods could be very complicated
computationally, and sometimes we would get the
solution after the event to be predicted happened.
Moreover these techniques are also not fit for
solving problems that are non-linear and stochastic
in nature.

Vol. 1 Issue 1 May 2012

DATA MINING PROCESS BASED ON ANN


A. Data Preparation & Prediction Process:

Data mining process can be composed by three


main phases (Adepoju, G.A., Ogunjuyigbe, S.O.A.
& Alawode, K.O., 2007): Training data
preparation, building the prediction model and
interpretation of the results as shown in figure 1.

Fig. 1 Data Mining Process with Predictive Neural Networks


B. Training the predictive neural network:
Normally the training of the neural network model has the
following steps:
i) Create an initial network with random initial weights
assigned to the links and setting the parameters of the
network of a particular type.
ii) Run each record from the training database through the
network and use the back propagation learning
algorithm or some other method to correct the error.

ISSN: 2278-621X

International Journal of Latest Trends in Engineering and Technology (IJLTET)

iii) Check for stopping criterion (when a pre specified


minimum training error is reached).
iv) If stopping criterion is not reached, return to step ii) and
repeat with the entire training database.

learning, algebraic equations are used to calculate degree of


weight adjustments to be made.
3) RBF (Radial basis function) Networks: A radial basis
function network is an artificial neural network that uses
radial basis functions as activation functions. An RBF
network is a three-layer feedback network, in which each
hidden unit implements a radial activation function and each
output unit implements a weighted sum of hidden units
outputs (Kotsiantis, S. B., 2007). It is a linear combination of
radial basis functions. They are used in function
approximation, time series prediction, and control.
Various methods have been used to train RBF networks.
First, the centres and widths of the hidden layer are
determined by clustering algorithms. Second, the weights
connecting the hidden layer with the output layer are
determined. One approach first uses K-means clustering to
find cluster centres which are then used as the centres for the
RBF functions. However, K-means clustering is a
computationally intensive procedure, and it often does not
generate the optimal number of centres.
D. Optimization Methods for error curve convergence
1) Gradient Descent Method: It is one of the simplest and
best known methods for minimizing a general nonlinear
function, also known as steepest descent method. This is
widely used learning technique used with back propagation
method for minimization of error. With back propagation
training method of predictive neural networks, gradient
descent simply means going downhill the error surface, in
small steps, in a direction opposite to gradient of error energy,
until you reach the bottom of error surface (Sehgal Parveen,
Gupta Sangeeta & Kumar Dharminder., February 2011).
Gradient descent can be utilized to decide for the direction
and amount of corrective adjustments of network parameters
for the error minimization, for networks learning purpose.
While the method is not commonly used in practice due to its
slow convergence rate, understanding the convergence
properties of this method can lead to a better understanding
of many of the more sophisticated optimization methods
(Meza Juan C., 2010).
2) Conjugate Gradient Descent Method: Conjugate Gradient
(CG) method is an iterative linear solver which is used by
many scientific and engineering applications to solve a linear
system of algebraic equations (Ismail Leila, 2011).
Compared to gradient descent, the conjugate gradient
algorithm takes a more direct path to the optimal set of
weight values. Usually, conjugate gradient is significantly
faster and more robust than gradient descent. Conjugate
gradient also does not require the user to specify learning
rate and momentum parameters. The traditional conjugate
gradient algorithm uses the gradient to compute a search
direction. It then uses a line search algorithm such as Brents
Method to find the optimal step size along a line in the
search direction. The line search avoids the need to compute
the Hessian matrix of second derivatives, but it requires
computing the error at multiple points along the line. The
conjugate gradient algorithm with line search (CGL) has
been used successfully in many neural network programs,
and is considered one of the best methods yet invented.
3) Scaled Conjugate Gradient Descent Method: The scaled
conjugate gradient algorithm uses a numerical approximation
for the second derivatives (Hessian matrix). This allows
scaled conjugate gradient to compute the optimal step size in

ANN FOR PREDICTION & THEIR TRAINING


METHODS
C. ANNs for Prediction
1) Multilayer Perceptron Network: MLP networks are
general-purpose, flexible, nonlinear models consisting of a
number of units organized into multiple layers (Sehgal
Parveen, Gupta Sangeeta & Kumar Dharminder., February
2011). The complexity of the MLP network can be changed
by varying the number of layers and the number of units in
each layer. Given enough hidden units and enough data, it
has been shown that MLPs can approximate virtually any
function to any desired accuracy.

Fig. 2 Multilayer Perceptron Architecture


MLP utilizes a supervised learning technique called back
propagation for training the network along with gradient
descent algorithm for optimization of error energy to adjust
the weights toward convergence using the gradient.
In MLP networks, error back propagation provides a way to
train networks with any number of hidden units arranged in
any number of layers. Predicted output of the system can
every time be compared with the desired output and the error
can be propagated back in the network and tune the network
parameters for reduction of error in next prediction. The
learning process of BP algorithm is just the self-adaptation
and self-organization process of weight coefficient in
between neurons of network. The goal of the training process
is to find the set of weight values that will cause the output
from the neural network to match the actual target values as
closely as possible. This way the network can be trained till
limits of desired accuracy.
2) ART (Adaptive resonance theory): ART is a theory
developed by Stephen Grossberg and Gail Carpenter on
aspects of how the brain processes information. It describes a
number of neural network models which use supervised and
unsupervised learning methods, and address problems such
as pattern recognition and prediction.
There are two basic methods of training ART-based neural
networks: slow and fast. In the slow learning method, the
degree of training of the recognition neurons weights
towards the input vector is calculated to continuous values
with differential equations and is thus dependent on the
length of time the input vector is presented. With fast

Vol. 1 Issue 1 May 2012

ISSN: 2278-621X

International Journal of Latest Trends in Engineering and Technology (IJLTET)

the search direction without having to perform the


computationally expensive line search used by the traditional
conjugate gradient algorithm. Of course, there is a cost
involved in estimating the second derivatives. This algorithm
is widely applied in the field of neutral networks and can be
used with any type of objective function (Zhuoyu Wang,
Linghong Zhou & Lu Chen Chaomin, 2008). Speed of the
SCG algorithm can speed up the standard SCG algorithm by
at least a factor of 2.
4) OWO-HWO Method: In the learning process of this
algorithm, first, calculate learning factor by using conjugate
gradient theory, then only modify the weights of input layer
to hidden layer, final, use the output of hidden layer to
establish linear equations and solve them to obtain the
weights of Output layer. This algorithm can get faster
learning speed than the previous algorithms (Yongming, Cai
Xun & Li Ming, September 2011).
5) Simulated annealing: Optimization methods such as
steepest descent and conjugate gradient are highly
susceptible to finding local minima, if they begin the search
in a valley near a local minimum. Simulated annealing (SA)
is a generic method for the global optimization problem of
locating a good approximation to the global optimum of a
given function in a large search space. They have no ability
to see the big picture and find the global minimum. Several
methods have been tried to avoid local minima. The simplest
is just to try a number of random starting points and use the
one with the best value. A more sophisticated technique
called simulated annealing improves on this by trying widely
separated random values and then gradually reducing
(cooling) the random jumps in the hope that the location is
getting closer to the global minimum. The name and
inspiration come from annealing in metallurgy, a technique
involving heating and controlled cooling of a material to
increase the size of its crystals and reduce their defects
(Wikipedia, 2012).

predictive models for the experimentation purpose.


Data sets from insurance sector are used for testing
the predictive neural network models based on the
back propagation technique with gradient descent
method for error minimization. A choice between
two policy types K01 and KFP was approximated
for the age factor of policy holders. Results
obtained for the created models are as shown in
table 1-3.
The model was tested with a variety of changed
parameters like learning rate, total number of
hidden layers, and number of neurons in the
hidden layers, stopping criteria, activation function
etc. and two best instances for the observed results
are captured in the following tables.
The advantage of the usage of neural networks for
prediction is that they are able to learn from
examples only and that after their learning is
finished, they are able to catch hidden and strongly
non-linear dependencies, even when there is a
significant noise and complexity in the training set.
Neural networks are applicable in virtually every
situation in which a relationship between the
predictor variables (independents, inputs) and
predicted variables (dependents, outputs) exists,
even when that relationship is very complex and
not easy to articulate in the usual terms of
"correlations" or "differences between groups.
Table 1 : Classification For Categories Of The Predicted
Variable
Model-1
Predicted

SCOPE FOR FURTHER RESEARCH

Model-2
Predicted

Percent
Percent
K01 KFP
K01 KFP
Correct
Correct
Sample Observed
28763 33301 46.3% 28763 33301 46.3%
Training K01
5142 59016 92.0% 5244 58914 91.8%
KFP
Overall
26.9% 73.1% 69.5% 26.9% 73.1% 69.5%
Percent
8257 9370 46.8% 8257 9370 46.8%
Testing K01
1385 16894 92.4% 1409 16870 92.3%
KFP
Overall
26.9% 73.1% 70.0% 26.9% 73.1% 70.0%
Percent
4133 4739 46.6% 4133 4739 46.6%
Holdout K01
757 8295 91.6%
774 8278 91.4%
KFP
Overall
27.3% 72.7% 69.3% 27.4% 72.6% 69.2%
Percent
Dependent Variable: CLIENT_TYPE

The study can be further extended to a comparison


between different types of prediction networks or
can also be compared with other non neural
traditional techniques like regression, rule
induction, decision tree etc. New algorithms and
software tools may be developed for the faster
convergence rate or finding of global minima
solutions or more accurate prediction. Different
types of prediction models (neural or traditional)
can be evaluated by changing various parameters
(Shafil Imran, Ahmad Jamil & Kashif Faisal M., Table I: Network Information
2006), applicable on the model and the results can
be justified in terms increased accuracy of
prediction and reduced training time.
ESTABLISHMENT OF PREDICTIVE MODEL
We have used Neural Network Module of the data
mining software SPSS from IBM to implement
Vol. 1 Issue 1 May 2012

ISSN: 2278-621X

International Journal of Latest Trends in Engineering and Technology (IJLTET)


Model-1

Model-2

AGE_YEARS

AGE_YEARS

Number of Units*

Rescaling Method
for Covariates

Standardized

Standardized

Number of Hidden
Layers

Number of Units in
Hidden Layer 1*

Number of Units in
Hidden Layer 2*

NA

Covariates
Input
Layer

Hidden
Layer(s)

Activation Function Hyperbolic tangent


Dependent
Variables
Output
Layer

Number of Units
Activation Function

Error Function
* Excluding the bias unit

As observed from experimental outcomes, neural


networks seem to be very suitable for solving the
complex problems of data mining because of their
high accuracy in predictive modeling, high speed
and parallel processing nature, but, with some
significant limitations like training time, clarity,
dimensionality and accurate deployment.
REFERENCES
1.

Hyperbolic tangent

CLIENT_TYPE

CLIENT_TYPE

Softmax

Hyperbolic tangent

Sum of Squares

Sum of Squares

2.

3.

Table II: Model Summary


Training Sum of Squares Errors
Percent Incorrect
Predictions
Stopping Rule Used

Model-2

61639.446

22928.437

30.5%

30.5%

2 consecutive step(s)
with no decrease in
error

3 consecutive step(s)
with no decrease in
error

00:00:01.537

00:00:01.662

17375.934

6453.068

30.0%

30.0%

30.7%

30.8%

Training Time
Testing

Sum of Squares Errors


Percent Incorrect
Predictions
Holdout Percent Incorrect
Predictions

I.

Model-1

4.

5.

STRENGTHS AND WEAKNESSES

6.

The disadvantage is that NNs can learn the


dependency valid in a certain period only. The
error of prediction cannot be generally estimated.
They do, however, have some significant 7.
limitations in terms of training time, clarity,
dimensionality and incorrect deployment because
of lack of expert implementation. Knowledge
models obtained under these paradigms are usually
considered to be black-box mechanisms, able to 8.
attain very good accuracy rates but very difficult
for people to understand.
9.

CONCLUSIONS
In this paper we have discussed the usefulness of neural net
based techniques and network training methods for creating
the prediction models. This study demonstrated the use of
multilayer perceptron network to build a prediction model.
Due to design problems neural systems need further research
before they are widely accepted in industry. Neural networks
are becoming very popular for predictive data mining
because they have proven their predictive power in
comparison with other statistical techniques using real data
sets.

Vol. 1 Issue 1 May 2012

Kotsiantis, S. B.(2007). Supervised Machine Learning:


A Review of Classification Techniques. Informatica,
University of Peloponnese, Greece, Vol. 31, 249-267.
Adepoju, G.A., Ogunjuyigbe, S.O.A. Alawode, K.O.
(May 2007 Spring ). Application of Neural Network to
Load Forecasting in Nigerian Electrical Power System,
The Pacific Journal of Science and Technology, Volume
8. Number 1. 68-72.
Sehgal Parveen, Gupta Sangeeta, Kumar Dharminder.
(February 2011). A Study of Prediction Modeling Using
Multilayer Perceptron (MLP) With Error Back
Propagation. Proceedings of AICTE sponsored National
Conference on Knowledge Discovery & Network
Security:, ISBN: 978-81-8465-755-5, 17-20.
Meza Juan C. (December 2010). Steepest Descent. Wiley
Interdisciplinary Reviews: Computational Statistics,
Volume 2, Issue 6, 719722.
Ismail Leila(March 2011), Communication Issues in
Parallel Conjugate Gradient Method using a Star-Based
Network. IEEE International Conference on Computer
Applications and Industrial Electronics (ICCAIE), Print
ISBN: 978-1-4244-9054-7. 350-355, Kuala Lumpur.
Zhuoyu Wang, Linghong Zhou, Lu Chen Chaomin
(2008) .Application of Optimization with Scaled
Conjugate Gradient Algorithm in IMRT. IEEE, The 2nd
International Conference on, Shanghai, Print ISBN:
978-1-4244-1747-6. 751-754. Guangzhou, China.
Li Yongming, Cai Xun, Li Ming (September 2011), A
New Neural Network Algorithm Based on Conjugate
Gradient and Output Weight Optimization, IEEE,
Seventh International Conference on Natural
Computation (ICNC), Print ISBN: 978-1-4244-9950-2,
ISSN: 2157-9555.38 42.
Wikipedia, The Free Encyclopedia. Retrieved February
13, 2112: From http://en.wikipedia.org/wiki/Simulated_
annealing
[9] Shafil Imran, Ahmad Jamil, Kashif Faisal M.
(2006), Impact of Varying Neurons and Hidden Layers
in Neural Network Architecture for a Time Frequency
Application, IEEE Multitopic Conference, INMIC '06.,
Islamabad , E-ISBN: 1-4244-0795-8, Print ISBN: 14244-0795-8. 188-193.

ISSN: 2278-621X

Anda mungkin juga menyukai