HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 94
Abstract— Along with the great increase in credit card transactions, credit card fraud has become increasingly rampant in
recent years. In Modern day the fraud is one of the major causes of great financial losses, not only for merchants, individual
clients are also affected. In this paper, clustering and outlier detection techniques are used to find the fraudulent activities. In
the first phase, clustering is used to partition the data. In the second phase two different outlier detection algorithms are used in
the partitions separately for finding the outliers. Finally, the outliers are combined and fraudulent cases are found.
—————————— ——————————
1 INTRODUCTION
The use of credit cards is prevalent in modern day socie- Grid based Outlier detection was also projected [5]. There is a
ty. Credit card becomes the most popular mode of pay- lot of research in outlier detection. Many outlier detection al-
ment. Detecting credit card fraud is a difficult task when gorithms such as base on statistics [6] and distance [7, 8] are
using normal procedures, so the development of the cre- gain good application.
dit card fraud detection model has become of signific- In this paper, the method of detection outliers is used for
ance, whether in the academic or business community set up a detection model, which could mine fraud transactions
recently. as outliers.
Detecting the fraud means identifying suspicious frau-
dulent cases. In this paper, clustering and outlier mining is 3 FRAUD DETECTION MODEL
used to detect the fraudulent cases. Clustering is used to
group the similar data objects into clusters. Outliers are
defined as data that appears to be inconsistent with the rest
of the data. Mining of outliers is an important research
field in the application of fraud detection. Detection of out-
liers has recently gained a lot of application in many do-
mains. In this model, outlier detection algorithm is em-
ployed to find fraudulent transactions.
2 RELATED WORK
From the work of view for preventing credit card fraud,
more research works were carried out with special em-
phasis on data mining. Kim and Kim have identified
skewed distribution of data and mix of Legitimate and
fraudulent transactions as the two main reasons for the
complexity of credit card fraud detection [1]. Sam and
Karl suggest a credit card fraud detection model using
Bayesian networks and neural network techniques to
learn models of fraudulent credit card transactions
[2].L.MuKhanov finds a credit card fraud detection model
using Bayesian Belief Networks [3].
Some Clustering-based outlier detection techniques are al-
so proposed to find the Fraud detection [4].
An Outline of the Fraud detection model is shown in Fig.1. majority of the data. The degree to which each outlier
First step in this method is Data Preprocessing. According to deviates from the remainder of the data indicates the
every attribute of the transaction sample, the model does severity of the abnormal activity denoted by that outlier.
some data preprocessing to convert all of them to numerical It is based on the mean and standard deviation of the data
attribute. The second and Third step shows the detection me- observed. In this method, two parameters are used, an up-
thods. The Final step concludes the fraudulent cases. per bound, Nu, on the number of potential outliers and the
probability, α, of incorrectly declaring one or more outliers
when no outliers exist. Nu <= 1/ 2 (N - 1), where N is the
4 DETECTION METHODS
number of samples. This method works well for the densi-
ty region.
4.1 Clustering
For sparse regions, the Q-test outlier detection method is
Clustering helps in grouping the data into similar clusters that used. First, the set of data are arranged in ascending order.
helps in uncomplicated retrieval of data [9].Cluster analysis is Then the experimental Q-value is calculated. It is the ratio
a technique for breaking data down into related components which shows the variation of the suspect value from its
in such a way that patterns and order becomes visible nearest one divided by the range of the values. The Q value
[10].Clustering techniques are known as “unsupervised learn- is compared with the critical Q- value (Q_crit).Critical Q-
ing” because there is no class to be predicted. The main goal of value is defined with confidence level. If Q>Qcrit then the
clustering data is to find common patterns or to group similar suspect value can be consider as outlier. This method per-
cases in the data. In this paper, an efficient cluster based parti- forms well with small sample sizes.
tioning algorithm is used. This divides the data in specified All the detected outliers from the regions are com-
number of partitions which shows the partitions of dense re- bined. They are ranked based on the severity of the outliers
gions and sparse regions. The K-means algorithm is applied to and concluded as fraudulent cases.
cluster the data, which find out the sparse region and dense
region. K-means clustering is a method of cluster analysis
which aims to partition n observations into k clusters in which 5 DISCUSSIONS
each observation belongs to the cluster with the nearest mean. In this detection Model, the unsupervised approach is used.
The cluster mean of Ki={ti1,ti2,….,tim} is defined as The unknown frauds are easily found by using this approach.
The models based on supervised approach must have the
m
1 labeled data for both normal data and anomolies.It is only
m i
m
j1
t ij able to detect frauds of a type which has previously occurred.
In contrast, unsupervised methods don’t make use of labeled
records. It detects the changes in behavior or unusual Transac-
The partitioned data region is shown in Fig. 2. tions. Unsupervised learning is a feasible method to learn the
large and more complex model.
In this process, the Clustering and Outlier Detection
methods are worn to find the Outliers. By applying the
algorithms in partitions separately will reduce the num-
ber of nearest neighbor searches and number of reachabil-
ity distance computation. This model mine fraud transac-
tions as outliers.