Anda di halaman 1dari 10

_____Data Mining Techniques for Crime Pattern Recognition and

Early Warning System___

Data Mining Techniques for Crime Pattern


Recognition and architecting Early Warning
System.
Saurav Jana, 08BM8010 MBA
Vijay Kumar Das, 08BM8003 MBA

Abstract

This paper tells about the detectives to speed up the


Data mining Techniques process of solving crimes.
that can be used to detect In this paper we will take an
the Crime pattern, region- approach which is a
wise and behavior or nature- combination between
wise. It tries to find out a computer science and
framework for an Early criminal justice to study a
Warning System and data mining technique that
proposes an Architecture for can help solve crimes faster.
the design of the system. We will use clustering based
models to help in
Introduction identification of crime
pattern on the basis of their
Data mining and data characteristics. We will then
analysis techniques are use this valuable data to
powerful tools in today’s make an Early Warning
world which are helpful in system for possible threats
contributing for the and attacks by terrorists and
intelligence and countering offenders in future.
terrorism by law-
enforcement officers. By the Survey of Literature
increasing use of the First of all, most of us are
computerized and confused with the term
automated systems to track ‘Data mining’. Data mining
crimes, computer data as mentioned in Wikipedia-
analysts have been helping [1] is the process of
the Law enforcement extracting patterns from
officers, sleuths and data. Data mining is
becoming an increasingly becomes Z = (z1, z2, z3),
important tool to transform where z1 = (x1 + y1)/2 and
this data into information z2 = (x2 + y2)/2 and z3 =
and give a meaningful idea (x3 + y3)/2.
to the data by analysis.
What data mining does is:
“discover useful, previously A warning system is any
unknown knowledge by system of biological or
analyzing large and technical nature deployed
complex” datasets.[2] by an individual or group to
inform of a future danger. Its
Clustering or Cluster purpose is to enable the
analysis is the grouping of deployer of the warning
a set of observations into system to prepare for the
groups called clusters, so danger and act accordingly
that observations in the to mitigate against or avoid
same cluster are similar in it.
some sense. Clustering is a
method of unsupervised Early warning can be
learning and a common defined as information which
technique for statistical data a national government or
analysis used in many fields, international or regional
including machine learning, organization receive in
data mining, pattern advance in order to be able
recognition and bio- to react timely and
informatics. effectively towards a crisis
situations. An early warning
Here we will discuss about system[4] in context of
the k-means[3] algorithm for crime reporting will need to
clustering. It assigns each address the following
point to the cluster whose questions:
center also called the
centroid, is nearest. The 1) to identify “who is doing
center is the average of all what”
the points in the cluster. By 2) to estimate “who can do
this we mean, its what” in order to
coordinates are the
3) predict “what can really
arithmetic mean for each
happen tomorrow”.
dimension separately over
all the points in the cluster.
Example: The data set has Crime Reporting Systems
three dimensions and the and Databases:
cluster has two points: X = Most police departments use
(x1, x2, x3) and Y = (y1, y2, some electronic systems for
y3). Then the centroid Z crime reporting that have
replaced the traditional Hence clustering algorithms
time-consuming paper- in data mining are
based crime reports. These equivalent to identifying
crimes reports have the groups of records that have
following kinds of similarities between
information categories themselves but have
namely - type of crime, differences from the rest of
date, time, location, the groups. Thus we need to
weaponry etc. There is work on the variances of the
information about the data between the groups
suspect (identified or and within the groups. In our
unidentified), victim and the case some of these clusters
witness. Also, there is the or groups will be useful for
narrative information or identifying a crime spree
description of the crime and committed by one or same
Modus Operandi (MO) that is group of suspects.
usually in the text form. The Thus, provided with this
police officers or detectives information, the next
use free text to record most challenge is to find the
of their observations which variables providing the best
are particular to the cases clustering. The variables can
and which cannot be be Crime type, Suspect race,
included in checkbox kind of age, sex, Crime weapon,
pre-determined questions. motives etc. Without a
While storing the first two suspected crime pattern, the
categories of information detective will be less likely
are usually done in the to build the complete
computer databases as picture from bits of
numeric, character or date information from different
fields of table, the last crime incidents.
category is often stored as
free text. Today most of this process
of connecting the data-
points is manually done with
Data mining using the help of multiple
Clustering: Use and spreadsheet reports that the
Implementation detectives usually get from
In crime terminology a the computer data analysts
cluster is a group of crimes and their own crime logs.
in a geo-spatial region or a Simple example:
hot spot of crime. In data
CRIM SUSPEC VICTIM WEAPO SITE*
mining terminology a cluster E T AGE AGE N
is group of similar data TYPE*

points –that is a possible TA 25 37 Bomb 1


crime pattern.
TA 34 56 Bomb 1 TA 28 13 Grenad 2
e
MA 25 34 Mine 3
MA 35 65 Bomb 3

*Let,
TA- Terrorist attack, labeled
as 1.
MA- Maoist attack, labeled
as 2.
Sites are Jammu=1, From the above figure it is
Delhi=2, Jharkhand=3. clear that the first cluster
contains the Site 1 and 2
both of which comes under
Output of statistical tools terrorist attack type and the
like SPSS for Two-means second cluster contains the
cluster for variables Crime Site 3, which is of the Maoist
type and Site is given: attack crime type.

The figure can be plotted as:

Figure: Pie chart showing


Cluster size
The figure shows that we
obtained two clusters, the
cluster1 consists three
cases, and the cluster2
consists 2 cases.

Figure: Map of the Crime sites


by plotting the cluster analysis
data.

Thus this picture gives


valuable information of the
crime pattern on the basis of
the Crime sites and the type understanding and mapping
of the crime committed. This the threat; followed by
type of technique is very prevention of the threat. The
helpful for the crime third phase is about
department to know the monitoring the risk in order
type of the crime to detect possible early
committed, by segregating warnings, including
the vast source of forecasting the near future
information and pulling out events. Dissemination of the
the threads which has the comprehensible warnings to
necessary meaning and political authorities and the
information. inhabitants are disclosed
later. Within the fifth phase
of the system appropriate
Early Warning System: and timely actions in
Architecture response to the warnings
are carried out.
Typically EWAS consist of
five phases[5] which are all In this paper we will discuss
associated to each other. the architecture of a EWAS
The first step is risk [5] that can be used to warn
assessment which involves in case of a threat or attack.

interest to the clients. Such


Figure: System architecture services which analyze and
read thousands of news
Media monitoring service items from heterogeneous
provides clients with sources and provide
documentation, analysis, or consolidated channels to
copies of media content of
access the news have providing structured
become available. information about entities of
The architecture of EWAS interest by mining news
consists of three main documents provided from
components (as shown in different sources. Europe
Figure): Media Monitor (EMM) RSS
feed is only one of such
1. Input (Data gathering sources for the news
sources) consolidation system.
a. News consolidation
system
b) By using Web analytics
b. Data extraction by Spiner one can use “Spiner”, which
[5] was proposed in a recent
c. Data gathered manually research paper [6]. Spiner
from fields can be a useful system to
practice in a dark web
2. Processing
analysis scenario. Efficient
a. Semantic entity and wrappers were designed for
relationship different banned terrorist
management system groups disguised websites,
terrorism databases, and
b. Social network analysis government information
(SNA) tools sites. The system’s web
c. Framework for social robot crawls through these
scientists interaction like sites and extract valuable
SOMA portal [7] information data provided
by the system’s web robots.
d. Rule based engine
From the analysis in the
3. Output system paper, it is evident that
a. Warning generation Spiner is capable of handling
system the dynamicity and chaotic
characteristic of today’s
b. Dissemination system World Wide Web by using
structural data mining
enhanced with social
Component description:
network analysis tools and
1. Input (Data gathering techniques.
sources) c) Data gathering is done
a) News consolidation manually.
system: The news
consolidation system can be 2. Processing
something which can help a) Semantic entity and
the data analyst by relationship management
system:
The most important things networks of actors, estimate
to early warning include their influence and
actors (like individuals, dependency among them.
groups, organizations, and
countries) and resources. c) Framework for interface
The knowledge about the for social scientists: A type
things are interpreted by of GUI based platform is
attributes and character of required to provide the
the actors and is social scientists and
represented as profiles. investigators with a tool,
Finally, the situations, where they will be able to
conditions and context in create their own rule based
which the events take place theories, manage, keep
is important in that it may updating them, and will be
create constraints on the helpful in testing, analyzing,
actors in their choice of and prototyping them.
actions or available We will assume to use
opportunities that would Stochastic Opponent
otherwise not have been Modeling Agents (SOMA)
possible. Terrorist Organizations
Portal (STOP) [7] for this
There are numerous ways purpose.
available to refer to an actor
of an event, which makes
the identification of the e) Rule-based engine:
entities and their A rule-based engine is used,
relationships a real problem. in which a set of rules are
In addition to this, in pre-decided so when any
terrorism, actors commonly news of items or data
have name aliases. Hence arrives then the best rule for
we need a mechanism to it is identified. The engine
identify the entities and also has the provisions for
then use the News base to the mechanisms and
measure the relations schemes for rating, grading
among these actors. and success rate of rules
b) Social network analysis that will reside in rule based
tool: engine.
After data gathering, we
need to provide social f) Warning generation
scientists with some data engine:
mining and social network The Warning generation in
analysis tools which can be such a system would be a
used to detect main players function of rule or a number
and master-minds in of rules. The task is
accomplished by analyzing investigative processes
the rules of the engine i.e., further.
the warnings can be
generated by sequencing of
rules flagged true by the
rule based engine. References:
[1]Wikipedia.org
3. Output (Dissemination
system) [2] David Jensen, “Data
a) A dissemination system is Mining in Networks,”
needed to notify the presentation at CSIS Data
warnings to subscriber Mining Roundtable,
devices. The notification Washington, D.C., July 23,
system should be able to 2003.
perform two tasks- of
managing subscription
information and handling [3] Yi Lu, Shiyong Lu,
the subscriber devices. A Farshad Fotouhi, Youping
simple and realistic way of Deng, and Susan Brown,
doing so would be that we "FGKA: A Fast Genetic K-
have to take care of at least means Algorithm", in Proc.
two devices: an email of the 19th ACM Symposium
receiver (i.e an internet on Applied Computing.
enabled machine) and SMS [4] Nasrullah Memon, Uffe
and Call receiver (a landline Kock Wiil, “Design and
or mobile phone). Development of an Early
Warning System to Prevent
Terrorist Attacks”
[5] Best, C., van der Goot, E.,
Conclusion
Blackler, K., Garcia, T., Horby,
There is ample amount of D.:Europe Media Monitor.
scope in data mining to use Technical Report EUR 22173
valuable data and to extract EN, European Commission
information from it, by (2005)
further studies in this area
as is done in this paper. .
[6] Memon Nasrullah; Qureshi
EWAS is designed to be an Abdul Rasool; Hicks David;
early warning system on Harkiolakis Nicholas:
which more advanced data Extracting Information from
mining research can be Semi-Structured Documents,
carried out. There can be In Y. Ishikawa et al. (Eds.):
more work on Web analytics APWeb 2008 Workshops,
and structural data mining LNCS 4977, pp. 54–64, 2008
techniques to enhance the
[7] Aaron Mannes, Mary
Michael, Amy Pate, Amy Sliva,
V.S. Subrahmanian, Jonathan
Wilkenfeld. "Stochastic
Opponent Modeling Agents: A
Case Study with Hezbollah,"
First International Workshop
on Social Computing,
Behavioral Modeling, and
Prediction, April 2008

Anda mungkin juga menyukai