impact on privacy of personal data. threats and observe how data mining
This paper presents a research in and web data mining in our
progress study that investigates the terminology encompasses data mining
need for an expanded role of ethics in as it deals with data mining on the web
data mining. [14], Countering as well as mining structured and
Terrorism: Integration of Practice and unstructured data.
Theory. [15], Article throws light on
how fast growing FBI data mining Data Mining for Handling
systems billed as a tool for hunting Threats
terrorists is being used in hacker and Data used for mining purposes for
domestic criminal investigations, and
now contains tens of thousands of handling of threats are grouped in
records from private corporate different ways. An example of that is
databases, including car-rental information related and non-
companies, large hotel chains and at information related groups of data.
least one national department store, Another way of grouping is real- time
declassified documents obtained by and non real-time threats. These
Wired.com show.[16], this is a report to
groupings are somewhat arbitrary in
the National Commission on Terrorist
Attacks upon the United States nature, e.g. a non real-time threat
explaining the FBI’s counterterrorism becomes a real-time threat when a
program.[17], Bhavani Thuraisingham suspected terrorist decides to attack at
in her book introduces how data a certain date.
mining has become a useful tool for
detecting and preventing terrorism, Non Real-time Threats:
explaining technical challenges for Non real-time threats are threats
data mining, various types of terrorist that do not have any time
threats how these techniques can constraints. Data might be collected
provide solution to counter terrorism. over months, analyzed and then
come at a conclusion may not
occur. For data mining to work
Data mining applications effectively, many examples and
in Counter terrorism patterns are needed. Patterns and
In this we will discuss a high level historical data are used to make
overview of how web data mining as predictions. The prime requisite is
well as data mining could help towards good data to carry out data mining
counter terrorism. Web data mining and obtain useful results. Examples
goes beyond just mining structured of barriers here are incomplete data
data. We will throw some light on and unwillingness of organizations
mining unstructured data, mining for to share data. Hence mining tools
business intelligence, web usage have to make a lot of assumptions
mining and web structured mining as a regarding incomplete and
web data mining. This states that data unavailable data. An alternative is
mining could contribute towards to carry out federated data mining
counter-terrorism, by extracting hidden under some federated
patterns and trends from large administrator.
quantities of data is very important for The next step is to decide what
detecting and preventing terrorist data needs to be collected. Mostly
attacks. We will be examining both the data regarding various people like
non real time threats and real time where they come from, what they
A Study on Evolution of Data Mining Techniques Post 9/11
are doing, who are their relatives, threats. Both hypothetical data as
etc. are gathered and then groups well as simulated data are needed
are formed of individuals having to be used. As many possible
similar patterns. Individuals with similar examples should be
criminal records are kept under high gathered from counter-terrorism
vigilance. specialists. Once the examples are
Once the data is collected, the data gathered and training of the neural
is formatted and organized. Data networks and other data mining
may be structured or unstructured tools are initiated, the next task is
data. Also, there might be data that deciding what sort of models are to
may not be of much use. Therefore, be built. To handle real-time
the data is segmented in terms of threats, dynamically changing
critical data and non-critical data. models are needed. This is the
Once the outcomes are determined, biggest challenge faced.
the mining tools are used to start Real time data mining is a
the mining process. controversial topic as many people
After that comes the most complex opine that it is an impossible task.
part. The usefulness of the mining Hence the challenge is to redefine
results are to be decided. Chances data mining and figure out ways to
of getting a false positive or a false handle real-time threats.
negative is pretty high and either of Analyzing data emanating from
the results could be disastrous. At sensors is a common source of
present human specialists are gathering data e.g. surveillance
needed to work with the mining
cameras placed in various places
tools. If the tool states that a certain
person is a terrorist, the specialist such as shopping centres and in
will have to do some more checking front of embassies and other public
before arresting or detaining. places. The data emanating from
A non real-time threat could these sensors have to be analyzed
become a real-time threat. The in real-time to detect/prevent
challenge will then be to find exactly attacks. Hence arises the issues
what the attack will be? Then, data
that raise the questions of privacy
mining tools that can continue with
the reasoning as new information and civil liberties. But the real
comes in, are needed, i.e., as new dilemma is what really the
information comes in, the alternatives are? Should privacy be
warehouse needs to get updated sacrificed to protect the lives of
and the mining tools should be millions of people? Policy makers
dynamic and take the new data and and lawyers need to work together
information into consideration in the
to come up with viable solutions.
mining process.
Real-time Threats: Analyzing the Techniques:
In the case of real-time threats The goal of data mining is to
there are time constraints. That is, analyze data and make predictions
such threats may occur within a and trends. It includes examining
certain time and therefore various data mining outcomes and
immediate response is required. discussing how they could be
There are several types of data applied for counter-terrorism. The
mining techniques for real-time outcomes of these analyses arrived
at by making associations, link
A Study on Evolution of Data Mining Techniques Post 9/11
analysts at headquarters with access system of record for, all the FBI
to more information in far less time electronic files.
than with other FBI investigative
systems. SCOPE data base even if Analytical Tools
gave opportunity to test new To make the most out of the IDW data
capabilities in a controlled stored, advanced analytical tools were
environment; this has now been planned to be used. These tools allow
replaced by IDW. FBI agents and analysts to look across
multiple cases and data sources
Investigative Data Warehouse indentifying relationships and other
The IDW, delivered in its first phase in pieces of information that initially
January 2004, now provides analysts weren’t readily available using older
with full access to investigative FBI systems. These tools will make
information within FBI files, including databases searches simple and
ACS and VGTOF data, open source effective, give analysts new
news feeds, and the files of other visualization, geomapping, link-
federal agencies such as DHS. charting and reporting capabilities and
Without needing to know the physical allow analysts to request automatic
location or format of the data IDW updates to their query results
allows users to access and provides whenever new, relevant data is
physical storage for that data. The downloaded into the database. Please
data in the IDW is at the secret level, refer illustrations from 1 to 3, which
and the addition of TS/SCI level data is give fictional examples that illustrate
in the planning stages. how some of these tools can assist
They have planned to enhance the drawing connections between discrete
IDW by adding additional data sources pieces of information.
like Suspicious Activity Reports, and
by making it easier to search. With this FBI IDW Systems
the agents and analysts using new In August 2006, the Electronic Frontier
analytical tools will be able to search Foundation (EFF) sought government
rapidly for pictures of known terrorists records concerning the FBI IDW
and match or compare the pictures pursuant to the Freedom of
with other individuals in minutes rather Information Act (FOIA), EFF filed a
than days. This will help in identifying lawsuit o October 17, 2006. The
relationships across cases. The major following data is based upon the
advantage of this deployment is that it records provided by 2009, along with
will take seconds to search up to 100 public information about the IDW and
million pages of international terrorism- the datasets included in the data
related documents. warehouse.
FBI worked with Science Applications platform (IDW-D) and a subsystem for
International Corporation (SAIC), maintenance and testing (IDW-I).
Convera and Chilliad for developing
the project. By March 2006, the IDW IDW Secret
had 53 data sources and over half a This system is the main subsystem
billion. By September 2008, the IDW of the IDW authorized to process
had grown to nearly one billion. classified national security data up
to, and including, information
IDW System Architecture designated Secret. However,
According to FBI project description, neither Top Secret data nor any
IDW system environment consists of a Sensitive Compartmented
collection of UNIX and NT servers Information (SCI) is authorized to
providing secure access to cohort of be processed by this system. The
very large-scale storage devices. The IDW Top Secret/ Sensitive
servers provide application, web Compartmented Information level
servers, relational database servers, datamart, appears to be in the
and security filtering servers. IDW web planning stage. This system is the
application can be accessed through successor of the Secure Counter-
FBINet by the user desktop units, Terrorism/collaboration Operation
providing browser based access to the Prototype Environment.
central database and their access
control units. The entire configuration IDW-Special Project Team
is designed to be scalable to enable A special project was started to
expansion as more data sources and augment the existing IDW system
capabilities are added. with new capabilities for use by FBI
and non-FBI agents on the JTTFs
A DOJ Inspector General report (Joint Terrorism Task Force) in
explained: "Data processing is November 2003 by
conducted by a combination of Counterterrorism Division, along
Commercial-Off-the-Shelf (COTS) with the Terrorist Financing
applications, interpreted scripts, and Operations Section (TFOS). The
open-source software applications. FBI office of Intelligence is the
Data storage is provided by several executive sponsor of the IDW. The
Oracle Relational Database IDW Special Projects Team was
Management Systems (DBMS) and in originally initiated for the 2004
proprietary data formats. Physical Threat Task Force. By May 2006,
storage is contained in Network the” Special Project Team provided
Attached Storage (NAS) devices and services to 5 task forces or
component hard disks. Ethernet operations.”
switches provide connectivity between
components and to FBI LAN/WAN. An As Described by the FBI, “The
integrated firewall appliance in the Special Projects Team (SPT)
switch provides network filtering." Subsystem allows for the rapid
import of new specialized data
IDW Subsystems sources. These data sources are
According to the IDW Concept of not made available to the general
Operations, the IDW has two main IDW users but instead are provided
subsystems, the IDW Secret (IDW-S) to a small group of users who have
and IDW-Special Project Tean(IDW- a demonstrated "need-to-know".
SPT). It also consist of a development The SPT System is similar in
A Study on Evolution of Data Mining Techniques Post 9/11
11. Robb S Todd, FBI's New Data 14. February 28, 2002, Countering
Warehouse A Powerhouse, 2006 Terrorism: Integration of
http://www.cbsnews.com/stories/20 Practice and Theory, An
06/08/30/terror/main1949643.shtml Invitational Conference FBI
Academy, Quantico, Virginia
12. Report on the Investigative Data 15. Ryan Singel, Newly Declassified
Warehouse, 2009 Files Detail Massive FBI Data-
http://www.eff.org/issues/foia/inves Mining Project, 2009
tigative-data-warehouse-report http://www.wired.com/threatlevel/2
009/09/fbi-nsac/
13. James Lawler, A Study of Data
Mining and Information Ethics in
Information Systems Curricula. 16. A Report to the National
Commission on Terrorist
Attacks upon the United States,
The FBI’s Counterterrorism
Program, 2001
Illustrations
Illustration 1
A Study on Evolution of Data Mining Techniques Post 9/11
Illustration 2
Illustration 3