Anda di halaman 1dari 10

Journal of Computer Information Systems

ISSN: 0887-4417 (Print) 2380-2057 (Online) Journal homepage: http://www.tandfonline.com/loi/ucis20

Analyzing Relationships in Terrorism Big Data


Using Hadoop and Statistics

Kenneth David Strang & Zhaohao Sun

To cite this article: Kenneth David Strang & Zhaohao Sun (2017) Analyzing Relationships in
Terrorism Big Data Using Hadoop and Statistics, Journal of Computer Information Systems, 57:1,
67-75, DOI: 10.1080/08874417.2016.1181497

To link to this article: http://dx.doi.org/10.1080/08874417.2016.1181497

Published online: 21 Jul 2016.

Submit your article to this journal

Article views: 101

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=ucis20

Download by: [University of Denver - Main Library] Date: 30 March 2017, At: 15:01
JOURNAL OF COMPUTER INFORMATION SYSTEMS
2017, VOL. 57, NO. 1, 6775
http://dx.doi.org/10.1080/08874417.2016.1181497

Analyzing Relationships in Terrorism Big Data Using Hadoop and Statistics


Kenneth David Stranga and Zhaohao Sunb
a
State University of New York, Queensbury, NY, USA; bPapua New Guinea University of Technology, Lae, Papua New Guinea

ABSTRACT KEYWORDS
We used big data software Hadoop in Google News to collect complex high-velocity, high-volume Hadoop; big data analytics;
terrorism information. We used big text search to code the factors of interest into nominal fields. We big text analytics; global
terrorist ideology
integrated new fields and records into an existing database drawn from other researchers. Our testable
hypothesis was that there was a significant relationship between terrorist group ideology and terrorist
attack type. Then we used correspondence analysis in SPSS to test our hypothesis. Our hypothesis was
supported, so we developed a symmetric model to visualize the hidden relationships between terrorist
ideology and attack type. Our purpose was to demonstrate how statistical software methods may be
applied in big data analytics. These methods will generalize to other researchers and practitioners. The
finding of a significant relationship between terrorist ideology and attack type may generalize to supply
chain operations and national security planning.

Introduction could use Hadoop for big data collection, and the results
could be integrated with text analytics and statistical software.
Big data analytics is an important topic to study from the busi-
This gives rise to our second proposition, that we can perform
ness practice standpoint because according to Gartner Group,
statistical analysis of terrorism big data.
organizations spent over $14 billion USD on those processes in
Global terrorism presents a risk to business operations [16].
2013 and this is increasing at an annual rate of 8% [1]. There is a
A quote from Garden (2001) about the 9/11 terrorists reminds
lack of documentation on how to apply big data analytics in
us about the seriousness and unpredictability of this complex
business operations since most studies are conceptual in nature
risk: if men armed only with knives can cause such mass
or they focus on commercial technology [25].
destruction, the world must review its assessment of threats
The academic community needs more literature for
[17]. Global terrorism data are complex, emerging rapidly and
designing studies and applying methods to analyze big data
there is no known limit to the activity volume [1821].
[612]. Big data analytics is so worthy of academic research
A few studies have been published about global terrorist
that the United States National Science Foundation [13] has a
behavior [2124]. However, predictive anti-terrorism analy-
dedicated program with US$100 million funding within the
tics are difficult to design and expensive to administer [20, 25,
Information and Intelligent Systems category. One of the
26]. Most studies focused on descriptive statistics of attack
three core practice groups is Information Integration and
types and religion [21, 24] but we do not understand how
Informatics, which focuses on developing techniques to ana-
ideology is related to terrorist behavior. Thus, our second
lyze data of unprecedented scale, complexity, and rate of
research proposition is that we could make a contribution to
acquisition, as well as issues of heterogeneity and complexity
the literature by integrating big data analytics with statistical
with innovative approaches and deep insights [13].
software to better understand the relationship between global
The authors have experience with big data software,
terrorist ideology and attack type.
namely Hadoop [14] as well as with the related applications
of Ambari, Avro, Cassandra, Chukwa, Flume, Hbase, Hive,
Mahout, Pig, Soir, Spark, Sqoop, Tez, YARN, and ZooKeeper Literature review
[15]. We find that none of those related applications provides
Big data analytics theoretical background
statistical analysis techniques without considerable program-
ming. Mahout is capable of machine learning through data The term and practice of big data analytics has garnered a lot
mining but it requires considerable programming once the of attention in the peer reviewed literature but most of the
data is collected. Thus, not only are there a lack of design studies were conceptual in nature or they focused on non-
guidelines for complex, high-velocity, large-volume big data operational topics [1, 2]. Big data generally refers to the
research, but there are also few examples of how to apply collection and analysis of large volumes of primary or sec-
statistical techniques. Our first research proposition is that we ondary data [1, 7]. More recently the meaning of big data has

CONTACT Kenneth David Strang kenneth.strang@gmail.com State University of New York, School of Business and Economics, 640 Bay Road, Regional
Higher Education Building, Queensbury, NY 12804, USA.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/ucis.
2017 International Association for Computer Information Systems
68 K. D. STRANG AND Z. SUN

been extended to encompass information management and The techniques for big data analytics encompass a wide
information processing of increasing volumes of complex, range of mathematical, statistical, and modeling techniques
multiple data types [3, 4]. [1]. Unsupervised learning techniques generally include
Data set size, complexity, and velocity are the most dis- descriptive statistics (e.g., means, standard deviations, med-
tinctive features differentiating big data analytics from other ians, or frequencies), distribution shape approximation (e.g.,
methods including general analytics [7]. Complexity alone is normal or something else), cluster analysis (k-means, hier-
not a distinctive feature of big data either since even small archical, Bayesian), and dimension reduction (e.g., principal
files may contain complicated relationships, such as recursive component or factor analysis) [29]. Supervised learning tech-
hierarchies within and between qualitative data types identi- niques often include regression (simple, multivariate, logistic,
fied through cluster analysis [27]. The benefits of big data probit, ordinal, nonlinear), conditional probability (maximum
analytics as a process revolves around having a large sample of likelihood estimation, odds ratio) [29] as well as deterministic
thousands or millions of records along with the relevant mathematical programming [8]. Correspondence analysis has
method to analyze the complex relationships hidden within it. sometimes been used to identify hidden relationships in com-
The key problem though is that customary analytical tools plex qualitative data types such as keywords and categorical
used by business managers such as Excel or statistical soft- factors [28]. When complex quantitative (metric) data types
ware, do not easily accommodate large files, or proprietary are available, such as scaled factors and ratio dependent vari-
Executive Information Systems can be too expensive or hard ables, principal component analysis, factor analysis, and struc-
to operate for small to medium-sized organizations. Another tural equation modeling can be applied to identify hidden
key problem seems to be that organizations cannot identify relationships [28].
how to analyze these large files, the variety of content data Text analysis involves information retrieval, lexical analysis
type is too widespread, or the velocity is too intense to to study word frequency distributions, pattern recognition,
manage in a desktop software application [1, 7]. tagging/annotation, information extraction, data mining tech-
In addition to there being vague descriptions of what big niques including link and association analysis, visualization,
data files are, there are ambiguous methods about how to and predictive analytics [30, 31]. The overarching goal is,
analyze it. These generally accepted big data methods include: essentially, to turn text into data for analysis, via application
Split files into manageable chunks; compress data to reduce of natural language processing (NLP) and analytical methods.
storage cost; and analyze data using statistical techniques to Big text analytics is text analysis applied for big data
obtain sharper, timelier business insights [5]. From a strategic mining but alone if falls short of being considered big data
business perspective, big data can be used to help make analytics because it identifies keyword frequencies but not the
critical decisions, based on the concept of the law of large hidden relationships between the factors [1, 3, 7]. Big text
numbers the more data in the sample then we may assume analytics refers to the process of deriving high-quality infor-
the patterns are more representative of the norm and there- mation from text, such as patterns and trends through means
fore statistical techniques may be used. such as statistical pattern learning [5, 31]}. Text mining
From a research methods stand point, increasing data usually involves the process of structuring the input text
volume, complexity, velocity, and variety of data types present (usually parsing, along with the addition of some derived
many challenges for business analysts. The problem is that not linguistic features and the removal of others, and subsequent
all data types can accommodate contemporary statistical insertion into a database), deriving patterns within the struc-
applications nor can these programs handle large files typical tured data, and finally evaluation and interpretation of the
of big data [1]. For example, financial transaction files gen- output [30]. The difference between big text analytics and
erally contain numeric data, which may easily be analyzed merely producing a frequency table of keyword patterns is
using statistics but social behavior big data generally contains that the resulting model will show a combination of relevance,
nominal data such as words or phrases [7]. Although nominal novelty, and interestingness within the data [30]. Typical text
big data types cannot generally be modeled to predict future mining tasks include text categorization, text clustering, con-
events, the analysis of non-metric fields such as attributes cept/entity extraction, production of granular taxonomies,
could identify patterns within the information. Classification sentiment analysis, document summarization, and entity rela-
techniques such as some forms of chi-square, logistic regres- tionship modeling as would be found in database design
sion, and cluster analysis can accommodate nominal data schema [30].
types to identify factors that may predict group membership
behavior [28].
Big data analysis of global terrorism
The goal of big data analytics is either to describe the past
or predict the future. Describing the past could mean sum- The most current high-volume, high-velocity primary global
marizing or visualizing patterns, while predicting the future is terrorism data evidence in English are available in the online
often thought of as forecasting, predicting, or in some dis- news media, from the Associated Press [16]. There is also
ciplines the term used is supervised learning and therefore some secondary data available in the scholarly literature.
unsupervised learning refers to non-predictive goals [29]. Weinberg, Pedahzur and Hirsch-Hoefler [18] conducted a
The principles of big data analytics can be found in disciplines meta-analysis of 55 manuscripts containing 73 distinct ver-
outside of business and management, such as mathematics, sions of terrorism. They suggested that terrorism should be
statistics, computer science, information technology, and defined as: a politically motivated tactic involving the threat
operations research. or use of force or violence in which the pursuit of publicity
JOURNAL OF COMPUTER INFORMATION SYSTEMS 69

plays a significant role [18]. On a global scale there have been The right-wing moderate conservatism is a political philo-
approximately 125,000 recognized terrorism events around sophy that resists any change in the existing political, eco-
the world since 1970 [21]. Thus, global terrorism evidence nomic, social, and religious institutions and beliefs [33]. It
are complex, highvelocity, and voluminous, which makes it a aims to protect traditional values. There are three core Islamic
problem suitable for big data analytics [1820]. Chunking is a diasporas: Islamism, Salafism, and Wahhabismall are core
sampling approach that may be applied to high-velocity big beliefs of Muslim communities [19]. The Jihadist terrorist
data, such as taking samples during stratified or random groups are an extreme form of constructed Islamic religion
periods. Given that most primary terrorism activity is [37]. There are also smaller cult and racially motivated terror-
reported in the online media, day is a natural delimiter for ist groups. Finally we have the lone wolf who often claims to
chunking global terrorism big data into portions that be part of recognized terrorist cult [38, 39]. We developed this
researchers may practically deal with. nominal scale to categorize terrorist group ideology:
Anti-terrorism measures are expensive to administer
[20, 25]. For example, the 9/11 terrorism event caused 1 = Moderate left (Cults);
approximately US$123 billion in economic losses and the 2 = Extreme left (Marxism);
London bombings in 2005 were estimated to cost England 3 = Left wing (Socialism);
UK2 billion [22, 32]. We still dont know the long-term 4 = Right strong (Narcoterrorism);
socioeconomic costs of the Boston bomber terrorist act 5 = Right strong (Anarchism);
that occurred in the United States during the spring of 6 = Extreme left (Maoism);
2013. 7 = Left moderate (Anti-Government);
Recent attacks in Pakistan (150 school children died), 8 = Right moderate (Environmentalism);
Canada (the government was targeted in Ottawa and in 9 = Right (Islamism);
Quebec), as well as Australia (Lindt Caf siege in Sydney) 10 = Extreme right (Jihadism);
illustrated that government, businesses, and individuals are 11 = Moderate left (Separatism);
impacted by global terrorism and thus it would be useful to 12 = Left wing (Communism);
know what and where the risks are [24]. This is a classical big 13 = Moderate right (Lone Wolf).
data analytics task due to the fact that factors of terrorist
group behavior are complex to understand [24], the data are
high-velocity [21], and the volume is large [21] since the The type of a terrorist attack may be related to, and a
evidence is scattered around the world. signature of, a terrorist ideology and a gauge as to the pre-
Terrorist groups tend to have behavioral patterns in their dictability of future attempts. Terrorist attack types may be
frequency, target country/region, ideology, and attack type categorized into nine nominal types, as briefly enumerated
[24]. There are patterns between religion (Christian, Hindu, below (adapted from: 19, 22, 23, 24, 40):
Jewish, Sikh, Muslim) and terrorist ideology, especially with
certain diasporas (groups), notably those with Sikh and (1) Assassinations, where the primary goal is to kill one
Muslim beliefs, but the Oakland Bomber Tim McVeigh is or more important people;
proof that Christians may also be terrorists. Some of the (2) Armed assaults, usually done with powerful firearms
modern terrorism groups are Environmentalists, because such as guns or, but could also include, knives, clubs,
since 1976, environmental terrorists within the United States or personal-use explosives namely grenades;
have carried out over 1000 criminal acts and caused over US (3) Barricading, taking people hostage only for the pur-
$110 million worth of damage. poses of preventing access to or use of a facility, such
In general, terrorist ideology can be placed on a continuum as a train station, government building, when the
from left to right. The left-wing extremists are Communists, intention is to not kidnap people;
Maoists, and Marxists; the left-wing moderates are socialists, (4) Hijacking, taking control of a vehicle such as a plane
environmentalists, and anti-government; with the right-wing (most common), bus, train, ship, submarine, helicop-
moderates being Islamic; toward the right-wing extremists are ter, generally items that could be moved;
radical Islamic and jihadists. Left-wing extremists (Maoists, (5) Bombs, explosions, which are generally bombs, often
Marxists, Communists, and Socialists) oppose capitalism, from nuclear fission or fusion, dynamite, or it could
feudalism, inequality, and exploitation [19, 33]. Moderate also be through chemicals or gases;
left-wing terrorist groups may also be known as Anarchists, (6) Kidnapping, which is hostage taking when the objec-
differentiated from other criminal violence by the political tive is to use the people to obtain economic value
end-state as their goal instead of merely killing people or such as a ransom or political concessions;
damaging property [34]. These are better known to have (7) Infrastructure attacks are non-human targets of ter-
arisen with the outburst of decolonization during the late rorism, such as a road, bridge, building, train, oil/gas
1950s and 1960s, such as in Cuba [35]. The term narcoter- pipeline, canal, train or train track, truck, oil tanker,
rorists was used in the early 1980s by President Fernando warehouse, water reservoir, and so on;
Belaunde Terry of Peru to describe the Anarchist terrorists (8) Unarmed assaults occurs in situations when terrorists
who attacked his police forces when they attempted to elim- dont use weapons only their body, fists, feet, bite, etc.;
inate drug trafficking [20, 36]. (9) Multiple methods, unknown combinations.
70 K. D. STRANG AND Z. SUN

Literature synthesis and research questions Groups (ISVG) at the University of New Haven. Beginning
with cases that occurred in November 2011 onward, a fourth
Our first research proposition is that we could use Hadoop for
data set was used [41].
big data sampling, and the results could be integrated with
Several other publicly available global terrorism data sets
text analytics and statistical software. This would be valuable
were researched to supplement the above. ITERATE
to illustrate as a research design for big data analytics studies.
(International Terrorism: Attributes of Terrorist Events)
We hypothesize that we can summarize terrorist activities by
from Duke University (http://library.duke.edu/data/collec
day through using Hadoop by researching the text archives of
tions/iterate), RDWTI (RAND Database of Worldwide
Google News (http://news.google.com) who carry the
Terrorism Incidents) from RAND Corporation (http://www.
Associated Press releases. We anticipate leveraging the work
rand.org/nsrd/projects/terrorism-incidents.html, and WITS
of other researchers to initially build the big database of key-
(Worldwide Incidents Tracking System) from the Terrorism
words (e.g., 24), and then we will extend the data to cover
Research& Analysis Consortium (http://www.trackingterror
recent periods.
ism.org) were accessed.
Our second research proposition is that we could make a
contribution to the literature by integrating big data analytics
with statistical software to better understand the relationship
Big data text search procedures and variables
between global terrorist ideology and attack type. We
hypothesize that we will be able to identify a significant We obtained the global terrorism big data set (GTBD) in an
relationship in the global terrorism big data between ideology SPSS SAV format from Strang and Alamieyeseigha [24]the
and attack type, which would be useful for input to risk size was 60.048 MB, it contained 125,087 records with 133
planning for businesses, government, or any type of organiza- fields totaling 16,636,571 data points. We extended this data
tion. Thus, by fulfilling the two research propositions and using Hadoop big text analytics accessed via Google Analytics
testing the general hypothesis, the findings should generalize by searching Google News for terrorism keywords during the
to academic researchers (as the research design method) as time period 20132014 (one year).
well as to risk management practitioners (both the design and We were interested in the qualitative fields, namely the
the results). terrorist attack type, along with the country and ideology.
H1: there will be a significant relationship between terrorist As noted earlier, the global terrorism data set contained a
group or individual ideology (left to right) and terrorist attack field named terrorist attack type, which was nominally coded
type. into one of nine types coded as: assassinations, armed assaults,
bombing/explosions, hijacking, barricading, kidnapping,
infrastructure, unarmed assaults, and unknown/mixed. We
Methodology accessed Hadoop big data text analytics (by searching for
We held a positivist ideology, so we collected a large relevant the attack type or attack method and noting the phrases
sample of empirical data and we applied statistical techniques that were adjacent). Since we were also interested in the
for the analysis. ideology, we used if conditions in big text analytics software
Descriptive and nonparametric statistical analysis techni- to search the news media coverage for the 13 ideology key-
ques were applied since this study was at the exploratory stage words identified earlier. We added the ideology as a nominal
with no a priori models [28] and the objective was to demon- type to each record so that it could later be counted. For cases
strate how big data analytics could be designed and applied to where multiple ideologies were identified, we manually
global terrorism. SPSS version 21 was used for the statistical reviewed the story and subjectively selected the most relevant
modeling analysis. The 95% level of confidence was used for keyword. We added 30 more global terrorism records to the
all statistical tests. The policy was to retain all outliers since global terrorism data (GTD) data set (N = 125,117).
the nature of this study was exploratory and descriptive.

Big data nominal field relationship analysis


Sample
The logical choice of big data analytic technique for nominal
According to Strang and Alamieyeseigha [24], they collected field relationship comparisons where the keywords could be
secondary global terrorism activity data from the newspapers counted (frequencies) would be cluster or correspondence
and from the National Consortium for the Study of Terrorism analysis. Both these methods reduce the complexity of the
and Responses to Terrorism (START), located at the big data into groups or factor dimensions. The purpose of
University of Maryland, USA. One terrorism data set was cluster analysis is to group similar records together, generally
initially developed by the Pinkerton Global Intelligence based on average distance of the mean within the factors,
Service (PGIS)a private security agency in the United whereas correspondence analysis goes a step further by iden-
Statescovering terrorism activity during 19701997. tifying potential hidden relationships between factors. Cluster
Terrorism acts that occurred between 1998 and March 2008 analysis identifies similar versus different records while corre-
were collected by the Center for Terrorism and Intelligence spondence determines the strength and direction of the asso-
Studies (CETIS), in partnership with START. A third data set ciation between factors. Both are capable of producing
was created for terrorist acts between April 2008 and October visualization maps of the results. Since the objective here
2011, with efforts led by the Institute for the Study of Violent was to identify important factor relationships between
JOURNAL OF COMPUTER INFORMATION SYSTEMS 71

terrorist attack type and ideology, correspondence analysis the relationships [44]. According to Strang [42], there are two
was the best choice. basic forms of this technique, the first is simple correspon-
Additionally, a table of big data descriptive statistics is dence analysis (with the word simple not usually mentioned)
difficult to interpret to support decision-making. A bar or and the second is multiple correspondence analysis; simple is
line chart could be developed to visualize the descriptive used with two factors and multiple correspondence analysis is
statistics but that would not uncover any underlying non- used when there are more than two factors. Simple corre-
linear associations between attack type and ideology. spondence analysis would be ideal to meet our mandate for
Therefore, correspondence analysis was applied to better nominal factor relationship testing and visualization. The
address the research mandate. advantage of using this over cluster analysis is that it does
A review of the empirical literature or a research design not try to classify records into groups but rather it uses the
handbook is generally the starting point when trying to iden- content of the fields to estimate the correlation between
tify appropriate techniques for big data analytics. Strang [42] factors, e.g., correspondence.
used a technique similar to cluster analysis to describe the In SPSS, simple correspondence analysis version 1.1 was
relationship between asset managers and the portfolio techni- developed by the Data Theory Scaling System Group (DTSS),
ques they used. Although that was not a big data analytics at the Faculty of Social and Behavioral Sciences in Leiden
study, his approach is relevant to note here. He interviewed 39 University, The Netherlands [45]. We selected DTSS symme-
asset managers from companies listed on the NYSE, asking trical normalization in SPSS because this is the best-practice
them about their choice of portfolio management technique, method to spread the inertia evenly over the row and column
along with other questions about their demographics (gender, factors for a two-dimensional plot [45, 46]. Normalization
company size, and so on). He used multiple correspondence means to estimate the Euclidean distance scores or inertia
analysis, which he explained was a nonparametric statistical (negative or positive association between factors and a cen-
technique to compare the interdependence between more troid location). For reference purposes, three other normal-
than two nominal factors. Strang [43] also conducted a similar ization choices exist [45]. Row principle normalization
study by interviewing 211 financial managers to identify the maximizes and emphasizes the distances between row factors,
nature and quality of the relationship between risk manage- while column principle normalization emphasizes the same
ment technique and functional discipline. for the column factor (generally the second factor in the
Correspondence analysis falls within the family of multi- correspondence analysis). A fourth option seems to have
variate exploratory techniques capable of producing statistical become available in SPSS, which is principal normalization,
estimates and graphical diagrams of nonmetric variable rela- whereby the inertia is spread twice in the solution matrix
tionships. Common variations include dimensional analysis, following the symmetrical concept but a plot is not yet avail-
multidimensional scaling, simple or multiple correspon- able for this [45], so the big data relationships cannot be
dence analysis, conjoint analysis, choice models, discri- visualized.
minant analysis, Euclidian distance analysis, spatial
segmentation as well as vector analysis of contingency
tables [44]. Results and discussion
While common nonparametric procedures such as chi- Descriptive statistics of terrorism big data
square test of independence or goodness of fit can be used
on the nominal big data types such as terrorist attack type and The terrorism big data were coded with the 13 ideologies and
ideology, chi-square methods do not indicate that nature the nine attack types. Due to the size of the nominal codes for
relationship between the content of the factors, only that ideology and attack type, so we chose use numbers for display
there may or may not be a relationship between factors in purposes in the tables and plots (following the same enumera-
the overall model [44]. Also, as explained earlier, cluster tion of our literature review).
analysis does not examine the nature of factor relationships, First, we calculated the frequencies of global terrorism
only group similarity or dissimilarity. attack type and ideologies (N = 125,117). However, it is
Correspondence analysis is particularly useful for trans- difficult to interpret the relationships between terrorist attack
forming complex qualitative big data into relationship-inter- types and the ideologies by examining only a table of frequen-
dependence diagrams, by comparing the relationship between cies, so we selected to use a symmetric plot in correspondence
two nominal variables in tables as well as visually through a analysis to visualize the relationships.
multidimensional plot. Factors that are similar to each other
appear close to each other in the plots, which indicates that
Correspondence analysis of terrorism big data
they are related without loss of attribute detail as would be the
case when applying cluster analysis [42]. In this way it is as Table 1 lists the statistical estimates from correspondence
powerful as traditional factor analysis to reduce the complex- analysis of the global terrorism big data set whereby the two
ity of big data but principal component and factor analysis nominal factors were attack type by ideology. The proportion
require quantitative, not qualitative, data [44]. of inertia shows the breakdown of the association between
Correspondence analysis calculates inertia estimates of terrorist attack type and ideology accounted for by each
variable interdependencies and, if significant, a symmetric dimension. The cumulative inertia proportion column
plot can be drawn by converting the inertias into Euclidian reveals that most of the interdependence was identified by
distance coordinates to visually depict the relative strength of the primary and secondary axes, amounting to 95% (0.945) of
72 K. D. STRANG AND Z. SUN

Table 1. Correspondence analysis estimates of terrorist attack type by ideology (N = 125,117).


Proportion of inertia Confidence singular value
Dimension Singular value Inertia Accounted Cumulative Standard Correlation 2
1 0.385 0.148 0.687 0.687 0.001 0.210
2 0.236 0.056 0.258 0.945 0.002
3 0.085 0.007 0.034 0.978
4 0.065 0.004 0.020 0.998
5 0.018 0.000 0.001 0.999
6 0.010 0.000 0.001 1.000
7 0.005 0.000 0.000 1.000
8 0.002 0.000 0.000 1.000
Total 0.215 1.000 1.000

the total inertia. The first dimension captured 0.687 of the geometrical sense, so in a nonlinear sense the content of the
inertia and the second accounted for 0.258 of the total inertia. factors in this space are related in some underlying way. Mass
The third dimension added only 3.4% (0.034) to the and inertia are the key estimates in Tables 12 that identify
accounted-for inertia and is therefore justification for pro- the contribution of the factors to the axes in the symmetric
ceeding with a two-dimensional visual model to facilitate plot. Mass will play a large role for initially determining the
decision-making (two factors are easier to plot than three). centroids and axes.
The estimated overall association (correlation) of two factors Proximity between points on the symmetric plot does not
was 0.21 (low). necessarily mean strong relationships because the association
Table 2 contains the details of the inertia and contribution is to the axis, with evidence of interdependence to other
estimates for correspondence analysis, by row since we could points of the same category. The strength of a relationship is
focus first on attack type and then determine how ideology measured on a symmetric plot based on the degree of simi-
was related to each attack type. We elected to not display the larity of the angle between points from the origin.
estimates by column (ideology) since this would not add any The mass theoretically refers to the weight of the cell in the
value to this analysis, and due to space limitations, but the full dimensional matrix (not just one axis). The mass for a
authors will make this information available to anyone by variable is a simple calculation of the proportion of frequen-
request. The most important estimates to discuss for big cies in the cell/total frequencies for the contingency matrix.
data analysis are the mass, dimension score, inertia, and Larger mass estimates mean that an attack type or ideology
symmetric contribution. appears more frequently in the data. Theoretically, the
The symmetric plot in Figure 1 shows the interdependence moment of inertia represents the mass and Euclidian dis-
of dimensional plot for global terrorism attack type by ideol- tance of each point from the center of gravity on the dimen-
ogy. The triangles (blue) are the attack types and the circles sional plot (centroid). Inertia is the eigen (), calculated by the
(green) represent the ideologies. The primary dimension is the weighted average of the chi-square distances from the axis
horizontal axis 1 and the secondary dimension is the vertical centroid to the projections of the profiles (points on the plot
axis 2 (the primary axis is calculated first and therefore will in Figure 1). As compared to mass (which essentially mea-
have larger variance-captured estimates). Interpretations of sures frequencies), inertia estimates the degree of interdepen-
the four quadrants have been superimposed on the symmetric dence between a variable and both dimensional axis
plot axis labels (as discussed below). The interpretation of the quadrants.
symmetric plot will require referencing back to the estimates Inertia in Tables 12 is the proportion of the inertia repre-
in Tables 12. sented by both axis dimensions. The contribution columns in
First there are some rules about interpreting symmetric Tables 12 are similar to the communality for a variable in
plots. Strang [42] likened the interpretation of a dimensional factor analysis. Thus, quality is a reliability ratio (somewhat
symmetric plot to identifying points of a profile group situ- like r2); with a minimum expectation of 0.5 but a preferred
ated away from the origin, but close to each other. Nearby proportion is 0.7 to meet the strong dimensional quality
points will have similar profiles in the subspace, in a level threshold.

Table 2. Correspondence analysis details of terrorist attack type by ideology (N = 125,117).


Symmetric contribution
Score in dimension Of point to inertia of dimension Of dimension to inertia of point
Terrorist attack type Mass 1 2 Inertia 1 2 1 2 Total
1 Assassination 0.111 0.253 0.458 0.011 0.018 0.099 0.248 0.498 0.745
2 Armed assault 0.111 1.249 0.642 0.079 0.451 0.194 0.847 0.137 0.984
3 Barricading 0.111 0.038 0.056 0.000 0.000 0.001 0.136 0.185 0.321
4 Hijacking 0.111 0.001 0.012 0.000 0.000 0.000 0.001 0.176 0.177
5 Bombs 0.111 1.316 0.796 0.091 0.500 0.299 0.814 0.182 0.996
6 Kidnapping 0.111 0.229 0.091 0.006 0.015 0.004 0.380 0.037 0.417
7 Infrastructure 0.111 0.131 0.922 0.026 0.005 0.401 0.029 0.870 0.899
8 Unarmed assault 0.111 0.000 0.024 0.000 0.000 0.000 0.000 0.142 0.142
9 Multiple method 0.111 0.183 0.057 0.003 0.010 0.002 0.562 0.033 0.594
Total 1.000 0.215 1.000 1.000
JOURNAL OF COMPUTER INFORMATION SYSTEMS 73

Figure 1. Symmetric plot of global terrorist attack type by ideology for 19702014.

The scores in Tables 1 and 2 are the coordinates or dis- Based on the above analysis, x-axis dimension 1 was
tances from the centroid. These are the principal coordinates named Target specificity (individual vs. mass) to represent
of the variable for each axis, which are used for plotting. what we thought was a pattern of individual terrorist targets
These represent the proportional contribution of the factor using assassination and armed assaults on the leftmost dimen-
inertia to a specific axis dimension (larger amounts are bet- sion as compared to using methods of mass destruction such
ter). Contributions are estimates of the variance calculated as bombing and destroying infrastructure but not necessarily
with respect to the entire set of variables and can therefore with a specific target in sight. The y-axis dimension 2 was
be interpreted for relative association. Explained points are captioned Target culture (west vs. east) since we recognized
those whose contributions to the eccentricity are greater than the Central and North America with Western European ideol-
a certain threshold, such as 0.25 [42]. The overall contribu- ogies on the bottom in contrast to the Muslim-dominated
tions shown in Tables 1 and 2 are like factor scores in factor Middle East and Southeast Asian ideologies on the top.
analysis, because they inherently suggest how related the When viewed together, the dimensions seem to illustrate a
factors on the same side of the axis are (and vice versa, lack pattern of mass destruction attack types against the eastern
of interdependence is indicated by axis separation). At this ideologies such as Middle East, as compared to assassinations
point we were able to accept the hypothesis that there was a and armed attacks directed toward Western culture ideologies
significant relationship between terrorist ideology (left to (USA, European Union countries, and Australia).
right) and terrorist attack type. The symmetric plot and inertia estimates from Tables 1
The next step was to name the dimensions and identify any and 2 could be used to estimate the risk of developing part-
clusters based on the decomposition of inertia measures nerships, removing assets, or expanding existing operations in
across each axis. Also, the symmetric plot may be visualized regions possessing the ideologies where terrorist attack types
to interpret it. Several points in the symmetric plot (Figure 1) are more common and destructive. Of course the granularity
stand out and thus strongly influence the interpretation of the of the symmetric plot at the region level is too broad (we used
symmetric plot dimensions. Attack type 2 (armed assault) and it only for an example)instead the city and country or the
ideology 2 (Marxism) stand out on the left part of x-axis as do geographic coordinates could be used to produce a more
ideology 11 (Separatism) and 5 (Anarchism) seem related as accurate estimate of the areas that would be more likely to
terrorist ideologies. Likewise, attack type 5 (bombing) along experience certain types of global terrorism attack types. In
with ideology 10 (Jhidadism) are in close proximity and stand another perspective the symmetric plot suggests that there
alone on the top of the y-axis. In similar fashion, attack type 1 may be patterns of terrorist attack types with certain countries
(assassination) and 7 (infrastructure) are isolated and close to where goods may be shipped through, so this would be of
ideology 8 (Environmentalism) on the bottom of the y-axis. great value to serve as input for decision-making in supply
The remaining points (attack types and ideologies) are densely chain management. For example, based on these results,
clustered in the middle of the symmetric plot. would you recommend using a warehouse or transporting
74 K. D. STRANG AND Z. SUN

your valuable products with companies where these terrorist becoming available on the Internet through open source soft-
attack types and ideologies are present? ware. We also encourage other researchers and practitioners
to share their data with others by indicating how it may be
accessed in their publications.
Conclusions
We conducted this study primarily because such a study was
recommended by several well-known big data analytics
References
researchers who asserted that there were too few practical
worked-out examples in the literature [13]. More so, [1] Kambatla K, Kollias G, Kumar V, Grama A. 2014. Trends in big
Kauffman, Srivastava and Vayghan [31] had found that data data analytics. J Parallel Distrib Comput. 74:25612573.
analytics techniques did not work well with large amounts of [2] Goldfield NI. 2014. Big datahype and promise. J Ambulatory
Care Manage. 37:195196.
data. We specifically addressed that gap in the literature with [3] Baoan L. 2014. Knowledge management based on big data pro-
our study by using a large big data set over 64 MB in size with cessing. Inf Technol J. 13:14151418
over 125,117 records showing how to collect, code, and ana- [4] Varian HR. 2014. Big data: New tricks for econometrics. J Econ
lyze terrorism big data. We used Hadoop big data analytics Perspect. 28:327.
software through Google to collect and search through the [5] Kandalkar NA. 2014. Extracting large data using big data mining.
Int J Eng Trends Technol. 9:576582.
terrorism big data and then we used a well-known statistical [6] Chen G, Guo X, Leon Zhao J. 2012. ECRA special issue introduc-
programSPSSto analyze hidden relationships in the nom- tion for: Information services in E-Commerce. Electron
inal coded factors. Commerce Res Appl J. 11:535536.
We justified the use of correspondence analysis, from SPSS, [7] Qin HF, Li ZH. 2014. Research on the method of big data analysis.
as a big data analytical technique, once we had collected and Inf Technol J. 12:19741980.
[8] Strang KD. 2012. Applied financial nonlinear programming mod-
coded terrorism big data through Hadoop. This allowed us to els for decision making. Int J Appl Decis Sci. 5:370395.
produce descriptive statistics on the factors of interest (attack [9] Strang KD, Brennan L, Vajjhala NR, Hahn J. 2015. Gaps to
types and ideologies) and also to identify interdependence address in future research design practices. In: Strang KD, editor.
relationships between attack type and ideology using a sym- Palgrave handbook of research design in business and manage-
metric plot. The plot provided a visual model to complement ment, vol. 1. New York: Palgrave Macmillan; p. 545560.
[10] Westland CJ, Chen G, Ba S. 2013. Guest editors introduction to
the statistical estimates from correspondence analysis. We the special issue. Electron Commerce Res Appl J. 12:297298.
were able to accept the hypothesis that there was a significant [11] Yang Y, Lai H. 2013. Guest editors introduction: Auctions and
relationship between terrorist ideology (left to right) and negotiation in electronic procurement. Electron Commerce Res
terrorist attack type. Appl J. 12:137138.
The results of this terrorism big data analytics could be [12] Zhou L, Zhang P, Zimmermann H-D. 2013. Social commerce
research: An integrated view. Electron Commerce Res Appl J.
used by business managers to inform their choice of selecting 12:6168.
marketing partners or supply chain providers, or to scale [13] NSF. 2015. Information and Intelligent Systems (IIS): Core
down operations in a potentially risky area. Nonetheless, the Programs, United States National Science Foundation (NSF),
purpose of our study was not merely to generalize these Arlington, VA, Program Solicitation (Grant Eligibility) NSF
specific correspondence analysis results for business decision 14596, January 2, 2015.
[14] Hoffman BL. 2015. Big data and analytics on IBM power systems:
makers; instead the ultimate goal was to generalize the big Whats the difference between Apache Hadoop and Apache Spark?
data analytics design methodology for other practitioners and https://www.ibm.com/developerworks/community/blogs/f0f3cd83-
researchers. 63c2-4744-9021-9ff31e7004a9/entry/What_s_the_Difference_
Another implication is that practitioners may achieve Between_Apache_Hadoop_and_Apache_Spark?lang=en
scientific credibility of their findings through the rigor of [15] EMC 2015. Data science and big data analytics: Discovering,
analyzing, visualizing and presenting data. EMC Educational
hypothesis testing by using parametric or nonparametric sta- Services. IN: Wiley.
tistical techniques in big data analytics. These findings gen- [16] Oprescu F. 2013. The new millenium and the age of terror:
eralize broadly to course designers at universities, to Literature and the figure in the carpet. J Study Religions
organizational researchers, to department of defense govern- Ideologies. 12:5171.
ment departments, and to students who are collecting big data [17] Garden T. 2001. Weapon of mass destruction. World Today 57:4
6.
with the intention to perform analytics. However, business [18] Weinberg L, Pedahzur A, Hirsch-Hoefler S. 2004. The challenges
practitioners, analysts, and academic researchers need more of conceptualizing terrorism. Terrorism Political Violence 16:777
information about how to use these approaches with large 794.
data files, and more so, how to design process to answer real- [19] Breckenridge JN, Zimbardo PG, Sweeton JL. 2010. After years of
world business problems. It might even be feasible to call for a media coverage, can one more video report trigger heuristic
judgments? A national study of American terrorism risk percep-
big data analytics methodology guidebook to be published for tions. Behav Sci Terrorism Political Aggression 2:163178.
practitioners and researchers. [20] Koc-Menard S. 2009. Trends in terrorist detection systems. J
Another implication of our study is that other researchers Homeland Secur Emergency Manage. 6:1621.
ought to publish similar empirical papers to demonstrate [21] Rivinius J. 2014. Majority of 2013 terrorist attacks occurred in just
other methods for qualitative and quantitative terrorism big a few countries. In Press Release. National Consortium for the
Study of Terrorism and Responses to Terrorism (START),
data analytics, particularly to investigate complex relation- University of Maryland, Baltimore, 2014, pp. 12.
ships between the nominal factors and quantitative variables. [22] Hanes E, Machin S. 2014. Hate crime in the wake of terror attacks.
Big data is getting bigger and more analytics programs are J Contemp Criminal Justice 30:247267.
JOURNAL OF COMPUTER INFORMATION SYSTEMS 75

[23] Bausch AW, Faria JR, Zeitzoff T. 2013. Warnings, terrorist threats [36] Kean TH, Hamilton LH, Ben-Veniste R, Fielding FF, Gorelick JS,
and resilience: A laboratory experiment. Conflict Manage Peace Gorton S, Kerrey B, Lehman JF, Roemer TJ, Thompson JR,
Sci. 30:433451. Zelikow PD, Kojm C, Marcus D. 2004. National Commission on
[24] Strang KD, Alamieyeseigha S. 2015. What and where are the risks Terrorist Acts Upon the United States, US Government,
of international terrorist attacks: A descriptive study of the evi- Washington, DC August 21 2004.
dence. Int J Risk Contingency Manage. 4:118. [37] Kastenmller A, Greitemeyer T, Hindocha N, Tattersall AJ,
[25] Blalock G, Kadiyali V, Simon DH. 2009. Driving fatalities after 9/ Fischer P. 2013. Disaster threat and justice sensitivity: a terror
11: a hidden cost of terrorism. Appl Econ. 41:17171729. management perspective. J Appl Social Psychol. 43:21002106.
[26] Elyas M, Maynard AA, Lonie A. 2014. Towards a systemic frame- [38] Scott B. 2014. Martin Place Cafe siege: Man Haron Monis named
work for digital forensic readiness. J Comput Inf Syst. 54:97105. as gunman. In Courier Mail, Weekend ed. Sydney; p. 13.
[27] Strang KD. 2009. Using recursive regression to explore nonlinear [39] Habib R. 2014. Man Haron Monis: The day I met the Sydney siege
relationships and interactions: A tutorial applied to a multicul- gunman. In: Daily Telegraph. Sydney; p. 12.
tural education study. Pract Assess Res Eval. 14:113. [40] START, Global Terrorism Database, Center of Excellence at the
[28] Strang KD. 2015. Selecting research techniques for a method and United States Department of Homeland Security, USA, National
strategy. In: Strang KD, editor. Palgrave handbook of research Consortium for the Study of Terrorism and Responses to
design in business and management, vol. 1. New York: Palgrave Terrorism (START), University of Maryland, Baltimore,
Macmillan; p. 6380. Database March 1 2014.
[29] NRC. 2013. Frontiers in massive data analysis. Washington, DC: [41] START, Global terrorism database codebook: Inclusion criteria
National Academies Press. and variables, Center of Excellence at the United States
[30] Coussement K, Van Den Poel D. 2008. Integrating the voice of Department of Homeland Security, USA, National Consortium
customers through call center emails into a decision support for the Study of Terrorism and Responses to Terrorism (START),
system for churn prediction. Inf Manage J. 45:16474. University of Maryland, Baltimore, Documentation August 30
[31] Kauffman RJ, Srivastava J, Vayghan J. 2012. Business and data 2014.
analytics: New innovations for the management of e-commerce. [42] Strang KD 2012. Man versus math: Behaviorist exploration of
Electron Commerce Res Appl J. 11:8588. post-crisis non-banking asset management. J Asset Manage.
[32] Fischhoff B. 2011. Communicating about the risks of terrorism 13:348467.
(or anything else). Am Psychologist 66:520531. [43] Strang KD. 2012. Nonparametric correspondence analysis of glo-
[33] Strang KD. 2012. Multicultural face of organizations, In: Sarlak bal risk management techniques. Int J Risk Contingency Manage.
MA, editor. The new faces of organizations in the 21st century, 1:124.
vol. 5. ON: North American Institute of Science and Information [44] Strang KD. 2015. Cross-sectional survey and correspondence
Technology (NAISIT); p. 121. analysis of financial manager behavior. In: Strang KD, editor.
[34] Strang KD. 2012. Student diaspora and learning style impact on Palgrave handbook of research design in business and manage-
group performance. Int J Online Pedagogy Course Des. 2:119. ment, vol. 1, New York: Palgrave Macmillan; p. 223238.
[35] Strang KD. 2009. Multicultural e-education: Student learning [45] IBM. 2013. IBM SPSS Statistics for Windows, Version 21 ed.
styles, culture and performance. In: Song H, Kidd T (eds.). Armonk, NY: International Business Machines Corporation
Handbook of research on human performance and instruc- (IBM).
tional technology. Houston, TX: Information Science [46] Kaufman L, Rousseeuw PJ. 1990. Finding groups in data: An
Reference; p. 392412. introduction to cluster analysis. New York: Wiley-Interscience.

Anda mungkin juga menyukai