Available at http://www.ijcsonline.com/
Abstract
Big data and big data analytics are playing very important roles in a variety of applications with the implementation
capabilities and tools for a possible support for offering efficient solutions to those applications. This new technology has
recently emerged as a very popular research and practical-oriented framework that implements i) data mining, ii)
predictive analysis forecasting, iii) text mining, iv) virtualization, v) optimization, vi) data security, vii) virtualization
tools for processing very large data sets particularly cloud data for exploring new business enterprise applications and
decisions. The big data analytic was being considered as one of the fast growing technology trend in 2014 and will
continue for few more years in future with a large number of big data applications in particular is social networking
cloud computing, healthcare systems and many business systems. Thus, in order to understand this technology, we need
to understand how each of the concepts (data mining, virtualization and data security) have evolved and contributed into
data big analytics.
With this in mind, we propose a two parts of paper that will provide a state-of-the-art of each of these interrelated
concepts used in Big data analytics starting with how this concept evolved, its applications, available tools, limitations
and the current status so that researchers and developers can understand the how this new technology can be used for
new applications and also deriving new technology, tools and frameworks In this first part of the paper, we focus on the
conceptual design of big data applications, big data analytics and solutions, discussion on a number of open source
framework tools, the roles of data mining and virtualization. The data mining techniques have been used in big data
analysis, but recent applications for multimedia big data are looking at newer data mining techniques for managing and
analyzing the huge amount of data. Further, easy representation and display of data analysis are looking for an efficient
and user friendly tool that will help the users to interpret the data in a very simple way. The second part of the paper
focuses on remaining two technologies as data virtualization and data security. Virtualization offers very efficient tool to
provide the representation of data with capabilities of displaying the dynamic behavior of the data. Since the modern big
data applications are implemented over Internet, it is important to understand various cyber-attacks and crimes that
affect all the implementation phases of big data multimedia applications. This paper will also provide insights of how
the applications can be prevented from these attacks and further how cyber-crime analysis technique can be used for
reliable big data implementation.
Specifically, part I discusses i) main concepts, features, applications, implementation issues and capabilities of
encapsulating features of ii) data mining for mining, analyzing and processing big data while part II discusses i)
virtualization tools to extract useful information from processed data and also dealing with ii) data security. The
rationale for considering these three sub topics of data mining, virtualization and data security is due to the fact that all
the successful and implemented big data applications are derived from big data analytics. We hope the these two papers
will provide a very clear understanding of the big data analytics and describe how each of the concepts like data mining
and text mining, virtualization and data security have evolved and have been now integrated into Big data analytics
I.
16 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
17 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
IV.
V.
18 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
by accessible
performance,
and services,
decrease risks,
19 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
IX.
X.
DATA COLLECTION
20 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
XI.
DATA CLEANSING
21 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
Many of the enterprise computing environments is Inmemory platform or framework as a service based on SAP
Hana can allow to build innovative applications with
improved productivity in handling and managing big data
sets of very large volume. The data analysis of big data
may provide insights of the data sets in such a way that it
can be used to grow the business and marketing of the
products. The big data analytics is becoming a new
technology that has generated an interest in enterprising
arena and as such has to support a number of architecture
that has been developed for those applications [12, 53].
APPLICATIONS
SOLUTIONS
A. MapReduce
MapReduce is a programming tool for distributed
computing and was created by Google. This framework
uses divide and conquer method to divide the data
problems into small sub data set processes and execute
22 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
B. Hadoop
Apache software foundation introduced a tool called
Apache Hadoop (an open source data computing
framework) that uses a number of modules and provides
solutions to handle manage and implement big data set.
This framework and set of tools for processing large data
sets was originally designed to manage cluster of physical
machines. Now, we have seen a big use of this framework
cloud like Amazons Redshift hosted B1 data warehouse,
Googles BigQuery data service, IBMs Bluemix cloud
platform and Amazons Kinesis data processing servive.
It is based on BigTable (data storage system which was
introduced by Google) data storage system. The Hadoop is
Java based framework designed for heterogeneous open
source platform. Various features of this framework
includes Distributed File System, analytics and data storage
platforms, layer to manage parallel computation, workflow
and configuration administration and many others needed
to solve big data sets. Hadoop Distributed File System runs
across nodes in a Hadoop clusters and provides
connectivity to all the input and output nodes with a view
to create one big file system. Some of the material
described below is derived from [1,16, 18, 20, 25-26, 3637, 42, 46].
Hadoop framework offers solution based on batch
processing concept for handling, managing and processing
big data set. It may not provide appropriate solution for
real-time ad hoc querying management, but has become a
common solution for processing large amounts of data.
Modules such as Pig and Hive along with Hadoop
MapReduce provide querying management. Some efforts
have already been made to provide solutions for real-time
ad hoc querying management over large scale big data set.
The querying system is based on SQL for implementing
query Hadoop system. Other possible solutions (based on
Hadoop) using relational data base based on scalability and
distributed relational systems have been developed that
analyzes the data set and interpret useful information from
the data set.
The volume of data is continuously increasing in all
applications at an exponential rate and it is becoming a big
challenge to handle the data and also develop appropriate
solutions [25, 26]. The Hadoop framework model has
become very popular tool for managing social networking
environment applications over Internet. Nearly all the
applications of social networking (Facebook, Twitter,
Linkedln, etc) deal with huge amount of data from their
users. The data are in different forms and need to be need
to be presented to their users in a very simple and friendly
manner. In spite of these features of Hadoop, in real-time
analysis and predictive analysis, it has been seen that it
takes significant amount of time. SQL query tool as
SparkSQLseems to be fast interactive query with streaming
capabilities. A new tool supporting SQL like quering has
opened the door for Hadoop to be used in Enterprise
computing applications.
A number of new open source modules interfacing at
application layer of Hadoop model have been developed to
implement scalable and distributed computing environment
including: database (HBase and Cassandra), querying
(Hive and Pig), coordination services (ZooKeeper) [33-34].
The various functional module programs offering
different services to be used on Hadoop framework have
recently been introduced. Some of the popular services
applications include: HDFS, MapReduce, Pig, Hive, JAQL,
HBase, Flume, Sqoop, Oozie, Zookeeper, YARN, Mahout,
Ambari, Spark, Whirr, Hue, Lucene, Chukwa, Hama,
Cassandra, Impala, etc [37] Each of the module does a
specific functionality and is being used with Hadoop to
implement a specific aspect of big data starting from the
collection, storage, administration, query management,
interpretation and solving of big data set into different
clusters across the distributed system over internet.
The following is a brief description of some of these
modules, their services and each of these modules operate
at the top layer of Hadoop model.
MapReduce: This module provides a powerful parallel
programming technique for distributed processing on
clusters of the framework [1,28]
Apache Hive: This module provides a SQL like
interface and relational model as an application on the
framework for storing and retrieving the data. A data
23 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
Voldemort
This module is a data base program that is highly
scalable distributed key-value based data store and was
originally developed by LinkedIn. In this model the data is
automatically replicated and partitioned among the nodes
in the distributed system. Each node is independent of the
others such that there is no single point of failure. Read and
write access is limited to key-value access. As such, it
supports only three types of queries: get, put and delete.
Based on its simplicity, it always offers predictable
performance of queries.
Sqoop:
This module allows the transferring of data between
relational databases and Hadoop [1]
Avro:
This module supports the serialization of data for its
processing.
Oozie:
This module defines a systematic workflow for
dependent Hadoop jobs [1]
Chukwa:
This module provides support for a Hadoop subproject
as data storage and accumulation system for managing and
monitoring distributed systems of clusters [1]
Flume:
This module provides mechanism for a reliable and
distributed streaming log collection of the data across
clusters [1].
24 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
A. Facebook
This social network hosts the largest Hadoop cluster by
volume, consisting of a total of over 4400 nodes and 100+
petabytes of data. It consists of five modules that are used
on big data set as the Hadoop Core, a log data collector
called Scribe, Hive, a UI for querying with Hive called
HiPal, and an automation framework called NoCron.
Based on needs and other requirements, the network
has defined its own configurations on Hadoop framework.
The underlying HDFS uses a Federated HDFS and its
redundancy is implemented using RAID technology. It
also uses Hive to simplify the interaction with Hadoop by
their analysts. Roughly 90% of their MapReduce jobs are
built on Hive [7, 40, 44].
B. Twitter
Another social network that needs a lot of storage and
processing of big data set also uses Hadoop framework. All
the data in this network is stored in Hadoop Distributed
File System using LZO (Lempel-Ziv-Oberhumer)
compression. Further, it uses Googles Protocol Buffers to
efficiently read and write data into their cluster through
data serialization with the generated code it provides as
these are supported by Hadoop framework. It uses scalding
framework to provide a simpler way of creating
MapReduce jobs much like Hive and Pig [7,41-42,44]
C. LinkedIn
In this social network, Hadoop is used to provide
predictive analytics and querying that is based on the
features of Hadoop like People You May Know and
Endorsements (PYMKE). Over a billion of LinkedIn
relationships are processed each day to compute People. It
also creates their engagement emails, presenting a users
profile views and also their association in their professional
association. It adopted Apache Pig to avoid writing
complex MapReduce programs. It developed a Hadoop log
aggregator and dashboard called White Elephant that
supports visualization the utilization across the users in
cluster and this allows the users to understand these
features and better use them over Distributed File System
[7, 43].
25 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
26 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
27 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
28 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
XXI. CONCLUSION
In the first part of the paper, we presented the state-ofthe-art of big data and big data analytics and various issues
associated with these. Starting with basic definitions and
need for such a technology, how big data technology is
evolved, operations and services offered by big data,
various applications, different forms and classifications of
data applications, data processing including data collection,
data cleansing, data analysis, data storage, manipulation
and interpretation, big data solutions, various open source
frameworks/architecture to solve big data applications. We
then discussed various open source software
frameworks/architectures that have been used extensively
in solving different types of applications of big data. In
addition to these frameworks, a number of application
programs have been introduced that can be interfaced with
29 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
[1]
[2]
[3]
[4]
[5]
[6]
30 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
Hortonworks,
Online:
http://hortonworks.com/hadoop/ambari/
Apache
Hadoop,
lasr
access
Feb
13,
2015,
http://hadoop.apache.org/core/ 283
A. Vailaya (2012) Whats All the Buzz Around Big Data?, IEEE
Women in Engineering Magazine, December 2012, pp. 24-31,
B. Brown, M. Chui and J. Manyika, (2011) Are you Ready for the
era of Big Data?, McKinsey Quarterly, McKinsey Global Institute,
October 2011
Begoli, E., & Horey, J. (2012). Design Principles for Effective
Knowledge Discovery from Big Data. Software Architecture
(WICSA) and European Conference on Software Architecture
(ECSA), 2012 Joint Working IEEE/IFIP Conference on (pp. 215218),
retrieved
last
April
2,
2015,
http://dx.doi.org/10.1109/WICSA-ECSA.212.32
B.Gerhardt, K. Griffin and R. Klemann, (2014) Unlocking Value in
the Fragmented World of Big Data Analytics, Cisco Internet
Business Solutions Group, June 2012, last retrieved on Nov 2,
2014, http://www.cisco.com/web/about/ac79/docs/sp/InformationInfomediaries.pdf
Bryant, R. E., Katz, R. H., & Lazowska. E. D. (2008). Big-data
computing: Creating revolutionary breakthroughs in commerce,
science, and society. In Computing Research Initiatives for the 21st
Century. Computing Research Association, 2008. Retrieved last
Aprilo 4, 2015, http://www.cra.org/ccc/docs/init/Big_Data.pdf
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun
(2007), Map-reduce for machine learning on multicore. In B.
Scholkopf, J.Platt, and T. Hoffman, editors, Advances in Neural
Information Processing Systems 19, pages 281288. MIT Press,
Cambridge, MA,2007.
Computer crime - Wikipedia, the free encyclopedia, (2011), last
access
June
11,
2015
https://en.wikipedia.org/wiki/Computer_crime, March 2015
C. Eaton, D. Deroos, T. Deutsch, G. Lapis and P.C. Zikopoulos,
(2012), Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data, Mc Graw-Hill Companies, 978-0-07179053-6, 2012
C. Ranger, R. Raghuraman, A. Penmetsa, G. R. Bradski, and C
Kozyrakis (2007). Evaluating MapReduce for Multi-core and
Multiprocessor Systems, Proc. International Symposium on HighPerformance Computer Architecture (HPCA), 2007, pp. 13-24.
C. Tankard, (2012), Big Data Security, Network Security
Newsletter, Elsevier, ISSN 1353-4858, July 2012
C. Tankard, (2012) Big Data Security, Network Security
Newsletter, Elsevier, ISSN 1353-4858, July 2012
Data Abstraction Best Practices with Cisco Data Virtualization
(2012),
last
access,
Nov
11,
2014,
www.cisco.com/.../data.../data_abstraction_with_cisco
Data Mining Concepts and Techniques Third Edition Jiawei Han
University of Illinois at UrbanaChampaign Micheline Kamber Jian
Pei Simon Fraser University, 2012- Chapter 13
http://www.cse.hcmut.edu.vn/~chauvtn/data_mining/Texts/%5B1%
5D%20Data%20Mining%20%20Concepts%20and%20Techniques%20(3rd%20Ed).pdf
Data virtualization: 6 best practices to help the business (2011),
last access April 12, 2015, www.zdnet.com/.../data-virtualization-6best-practices-to-help-the, Oct 27, 2011
Data Virtualization: Achieve Better Business Outcomes, Faster,
(2014), last access, April 23, 2015, blogs.cisco.com Data Center
Cisco Systems, Inc. May 6, 2014
Data Abstraction Layer | Data Virtualization Layer (2012),
www.compositesw.com//data-abstraction/
Data
Virtualization
Applied: Effective Solutions to Today's Business and IT Challenges
(2013), last access, April 10, 2015. www.tdwi.org/.../Case-for-DataVirtualization
Data Virtualization's Value: Myth or Reality? - DATAVERSITY,
A white paper (2015) Last access May 2, 2015
www.dataversity.net/data-virtualizations-value-myth-or-reality Feb
2015/
[27] Data Virtualization use cases and patterns (2014), last access, May
30, 2015, |
www.denodo.com/en/page/data-virtualization-usecases-and-patterns
[28] Data Virtualization: Going Beyond Traditional Data Integration to
Achieve Business Agility Paperback, by Juditj T Davis and Robert
Eve
(2011)
,
last
access
May
22,
2015,
www.datavirtualizationbook.com, ISBN: 13: 078-0-9799304-16,
Printed in US Nine Five One Press, Sept 2011
[29] Donald Miner, Adam Shook, (2012),MapReduce Design Patterns,
OReilly Media, 2012 Edition
[30] Edward Capriolo, Dean Wampler, Jason Rutherglen,
(2012),Programming Hive, OReilly Media, 2012 Edition
[31] Effective Solutions to Today's Business and IT Challenges (2012,
last
access
March
23,
2015,
www.
purl.manticoretechnology.com/MTC.../mtcURLSrv.aspx?ID...
[32] Gurdeep S Hura, A chapter on Need for dynamicity in social
networking site: An overview from data mining perspective, Data
mining in dynamic social networks and fuzzy systems, chapter I,
Dec 2013, IGI Global Publishing Company, NY
[33] G. S Hura, Chapter 29: Computer Networks: LANs, MANs,
WANs, and Wireless, Digital Process Control and Networks,
Taylor and Francis Group in June 2011.
[34] G. S Hura, Chapter 30: Internet Fundamentals and Cyber Security
Management, Digital Process Control and Networks, Taylor and
Francis Group in June 2011
[35] G. S. Hura, A Chapter on Terrestrial Wide Area Networks,
Handbook of Computer Networks, 3 Volume Set, Hossein Bidgoli,
Editor-in-Chief, John Wiley and Sons, Inc., 2007.
[36] Hayes, M. (2013). White: Elephant: The Hadoop Tool You Never
Know You Needed. Last access, Nov 11, 2014,
http://engineering.linkedin.com/hadoop/white-elephant-hadooptool-you-never-knew-you-needed
[37] http://hpccsystems.com/, last access Dec 11, 2014
[38] http://en.wikipedia.org/wiki/Big_data , last access Nov 11, 2014
[39] http://hadoop.apache.org/ , last access Nov 11, 2014
[40] http://www.humanfaceofbigdata.com/ , last access Nov 11, 2014
[41] IBM System z - Virtualization: Overview (2012), last access Nov
11, 2014 www.ibm.com/systems/z/advantages/virtualization/
[42] Intel IT Center, (2012), Planning Guide: Getting Started with
Hadoop, Steps IT Managers Can Take to Move Forward with Big
Data Analytics, June 2012
[43] Intel IT Center, (2012), Peer Research: Big Data Analytics, Intels
IT Manager Survey on How Organizations Are Using Big Data,
August 2012, last access Feb 4,2015
http://www.intel.com/content/dam/www/public/us/en/documents/re
ports/data-insights-peer-research-report.pdf
[44] Ji, C., Li, Y., Qiu, W., Awada, U., & Li, K. (2012). Big Data
Processing in Cloud Computing Environments Pervasive Systems,
Algorithms and Networks (ISPAN), 2012 12th International
Symposium on (17-23). http://dx.doi.org/10.1109/I-SPAN.2012.9
[45] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh
and A.H. Byers, (2011) Big data: The next frontier for innovation,
competition, and productivity, McKinsey Global Institute,2011, last
access
March
2,
2015
http://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20
and%20pubs/MGI/Research/Technology%20and%20Innovation/Bi
g%20Data/MGI_big_data_full_report.ashx
[46] K. Bakshi, (2012), Considerations for Big Data: Architecture and
Approach, Aerospace Conference IEEE, Big Sky Montana, March
2012
[47] M. Smith, C. Szongott, B. Henne and G. Voigt , (2012), Big Data
Privacy Issues in Public Social Media, Digital Ecosystems
Technologies (DEST), 6th IEEE International Conference on,
Campione d'Italia, June 2012
[48] Mainframe Data Virtualization - Rocket Software (2012), last
access April 12, 2015, www.rocketsoftware.com/data-virtualization
[49] Menon, A. (2012). Big data @ facebook. In Proceedings of the
2012 workshop on Management of big data systems (MBDS '12)
(pp.
31-32).
New
York,
NY,
USA:
ACM.
http://dx.doi.org/10.1145/2378356.2378364
31 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016
Gurdeep S Hura et al
Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- I
[50] Michael J Quinn (2014), Ethics for the information age, Pearson
Press, Sixth Edition, 2014, chapter 7
[51] Paolo Ciuccarelli, Giorgia Lupi, Luca Simeone (2014) "Visualizing
the Data City: Social Media as a Source of Knowledge for Urban
Planning and Management", Springer.Verlag
[52] Pokorny, J. (2011). NoSQL databases: a step to database scalability
in web environment. In Proceedings of the 13th International
Conference on Information Integration and Web-based
Applications and Services (www.ccsenet.org/nct Network and
Communication Technologies Vol. 2, No. 1; 2013
[53] P. Russom, (2011) Big Data Analytics , TDWI Best Practices
Report, TDWI Research, Fourth Quarter 2011, last access April 3,
2015, http://tdwi.org/research/2011/09/best-practices-report-q4-bigdata-analytics/asset.aspx
[54] R. Weiss and L.J. Zgorski, (2012), Obama Administration Unveils
Big Data Initiative:Announces $200 Million in new R&D
Investments, Office of Science and Technology Policy Executive
Office of the President, March 2012
[55] Rajan, S. et al. (2012). Top Ten Big Data Security and Privacy
Challenges. Retrieved from
https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_D
ata_Top_Ten_v1.pdf
[56] Reed, B. (2012). ZooKeeper Overview. Last access March 10, 2015
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Project
Description
[57] Ryaboy, D. (2012). Twitter at the Hadoop Summit. Last access Nov
11, 2014
http://engineering.twitter.com/2012/06/twitter-at-hadoopsummit.htm
[58] S. Singh and N. Singh, (2011) Big Data Analytics, 2012
International Conference on
Communication, Information &
Computing Technology Mumbai India, IEEE, October 2011
[59] Sanjay P Ahuja and Bryan Moore (2013) State of Big data analysis
in the cloud, Network and Communication technologies, Vol 2, No.
1, 62-68, 2013
[60] S. Ghemawat, H. Gobioff, and S. Leung (2003), The Google file
system,Symposium on Op-erating Systems Principles, 2003, pp
2943.
[61] S. Madden, (2012), From Databases to Big Data, IEEE Internet
Computing, June 2012, v.16, pp.4-6
[62] Shashank Tiwari, (2011) Professional NoSQL, Wrox Publications,
2011 Edition
[63] Tierney, B., Kissel, E., Swany, M., & Pouyoul, E. (2012). Efficient
data transfer protocols for big data.E-Science (e-Science), 2012
IEEE 8th International Conference on (pp. 1-9).
http://dx.doi.org/10.1109/eScience.2012.6404462
[64] Tom White, (2012), Hadoop: The Definitive Guide, OReilly
Media, 2012 Edition
[65] Warren Pettit, (2012), Introduction to Pig, Big Data University,
Online,
last
access
March
23,
2015,
http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-topig/
[66] Weil, K. (2010). Hadoop at Twitter. Last access, Nov 11,2014
http://engineering.twitter.com/2010/04/hadoop-at-twitter.html
[67] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, (2012) From Data
Mining to Knowledge Discovery in Databases", American
Association for Artificial Intelligence, AI Magazine, Fall 1996, pp.
37- 54 http://engineering.twitter.com/2012/06/twitter-at-hadoopsummit.html
[68] Unlocking
Agility
with
Data
Virtualization,
www.denodo.com/en/video/webinar/unlocking-agility-datavirtualization
[69] Gurdeep S Hura, Dynamic Reconfigurable Software architecture: A
Novel intelligent framework, Proc MTMI, Virginia Beach, VA,
Sept 11-12, 2015
[70] Gurdeep. S. Hura and M. Singhal:
Data and Computer
Communications: Networking and Internetworking, CRC Press,
April 2001.
32 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016