In-Memory Computing: Powering Enterprise High-Performance Computing

Cognizant 20-20 Insights
In-Memory Computing: Powering

Enterprise High-Performance Computing
To succeed in todays modern digital era, organizations must embrace
the next wave of hyperscale computing into mainstream business by
considering in-memory computing technologies that not only bolster
their large-scale data processing capabilities but accelerate the
transformation of raw information into applied knowledge.
Executive Summary
Traditional high performance computing (HPC)/
supercomputing, analytics and mainstream realtime/batch computing are quickly converging.
Mainstream workloads are crossing over the high
performance computing arena, demanding faster
analytics/batching, resource-intensive computations and algorithms. To succeed in todays accelerating digital world, enterprises must collect and
analyze mind-boggling amounts of data, in real
time, and at ever-faster speeds that most legacy
enterprise HPC technologies and systems were
not originally designed to accommodate.
In our view, organizations need to embark on what
we call Enterprise HPC 2.0. This term refers to the
ecosystem that leverages/utilizes various latest
commodity-hardware-based hyperscale grid technologies such as in-memory computing (IMC),
compute and data grid technologies, streaming
analytics, graph analytics, etc. These are in conjunction with infrastructure advancements such
as solid state drives (SSD)-enabled technology,
GPGPU acceleration, general purpose Infiniband
cognizant 20-20 insights | november 2015
interconnect technology, etc. that enable IT organizations to fast-track enterprise computing to

better serve the ever-growing data needs of the
business.
Significant enthusiasm is building around the
IMC paradigm for large-scale data analysis. Historically, in-memory grid technologies were
primarily data-focused and used by the organizations for distributed caching patterns to
achieve low latency reads of critical transactional data. However, IMC technology is progressively emerging as a key empowering agent for
enterprises seeking to accelerate their real-time
decision-making ability and agility, by enabling
Web-scale data processing, which are capabilities
necessary for staying relevant and competitive in
todays digital era.
IMCs impact is typically felt where organizations are creating new and more innovative ways
of working. A dramatic reduction in memory
hardware costs also favors the growth of IMC
technologies. However, several factors continue
to slow the adoption at the enterprise, such as

a fragmented technology and vendor landscape,
a lack of commonly agreed upon industry
standards, scarcity of skills and still-emerging
industry best practices.
Given that the technology remains in its adolescence, the selection of the right IMC technology
is critical to any strategic digital business transformation decision. Soaring enterprise workloads
and the use cases that make use of in-memory
processing are informing key decisions around
IMC technology platform selection.
A blind jump into the IMC technology valley will
not yield durable value. It requires clear and
effective analysis and understanding of workloads
and business priorities, with a goal to increase
scalable performance and competitive benefits
for the business. This entails skilled experts to
perform a focused evaluation. Furthermore, the
multitude of new and emerging products makes is
extremely challenging to select the right product
and approach.
However daunting this decision may seem, it is of
utmost importance for organizations to use IMC
technology to help address their ever-mounting
high-performance and low-latency processing
needs across the enterprise.
This white paper summarizes the features and

benefits of using IMC for large-scale data-set
aggregations using multiple popular IMC
approaches. The paper presents results from an
internal study performed in which we created
an evaluation scenario to compare various IMC
approaches/technology architectures. The study
results establish that simple migration to an IMC
technology yields performance levels 13 times
greater for a given batch workload previously
implemented using a disk-based architecture.
This paper not only highlights the importance
of embracing the IMC agenda for enterprise
workloads but offers a formal methodology for
choosing the most appropriate IMC platform to fit
given business needs.
In-Memory Computing:
A Market Check
Effective use of IMC technology along with a clear
strategy for adoption can help enterprises reap
multiple benefits. Figure 1 lists some of the key
use cases across specific industries. While this is
just an indication, the possibilities are abundant
and are not limited to the specified list.
There have been rapid innovations in the IMC
space recently to enable faster computation
and processing speeds. These include Hadoop
In-Memory Computing (Enterprise HPC 2.0)
Telecom
Retail
Real-time in-store
analytics.
Fast real-time
loyalty offers.
Insurance
Real-time ads
placements.
Real-time sentiment
analysis.
Healthcare
Faster medical
imaging processing.
Genome analysis.
Faster claim
processing &
modeling.
Faster actuarial
science.
Fraud detection.
Manufacturing
Banking & Financial

Services
Figure 1
cognizant 20-20 insights
Real-time trading
decisions.
Faster reporting.
Inventory
management.
Predictive analytics
to avoid unplanned
downtime.
MapReduce a batch processing framework

that has added support for an in-memory file
system called Tachyon. In addition, IBM has
added Apache Spark an IMC system to its z
Systems to bring analytics to mainframes. Also,
SQL Server 2016 Community Technology Preview
2 adds IMC power. This has led to the availability
of a plethora of IMC technology-based products.
However, these products can be classified into
various segments, based on their inherent architecture and technological approaches. Moreover,
each IMC system is not applicable for every type
of enterprise workload. It is therefore imperative
to have a clear understanding of the pros and
cons of each of these system types in order to
effectively select and utilize IMC systems and
reap the business benefits.
IMC technology has evolved from its earliest
avatar (distributed caching) to todays integrated
in-memory platform that provides storage,
compute and transactional services for large-scale
data sets. These systems fall under the pure-play
IMC technologies category. The alternate IMC
segment applies to products such as Apache
Spark, which, in our view, does not represent
all-encompassing in-memory technology in the
strict sense since it does not provide a platform
for storing large-scale data. However, it provides

a processing platform for large-scale in-memory
computing and is said to provide performance
up to 100 times faster for certain applications1
and is being endorsed by IBM2 and Amazon Web
Services.3
Figure 2 illustrates the evolution of IMC
technology, some of the popular products under
each segment and the typical workloads for which
they are best used.
Given the rapid pace of innovation, the IMC
product landscape requires the latest skills and
a thorough understanding of a specific IMC
systems architectural underpinnings to validate
its fit and effective use for a given enterprise
workload. Furthermore, with the multiple options
available, enterprises can find it difficult to make
the best choice and use of an IMC technology to
satisfy their high performance computing needs.
To address these challenges, we at the
Cognizant Hyperscale Computing (HPC) Lab
have launched a structured methodology to help
enterprises realize value from the next wave of
hyperscale computing using Enterprise HPC 2.0,
which leverages in-memory computing grids.
IMC Technologys Progression

Alternate IMC
Pure Play IMC
A data fabric across

large cluster of
servers for distributed
in-memory storage
and management of
large data sets.
In-Memory Data
Grid (IMDG)
Distributed
Caches
In-Memory Data
Fabric (IMDF)
In-Memory
Database (IMDB)
A cache that
partitions its data
among all cluster
nodes.
A RDBM system
that stores data in
memory instead
of on disk.
Memcachedd
Ehcache
cache
Pivotal GemFire
ire
Distributed Key/Value
Cache for Low Latency
access.
Pivotal GemFire XD
Oracle
racle Coheren
Coherence
GigaSpaces
aSpaces XAP
Hazelcast
Infinispan
nispan (JBo
(JBoss)
For real-time big data

initiatives, handling HPC
payloads along the lines of
MapReduce, MPP with
partial SQL support.
In-Memory
Compute Grid
(IMCG)
Apache Spark
Apache Ignite
Ign
(GridGain)
SAP HANA
Oracle Exalytics
Exadata
MS SQL2014
In-memory high speed

alternative for existing
disk-based RDBMS with
full SQL support, with no
change to application.
Figure 2
A next-gen platform that

integrates IMDG with
IMCG and provides
additional features like
CEP, streaming etc.
A platform for computing

and transacting on
large-scale data sets in
parallel.
For a single integrated

platform for real-time big data
management and computing,
handling new HPC payloads
such as Streaming, CEP.
For in-memory
computation and
processing of data
stored in disks.
IMC Technology Selection Process
IMC Assessment Methodology

Establishment (Stage I)
Refinement (Stage II)
Figure 3
IMC Value Creation: Methodology

A clear process, as well as a framework, is required
to establish the business goals and successfully
determine the best-fit IMC technology. This is
vital to garner the utmost value from an IMC-led
transformation. Figure 3 depicts our process for
establishing and identifying the right IMC product
for the business.
1
Step 1: Discovery
The business use cases and the workloads to be
implemented via IMC technology play a crucial
role in the selection of the products. So first the
workload is chosen and key goals for implementation are defined.
For this white paper, we studied a retail customer
analytics workload previously processed on 1 a
modern scalable batch model using Apache Pig, a
Hadoop MapReduce-based technology, which has
a disk-based architecture. The nature of the technology used for this implementation permitted the
solution to be an offline and batch-based system.
To be better prepared to handle the disruptive
nature of the consumer behavior where latency
implies loss of business, we preemptively wanted
an alternative solution to support faster and/or
near-real-time performance and support for the
customers customers. We devised an internal
study to transform the batch workload using
multiple IMC technologies and successfully applied
appropriate IMC technology to make it faster.
Next, we defined the key use cases that the
workload requires, which becomes the input
for the IMC system evaluation matrix. For quick
development of the use case and benchmarking, we wanted the following core features to be
readily and easily supported by the product, apart

from the in-memory caching features normally
available with such products:
Bulk data loading.

SQL support for easy and fast retrieval of data
with conditions.
SQL
support for joining multiple data sets

based on criteria.
Support
for creating new tables/data sets

dynamically on the fly with data from other
tables/data sets.
Support
for stored procedures/user-defined

functions/MapReduce to handle very specific
aggregations.
In-memory distributed computation capabilities.

2
Step 2: Analysis
Second, we needed to ascertain the segment of
IMC technology that would best suit the workload
and identify a potential list of IMC systems from
the category that readily support the evaluation
criteria for specific use cases. This is carefully
chosen after deliberation with the enterprises
business and architect stakeholders.
We then performed deep-dive fit and architectural analysis on the selected list and determined
the best-fit match based on the aforementioned
evaluation criteria. From the output of this
analysis, the final list of IMC systems that closely
fit the requirements was determined. Further
proof-of-concept, proof-of-technology and benchmarking were performed on the final list of IMC
systems to validate, establish and recommend the
best-fit IMC system for a given workload.
And so, in our case, we selected an initial list of

potential IMC products from the IMDG, IMDF and
alternate IMC segments, as we needed the capabilities like that of MapReduce to handle specific
aggregations demanded by the chosen workload.
Distributed caching systems lack these features
and an IMDB system like SAP HANA that primarily
supports SQL workloads was not the right fit in
this case.
Figure 4 lists the IMC systems selected. As an
internal study, we chose a list of products rated
as top vendors and leaders in the given segment
by various leading analysts from a good mix of
commercial and open-source products.
Establishing the Short List

Pure-Play
IMC Technology
Commercial
Pivotal GemFire XD
Oracle Coherence
GigaSpaces XAP
Alternate
IMC Technology
Others
Apache Spark
Next, we performed a comprehensive product

comparison and weighted scoring and ranking
model on 20 different attributes and dimensions
based on the specific list of features that were
most essential for quick development and benchmarking of the use case, as listed in Figure 5. This
methodology helped us to quickly shortlist one
data grid system each from the commercial and
open source categories for our final evaluation.
In-memory data grids offer many other useful
features. IMC vendors have developed unique
selling propositions for their products that need
to be compared, analyzed and leveraged on a
case-by-case basis.
The final considerations were based on the
score ratings depicted in following two product
comparison scoring figures. Figure 6 shows a
comparison between three commercial data grids
and offers a comparison between three opensource data grids selected from the previous step,
as depicted in Figure 4.
Analysis Results
For the final benchmark and evaluation, we
chose Apache Spark as the first product, for
its reputation as the next best IMC technology
to replace the Hadoop MapReduce framework.
From the scoring process, from the commercial
category we selected Pivotal GemFire XD (the
community version of the GemFire is now available
as Apache Geode); the third product chosen from
the open source category was Apache Ignite.
Both of these products scored the highest as the
Open Source
Fitment Analysis
Apache Ignite
Apache Infinispan
from JBoss
Apache Hazelcast
Figure 4
Scoring the Requirements

Category
Weightage Percent
Criteria
60%
Bulk Data Loading, SQL Queries Support, Stored Procedures

Support, Dynamic Data Set Creation, Txn Support, UDF Support,
SQL Joins, Sub Queries, JDBC Driver, Caching Patterns
(Side Cache, In-line Cache), Replication, Guaranteed Delivery,
Change Data Capture, Cloud Integration
System
Environment
Setup
25%
Application Server (Tomcat/Jetty) Integration, Administration

Consoles Availability, Monitoring/Management Consoles
Availability, HA & Fault Tolerance, Deployment & Configuration
Speed
Dev
Environment
Setup
15%
Features
Programming Language Support (.Net/Java), Client SDKs/APIs

Support, Spring Data Support
Figure 5
The Comparative Matrix

Open Source IMC Product Comparison
Commercial IMC Product Comparison
60%
Apache Ignite
Apache Hazelcast
Oracle Coherence
45%
Jboss Infinispan
25%
25% 25% 25%

10% 10%
10% 10% 10%
System
Environment
setup
45%
Pivotal GemFireXD
35%
Dev
Environment
Setup
60%
55%
GigaSpaces XAP
15%
Dev
Environment
Setup
Features
25%
15%
System
Environment
setup
Features
Figure 6
potential best-fit technology to meet our needs

(i.e., the other compared products did not support
straightforward SQL joins or subqueries).
We followed this with a detailed proof-of-concept
(PoC) and proof-of-technology (PoT) approach
and compared the various aspects of the architectures of the three IMC systems selected.
We then considered their features, differences
and relevance for supporting the large-scale
data aggregation required by the use case and
validated this with a benchmarking process.
Performance Benchmarking
An identical computing cluster consisting of
three nodes was provisioned using the Cognizant
Hyperscale Application Platform, which allows

for fast setup and deployment and provides
monitoring facilities to gather the benchmark
results. The system detail of each node and the
IMC software details are shown in Figure 7.
The three systems were then configured with the
default cluster settings to determine the as-is performance of the IMC systems compared with traditional Hadoop MapReduce (MR) using Apache
Pig on Apache Hadoop Yarn 2.4.0. For all three
systems, the only setting change we performed
was to increase the IMC system processs memory
parameters (JVM) such that the total cluster heap
memory size was 250 GB for the in-memory data
cache.
Node Details
Disk Space (TB)
RAM (GB)
CPU Cores
CPU Clock Speed
128
32
2.6
Operating System - CentOS release 6.5
IMC System
Version
Apache Spark
1.3.1
Apache Ignite
1.2.0-incubating
Pivotal GemFire XD
1.4.1
Figure 7
optimal performance of each system the system

configuration parameters must be tweaked based
on data size, workload types, hardware capacities,
resource utilizations, etc. The metrics shown in
Figure 8 would therefore change based on the
system tuning and optimization techniques used.
However, we expect only the execution times to
be faster and the relative performance rating of
these systems to be equivalent when measured
against each other.
Benchmark Task
Our study was to compare a batch workload, which
performed a good mix of various computations to
create new data sets, with computed fields based
on aggregations performed in previous steps. The
original data was persisted in four different structured data sets with relational integrity between
them based on certain attributes/fields. The study
was done on 50 GB of data with 500 million records
using the traditional MR mode and compared with
the twin approaches using Alternate1IMC Apache
2
Spark and using IMDG New SQL products.
Step 3: Recommendation
Third, after creating PoCs and performance-related benchmarks, we can easily derive, validate and
recommend the best-fit IMC system for any given
workload. We can also consider where these technologies would potentially give the most durable
benefit for enterprise workloads by performing
such detailed analysis of their architectural
aspects.
Benchmark Execution
We executed each task three times for each IMC
system and reported the average of the trials.
Each system executes the benchmark tasks
separately to ensure exclusive access to the
clusters resources. During the tests, it was found
that Apache Ignite, unlike the other three systems,
did not provide out-of-the-box support for bulk
ingestion of data from csv files and was unable to
handle the ingestion beyond 1 GB volume of data
with its default cluster environment settings in a
stable manner. This prevented us from testing the
system for task executions.
For the current workload, we established key

findings for each IMC system, as shown in Figure
9 (next page). The results provide evidence and
confirm that using IMC technology accelerates
computational performance that the enterprises
can harness after due diligence and consideration. IMC technology can considerably improve
the overall processing times, from data loading
to execution. For the given use case and data
load, processing times improved 13-fold by simply
replacing the MapReduce-based batch system with
an IMC technology. We found that Apache Spark
was best suited for this particular scenario.
Results
Figure 8 depicts the overall performance numbers
of the three IMC systems under different task
scenarios.
It is important to note that although performance tuning was not considered in our study, for
Data Loading Times (50GB)
Aggregations/
Computations
50%
Input Data
Size
(4 datasets)
Data Set
Joins
30%
Input
Records
Count
50G
500 mil
Pre-IMC
Execution Time
Post-IMC
Execution
Time
13hrs
15min
Data Set
Select/Create
Data Set Filters
10%
10%
Output
Data Size
(1 denormalized
view)
300 mil
Total Performance Improvement

By Apache Spark
13x
20
15
10
5
0
Output
Records Count
150G
1hr 6sec
Time Taken (minutes)
25
Apache Pig
Pivital GemFireXD
Apache Spark
Task Execution Times (50GB)

25
Time Taken (hours)
Performance
Metrics
Data Set
Metrics
Workload
Operations Mix
Percent
Performance Comparison
20
15
10
5
0
Figure 8
Apache Pig
Pivital GemFireXD
Apache Spark
Functional Findings
Pivotal GemFire XD
Apache Spark
Apache Ignite (incubating)
Ideal for low latency transactional

and operational workloads.
Ideal for iterative data analysis, caching

intermediate data for real-time querying.
Easy to implement.
Easy to administer and monitor.
Ideal for live stream analytics and

predictive workloads also involving
machine learning.
Extensive SQL support.
Ideal for big data analytics, fraud

detection, risk analytics, customer
intelligence.
Single integrated platform with

additional capabilities such as
Compute Grid, Service Grid,
CEP Streaming.
Easy to implement.
Not ideal for analytical and predictive

workloads in stand-alone mode.
Not ideal for transactional processing

in stand-alone mode.
Nascent stage and requires

maturing from incubation status.
Lacks support for running iterative

loops based on a large number of
keys from a specific collection.
Rudimentary management and

monitoring consoles.
No out-of-the-box CSV streamer

for bulk data ingestion.
Processing times deteriorate due

to missing feature.
Lacks support for in-memory

data storage.
Large data loading times suffer

due to missing feature.
Not so easy to implement.
Figure 9
2
Step 4: Planning
Finally, with the knowledge and validation
achieved in the previous steps, we can then
successfully plan and create an effective IMC
roadmap.
onciliation, number crunching) or real-time

stream processing (e.g., real-time analytics,
continuous calculation, fraud detection, clickstream analytics).
When
opting for IMC systems from the

open-source model, one way to proceed in a
fail-proof manner is to conduct a PoC and a
PoT to validate the system and then adopt the
commercial counterpart of the same system to
ensure stable system support.
Key Recommendations
Our analysis establishes that IMC is the future
of computing and a key enabling technology for
enterprise HPC workloads that require analytical,
predictive and cognitive capabilities.
As such, we recommend that:
Although technology maturity is still uneven,

decision-makers must realize that IMC technologies and architectures are well positioned
to be adopted and utilized for their mainstream
businesses.
Application development and other IT leaders

must look at IMC technology to support a wide
range of use cases including batch, analytics,
transaction processing and event processing
rather than limiting the technology to distributed caching applications.
Organizations
would benefit by shifting to

IMC technology when they need to reengineer
established applications to increase their performance and scalability for fast transactional data access (e.g., inventory management,
financial reference data, real-time transactional data) or to offload workloads from legacy
systems performing heavyweight offline calculations (e.g., pattern analysis, trade rec-
Even though our study was limited to three

IMC systems, we recommend that enterprises
consider a broader range of products for initial
evaluation. This should be based on criteria most
critical to the business such as available expertise,
business drivers for IMC adoption, preference
for IMC appliance model, cloud support, product
support for post-implementation, mega-vendors,
small-size vendors and newer open-source
options for open integration. All of these considerations are critical to the evaluation matrix. This
should be accompanied by the deep-dive-comparison scoring model approach similar to that which
we followed on a list of parameters such as most
significant use cases, workload patterns of use
cases, short-term and long-term goals, ability to
realize ROI in next three to five years, etc.
A PoC/PoT on shortlisted products would further
reinforce the merits/demerits of any evaluated
product. This would help the enterprise to
make an informed decision to adopt a new IMC
technology that creates impact for their business.
Looking Forward
Albeit in-memory technology has been around
for many years, the latest advancements around
scale-out architecture, increased automation and
reduced memory costs have increased the technologys appeal to all enterprises. IMC innovation
continues to be unabated across the whole
spectrum of IT market segments from hardware
to application infrastructure to packaged
business applications. New in-memory technologies can support new and complex workloads
that organizations can confidently apply to
achieve competitive advantage. While we do not

advise general replacement of all workloads and
traditional approaches by IMC technology, our
study suggests that organizations can reap a
high reward with the technology if the platform
is properly vetted, selected and deployed. So,
if you ask us, what technology can accelerate
data processing 10x times and deliver real-time
business insights and information with high performance and low latency?, our answer would
be Enterprise HPC 2.0 and in-memory computing
technology.
Footnotes
1
Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion,
Shark: SQL and Rich Analytics at Scale, June 2013.
http://www.firstpost.com/business/ibms-apache-spark-push-plans-put-spark-bluemix-open-tech-centre-2296260.html.
http://searchaws.techtarget.com/news/4500248624/Amazon-Elastic-MapReduce-moves-forward-withApache-Spark.
References
Taxonomy, Definitions and Vendor Landscape for In-Memory Computing Technologies,

Gartner report.
Hype Cycle for In-Memory Computing Technology, 2014, Gartner report.

Noel Yuhanna, Market Overview: In-Memory Data Platforms, Forrester report, December 26, 2014.
About the Author

Archana Rao is a Senior Technology Architect within Cognizant HyPerscale Computing Lab, a unit of
the Cognizant Technology Labs business unit. She has 11-plus years of cross-industry IT experience
developing and providing solutions, focusing on architecture and design of enterprise high performance computing (HPC) applications using various compute and data grid technologies such as
Hadoop, Windows HPC, in-memory computing, search grids and NoSQL. Archanas focus is on business
enablement and transformation through HPC technology and architecture, where she has consulted
with many clients implementing strategic technology transformation initiatives. She holds a B.E. in
electrical engineering and electronics from University of Madras, Chennai. Archana can be reached at
Archana.Rao2@cognizant.com | Twitter: @ArchanaRA0.
Acknowledgment
Special thanks to Senthil Ramaswamy Sankarasubramanian, Director, Cognizant HyPerscale Computing
Lab, a unit of Cognizant Technology Labs, for his invaluable feedback during the course of writing this
paper.
About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business
process outsourcing services, dedicated to helping the worlds leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative
workforce that embodies the future of work. With over 100 development and delivery centers worldwide
and approximately 218,000 employees as of June 30, 2015, Cognizant is a member of the NASDAQ-100,
the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and
fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.
World Headquarters
European Headquarters
India Operations Headquarters
500 Frank W. Burr Blvd.

Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
Email: inquiry@cognizant.com
1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 20 7297 7600
Fax: +44 (0) 20 7121 0102
Email: infouk@cognizant.com
#5/535, Old Mahabalipuram Road

Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
Email: inquiryindia@cognizant.com
Copyright 2015, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.
TL Codex 1546

In-Memory Computing: Powering Enterprise High-Performance Computing

Diunggah oleh

Informasi Dokumen

Hak Cipta

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

In-Memory Computing: Powering Enterprise High-Performance Computing

Diunggah oleh

Hak Cipta:

Cognizant 20-20 Insights

In-Memory Computing: Powering

cognizant 20-20 insights | november 2015

interconnect technology, etc. that enable IT organizations to fast-track enterprise computing to

to slow the adoption at the enterprise, such as

This white paper summarizes the features and

In-Memory Computing (Enterprise HPC 2.0)

Banking & Financial

cognizant 20-20 insights

MapReduce a batch processing framework

for storing large-scale data. However, it provides

IMC Technologys Progression

A data fabric across

For real-time big data

In-memory high speed

cognizant 20-20 insights

A next-gen platform that

A platform for computing

For a single integrated

IMC Technology Selection Process

IMC Assessment Methodology

Refinement (Stage II)

IMC Value Creation: Methodology

cognizant 20-20 insights

readily and easily supported by the product, apart

Bulk data loading.

support for joining multiple data sets

for creating new tables/data sets

for stored procedures/user-defined

In-memory distributed computation capabilities.

And so, in our case, we selected an initial list of

Establishing the Short List

Next, we performed a comprehensive product

Scoring the Requirements

Bulk Data Loading, SQL Queries Support, Stored Procedures

Application Server (Tomcat/Jetty) Integration, Administration

Programming Language Support (.Net/Java), Client SDKs/APIs

cognizant 20-20 insights

The Comparative Matrix

Commercial IMC Product Comparison

25% 25% 25%

10% 10% 10%

potential best-fit technology to meet our needs

Hyperscale Application Platform, which allows

CPU Clock Speed

Operating System - CentOS release 6.5

cognizant 20-20 insights

optimal performance of each system the system

For the current workload, we established key

Data Loading Times (50GB)

Data Set Filters

Total Performance Improvement

Time Taken (minutes)

Task Execution Times (50GB)

Time Taken (hours)

cognizant 20-20 insights

Apache Ignite (incubating)

Ideal for low latency transactional

Ideal for iterative data analysis, caching

Easy to administer and monitor.

Ideal for live stream analytics and

Extensive SQL support.

Ideal for big data analytics, fraud

Single integrated platform with

Not ideal for analytical and predictive

Not ideal for transactional processing

Nascent stage and requires