discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309616918
CITATIONS READS
0 34
2 authors:
All content following this page was uploaded by Pedro Roger Magalhães Vasconcelos on 02 November 2016.
Introduction
Introduction Cloud Computing
Virtualization OpenVZ
KVM MapReduce
Hadoop
OpenVZ HDFS
Experimental
Evaluation
MapReduce Conclusions
Hadoop
HDFS
Experimental Evaluation
Conclusions
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Introduction
Conclusions
Virtual Machines
I Uses the internet to access, use and process the
resources
I Multitenancy
I Massive scalability
Introduction
3 Cloud Computing
Virtualization
Characteristics: KVM
OpenVZ
I Typically is hosted on a server farm MapReduce
Hadoop
I Large amount of computers and resources HDFS
computers
I Based on a business model
I The resources can be increased or decreased based on
demand
Introduction
Nowadays, there are a lot of open source cloud 4 Cloud Computing
Virtualization
computing solutions for providing infrastructure KVM
OpenVZ
environments: MapReduce
Hadoop
HDFS
Experimental
I OpenStack Evaluation
Conclusions
I Eucalyptus
I OpenNebula
I CloudStack
Introduction
5 Cloud Computing
Virtualization
Experimental
Evaluation
Conclusions
I Great flexibility regarding hypervisor usage
I Natively supports KVM, Xen and VMware ESXi
I Drivers provided by OpenNebula community provides
support for OpenVZ OS-level virtualization
Introduction
Cloud Computing
6 Virtualization
Virtualization KVM
OpenVZ
Experimental
machine Evaluation
Conclusions
Main advantages:
I Effective use of hardware
I Virtual machine isolation
I Allows less physical hardware and less dissipation of heat
MapReduce
I Requires a processor with hardware virtualization Hadoop
HDFS
extension
Experimental
Evaluation
Full-virtualization:
Conclusions
I A layer, commonly called the hypervisor, exists between
the virtualized operating systems and the hardware
I This layer multiplexes the system resources between
competing operating system instances
I Provides total abstraction of physical hardware
I Does not require modification in the guest OS
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Virtualization
Introduction
Experimental
Evaluation
OS-level Virtualization:
Conclusions
I Allows a physical server to run multiple isolated OS
instances, known as containers
I Technology which works at OS layer
I In practice, hypervisors works at the hardware abstraction
level and OS-level virtualization at the system call layer
Introduction
Cloud Computing
Virtualization
KVM
9 OpenVZ
MapReduce
This paper evaluates the performance of a Hadoop Hadoop
HDFS
MapReduce cluster on a OpenNebula cloud under Experimental
Conclusions
and operating system-level virtualization.
Introduction
Cloud Computing
Virtualization
KVM
MapReduce OpenVZ
10 MapReduce
Hadoop
I Programming model that works on large datasets HDFS
Introduction
Cloud Computing
Virtualization
KVM
Map phase: OpenVZ
11 MapReduce
I processes the input in the form of key/value pairs and Hadoop
HDFS
generate intermediate key/value pairs Experimental
Evaluation
Conclusions
Reduce phase:
I process all intermediate values associated with the same
intermediate key generated by the Map function
Introduction
Cloud Computing
Virtualization
Hadoop KVM
OpenVZ
MapReduce
I Hadoop is a distributed programming framework and an 12 Hadoop
Experimental
I A MR job consists of multiple map and reduce tasks that Evaluation
I There are two types of nodes that control the job execution
process:
I A JobTracker
I A number of TaskTrackers
Introduction
MapReduce
I HDFS creates multiple replicas of data blocks and Hadoop
13 HDFS
distributes them among the cluster nodes Experimental
Evaluation
I All data is stored as HDFS files composed of datablocks of
Conclusions
fixed size (64MB) distributed across multiple nodes
I Two tipe of nodes: a NameNode and a number of
DataNodes
I Namenodes maintains the metadata about the files and
directory tree
I DataNodes store the data blocks themselves
Introduction
To evaluate the performance of Hadoop cluster on each Cloud Computing
MapReduce
I 2x IBM BladeCenter HS21
Hadoop
HDFS
I Intel Xeon CPUs E5-2620 of 2.00GHz (with 6 cores and HT
14 Experimental
technology in each) Evaluation
I 48GB of RAM Conclusions
I Connected to a SAN via Fibre Channel
I Running Ubuntu GNU/Linux 14.04.1 LTS amd64
I OpenNebula 4.8.0
I Each Hadoop Cluster consists of 6 VMs
I 2 vCPUs, 2GB of vRAM, 1GB of swap, 10GB of disk
I Ubuntu GNU/Linux 12.04 amd64
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation
Introduction
Cloud Computing
Virtualization
KVM
I QEMU/KVM 2.0.0 OpenVZ
Introduction
WordCount Cloud Computing
Virtualization
I Application that reads text files as input and computes the KVM
OpenVZ
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
17 Experimental
Evaluation
Conclusions
TeraSort Introduction
Cloud Computing
Virtualization
I The goal of TeraSort benchmark is to sort certain volume KVM
OpenVZ
of data as quickly as possible
MapReduce
I It is a benchmark that combines the use of HDFS layer Hadoop
HDFS
and MapReduce layer 18 Experimental
Evaluation
I TeraSort consists of 3 MR applications:
Conclusions
I TeraGen is a MR program to generate the data
I TeraSort samples the input data and uses MR to sort the
data into a total order
I TeraValidate is a MR program that validates the output is
sorted
I We used variable sizes for generation of input data
through TeraGen: 512MB, 1GB and 2GB
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
19 Experimental
Evaluation
Conclusions
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
TeraSort MapReduce
Hadoop
I OpenVZ performs better than KVM in the TeraSort and HDFS
Introduction
Cloud Computing
TestDFSIO Virtualization
KVM
OpenVZ
I TestDFSIO is a read/write test to HDFS MapReduce
Hadoop
I Useful to perform stress tests in the HDFS, to find HDFS
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
22 Experimental
Evaluation
Conclusions
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
TestDFSIO Hadoop
HDFS
I OpenVZ performance in writing tests was much lower than 23 Experimental
Evaluation
KVM
Conclusions
I Although, in the reading tests OpenVZ performs better
than KVM
Introduction
Cloud Computing
Virtualization
NNBench KVM
OpenVZ
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
25 Experimental
Evaluation
Conclusions
Introduction
Cloud Computing
Virtualization
MRBench KVM
OpenVZ
26 Experimental
I It put its focus on the MapReduce layer as its impact on Evaluation
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
27 Experimental
Evaluation
Conclusions
Introduction
Cloud Computing
Virtualization
Pi KVM
OpenVZ
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
29 Experimental
Evaluation
Conclusions
I Openvz too reachs better results than KVM on I/O reading HDFS
Experimental
tests as showed in the values of Read Throughput and Evaluation
benchmark
I However, OpenVZ performs worst than KVM in I/O writing
tests of large files in TeraGen and reached low rates of
Write Throughput and Write Average I/O in TestDFSIO
I But, in the sequential creation of inumerous small files in
NNBench test the time elapsed in KVM run was almost
twice of OpenVZ time The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Conclusions
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
I By using OpenVZ, an Hadoop cluster can achieve a high Hadoop
HDFS
performance on virtualized systems when running jobs Experimental
that use intensively CPU, network and I/O reading. Jobs Evaluation
pedro.roger@alu.ufc.br gisele@lia.ufc.br