Anda di halaman 1dari 7

International Journal of Communications and Engineering

Volume 05 No.5, Issue: 04 March2012

PERFORMANCE ANALYSIS OF
COMPUTING PROCESS EXECUTION IN
MULTICLOUD ENVIRONMENT
S.Srividhyalakshmi, R.Vinoth
Dhanalakshmi Srinivasan Engineering College
student.topper@gmail.com, softvin84@gmail.com
ABSTRACT:
Cloud computing is gaining acceptance in many IT organizations, as an elastic, flexible and variable-cost
way to deploy their service platforms using outsourced resources. Unlike traditional utilities where a
single provider scheme is a common practice, the ubiquitous access to cloud resources easily enables the
simultaneous use of different clouds. In this paper we explore this scenario to deploy a computing cluster
on top of a multi-cloud infrastructure, for solving Many-Task Computing (MTC) applications. In this
way, the cluster nodes can be provisioned with resources from different clouds to improve the costeffectiveness of the deployment, or to implement high-availability strategies. We prove the viability of
this kind of solutions by evaluating the scalability, performance, and cost of different configurations of a
Sun Grid Engine cluster, deployed on a multi-cloud infrastructure spanning a local data-center and three
different cloud sites: Amazon EC2 Europe, Amazon EC2 USA, and ElasticHosts. Although the testbed
deployed in this work is limited to a reduced number of computing resources (due to hardware and budget
limitations), we have complemented our analysis with a simulated infrastructure model, which includes a
larger number of resources, and runs larger problem sizes. Data obtained by simulation show that
performance and cost results can be extrapolated to large scale problems and multicloud infrastructures.
INDEX TERMS : Cloudcomputing, computing cluster, multi-cloud infrastructure, computing process.

I. INTRODUCTION
A cloud is a pool of virtualized computer
resources. A cloud can:
1) Host a variety of different workloads,
including batch-style back-end jobs and
interactive, user-facing applications.
2) Allow workloads to be deployed and scaledout quickly through the rapid provisioning of
Virtual machines or physical machines
3) Support redundant, self-recovering, highly
scalable programming models that allow
Workloads to recover from many unavoidable
hardware/software
failures
4) Monitor resource use in real time to enable

Rebalancing of allocations when Cloud


computing environments needed support grid
computing by quickly providing physical and
virtual Servers on which the grid applications
can run. Cloud computing should not be
confused with Grid computing. Grid computing
involves dividing a large task into many smaller
tasks that run in parallel on separate servers.
Grids require many computers, typically in the
thousands, and commonly use servers, desktops,
and laptops. Clouds also support no grid
environments, such as a three-tier Web
architecture running standard Or Web 2.0
applications. A cloud is more than a collection
of computer resources because a Cloud provides
a mechanism to manage those resources. History

Page 49

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

The Cloud is a metaphor for the Internet, derived


from its common depiction in network diagrams
(or more generally components which are
managed by others) as a cloud outline. The
underlying concept dates back to 1960 when
John McCarthy opined that "computation may
someday be organized as a public utility"
(indeed it shares characteristics with service
bureaus which date back to the 1960s) and the
term The Cloud was already in commercial use
around the turn of the 21st century. Cloud
computing solutions had started to appear on the
market, though most of the focus at this time
was
on
Software
as
a
service.
2007 saw increased activity, including Google,
IBM and a number of universities embarking on
a large scale cloud computing research project,
around the time the term started gaining
popularity in the mainstream press. It was a hot
topic by mid-2008 and numerous cloud
computing events had been scheduled.
Supercomputers today are used mainly
by the military, government intelligence
agencies, universities and research labs, and
large companies to tackle enormously complex
calculations for such tasks as simulating nuclear
explosions, predicting climate change, designing
airplanes, and analyzing which proteins in the
body are likely to bind with potential new drugs.
Cloud computing aims to apply that kind of
powermeasured in the tens of trillions of
computations per second to .problems like
analyzing risk in financial portfolios, delivering
personalized
medical
information,
even
powering immersive omputer games, in a way
that users can tap through the Web. It does that
by networking large groups of servers that often
use low-cost consumer PC technology, with
specialized connections to spread dataprocessing chores across them. By contrast, the
newest and most powerful desktop PCs process
only about 3 billion computations a second. Let's
say you're an executive at a large corporation.
Your particular responsibilities include making
sure that all of your employees have the right
hardware and software they need to do their
jobs. Buying computers for everyone isn't
enough -you also have to purchase software or
software licenses to give employees the tools

they require. Whenever you have a new hire,


you have to buy more software or make sure
your current software license allows another
user. It's so stressful that you find it difficult to
go to sleep on your huge pile of money every
night. installing a suite of software for each
computer, you'd only have to load one
application. That application would allow
workers to log into a Web-based service which
hosts all the programs the user would need for
his or her job. Remote machines owned by
another company would run everything from email to word processing to complex data
analysis programs. It's called cloud computing,
and it could change the entire computer
industry.
In a cloud computing system, there's a
significant workload shift. Local computers no
longer have to do all the heavy lifting when it
comes to running applications. The network of
computers that make up the cloud handles them
instead. Hardware and software demands on the
user's side decrease. The only thing the user's
computer needs to be able to run is the cloud
computing system's interface software, which
can be as simple as a Web browser, and the
cloud's network takes care of the rest. Instead of
running an e-mail program on your computer,
you log in to a Web e-mail account remotely.
The software and storage for your account
doesn't exist on your computer --it's on the
service's computer cloud.
II. RELATED WORK
A key issue in the adaptive and autonomic
computing vision is the automation of managing
large application systems and IT Infrastructure
to serve millions of users with satisfactory
performance. With the advent of cloud
computing todays enterprise computing
resources and applications are more distributed
in data center environments and more dynamic
through on-demand utility computing under
rapidly changing conditions than ever before.
Additionally, various management objectives
such as performance benefit, power saving,

Page 50

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

service ability, and management costs are


increasingly inter-related and often conflicted.
The growing complexity demands the
automation of optimizing the utility of IT
infrastrucutures and addressing various tradeoffs
among such management objectives.
Several trends are opening up the era of
Cloud Computing, which is an Internet-based
development and use of computer technology.
The ever cheaper and more powerful processors,
together with the software as a service (SaaS)
computing architecture, are transforming data
centers into pools of computing service on a
huge scale. The increasing network bandwidth
and reliable yet flexible network connections
make it even possible that users can now
subscribe high quality services from data and
software that reside solely on remote data
centers. Moving data into the cloud offers great
convenience to users since they dont have to
care about the complexities of direct hardware
management. The pioneer of Cloud Computing
vendors, Amazon Simple Storage Service (S3)
and Amazon Elastic Compute Cloud (EC2) are
both well known examples. While these internetbased online services do provide huge amounts
of storage space and customizable computing
resources, this computing platform shift,
however, is eliminating the responsibility of
local machines for data maintenance at the same
time. As a result, users are at the mercy of their
cloud service providers for the availability and
integrity of their data . On the one hand,
although the cloud infrastructures are much
more powerful and reliable than personal
computing devices, broad range of both internal
and external threats for data integrity still exist.
Cloud computing is different from
hosting services and assets at ISP data center. It
is all about computing systems are logically at
one place or virtual resources forming a Cloud
and user community accessing with intranet or
Internet. So, it means Cloud could reside inpremises or off-premises at service provider
location. There are types of Cloud computing
like

1. Public clouds
2. private Clouds
3. Inter-clouds or Hybrid Clouds,
say Mr.B.L.V. Rao- CIO and IT Leaders and
expert in cloud computing. At the foundation of
cloud computing is the broader concept of
infrastructure convergence (or Converged
Infrastructure) and shared services. This type of
data center environment allows enterprises to get
their applications up and running faster, with
easier manageability and less maintenance, and
enables IT to more rapidly adjust IT resources
(such as servers, storage and networking) to
meet fluctuating and unpredictable business
demand. Most cloud computing infrastructures
consist of services delivered through shared
data-centers and appearing as a single point of
access for consumers' computing needs.
Commercial offerings may be required to meet
service-level agreements (SLAs), but specific
terms are less often negotiated by smaller
companies.
III. CLOUD TECHNOLOGIES
Four Selected Clouds: Amazon EC2,
GoGrid,ElasticHosts, and Mosso We identify
three categories of cloud computing services
[19], [20]: Infrastructure-as-a-Service (IaaS),
that is, raw infrastructure and associated
middleware, Platform-as-a-Service (PaaS), that
is, APIs for developing applications on an
abstract platform, and Software-as-a-Service
(SaaS), that is, support for running software
services remotely. Many clouds already exist,
but not all provide virtualization, or even
computing services. The scientific community
has not yet started to adopt PaaS or SaaS
solutions, mainly to avoid porting legacy
applications and for lack of the needed scientific
computing services, respectively. Thus, in this
study we are focusing only on IaaS providers.
We also focus only on public clouds, that is,
clouds that are not restricted within an
enterprise; such clouds can be used by our target
audience, scientists. Based on our recent survey

Page 51

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

of the cloud computing providers , we have


selected for this work four IaaS clouds. The
reason for this selection is threefold. First, not
all the clouds on the market are still accepting
clients; FlexiScale puts new customers on a
waiting list for over two weeks due to system
overload. Second, not all the clouds on the
market are large enough to accommodate
requests for even 16 or 32 coallocated resources.
Third, our selection already covers a wide range
of
quantitative
and
qualitative
cloud
characteristics, as summarized in Table 1 and
our cloud survey [21], respectively. We describe
in the following Amazon EC2; the other three,
GoGrid (GG), ElasticHosts (EH), and Mosso,
are IaaS clouds with provisioning, billing, and
availability and performance guarantees similar
to Amazon EC2s. The Amazon Elastic
Computing Cloud is an IaaS cloud computing
service that opens Amazons large computing
infrastructure to its users. The service is elastic
in the sense that it enables the user to extend or
shrink its infrastructure by launching or
terminating new virtual machines (instances).
The user can use any of the instance types
currently available on offer, the characteristics
and cost of the five instance types available in
June 2009 are summarized in Table 1. An ECU
is the equivalent CPU power of a 1.0-1.2 GHz
2007 Opteron or Xeon processor. The theoretical
peak performance can be computed for different
instances from the ECU definition: a 1.1 GHz
2007 Opteron can perform 4 flops per cycle at
full pipeline, which means at peak performance
one ECU equals 4.4 gigaflops per second
(GFLOPS). To create an infrastructure from
EC2 resources, the user specifies the instance
type and the VM image; the user can specify any
VM image previously registered with Amazon,
including Amazons or the users own. Once the
VM image has been transparently deployed on a
physical machine (theresource status is running),
the instance is booted; at the end of the boot
process
the
resource
status
becomes
installed.The installed resource can be used as a
regular computing node immediately after the
booting process has finished, via an ssh
connection. A maximum of 20 instances can be
used concurrently by regular users by default; an
application can be made to increase this limit,

but the process involves an Amazon


representative. Amazon EC2 abides by a Service
Level Agreement (SLA) in which the user is
compensated if the resources are not available
for acquisition at least 99.95 percent of the time.
The security of the Amazon services has been
investigated elsewhere MTC PRESENCE IN
SCIENTIFIC COMPUTING WORKLOADS
An important assumption of this work is that the
existing scientific workloads already include
Many Task Computing users, that is, of users
that employ loosely coupled applications
comprising many tasks to achieve their scientific
goals. In this section, we verify this assumption
through a detailed investigation of workload
traces taken from real scientific computing
environments
Cloud Deployment Models:
The selection of cloud deployment
model depends on the diffierent levels of
security and control required. The Private cloud
infrastructure is operated solely for a single
organization with the purpose of securing
services and infrastructure on a private network.
This deployment model offer the greatest level
of security and control, but it requires the
operating organization to purchase and maintain
the hardware and software infrastructure, which
reduces the cost saving bene_ts of investing in a
cloud infrastructure. Rackspace, Eucalyptus, and
VMware6 are example providers of private
cloud solutions. A Community cloud
infrastructure is shared by several organizations
and supports a specific community that has
shared concerns. services.

Figure 1- Multicloud Deployment Architecture

Page 52

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

It
may
be
established
where
organizations have similar requirements and
seek to share cloud infrastructure. Example of
community cloud is Google's Gov Cloud. Public
clouds provide services and infrastructure over
the Internet to the general public or a large
industry group and is owned by an organization
selling cloud. Major public cloud providers are
Google and Amazon. These clouds offer
thegreatest level of efficiency in shared
resources, however they are also more
vulnerable than private clouds. A Hybrid cloud
infrastructure, as the name suggests, is a
composition of private, public, and/or
community clouds possibly through multiple
providers. Reasoning for hybrid cloud
infrastructure is to increase security, better
management or failover purposes. For some it
may not be feasible to place assets in a public
cloud, therefore many opt for the value of
combining diffierent cloud deployment models.
The drawbacks of a hybrid cloud however is the
requirements of managing multiple diffierent
security platforms and communication protocols.
IV. CLOUD PERFORMANCE
EVALUATION
In this section we present a performance
evaluation of cloud computing services for
scientific computing. We design a performance
evaluation method, that allows an assessment of
clouds. To this end, we divide the evaluation
procedure into two parts, the first cloudspecific,
the second infrastructure-agnostic. Cloudspecific evaluation. An attractive promise of
clouds is that there are always unused resources,
so that they can be obtained at any time without
additional waiting time. However, the load of
other large-scale systems (grids) variesover time
due to submission patterns; we want to
investigate if large clouds can indeed bypass this
problem. Thus, we test the duration of resource
acquisition and release over short and long
periods of time. For the short-time periods one
or more instances of the same instance type are
repeatedly acquired and released during a few
minutes; the resource acquisition requests follow
a Poisson process with arrival rate = 1s. For
the long periods an instance is acquired then

released every 2 min over a period of one week,


then hourly averages are aggregated from the 2minutes samples taken over a period of one
month. Infrastructure-agnostic evaluation. There
currently is no single accepted benchmark for
scientific computing at large-scale. In particular,
there is no such benchmark for the common
scientific computing scenario in which an
infrastructure is shared by several independent
jobs, despite the large performance losses that
such a scenario can incur. To address this issue,
our method both uses traditional benchmarks
comprising suites of jobs to be run in isolation
and replays workload traces taken from real
scientific computing environments.We design
two types of test workloads: SJSI/MJSIrun one
or more singleprocess jobs on a single instance
(possibly with multiple cores) single multiprocess jobs on multiple instances. The SJSI,
MJSI, and SJMI workloads all involve executing
one or more from a list of four open-source
benchmarks: LMbench, Bonnie, CacheBench ,
and the HPC Challenge Benchmark (HPCC)
[15]. The characteristics of the used benchmarks
and the mapping to the test workloads are
summarized in table1. Performance metrics.
We use the performance metrics defined by the
benchmarks used in this work. We also define
and use the HPL efficiency for a real virtual
cluster based on instance type T as the ratio
between the HPL benchmark performance of the
cluster and the performance of a real
environment formed with only one instance of
same type, expressed as a percentage.

Table 1- performance of various cloud service


providers

Page 53

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

Improving Clouds for Scientific Computing:


Tuning applications for virtualized resources:
We have shown throughout Section 3.3 that
there is no best-performing instance type in
cloudseach 128 S. Ostermann et al. instance
type has preferred instruction mixes and types of
applications for which it behaves better than the
others. Moreover, a real scientific application
may exhibit unstable behavior when run on
virtualized resources. Thus, the user is faced
with the complex task of choosing a virtualized
infrastructure and then tuning the application for
it. But is it worth tuning an application for a
cloud? To answer this question, we use from
CacheBench the hand-tuned benchmarks to test
the effect of simple, portable code optimizations
such as loop unrolling etc. We use the
experimental setup described in Section 3.2.
Figure 7 depicts the performance of the memory
hierarchy when performing the Wr hand-tuned
then
compiler-optimized
benchmark
of
CacheBench on the c1.xlarge instance types,
with 1 up to 8 benchmark processes per instance.
Up to the L1 cache size, the compiler
optimizations to the unoptimized CacheBench
benchmarks leads to less than 60% of the peak
performance achieved when the compiler
optimizes the hand-tuned benchmarks. This
indicates a big performance loss when running
applications on EC2, unless time is spent to
optimize the applications (high roll-in costs).
When the working set of the application falls
between the L1 and L2 cache sizes, the
performance of the hand-tuned benchmarks is
still better, but with a lower margin. Finally,
when the working set of the application is bigger
than the L2 cache size, the performance of the
hand-tuned benchmarks is lower than that of the
unoptimized
applications.
Given
the
performance difference between unoptimized
and hand tuned versions of the same
applications, and that tuning for a virtual
environment holds promise for stable
performance across many physical systems, we
raise as a future research problem the tuning of
applications for cloud platforms. New providers
seem to addressmost of the bottlenecks we

identified in this work by providing cloud


instances with high speed interconnections like
penguin computing [24] with their Penguin on
DemandTM(PODTM) and HPC as a ServiceTM
offers. HPC as a Service extends the cloud
model by making concentrated, nonvirtualized
high-performance computing resources available
in the cloud.
V. CONCLUSION AND FUTURE WORK
With the emergence of cloud computing as the
paradigm in which scientific computing is done
exclusively on resources leased only when
needed from big data centers, e-scientists are
faced with a new platform option. However, the
initial target of the cloud computing paradigm
does not match the characteristics of the
scientific computing workloads. Thus, in this
paper we seek to answer an important research
question: Is the performance of clouds sufficient
for scientific computing? To this end, we
perform
a
comprehensive
performance
evaluation of a large computing cloud that is
already in production. Our main finding is that
the performance and the reliability of the tested
cloud are low. Thus, this cloud is insufficient for
scientific computing at large, though it still
appeals to the scientists that need resources
immediately and temporarily. Motivated by this
finding, we have analyzed how to improve the
current clouds for scientific computing, and
identified two research directions which hold
each good potential for improving the
performance of todays clouds to the level
required by scientific computing. New provider
[24] seem to address this directions and we plan
to test their services to see if they can hold their
claims. We will extend this work with additional
analysis of the other services offered by
Amazon: Storage (S3), database (SimpleDB),
queue service (SQS), Private Cloud, and their
inter-connection. We will also extend the
performance evaluation results by running
similar experiments on other IaaS providers and
clouds also on other real large-scale platforms,
such as grids and commodity clusters. In the
long term, we intend to explore the two new
research topics that we have raised in our
assessment of needed cloud improvements.

Page 54

International Journal of Communications and Engineering


Volume 05 No.5, Issue: 04 March2012

REFERENCES
1. Amazon, Inc., Amazon Elastic
Compute
Cloud
(Amazon
EC2),http://aws.amazon.com/ec2/,
Dec. 2008.
2. GoGrid,
GoGrid
Cloud-Server
Hosting, http://www.gogrid. com, Dec.
2008.
3. A. Iosup, O.O. Sonmez, S. Anoep, and
D.H.J. Epema, The Performance of
Bags-of-Tasks
in
Large-Scale
Distributed Systems, Proc. ACM Intl
Symp. High Performance Distributed
Computing (HPDC), pp. 97-108, 2008.
4. I. Raicu, Z. Zhang, M. Wilde, I.T.
Foster, P.H. Beckman, K. Iskra, and B.
Clifford, Toward Loosely Coupled
Programming on Petascale Systems,
Proc. ACM Conf. Supercomputing (SC),
p. 22, 2008.
5. A. Iosup, C. Dumitrescu, D.H.J. Epema,
H. Li, and L. Wolters, How Are Real
Grids Used? The Analysis of Four Grid
Traces and Its Implications, Proc. IEEE
Seventh Intl Conf. Grid Computing, pp.
262-269, 2006.
6. U. Lublin, D.G. Feitelson, Workload
on Parallel Supercomputers: Modeling
Characteristics of Rigid Jobs, J.
Parallel and Distributed Computing, vol.
63, no. 11, pp. 1105-1122, 2003.
7. D.G. Feitelson, L. Rudolph, U.
Schwiegelshohn, K.C. Sevcik, and P.
Wong, Theory and Practice in Parallel
Job Scheduling, Proc. Job Scheduling
Strategies for Parallel Processing
(JSSPP), pp. 1-34,1997.
8. L. Youseff, R. Wolski, B.C. Gorda, and
C. Krintz, ,ISCN: towards a distributed
scientific computing environment Proc.
ISPA Workshops, pp. 474-486, 2006.
9. [9]
E. Deelman, G. Singh, M.
Livny, J.B. Berriman, and J. Good, The
Cost of Doing Science on the Cloud:
The
Montage
Example,
Proc.
IEEE/ACM Supercomputing (SC), p.
50, 2008.
10. [10]
M.R. Palankar, A. Iamnitchi, M.
Ripeanu, and S. Garfinkel, Amazon S3

11.

12.

13.

14.

15.

16.

17.

18.

for Science Grids: A Viable Solution?


Proc. DADC 08: ACM Intl Workshop
Data-Aware Distributed Computing, pp.
55-64, 2008.
E. Walker, Benchmarking Amazon
EC2 for HP Scientific Computing,
Login, vol. 33, no. 5, pp. 18-23, Nov.
2008.
L. Wang, J. Zhan, W. Shi, Y. Liang, L.
Yuan, In Cloud, Do Mtc or Htc Service
Providers Benefit from the Economies
of Scale? Proc. Second Workshop
Many-Task Computing on Grids and
Supercomputers (SC-MTAGS), 2009.
A. Iosup, H. Li, M. Jan, S. Anoep, C.
Dumitrescu, L. Wolters, and D. Epema,
The Grid Workloads Archive, Future
Generation Computer Systems, vol. 24,
no. 7, pp. 672-686, 2008.
D. Thain, J. Bent, A.C. Arpaci-Dusseau,
R.H. Arpaci-Dusseau, and M. Livny,
Pipeline and Batch Sharing in Grid
Workloads, Proc. IEEE 12th Intl
Symp. High Performance Distributed
Computing (HPDC), pp. 152-161, 2003.
S. Ostermann, A. Iosup, R. Prodan, T.
Fahringer, and D.H.J. Epema, On the
Characteristic of Grid Workflows,
Proc. Workshop Integrated Research in
Grid Computing (CGIW), pp. 431442,2008.
The Parallel Workloads Archive Team,
The Parallel Workloads Archive Logs,
http://www.cs.huji.ac.il/labs/parallel/
workload/logs.html, Jan. 2009.
Y.-S. Kee, H. Casanova, and A.A.
Chien, Realistic Modeling and
Svnthesis
of
Resources
for
Computational
Grids,
Proc.
ACM/IEEE Conf. Supercomputing
(SC), p. 54, 2004.
[18]
A. Iosup, O.O. Sonmez,
and D.H.J. Epema, DGSim: Comparing
Grid
Resource
Management
Architectures through Trace-Based
Simulation, Proc. 14th Intl Euro-Par
Conf. Parallel Processing, pp. 13-25,
2008.

Page 55

Anda mungkin juga menyukai