Anda di halaman 1dari 99

Cloud Computing: An Overview

Abhishek Kalapatapu and Mahasweta Sarkar


San Diego State University, San Diego, California

CONTENTS
1.1 Introduction
1.2 Cloud Computing: Past, Present, and Future
1.3 Cloud Computing Methodologies
1.4 The Cloud Architecture and Cloud Deployment Techniques
1.5 Cloud Services
1.6 Cloud Applications
1.7 Issues with Cloud Computing
1.8 Cloud Computing and Grid Computing: A Comparative Study
1.9 Conclusion
1.1 Introduction
Cloud computing, with the revolutionary promise of turning computing into a 5th utility, after water, electricity, gas,
and telephony, has the potential to transform the face of Information Technology (IT), especially the aspects of
service-rendition and service management. Though there are myriad ways of defining the phenomenon of Cloud
Computing, we put forth the one coined by NIST (National Institute of Standards and Technology). According to
them, Cloud Computing is defined as “A model for enabling convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or service provider interaction” [84]. Loosely
speaking, Cloud computing represents a new way to deploy computing technology to give users the ability to
access, work on, share, and store information using the Internet [762]. The cloud itself is a network of data centers,
each composed of many thousands of computers working together that can perform the functions of software on a
personal or business computer by providing users access to powerful applications, platforms, and services delivered
over the Internet. It is in essence a set of network enabled services that is capable of providing scalable, customized
and inexpensive computing infrastructures on demand, which could be accessed in a simple and pervasive way by a
wide range of geographically dispersed users. The Cloud also assures application based Quality-of-Service (QoS)
guarantees to its users. Thus, Cloud Computing provides the users with large pools of resources in a transparent way
along with a mechanism for managing the resources so that a user can access it ubiquitously and without incurring
unnecessary performance overhead. The ideal way to describe Cloud Computing then would be to term it as
“Everything as a Service” abbreviated as XaaS [100]. Below, we sum up the key features of Cloud Computing:
• Agility – helps in rapid and inexpensive re-provisioning of resources.
• Location Independence – resources can be accessed from anywhere and everywhere.
• Multi-Tenancy – resources are shared amongst a large pool of users.
• Reliability – dependable accessibility of resources and computation.
• Scalability – dynamic provisioning of data helps in avoiding various bottleneck scenarios.
• Maintenance – users (companies/organizations) have less work in terms of resource upgrades and management,
which in the new paradigm will be handled by service providers of Cloud Computing.
However, Cloud Computing doesn’t imply that it consists of only one cloud. The term “Cloud” symbolizes the
Internet, which in itself is a network of networks. Also, not all forms of remote computing are Cloud Computing.
On the contrary, Cloud Computing is nothing but services offered by providers who might have their own systems
in place. [18]
1.2 Cloud Computing: Past, Present, and Future
A hundred years ago, companies stopped generating their own power with steam engines and dynamos and plugged
into the newly built electric grid. The cheap power pumped out by such electric utilities did not just change how
businesses operated — it set off a chain reaction of economic and social transformations that brought the modern
world into existence. Today, a similar revolution is under way. Hooked up to the Internet’s global computing grid,
massive information-processing plants have begun pumping data and software code into our homes and businesses.
This time, it’s computing (instead of electricity) that’s turning into a utility. Nicholas Carr in his book The
Big Switch: Rewiring the world from Edison to Google [164] has finely portrayed the transition of Cloud from its
past till its present form. Cloud Computing has integrated various positive aspects of different computing
paradigms, resulting in a hybrid model that has evolved gradually over the years beginning in 1960 when John
McCarthy rightfully stated that “computation may someday be organized as a public utility” [608].
“The Cloud” is based on the idea of generating computing facility on demand. Just the way one turns on a faucet to
get water or plug into an electric socket on the wall to get electricity, similarly Cloud Computing intends to create a
paradigm where most of the features and functions of stand-alone computers today can be streamed for a user over
the Internet [764]. Further probing into the philosophy of cloud computing will reveal that the concept dates back to
the era of “Mainframes,” where resources (like memory, computational capabilities) of centralized powerful
computers owned by large organizations were used/shared by several users over a small geographical local area.
Today Cloud Computing boasts of an architecture where the powerful computers are replaced by supercomputers
and perhaps even a network of supercomputers and the users are dispersed over vast geographic areas, accessing
these computing resources via the Internet (network of networks). In the past, issues like dearth of bandwidth,
perception, loss of control, trust and feasibility proved to be major road blocks in realizing the Cloud concept of
service rendition. Today most of these challenges have been overcome, or countermeasures are in place to resolve
the challenges. Faster bandwidth, virtualization and more particular skills around cloud type technologies help in
realizing the Cloud Computing paradigm.
The concept of Cloud Computing — as nascent as it might appear — itself, has undergone significant evolution.
The first generation of Cloud Computing which evolved along with the “Internet Era” was mainly intended for
“ebusiness services.” The current generation of Cloud services has progressed several steps to now include “IT as a
Service” which can be thought of as “consumerized internet services.” Using standardized, highly virtualized
infrastructure and applications, IT can drive higher degrees of automation and consolidation, thus reducing the cost
of maintaining existing solutions and delivering new ones. In addition, externally supplied infrastructure, software,
and platform services are delivering capacity augmentation and a means of using operating expense funding instead
of a heavy duty capital. [388]

FIGURE 1.1
History of Computing.
Today, “the network is the (very big, very powerful) computer.” The capabilities of the Cloud as a centralized
resource can match to industrial scales. This implies that processing power involving thousands of machines
embedded in a network has surpassed even the capabilities of the very high-performance supercomputers. By
making this technology available through the network on an on-demand and as-needed basis, the Cloud holds the
promise of giving individuals, businesses, organizations and governments around the world access to extraordinary
computing power from any location and any device. It is a fact that data and information is growing at a break-neck
pace. HP predicted that within three more years (by 2014) there will be more information produced and consumed
than in the history of mankind. Thus, the next version of Cloud will enable access to information through services
that are set in the context of the consumer experience. This is significantly different — it means that data will be
separated from the applications — a paradigm where processes can be broken into smaller pieces and automated
through a collection of services, woven together with access to massive amounts of data. It will eliminate the need
for large scale, complex applications that are built around monolithic processes. Changes can be accomplished by
refactoring service models, and integration achieved by subscribing to new data feeds. This will create new
connections, new capabilities, and new innovations surpassing those that exist today. [388]. Thus it is envisioned
that by 2020 most people will access software applications online and share and access information through the use
of remote server networks, rather than depending primarily on tools and information housed on their individual,
personal computers. It is predicted that cloud computing will become more dominant than the desktop in the next
decade [84]. In other words, most users will perform the majority of their computing and communicating activities
through connections to servers that will be operated and owned by service-providing organizations. Most
technologists and industrialists believe that cloud computing will continue to expand and come to dominate
information transactions because it offers many advantages, allowing users to have easy, instant, and individualized
access to tools and information they need wherever they are, locatable from any networked device. To validate this
claim, the PEW INTERNET & AMERICAN LIFE PROJECT carried out a survey with a highly diverse population
set. 71% of the survey takers believed that by 2020, most people won’t do their work with software running on a
general-purpose PC. Instead, they will work in Internet-based applications such as Google Docs, and in
applications run from smart phones [630]. However, quality of service guarantees, interoperability between existing
working platforms and security concerns are some of the issues that still continue to plague the growing popularity
of Cloud computing.
FIGURE 1.2
Clouds Past, Present, and Future.
1.3 Cloud Computing Methodologies
Cloud Computing is based on two main techniques — (i) Service Oriented Architecture and (ii) Virtualization.
(i) Service Oriented Architecture (SOA): Since the paradigm of Cloud computing perceives of all tasks
accomplished as a “Service” rendered to users, it is said to follow the Service Oriented Architecture. This
architecture comprises a flexible set of design principles used during the phases of system development and
integration. The deployment of a SOA-based architecture will provide a loosely-integrated suite of services that can
be used within multiple business domains. The enabling technologies in SOA allow services to be discovered,
composed, and executed. For instance, when an end-user wishes to accomplish a certain task, a service can be
employed to discover the required resources for the task. This will be followed by a composition service which will
plan the road-map to provide the desired functionality and quality of service to the end-user. [579,761]
(ii) Virtualization: The concept of virtualization is to relieve the user from the burden of resource purchases and
installations. The Cloud brings the resources to the users. Virtualization may refer to Hardware(execution of
software in an environment separated from the underlying hardware resources), Memory(giving an application
program the impression that it has contiguous working memory, isolating it from the underlying physical memory
implementation), Storage (the process of completely abstracting logical storage from physical
storage), Software (hosting of multiple virtualized environments within a single Operating System (OS)
instance), Data (the presentation of data as an abstract layer, independent of underlying database systems, structures
and storage) and Network (creation of a virtualized network addressing space within or across network subnets)
[486]. Virtualization has become an indispensable ingredient for almost every Cloud; the most obvious reasons
being the ease of abstraction and encapsulation. Amongst the other important reasons for which the Clouds tend to
adopt virtualization are:
(i) Server and application consolidation – as multiple applications can be run on the same server resources can be
utilized more efficiently.
(ii) Configurability – as the resource requirements for various applications could differ significantly, (some require
large storage, some require higher computation capability) virtualization is the only solution for customized
configuration and aggregation of resources which are not achievable at the hardware level.
(iii) Increased application availability – virtualization allows quick recovery from unplanned outages as virtual
environments can be backed up and migrated with no interruption in services.
(iv) Improved responsiveness – resource provisioning, monitoring and maintenance can be automated, and common
resources can be cached and reused. [784]
In addition, these benefits of virtualization tend to facilitate the Cloud to meet stringent SLA (Service Level
Agreement) requirements in a business setting which otherwise cannot be easily achieved in a cost-effective
manner. Without virtualization, systems have to be over provisioned to handle peak load and hence waste valuable
resources during idle periods.
1.4 The Cloud Architecture and Cloud Deployment Techniques
Geared with the knowledge of SOA and Virtualization, we now take a look at the overall Cloud architecture. From
the end user’s perspective, Figure 4.1 depicts a basic Cloud Computing architecture involving multiple components.
Cloud architecture closely resembles the UNIX philosophy of involving multiple components which work together
over universal interfaces [762]. Recall that the Cloud computing paradigm represents a Service oriented mechanism
of managing and dispatching resources. Before we delve into studying the actual architecture of Cloud computing it
will be beneficial to examine the possible characteristics that will be required to realize such a system. It is common
knowledge that the architectural requirements of the Cloud will vary depending on the application for which the
Cloud is being used. For instance, social networking applications like Facebook and Orkut will have a very
different set of requirements, constraints and deliverables from the architecture in comparison to, say, a remote
patient health monitoring application. However, some common architectural characteristics can still be identified.
For instance, (i) the system should be scalable with the ability to include thousands to perhaps tens of thousands of
members. (ii) It should be able to interoperate between various service requirements and effectively share resources
amongst its users. (iii) The system should be easy to maintain and upgrade, maintaining user transparency during
these processes. (iv) As outlined earlier, managing resources like servers and storage devices virtually, thereby
creating a virtual organization, is absolutely crucial.
FIGURE 1.3
Basic Cloud Computing Architecture [86].
To mitigate the problem of designing customized Cloud architecture for each and every application and also to
streamline the architecture design process of the Cloud, scientists resorted to the age old concept of a generalized
“Layered approach” [163]. As with the classical 7-layer OSI model of data networks [759], the layered model in
Cloud computing serves the same general purpose. Depending on the service requirement of an application, these
layers are shuffled to create a customized architecture. The layered architecture adheres to the principle of Service
Oriented Architecture (SOA) which forms the core of the Cloud computing paradigm. The components of a basic
layered architecture are shown in the Figure 1.4, below [630], namely the Client, its required Services, the
Applications that the Client runs, the Platform on which these applications run, the Storage requirement and finally
the Infrastructure required to support the Client’s computing needs.

FIGURE 1.4
Layered Architecture for a Customized Cloud Service [760].
We devote a few sentences on each component below.
Clients: the Clients of a Cloud comprise computer hardware and/or computer software that relies on the
computational capability of the Cloud for application or service delivery. Examples include computers, mobile
devices, operating systems, and browsers.
Services: this refers to the different service models made available by the Cloud like SaaS (Software-as-a-Service),
IaaS (Infrastructure-as-a-Service) and PaaS (Platform-as-a-Service). This layer acts as a middleman between the
user and the vast amount of resources accessible to the user. A resource includes “products, services and solutions
that are delivered and consumed in real time over the Internet” [630]. Examples include location services and
Search Engines among others.
Application: the Cloud enables resource management and user activity tracking from central locations rather than at
each customer’s site, enabling customers to access applications remotely via the Internet. Cloud application services
deliver software as a service over the Internet, eliminating the need to install and run the application on the
customer’s own computer, thereby simplifying (almost eliminating) maintenance and support at the customer’s end.
Examples include Web Application and Peer-to-Peer computing.
Platform: it facilitates deployment of applications without the cost and complexity of buying and managing the
underlying hardware and software layers. This layer delivers a computing platform and/or solution stack as a
service, often consuming Cloud infrastructure and sustaining Cloud applications. Examples include Web
Application Frameworks like Ruby on Rails and Web Hosting.
Storage: the storage layer consists of computer hardware and/or computer software products that are specifically
designed for the storage of Cloud services. Computer hardware comprises huge data centers that are used for
resource sharing. Examples include Amazon SImpleDB, and Nirvanix SDN (Storage Delivery Network).
Infrastructure: this layer delivers computer infrastructure, typically a platform virtualization environment as a
service. It includes management of virtual resources too. Rather than purchasing servers, software, data center space
or network equipment, clients instead buy those resources as a fully outsourced service. Examples include Network
Attached Storage and Database services.
The main advantage of such a layered architecture is the ease with which they can be modified to suit a particular
service. The way in which these components interact leads to various architectural styles. There are two basic
architectural styles on which most of the services are based. They are:
• Outside-In: This architectural style is inherently a top-down design emphasizing the functionality of the
components. Implementing this style leads to a better architectural layering with various functionalities. It infuses
more feasibility enabling better integration and interoperation of components.
• Inside-Out: This architectural style, on the other hand, is inherently a bottom-up design which takes an
infrastructural point of view of the components. This style is more application oriented than service oriented. [631]
It is to be noted that incorporating new functionalities in a pre-existing architectural scheme is done in an
incremental fashion. The ease of transforming an existing architecture into another, depends on the complexity of
the architecture, the functionalities of the components and their integration. The vast landscape of the services and
their growing complexity has lead to implementation of innovative architectural styles and several hybrid
architectures. [221,631]
Cloud Deployment Techniques
Cloud deployment is the manner in which a Cloud is designed to provide a particular service. Obviously these
deployment methods will vary according to the way in which a Cloud provides service to the users. Thus, their
deployment techniques are user specific [494]. For instance, a deployment technique might depend on the level of
security commissioned for a particular user.
Figure 1.5 depicts the various Cloud deployment techniques which predominantly comprise (i) the Public
deployment, (ii) the Private deployment or (iii) the Hybrid deployment. We discuss each of these deployment
strategies briefly below.
(i) Public Cloud: it is the traditional mainstream Cloud deployment technique whereby resources are dynamically
provisioned by third party providers who share them with the users and bill the users on a fine grained utility
computing basis. It offers easy resource management, scalability and flexibility with an economical pay-as-you-go
model which is extremely viable especially for small businesses. On the negative side, the user lacks visibility and
control over the computing infrastructure. Since computing infrastructures are shared between various
organizations, these Clouds face various security and compliance issues. Amazon’s Web Services and Google’s
AppEngine are few examples of Public Clouds, also known as external Clouds. [630]

FIGURE 1.5
Cloud Deployment Techniques [325].
(ii) Private Cloud: in this Cloud deployment technique, the computing infrastructure is solely dedicated to a
particular organization or business. These Clouds are more secure because they belong exclusively to a particular
organization [494]. These Clouds are more expensive because one needs in-house expertise for their maintenance.
Private Clouds are further classified based on their location as:
(a) On-Premise Clouds – these refer to Clouds that are for a particular organization hosted by the organization
itself. Examples of such Clouds would include Clouds related to military services which have a considerable
amount of confidential data.
(b) Externally hosted Clouds – these refer to Clouds that are also dedicated for a particular organization but are
hosted by a third party specializing in Cloud infrastructure. These are cheaper than On-premise Clouds. Examples of
such Clouds would be small businesses using services from VMware, Amazon etc. Such Clouds are also known as
Internal Clouds. [631]
(iii) Hybrid Cloud: this deployment technique integrates the positive attributes of both the Public Cloud and
Private Cloud paradigm. For instance, in a Hybrid Cloud deployment, critical services with stringent security
requirements may be hosted on Private Clouds while less critical services can be hosted on the Public Clouds. The
criticality, flexibility and scalability requirement of a service governs its classification into either the Public or
Private Cloud domain. Each Cloud in the Hybrid domain retains its unique entity. However, they function
synchronously to gracefully accommodate any sudden rise in computing requirements. Hybrid Cloud deployment is
definitely the current trend amongst the major leading Cloud providers currently [630].
FIGURE 1.6
Types of Cloud Services [569].
(iv) Community Cloud: this deployment technique is similar to Public Clouds with the only difference being the
distribution of the sharing rights on the computing resources. In a Community Cloud, the computing resources are
shared amongst organizations of the same community. So this Cloud covers a particular group of organizations,
which have the same functionalities. For example, all Government organizations within the state of California may
share the computing infrastructure on the Cloud to manage data related to citizens residing in California [608].
1.5 Cloud Services
The Cloud can provide us with a myriad service models and services. They include SaaS (Software as a Service),
PaaS (Platform as a Service), HaaS (Hardware as a Service), DaaS ([Development, Database, Desktop] as a
Service), IaaS(Infrastructure as a Service), BaaS (Business as a Service), FaaS (Framework as a Service), OaaS
(Organization as a Service) amongst others [100]. However, Cloud Computing products can be broadly classified
into three main Services (SaaS, PaaS and IaaS) which are showcased in Figure 1.6along with their relationship to a
user (Enterprise). The following section is an attempt to familiarize the reader with the several different Cloud
Services that are currently rendered:
Infrastructure-as-a-Service (IaaS): This service provisions for hardware related services like storage, and virtual
servers on a pay-as-you-go basis. The main advantage of IaaS is the usage of latest technology at all times with
regard to computer infrastructure which allows users to achieve faster service. Organizations can use IaaS to quickly
build new versions of applications or environments without incurring unnecessary purchase and configuration delay.
On-demand scaling via resource virtualization and use-based billing makes IaaS competent enough for any kind of
businesses. The major companies already providing IaaS are Amazon [57,58], Rackspace, GoGrid, AT&T and IBM.
[367]
Platform-as-a-Service (PaaS): PaaS offerings may include facilities for application design, application
development, testing, deployment and hosting as well as application services such as team collaboration, web
service integration and marshalling, database integration, security, scalability, storage, persistence, state
management, application versioning, application instrumentation and developer community facilitation. These
services may be provisioned as an integrated solution over the web, providing an existent managed higher-level
software infrastructure for building particular classes of applications and services. The platform includes the use of
underlying computing resources, typically billed similar to IaaS products, although the infrastructure is abstracted
away below the platform. Major companies providing PaaS are Google’s AppEngine [92], Microsoft Azure,
and www.Force.com etc. [608,630,631]
Software-as-a-Service (SaaS): Provides specific already created applications as fully or partially remote services.
Sometimes it is in the form of web-based applications and other times it consists of standard non-remote
applications with Internet-based storage or other network interactions. It allows a user to use the provider’s
application using a thin client interface. Users can access a software application hosted by the Cloud vendor on pay-
per-use basis [527]. It is a multi-tenant platform. The pioneer in this field has been www.Salesforce.com offering
online Customer Relationship Management (CRM) space. Other examples are online email providers like Google’s
Gmail and Microsoft’s hotmail, Google docs and Microsoft’s online version of office called BPOS (Business
Productivity Online Standard Suite). [16,608,630,631]
Other than the above services David Linthicum has described a more granular classification of services which
includes:
Storage-as-a-Service (SaaS): Storage as a Service is a business model that helps a smaller company or individual
in renting storage spaces from a large company. Storage as a Service is generally seen as a good alternative for
a small or mid-sized business that lacks the capital budget and/or technical personnel to implement and maintain
their own storage infrastructure. SaaS is also being promoted as a way for all businesses to mitigate risks in disaster
recovery, provide long-term retention for records and enhance both business continuity and availability. Examples
include Nirvanix, Cleversafe’s dsNET etc.
Database-as-a-Service (DbaaS): It constitutes delivery of database software and related physical database storage
as a service. A managed service, offered on a pay-per-usage basis that provides on-demand access to a database for
the storage of application data is what constitutes DbaaS. Examples include Amazon, www.Force.com etc.
Information-as-a-Service (IfaaS): Information as a service accepts the idea that data resides within many systems
and repositories. Its main function is to standardize the access of data by applying a standard set of transformations
to the various sources of data thus enabling service requestors to access the data regardless of vendor or system.
Examples include IBM, Microsoft etc. Process-as-a-Service (PraaS): Refers to a remote resource that’s able to bind
many resources together, either hosted within the same Cloud computing resource or remote, to create business
processes. These processes are typically easier to change than applications, and thus provide agility to those who
leverage these process engines that are delivered on-demand. Process-as-a-service providers include Appian
Anywhere, Akemma, and Intensil.
Integration-as-a-Service (InaaS): Integration-as-a-Service includes most of the features and functions found
within traditional Enterprise Application Integration technology, but delivered as a service. Integration-as-a-Service
takes the functionality of system integration and puts it into the Cloud, providing for data transport between the
enterprise and SaaS applications or third parties. Examples include Amazon SQS, OpSource Connect, Boomi, and
Mule OnDemand.
Security-as-a-Service (SeaaS): Delivers core security services remotely over the Internet like anti-virus, log
management etc. While typically the security services provided are rudimentary, more sophisticated services are
becoming available such as identity management. Security-as-a-Service providers include Cisco, McAfee, Panda
Software, Symantec, Trend Micro and VeriSign.
Management/Governance-as-a-Service (MaaS): Provides the ability to manage one or more Cloud services,
typically simple things such as topology, resource utilization, virtualization, and uptime management. Governance
systems are becoming available as well, such the ability to enforce defined policies on data and services.
Management/governance-as-a-service providers include RightScale, rPath, Xen, and Elastra.
Testing-as-a-Service (TaaS): These systems have the ability to test other Cloud applications, Web sites, and
internal enterprise systems, and do not require a hardware or software footprint within the enterprise. They also have
the ability to test services that are remotely hosted. SOASTA is one of the many Testing-as-a-Service providers
[608,631].
1.6 Cloud Applications
Cloud Applications are not only developed with a business perspective but also take into account activities oriented
towards socializing and sharing information. This information may be as basic as checking news headlines or more
sensitive in nature, such as searches for health or medical information. Thus Cloud Computing is often a better
alternative than local servers handling such applications.
Virtualization is the main basis of Cloud Computing, thus it is fully enabled with virtual appliances. A virtual
appliance is an application which has all its components bundled and streamlined with the operating system [762].
The main advantage of a Cloud Computing application is that the provider can run various instances of an
application with minimum labor and expense. A Cloud service provider needs to anticipate a few issues before
launching its application in the Cloud computing environment. Keeping in mind the issues an application must be
designed to scale easily, tolerate failures and include management tools [388]. We discuss these issues in the
following section.
• Scale: Application in a Cloud environment needs to have maximum scala-bilities and to ensure this, one should
start building the application in the simplest manners avoiding complex design patterns and enhancements. The next
step would be to split the functions of an application and integrate them loosely. The most important step in
ensuring on demand scalability of an application is sharding, which can be described as splitting up the system into
many smaller clusters instead of scaling the single system up so as to serve all users.
• Failures: Any application due to one or the other reason is bound to face failure at some point of time. To tolerate
failures, an application must operate in an asynchronous fashion and one should spread the load across multiple
clusters so that the impact of failure gets distributed. The best way to tolerate failures would be testing the
application for all kinds of failure scenarios, and also users should be aware of the real cost incurred if an
application faces any kind of failure.
• Management Tools: Having a proper management tool helps in automating the application configuration and
updates, thereby reducing management overhead. The management system helps in not just minimizing economic
expenditures but also leading to optimized usage of resources. The most difficult and expensive problem that can be
handled with a proper management tool is variability [795]. This helps in delivering an application that can boast of
consistent performance.

FIGURE 1.7
Cloud Applications [569].
Applications in Cloud Computing have been designed keeping in mind the various types of services offered by it.
Cloud Computing has impacted people’s perception of applications over the Internet. The applications of Cloud
computing are streamlined with each and every field of sciences. In Figure 1.7 we have highlighted a few areas in
which Cloud Computing has shown great potential for developing various applications which have fostered the
growth of businesses and enterprises.
These applications can be categorized into broad areas like datacenters and storage, security, military applications,
the health industry, platforms and software usages, applications with high performance computing resources, and
last but not the least, the growing need for virtual environments. [221,494]
1.7 Issues with Cloud Computing
So far we have focused on the promises that the Cloud holds for users and applications alike. However, there are
issues that need to be resolved before this technology can be exploited to its maximum potential [238]. In the
section below we enumerate and discuss the burning issues which might deter the phenomenal growth of Cloud
Computing technology.
(i) Security Issues: Security is as much of a concern in Cloud computing as it would be in any other computing
paradigms. Cloud Computing can be vaguely defined as outsourcing of services, which in turn causes users to lose
significant control over their data. There is always a risk of seizure associated with the public Clouds. For instance,
an organization sharing data in an environment where other organizations are doing the same is always under the
threat of compromising the security and privacy of its data if any other organization in the shared scheme happens to
violate the security protocols [390]. Moreover, in a virtualized environment one needs to consider the security of not
just the physical host but also the virtual machine. This is because if the security of a physical host is compromised,
then automatically all virtual machines face security threat and vice versa. Since the majority of services in Cloud
computing are provided using web browsers, there are many security issues related with it as well [392]. Flooding is
also a major issue where an attacker sends huge amounts of illegitimate service requests which cause the system to
run slow thereby hampering the performance of the overall system. Cloud networks stand the potential threat of
both Indirect Denial of Service attacks and Distributed Denial of Service attacks.
(ii) Legal and Compliance Issues: Clouds are sometimes bounded by geographical boundaries. Provision of
various services is not location dependent but because of this flexibility Clouds face Legal & Compliance issues.
These issues are related mainly to the vendors though they still affect the end users. These issues are broadly
classified as functional (services in the Clouds that have legal implications for both service providers and end users),
jurisdictional (where governments administer laws to follow) and contractual (terms and conditions). Issues include
(a) Physical Location of the data referring to where the data is physically located and if a dispute occurs, which
jurisdiction will help in resolving it (b) Responsibilities of the datawhere if a vendor is hit by a disaster will the
businesses using its services be covered under insurance (c) Intellectual Property Rights which deals with the way
trade secrets are maintained [732].
(iii) Performance and QoS Related Issues: For any computing paradigm performance is of utmost importance.
Quality of Service (QoS) varies as the user requirements vary. One of the critical QoS related issues is the optimized
way in which commercial success can be achieved using Cloud computing. If a provider is not able to deliver the
promised QoS it may tarnish its reputation [238]. Since Software-as-a-Service (SaaS) deals with provision of
softwares on virtualized resources, one faces the issue of Memory and Licensing constraints which directly hamper
the performance of a system.
(iv) Data Management Issues: The main purpose of Cloud Computing is to put the entire data on the Cloud with
minimum infrastructure requirements for the end users. The main issues related to data management are scalability
of data, storage of data, data migration from one Cloud to another and also different architectures for resource
access [127]. Since data in Cloud computing even includes high confidential information it is of utmost importance
to manage these data effectively. There has been an instance where an online storage service called The Linkup got
shut down after losing access to as much as 45% of its customers. While transferring data, i.e., data migration, in a
Cloud has to be done very carefully as it could lead to bottlenecks at each and every layer of the network model, as
huge chunks of data are associated with the Cloud [762].
(v) Interoperability Issues: The Cloud computing interoperability idea was conceived by Reuven Cohen. Reuven
Cohen is founder and Chief Technologist for Toronto based Enomaly Inc. Vint Cerf, who is a co-designer of the
Internet’s TCP/IP standards and widely considered a father of the Internet, spoke about the need for data portability
standards for Cloud computing. Companies such as Microsoft, Amazon, IBM, and Google all own their independent
Clouds but they lack interoperability amongst them. Each service provider has its own architecture, which caters to
a specific application. To make such uniquely distinct Clouds interoperate is a non-trivial problem. The lack of
standardized protocols in the domain of Cloud computing further makes interoperability a challenge. The key issues
hampering implementation of interoperable Clouds are the large scale access and computational capabilities of the
Clouds, resource contention and the dynamic nature of the Cloud. However, interoperability amongst the various
Clouds would only add to the value of this technology, making it more widely accessible, fault tolerant, and thereby
robust.
1.8 Cloud Computing and Grid Computing: A Comparative Study
Cloud computing should not be confused with Grid Computing, Utility Computing and Autonomic Computing.
Grid computing consists of clusters of loosely coupled, networked computers acting in concert to perform very large
tasks. Utility computing packages computing resources as a metered service. Autonomic computing stresses self
management of resources. Grids require many computers, typically in the thousands, and commonly use servers,
desktops, and laptops. Clouds also support non-grid environments [284]. The differences between Utility computing
and Cloud computing are crucial. Utility computing relates to the business model in which application,
infrastructure, resources, hardware and/or software are delivered. In contrast, Cloud computing relates to the way
we design, build, deploy and run applications that operate in a virtualized, shared-resource environment
accompanied by the coveted ability to dynamically grow, shrink and self-heal. Thus the major aspects which
separate Cloud Computing from other computing paradigms are user centric interfaces, on-demand services, QoS
guarantees, autonomous system organization, scalability and flexibility of services [369].
In the mid 1990s, the term “Grid” was coined to describe technologies that would allow consumers to obtain
computing power on demand. Ian Foster and others posited that by standardizing the protocols used to request
computing power, we could spur the creation of a Computing Grid, analogous in form and utility to the electric
power grid. Figure 1.8 showcases the overlap and distinctions amongst several cutting edge computing technologies
[369]. Web 2.0 covers almost the whole spectrum of service-oriented applications, whereas Cloud Computing lies
mostly on the large-scale computing domain. Supercomputing and Cluster computing have been more focused on
traditional non-service applications. Grid Computing overlaps with all these fields and is generally considered to be
catering to smaller scaled computing requirements than the Supercomputers and the Clouds [284].
FIGURE 1.8
Cloud and Grid Computing [283].
Half a decade ago, Ian Foster gave a three point checklist to help define what is, and what is not a Grid. We present
his checklist below:
1.) Coordinates resources that are not subject to centralized control,
2.) Uses standard, open, general-purpose protocols and interfaces, and
3.) Delivers non-trivial qualities of service.
Although the third point holds true for Cloud Computing, neither point one nor two is applicable for today’s Clouds.
The vision for Clouds and Grids are similar but the implementation details and technologies used differ
considerably. In the following section, we discuss the differences in these two technologies based on their
architecture, business model, and resource management techniques.
Architecture
Grids provide protocols and services at five different layers as identified in the Grid protocol architecture shown
in Figure 1.9.
FIGURE 1.9
Cloud Architecture vs. Grid Architecture [283].
At the fabric layer, Grids provide access to different resource types such as computation, storage and network
resource, code repository, etc. The connectivity layer defines core communication and authentication protocols for
easy and secure network transactions. The resource layer defines protocols for the publication, discovery,
negotiation, monitoring, accounting and payment of sharing operations on individual resources. The collective
layer captures interactions across collections of resources and directory services. The application layer comprises
whatever user applications are built on top of the above protocols and APIs environments. [284,370]
In contrast, the Cloud architecture presents a four-layer architecture composed of (i)Fabric (ii)Unified resource (iii)
Platform and (iv)Application layers. (It is to be noted that there are other paradigms of cloud architecture as well.
We however, choose to focus on the 4-layer architecture for our discussion).
The Fabric layer contains the raw hardware level resources, such as computational resources, storage resources, and
network resources. The Unified Resource Layer contains resources that have been abstracted/encapsulated (usually
by virtualization) so that they can be exposed to upper layer and end users as integrated resources, for instance, a
virtual computer/cluster, a logical file system, a database system, etc. The Platform layer adds a collection of
specialized tools, middleware and services on top of the unified resources to provide a development and/or
deployment platform. Finally, the Application layer contains the applications that run in the Clouds. Clouds in
general provide services at three different levels, namely IaaS, PaaS, and SaaS, although some providers can choose
to expose services at more than one level. [284]
Resource Management
With regard to resource management we compare Clouds and Grids using the following models:
(i) Computational Model: Most Grids use a batch-scheduled computational model, in which a local resource
manager (LRM), such as PBS, Condor and SGE manages the computational resources for a Grid site. Users submit
batch jobs via GRAM that occupies a certain amount of resources for a stipulated period of time [787]. In contrast
to the Grid’s paradigm of allocating dedicated resources to specific “jobs” for a stipulated amount of time with a
scheduling algorithm governing the process, the Cloud computing model takes a resource sharing approach by all
users at all times [371]. The goal here (in Cloud computing) is to achieve low queuing latency for resource access.
Moreover, this design should allow latency sensitive applications to operate natively on Clouds, although ensuring
the required level of QoS is likely to be one of the major challenges for Cloud Computing, especially as the Cloud
scales to incorporate more users. [284]
(ii) Data model: The importance of data has caught the attention of the Grid community for the past decade; Data
Grids have been specifically designed to tackle data intensive applications in Grid environments, with the concept
of virtual data playing a crucial role. Virtual data captures the relationship between data, programs, and
computations and prescribes various abstractions that a data grid can provide. For instance, (a) location transparency
where data can be requested without regard to data location, a distributed metadata catalog is engaged to keep track
of the locations of each piece of data (along with its replicas) across grid sites, and privacy and access control are
enforced; (b) materialization transparency where data can be either recomputed on the fly or transferred upon
request, depending on the availability of the data and the cost to re-compute [378]. There is also representation
transparency where data can be consumed and produced no matter what their actual physical formats and storages
are, data are mapped into some abstract structural representation and manipulated in that way. In contrast, the Cloud
computing paradigm is centralized around Data.Cloud Computing and Client Computing will coexist and evolve
hand in hand, while data management (mapping, partitioning, querying, movement, caching, replication, etc.) will
become more and more important for both Cloud Computing and Client Computing with the increase of data-
intensive applications. [163]
(iii) Virtualization: Grids do not rely on virtualization as much as Clouds do, but that might be more due to policy
issues and having each individual organization maintain full control of their resources (i.e., not by virtualization).
However, there are efforts in Grids to use virtualization as well, such as Nimbus, which provides the same
abstraction and dynamic deployment capabilities. A virtual workspace is an execution environment that can be
deployed dynamically and securely in the Grid. In addition, Nimbus can also provision a virtual cluster for Grid
applications (e.g., a batch scheduler, or a workflow system), which is also dynamically configurable — a growing
trend in Grid Computing. [284]
(iv) Monitoring: Another challenge that virtualization brings to the Clouds is the potential difficulty in retaining
fine grained control over the monitoring of resources. Although many Grids (such as TeraGrid) also enforce
restrictions on what kind of sensors or long-running services a user can launch, Cloud monitoring is not as
straightforward as in Grids, because Grids in general have a different trust model in which users via their identity
delegation can access and browse resources at different Grid sites, and Grid resources are not highly abstracted and
virtualized as in Clouds. Monitoring can be argued to be less important in Clouds, as users interact with a more
abstract layer that is potentially more sophisticated. This abstract layer could respond to failures and quality of
service (QoS) requirements automatically in a general-purpose way irrespective of application logic. In the near
future, user-end monitoring might be a significant challenge for Clouds, but it will become less important as Clouds
become more sophisticated and more self-maintained and self-healing.
(v) Programming Model: Grids primarily target large-scale scientific computations, so they must scale to leverage
large number/amount of resources, and we would also naturally want to make programs run fast and efficiently in
Grid environments. Programs must also run correctly, so reliability and fault tolerance must be considered. Clouds
(such as Amazon Web Services, Microsoft’s Azure Services Platform) have generally adopted Web Services APIs
where users access, configure and program Cloud services using predefined APIs exposed as Web services, and
HTTP and SOAP are the common protocols chosen for such services. Although Clouds adopted some common
communication protocols such as HTTP and SOAP, the integration and interoperability of all the services and
applications remain the biggest challenge as users need to tap into a federation of Clouds instead of a single Cloud
provider. [519]
(vi) Application model: Grids generally support many different kinds of applications, ranging from high
performance computing (HPC) to high throughput computing (HTC). HPC applications are efficient at executing
tightly coupled parallel jobs within a particular machine with low-latency interconnects and are generally not
executed across a wide area network Grid. On the other hand, Cloud computing could in principle cater to a similar
set of applications. The one exception that will likely be hard to achieve in Cloud computing (but has had much
success in Grids) are HPC applications that require fast and low latency network interconnects for efficient scaling
to many processors. As Cloud computing is still in its infancy, the applications that will run on Clouds are not well
defined, but we can certainly characterize them to be loosely coupled, transaction oriented (small tasks in the order
of milliseconds to seconds), and likely to be interactive (as opposed to batch scheduled as they are currently in
Grids).
(vii) Security Model: Clouds mostly comprise dedicated data centers belonging to the same organization, and
within each data center, hardware and software configurations and supporting platforms are in general more
homogeneous as compared with those in the Grid environments. Interoperability can become a serious issue for
cross-data center and cross-administration domain interactions. Imagine running your accounting service in Amazon
EC2 while your other business operations are on Google infrastructure. Grids, however, are built on the assumption
that resources are heterogeneous and dynamic, and each Grid site may have its own administration domain and
operation autonomy. Thus security has been engineered in the fundamental Grid infrastructure. The key issues
considered are: single sign-on, so that users can log on only once and have access to multiple Grid sites; this also
facilitates accounting and auditing, delegation (so that a program can be authorized to access resources on a user’s
behalf and it can further delegate to other programs), privacy, integrity and segregation. Moreover, resources
belonging to one user cannot be accessed by unauthorized users and cannot be tampered with during transfer. In
contrast, currently, the security model for Clouds seems to be relatively simpler and less secure than the security
model adopted by the Grids. Cloud infrastructure typically relies on Web forms (over SSL) to create and manage
account information for end-users, and allows users to reset their passwords and receive new passwords via emails
in an unsafe and unencrypted communication. Note that new users could use Clouds relatively easily and almost
instantly, with a credit card and/or email address. To contrast this, Grids are stricter about security.
Business models:
In a Cloud-based business model, a consumer pays the provider on the basis of resource consumption, akin to the
utility companies charging for basic utilities such as electricity, gas, and water. The model relies on economics of
scale in order to drive prices down for users and profits up for providers. Today, Amazon essentially provides a
centralized Cloud consisting of Compute Cloud EC2 and Data Cloud S3. The former is charged based on per
instance-hour consumed for each instance type and the latter is charged by per GB-Month of storage used. In
addition, data transfer is charged by TB/month data transfer, depending on the source and target of such transfer.
The business model for Grids (at least that found in academia or government labs) is project-oriented, in which the
users or community represented have a certain number of service units (i.e., CPU hours) they can spend.
Thus Clouds and Grids share a lot of commonality in their vision, architecture and technology, but they also differ
in various aspects such as security, programming model, business model, computational model, data model,
applications, and abstractions. We believe a close comparison such as this can help the two communities
understand, share, and evolve infrastructure and technology within and across, and accelerate Cloud computing to
leap from early prototypes to production systems. [360]
1.9 Conclusion
Cloud Computing plays a significant role in varied areas like e-business, search engines, data mining, virtual
machines, batch oriented scientific computing, online TV amongst many others. Cloud computing has the potential
to become an integral part of our lives. Examples include, (i) the Cloud operating system which provides users with
all the basic features of an operating system like data storage and applications; (ii) mapping services which help
users in finding routes to various places; (iii) Telemedicine applications for collecting data of a patient and calling
emergency services in dire need.
As an increasing number of businesses move toward Cloud based services, issues like interoperability, security,
portability, migration and standardized protocols are proving to be critical concerns. For instance, the need for
higher transparency in scheduling tasks with guaranteed QoS is proving to be a challenging issue in the area of data
management over the Clouds. In addition, the service driven model of Cloud Computing often leads to concerns and
queries regarding the Service Level Agreement (SLA) of enterprises.
The research and business community alike are coming up with new and innovative solutions to tackle the
numerous issues that are making its way as Cloud Computing is shaping itself as the Future of Internet. The
potentials are endless and the sky is the limit as we watch “a computing environment to elastically provide
virtualized resources as a service over the Internet in a pay-as-you-go manner” [631] bloom and blossom to its
maximum potential.
APPENDIX: Comparisons of different Cloud computing technologies
In this section, we present a comparative data-set that showcases the various solution and service strategies of some
of the major Cloud service providers today, like Amazon Web Services, GoGrid, FlexiScale, Google, Nimbus and a
few others. Detailed description of this data set is available at [608].
TABLE 1.1
Outages in Different Cloud Services
Vendor Service and outage Outage Duration
Microsoft Malfunction in Windows Azure 22 hours

Gmail and Google Apps 2.5 hours

Google Google search outage due to programming error 40 Mins

Gmail site unavailable due to outage in contacts system 1.5 hours


Google App engine partial outage 5 hours
Amazon Authentication overload 2 hours
single bit error leading to protocol blowup 6–8 hours
Flexiscale Core network failure 18 hours

TABLE 1.2
Comparison of Different Cloud Computing Technologies and Solution Provider
Feature Amazon Web Services GoGrid Flexiscale
Computing Provides IaaS.Gives Provides IaaS.Designed to Provides IaaS.Functions
Architecture client API’s to manage deliver a guaranteed QoS similar to GoGrid’s
their Infrastructure
level and reconfigures itself architecture but allows
depending on demand. multitier architectures.
Load Balancing Round-Robin Load F5 Load Balancing. Automatic Equalization of
Balancing, HAproxy. Algorithms used are round- server load within clusters.
robin, sticky session, SSI
least connect,source address.
Fault Tolerance System alerts Instantly scalable and reliable Full Self-Service.
automatically, does file level backup service.
failover and resyncs to
last known state.
Interoperability Supports horizontal Working towards Working towards
interoperability interoperability. interoperability.
Storage S3 and SimpleDB First connects each server to Persistent storage based on
private network and then a fully virtualized high end
uses transfer protocols. SAN/NAS back end.
Security Type-I, Firewall, SSI, No guarantee of security. Customers have their own
Access Control list. VLAN.
Programming Amazon Machine Uses REST-like Query Supports C, C++, Java,
Framework Image, Amazon interface and supports Java, PHP, Pearl and Ruby.
MapReduce. Python and Ruby.
TABLE 1.3
Features of Different PaaS and SaaS Providers
Feature Google App Engine Azure www.Force.com
Computing Google’s geo-distributed Platform is hosted on Facilitates multitenant
Architecture architecture Microsoft datacenters. architecture allowing a
Provides an OS and a set single application to serve
of developer’s cloud multiple customers.
services.
Load Balancing Automatic scaling and Load Built in hardware Load Load Balancing among
Balancing Balancing. Containers are tenants.
used as load balancer.
Fault Tolerance Automatically pushed to a On Failure SQL data Self-Management and
number of fault tolerant services will Self-Learning.
servers. automatically begin using
another replica of
container.
Interoperability Supports interoperability Supports interoperability Application level
between platforms of between platforms. integration between
different vendors and different clouds.
programming languages.
Storage Proprietary database (Big SQL Server Data Database deals in terms
Table distributed storage) Services (SSDS). of relationship fields.
Security Google’s Secure Data Security Token Service SysTest SAS, 70 Type II.
Connector (SDC).SDC uses (STS) creates a security
TLS based server assertion markup
authentication and uses language token according
RSA/128-bit or higher AES to rule.
CBC/SHA.
Programming MapReduce framework Microsoft NET. Apex for database service
Framework supporting Python and Java and supports NET,
Apache Axis (Java, C++).

TABLE 1.4
Comparisons of Different Open Source-Based Cloud Computing Services
Feature Eucalyptus OpenNebula Nimbus
Computing Can configure multiple It is based on Haizea It has a client side cloud
Architecture clusters in a Private scheduling. Focuses on computing interface to Globus
Cloud efficient, dynamic and enabled TeraPort cluster.
scalable management of Nimbus Context Broker
VM’s within private combines several deployed
clouds. VM’s into “turnkey” virtual
cluster.
Load Balancing Simple Load Nginx server is used as Triggers self configuring virtual
Balancing cloud Load Balancer with round- clusters.
controller robin or weighted
selection mechanism.
Fault Tolerance Separate clusters Daemon is restarted and a Checks worker nodes
within a cloud reduce persistent database periodically and performs
the risk of correlated backend is used to store recovery operation.
failure. host and VM info.
Interoperability Multiple private Interoperable between Standards “rough consensus
clouds use the same intra cloud services. and working code.”
backend infrastructure.
Storage Walrus is used. SQLite3 is used as a GridFTP and SCP.
backend component.
Persistent storage for ONE
data structures.
Security WS-security for Firewall, virtual private PKI credential required, works
authentication, cloud network tunnel. with GRID proxies VOMS,
controller generates shibboleth custom PDP’s.
the public/private key.
Programming Supports Hibernate, Supports Java and Ruby. Supports Python and Java.
Framework Axis2 and Java.
A Taxonomy of Interoperability for IaaS
Ralf Teckelmann
Department of Computer Science Hochschule Furtwangen University, Germany
Anthony Sulistio
Department of Applications, Models and Tools High Performance Computing Center Stuttgart (HLRS), Germany
Christoph Reich
Department of Computer Science Hochschule Furtwangen University, Germany

CONTENTS
3.1 Introduction
3.2 Interoperability of Cloud Platforms
3.3 Taxonomy of Interoperability for IaaS
3.3.1 Access Mechanism
3.3.1.1 Type
3.3.1.2 Service Discovery
3.3.1.3 Discussion
3.3.2 Virtual Appliance
3.3.2.1 Life Cycle
3.3.2.2 Virtualization Platform
3.3.2.3 Virtualization Manager
3.3.2.4 Discussion
3.3.3 Storage
3.3.3.1 Management
3.3.3.2 Organization
3.3.3.3 Discussion
3.3.4 Network
3.3.4.1 Addressing
3.3.4.2 Application-Level Communication
3.3.4.3 Discussion
3.3.5 Security
3.3.5.1 Authentication
3.3.5.2 Authorization
3.3.5.3 Accounting/Logging
3.3.5.4 Encryption
3.3.5.5 User Management
3.3.5.6 Verification
3.3.5.7 Discussion
3.3.6 Service Level Agreement
3.3.6.1 Architecture
3.3.6.2 Template Format
3.3.6.3 Monitoring
3.3.6.4 SLA Objectives
3.3.6.5 Discussion
3.3.7 Other
3.3.7.1 Consensus
3.3.7.2 Regulation and Auditing Standards
3.4 Related Work
3.5 Conclusion and Future Work
The idea behind cloud computing is to deliver Infrastructure-, Platform- and Software-as-a-Service (IaaS, PaaS and
SaaS) over the Internet on an easy pay-per-use business modelHowever, current offerings from cloud providers are
based on proprietary technologiesAs a consequence, consumers run into a risk of a vendor lock-in with little
flexibility in moving their services to other providersThis can hinder the advancement of cloud computing to small-
and medium-sized enterprisesIn this chapter, we present our work in outlining the motivations and current trends of
achieving interoperability, especially in the area of IaaSMore specifically, this work delivers a comprehensive
taxonomy as a guideline for cloud providers to enable interoperability within their cloud infrastructuresThus, this
taxonomy discusses important topics of IaaS, such as access mechanism, virtual appliance, security, and service-
level agreement.
3.1 Introduction
According to National Institute of Standards and Technology (NIST), there are five essential characteristics of cloud
computing, i.e., on-demand selfservice, broad network access, resource pooling, rapid elasticityand measured
Service [509]On-demand self-service means that customers are able to obtain computing capabilities without human
interaction from a service providerBroad network access defines the need for network-based access and
standardized mechanisms in order to facilitate access through heterogeneous platforms.
FIGURE 3.1
Layers of Cloud Computing.
Resource pooling describes providers’ resources as pools of different physical and virtual resources dynamically
assigned to consumersThe assignment and reassignment are based on a multi-tenant model in order to achieve high
utilizationThe possibility of rapid scaling up or down of provisioned capabilities is referred to as rapid
elasticity Moreover, the impression of infinite resource capabilities that are obtainable in a large quantity at any time
is proposedFinally, measured service means resources are automatically monitored, controlled, and if needed,
optimized using metering mechanisms appropriate to the resource typeFurthermore, the according utilization is
transparently traceable for customers and cloud providers.
The service models of cloud computing consist of Infrastructure as a Service (IaaS), Platform as a Service
(PaaS) and Software as a Service (SaaS) [509], as shown in Figure 3.1 They differ in the complexity of the
provisioned capability, and how customers can access or use themIaaS is marked through the provision of basic
computing capabilities for customersIn PaaS, users have control only of their applications, and to a certain extent
the configuration of hosting environmentsIn the SaaS model, customers just use a certain application running on a
cloud infrastructure.
3.1.1 Motivation
Currently, offerings from first generation clouds, such as Amazon Web Services (AWS) [63], Rackspace [594], and
Flexiant’s FlexiScale [273], are based on proprietary middleware [588], thus, resulting in isolated environmentsThis
isolation obstructs the further advancement of cloud computingNew models, e.g., federations, are not feasible
without interoperability [611]The same applies to the enforcement of Service-Level Agreements (SLA) in a hybrid
cloud and the management of logically-grouped virtual machines (VMs).
The aforementioned issues lead to a series of disadvantages for customers, such as vendor lock-in and high
migration costsTo address these, standardization efforts have to take place in order to support further developments
in the second generation of cloudsStandardized exchange mechanisms and interfaces are crucial in order to facilitate
interoperability [490]It is also important that the standardization process should apply to the evolution of cloud.

FIGURE 3.2
The Cloud Platform Layer.
As cloud computing becomes popular, myriads of offerings are announcedHowever, the first wave of these
offerings are IaaS-based products like Amazon Elastic Compute Cloud (EC2) [63], Rackspace and FlexiScalePaaS
and SaaS solutions are later offered on top of IaaSFigure 3.2 illustrates this relationship, which includes the Cloud
Platform layer and its underlying hardwareDeriving from this stack, interoperability for IaaS has to be achieved
firstIn order to do so, standardization efforts have to focus on one layer below, i.e., the cloud platform layer, since it
encapsulates all technologies used to provide virtual environment and provides functionalities to the upper layersAs
shown in Figure 3.2, the cloud platform layer consists of several Virtualization Platforms that are responsible for the
resource virtualization, a Virtualization Manager for managing VMs, and Interfaces that provide a controlled and
uniform access to the underlying environment.
This chapter deals with technologies and mechanisms of the cloud platform layer specific to IaaS, in order to
achieve interoperability between cloudsOnly when the interfaces, documentation and standards are open and
transparent, cloud services can be easily migrated to various platformsTo give an overview of related topics and
important areas, this work presents a taxonomyThe taxonomy discusses all important issues to outline the needs and
trends in current developments aiming for interoperability for IaaSImportant developments, such as the rise of
Virtual Appliances (VAs) over VMs and the adoption of SLAs to clouds, are also considered in this taxonomy.
The rest of this chapter is organized as followsIn Section 3.2, the term interoperability is defined and
discussedFurthermore, the benefits of interoperability are pointed outThe core of this work is a taxonomy about
IaaS interoperability and is presented in Section 3.3Section 3.4 provides some related workFinally, Section
3.5 concludes the chapter and gives future work.
3.2 Interoperability of Cloud Platforms
According to Oxford Dictionary of Computing, interoperability refers to “The ability of systems to exchange and
make use of information in a straightforward and useful way; this is enhanced by the use of standards in
communication and data format” [215]This definition applies well to cloud computing, since it addresses the current
state of cloud solutionsThere are multiple systems or clouds that are able to use the information that they have, but
are not able to 1) exchange and share them, and 2) understand the information from othersOnly the second point is
somewhat related to interoperabilityThere are two kinds of interoperability,
i.e., syntactic and semantic [728]Syntactic means that clouds are trying to communicate through standardized
mechanisms, where interfaces, data formats and communication protocols are used to achieve
interoperabilitySemantic interoperability means that the information is not only exchanged but also interpreted in
order to use it.

3.2.1 Benefits of Interoperable Clouds


The benefits of interoperability cover business/economy and technical issuesThese issues are relevant to
stakeholders, i.e., customers, developers, and cloud providersFrom a business perspective, advantages to customers
are: (i) no vendor lock-in, where a customer is no longer restricted to a single cloud provider; (ii) flexibility, where a
customer is able to interact with others using various cloud offerings and distribute his/her applications across
several providers; and (iii) cost reduction, where if a customer moves to a different provider, he/she can move major
parts of the application without building it from scratch againFrom a technical perspective, developers can take
many advantages of using interoperable clouds:
• Monitoring of a distributed infrastructure scaled over several clouds can be achieved through a standardized
communication mechanism.
• Consistent and uniform system management of multiple cloudsThis addresses the current lack of having multiple
information points, e.g., a local monitor for a private cloud, and Amazon CloudWatch [63] for virtual machines in
the Amazon EC2.
• Secure verification and service discovery using standardized interfaces.
• SLA enforcement for the possibility of automatically reacting to a SLA violation.
• Fault tolerance by scaling over several clouds.
• Disaster recovery, where distributing information and components prevents complete data loss and reduces
downtime.
FIGURE 3.3
Interoperability of IaaS Taxonomy.
The above advantages provide cloud providers with a case for interoperabilityOne incentive for cloud providers is
an increase in market share, as interoperability further drives wide-spread adoption by users and developers in using
cloud computingThis translates to a growth in revenueAnother factor is to be able to provide users with a SLA
guarantee, in case of hardware failure and/or security attacksUsing a federated cloud model as an example, users’
VMs can be migrated to a different provider with minimal overhead, thus, reducing SLA violations.
3.3 Taxonomy of Interoperability for IaaS
Figure 3.3 gives an overview of the taxonomy categories, which can be seen as essential building blocks towards an
interoperability for IaaS on the cloud platform levelThe first category shown in Figure 3.3 is Access Mechanism,
where several access types and important topics of service discovery are discussedVirtual Appliance is the second
category and deals with the interoperability of computational resourcesNext is Storage, where it discusses storage
organization and management as important areas for interoperability.
The fourth category shown in Figure 3.3 is Network that focuses on addressing issues regarding to high-level
communicationsAfterwards, Security is considered in this taxonomyService Level Agreement (SLA) is discussed
next, especially in the area of architecture, template format, and monitoring of SLA objectivesFinally,
the Other category is intended for topics like consensus and regulations, and audit standards that do not belong to
other parts.

3.3.1 Access Mechanism


Access mechanisms have a major influence on interoperability since they define how a service can be
accessedFurthermore, they define how a service can be discoveredTwo areas in access mechanisms are identified,
i.e., Types and Service Discovery, as shown in Figure 3.4.

FIGURE 3.4
Access Mechanism Taxonomy.

3.3.1.1 Type
In an access mechanism, Type means different ways that a customer has access to servicesIn the first generation of
clouds, common standard types are Application Programming Interface (API), Graphical User Interface (GUI),
and Command-Line Interface (CLI), as depicted in Figure 3.4.
Application Programming Interface
APIs offered by cloud providers can be divided into three architectural styles, ie., Web Service, HTTP Query,
and Extensible Messaging and Presence Protocol (XMPP) Simple Object Access Protocol (SOAP) is a commonly-
used specification for implementing web services, since it is a stateless message exchange protocolThe common
underlying protocol is Hypertext Transfer Protocol (HTTP), but Remote Procedure Call (RPC) is also
applicableHowever, SOAP does not provide its own syntax within Web Services Description Language (WSDL), an
XML-based language that provides a model for describing web servicesThus, it may lead to existing systems unable
to communicate with each other due to using proprietary or implementation-specific solutionsHTTP Query provides
an alternative mechanism to SOAP, where it uses either Remote Procedure Call (RPC) or Representational State
Transfer (REST)Common RPC analogs are XML-RPC and JSON-RPCXML-RPC is the precursor of SOAP and it
uses XML to encode its calls, whereas JSON-RPC uses Java Simple Object Notation (JSON) as the data formatIn
contrast, REST is an architectural styleAlthough REST is exhaustively elaborate and explained in [271], no concrete
definition is formalizedThus, it leads to various interpretations on what RESTful meansThis is reflected through
APIs from several cloud providers, such as Amazon and Rackspace that are promising RESTfulness, but unable to
communicate with each other.
XMPP is an XML-based protocolThe design goals of XMPP are messaging, presence and request-response services
in a near real-time mannerLike other XML-based communications over HTTP, XMPP can be adopted to a
programmatic client-to-cloud communicationHowever, XMPP suffers from the low adoption by cloud applications.
Command Line Interface
Many cloud providers offer tools and scripts to support CLI, e.g., Amazon’s EC2 API tools or other third-party tools
for AWSIn most cases, these tools are just implementations on top of existing APIs like REST or XML-RPCSecure
Shell (SSH) also becomes a standard tool for accessing Unix/Linux- based computational resources.
Graphical User Interface
The main role of GUI is to provide users with an easy access to a cloud offeringIn general, access using GUI can be
differentiated into Remote Desktop Protocol (RDP) and Web Portals RDP is a proprietary protocol from Microsoft,
and it is used to connect VMs that use the Windows operating system (OS) remotelyIn contrast, web portals are
used independent of OSThus, many cloud providers like Amazon and Flexiant offer access to their IaaS offerings
via web portals.

3.3.1.2 Service Discovery


Figure 3.4 also shows various forms of service discovery like Domain Name System (DNS), HTTP Query,
Semantics and Web Service Discovery These forms are discussed next.
Domain Name System
DNS Service Discovery (DNS-SD) describes a mechanism to discover services of a certain domain [182]It is a
convention for naming of resource-record types, thus, allowing the discovery of service instances under a certain
domain with basic DNS queriesDNS-SD provides a discovery mechanism, which can be integrated into existing
DNS structures without major changesThus, DNS-SD is an access mechanism independent of a discovery
mechanism.
HTTP Query
It refers to the HTTP Query described in Section 3.3.1.1 If a web service follows the REST constraint of
statelessness [271], a discovery request on a resource should return all related information including the available
operationsMoreover, each service is self-describing and discoverable through simple HTTP requestsHowever, it
only refers to the discovery of a service’s capabilities, not the service itself.
Semantics
It aims to discover services by finding a meaning in certain contextsResource Description Framework (RDF) and
Web Ontology Language (OWL) are commonly-used semantic modelsRDF is a general-purpose language for
representing information on the Web, and uses XML for the information presentationIn contrast, OWL enables the
description of connections between informationThus, resources can be described using RDF and their relations
through OWL in a machine-readable wayAs a result, resources can be discovered not only by keywords, but also by
meaningThis makes service discovery very flexible, because applications are no longer bound to or developed
against a static set of particular services.
Web Service Discovery
There are four aspects to a web service discovery, i.e., Registry, Federated, Language, and MulticastUniversal
Description, Discovery & Integration (UDDI) is an XML-based registry for classifying and locating web service
applicationsIt uses WSDL for service description and SOAP for communication.
A federated discovery is a hybrid version of distributed and centralized ones [660]Several autonomous discovery
entities are federated, and each of them provides its information within the federation.
Web Service Inspection Language (WSIL) is a language that defines formats and rules on how information about
services are made available [113]A WSIL document abstracts different descriptions’ formats through a uniform
XML- based formatIt provides a mapping of different descriptions to the according service through pointersBecause
the WSIL document only holds references and no concrete descriptions, it is extensible and easy to process.
Web Services Dynamic Discovery (WS-Discovery) is a multicast discovery protocol for locating web services in
local area networks [556].

3.3.1.3 Discussion
CLI and GUI are important tools for giving users access to cloud offerings, but they lack interoperabilityThe only
exception is standard tools for remote access to computational resources, which are widely-accepted like SSH or
RDPCLI and GUI are mainly implemented on top of existing APIs.
These APIs are the main access pointsThey use proprietary XML, JSON or plain HTTP as data formats resulting in
a high diversity of implementation detailsTo achieve interoperability for IaaS, best practices or other restrictive
rules are necessary, especially when using XML or JSON to describe data, due to high extensibility and
adaptivityThus, a consensus about the meanings of syntactic constructs has to be achievedFurthermore, the
appropriateness of existing approaches can be evaluated, and the inappropriate ones can be rejected in order to
reduce the diversity.
In order to facilitate interoperability, APIs have to expose service discovery functionalitiesA well-accepted
approach is the combination of REST and HTTP Query service discovery of a certain domainThe advantages of this
approach are ease of integration and simple discovery through well-known and well-defined technologies.
A federated service discovery mechanism can also help interoperabilityA federation of mediators like WSIL offers
the ability to encapsulate a broad band of services and offer request specific responseThe approach of mediators also
applies well with a semantic service discovery mechanismAn environment with services described in RDF is
summarized using WSILMoreover, it can be discovered through the use of OWL, since it leverages the power of a
semantic discovery, thus, leading to a more dynamic and flexible wayThese technologies are also applicable in a
RESTful architecture.
3.3.2 Virtual Appliance
The idea of Virtual Appliance (VA) is to deliver a service as a complete software stack installed on one or more
VMs, as proposed by a Distributed Management Task Force (DMTF) [240]A VA description is more than the
XML-based summary of VM descriptionsFigure 3.5 shows the taxonomy of VA, where it consists of Life Cycle,
Virtualization Platform and Virtualization Manager.

FIGURE 3.5
Virtual Appliance Taxonomy.
FIGURE 3.6
The Life Cycle of a Virtual Appliance [240].

3.3.2.1 Life Cycle


As shown in Figure 3.5, there are two important topics of the VA life cycle, i.e., Stages and Update Management.
Stages
According to DMTF, the life cycle of a VA consists of five stages, as shown in Figure 3.6 In this figure, it starts
with the Development stage, where images are prepared and meta-data are assembledNext is the Package and
Distribute stageBecause VAs consist of multiple entities, they have to be assembledHowever, packaging means
nothing more than a collection of all componentsThe Distribute part is the step of making the package available by
uploading it to the cloudThen, the Deploy stage launches the VAThe Managementstage manages the VA, such as
performing pause, resume, and stop operationsFinally, in the Retirementstage, the VA is decommissioned and the
relevant resources are released.
Update Management
Currently, running VMs need to be updated individually by the users for software upgrades or security fixesThe
update management can also pose a significant problem to cloud providers, as they need to update all pre-
configured imagesFor PaaS and SaaS providers, an update management needs to be done carefully so as not to
avoid SLA violations that disrupt VMs’ uptime and network performanceTherefore, an update management can
become a complex problemCurrently, one viable solution is to move from a conventional software patching to a
redeployment [681]With this solution, the update procedure is being shifted into the development phase of the VA
life cycleThus, VA developers or maintainers can update their images and test them before they are being packaged
and deployedAs a result, the newly-deployed VA takes over the older one after its retirement or slowly replaces it.

3.3.2.2 Virtualization Platform


Figure 3.5 also shows the core characteristics of a virtualization platform, i.e., Virtual Disk Format, License, Host
Operating System (OS), Host Architecture, and Virtualization Technology Descriptions of each type are described
next.
Virtual Disk Format
The current landscape of virtual disk formats (VDFs) is heterogeneous, like VMware’s Virtual Machine Disk
(VMDK), Microsoft’s Virtual Hard Disk (VHD), and QEMU’s qcowHowever, most of them are proprietaryTo
advocate interoperability, the support of various VDFs is recommended.
License
Software license is one of the most complicated topics because there are many free and proprietary licenses existing
nowadaysVirtualization platforms with a free license are preferred in order to support interoperabilityMoreover,
they are less restrictive, hence, enabling broader adoption and agreement.
Host Operating System (OS)
In terms of interoperability, it is important that the host OS supports VMs with a variety of guest OSs, if a
virtualization manager is not capable of managing different platformsMoreover, the host OS should allow
paravirtualization of a guest OS to enhance the performance, in comparison to using full virtualization.
Host Architecture
Several host architectures were developed within the evolution of computing like SPARC (Scalable Processor
ARChitecture) and x86The x86 architecture with 64-bit registers (x86-64) is compatible to 32-bit ones, but not vice
versaSince a VM’s guest architecture depends on the host architecture and most x86 processors are used in new PCs
and servers, x86-64 should be the standard host architecture to achieve interoperability.
Virtualization Technology
It is an essential part of virtualization, because it enables the running of multiple VMs inside one host or
serverCurrently, there are four technologies used in today’s virtualization, i.e., full, para, OS-level, and hardware-
assistedFull virtualization allows a VM to be running without any OS modifications, for example, a VM with
Microsoft Windows OS running inside a Linux-based host OSIn contrast, paravirtualization modifies a guest OS to
communicate with a host system to improve performance and efficiencyHence, VMs and the host need to be
running on the same OS.
OS-level virtualization has the appearance of a stand-alone system from the view of running applicationsHowever,
the host OS is actually shared by many guest OSsHardware-assisted virtualization is a hardware technology that
supports virtualization by enhancing the communication between a guest OS and the underlying hardwareFor
example, Intel’s VT and AMD’s AMD-V processors have built-in support to enable a hardware-assisted
virtualization.
To support interoperability, it is recommended that the cloud providers use a variety of virtualization technologies
to address the different needs and requirements of their customers.

3.3.2.3 Virtualization Manager


A virtualization manager is responsible for managing VAs on physical hostsFor interoperability, an important task
is to move the VAs to different hosts and/or cloud providersTherefore, Migration is an essential task for the
manager, as shown in Figure 3.5 Moreover, this figure shows other relevant interoperability issues
like Adaptors/Connectors and License.
Migration
There are two kinds of migration, i.e., cold and live For a cold migration, the VM is being shutdown and then
moved to another host or serverThis is a time- and resource-consuming process, where it has many disadvantages
like increasing downtime and potentially violates SLAsA second approach is a live migration that aims to avoid the
aforementioned disadvantagesWith this process, data need to be stored externally before the migration beginsSome
virtualization managers, such as VMware vSphere and Microsoft Hyper-V, are under closed source development
and have restrictive licensesThus, they limit the ability to reproduce the same migration procedures and perform
uniform benchmark tests [191]Moreover, with various vendors using different live migration strategies, it would be
a challenging task to achieve interoperability in this issue.
Adaptors/Connectors
The terms adaptor, connector, broker or driver are refer to the same thing, i.e., the integration of an API in a
modular way that allows the translation of operations and commands from one API to the function set of
anotherThis modular approach is adopted by multiple virtualization managers of second generation clouds like
OpenNebula [667] and Eucalyptus [544].
The modular approach results from the need to achieve a certain level of interoperability through the integration of a
variety of interfacesCurrently, only the proprietary Amazon EC2 API is gaining wide acceptance since interfaces
from Open Grid Forum (OGF) [551] and DMTF are still in an early adoption stateHowever, in the future, the
adoption of several APIs through adaptors should be a de facto requirement for a virtualization manager.
License
The same issues with a license for a virtualization platform, as discussed in Section 3.3.2.2, can be applied to a
virtualization manager.

FIGURE 3.7
Storage Taxonomy.

3.3.2.4 Discussion
In this chapter, virtual appliances are introduced and presented as a new computational resource for cloud
computing, instead of virtual machinesAs mentioned earlier, the VA approach provides new possibilitiesThus, its
adoption and further enhancement is recommended.
Due to heterogeneity in virtual disk formats, licenses, host and guest OSs, and virtualization technologies, only host
architecture and virtualization manager can be used to achieve interoperabilityThe virtualization manager is capable
of abstracting various virtualization platforms through an intelligent controller, which orchestrates the platform
utilization and allows a heterogeneous environmentHowever, interoperability among virtualization managers is
obscured by closed or proprietary sources, and license restrictionsInteroperability can only be achieved through the
adoption of virtualization platforms and virtualization managers without such limitationsThe adaptors/connectors
concept can also help interoperability though a modular integration of various APIs.

3.3.3 Storage
Figure 3.7 shows two important topics in terms of storage interoperability, i.e., Management and Organization The
former addresses storage functionalities and discusses the need of their availabilityThe latter points out several
kinds of storage organization, which are important to prevent incompatibility.

3.3.3.1 Management
Figure 3.7 also shows the three topics of a storage management, i.e., Backup, Replication, and SnapshotsBecause
these topics are sometimes mixed up or treated as alternatives, their differences are highlighted to avoid this
confusionMoreover, they have to be present in order to achieve interoperability of IaaSThe backup functionality is
important to guarantee long-term preservation on a non-volatile storage media, either for data to be backed up or
VM images to be protected against data lossOn the other hand, replication means that data are not only being copied
to one location, but they are distributed to several sitesHowever, replicated data are not moved into a non-volatile
storage mediaInstead, they are copied to volatile storage for availability and faster access timeFinally, snapshots are
full copies of data objectsCompared to backups, snapshots are not moved to non-volatile storageSnapshots are also
different from replications, because in the process only one copy is made to a single location and not to multiple
ones.

3.3.3.2 Organization
Storage organization depends on the kind of data to be storedFigure 3.7 also shows the areas of a storage
organization, i.e., Schemes, and Image.
Schemes
There are three major schemes of storage organization: Block Storage, File System, and Object StorageBlock
storage is a kind of storage where data are saved in blocksA block has a certain size and consists of a structured
sequence of bytesThus, block storage can be seen as a logical array of unrelated blocks, which are addressed by an
index in the array called Logical Block Address (LBA) [260]On the other hand, a file system is responsible for
imposing a structure on the address space of disks, such that applications can refer to files’ abstract names of data
objectsFinally, object storage is a kind of abstract storage consisting of one or more object stores or devices
[260]An object store is a collection of objects, which consists of data objects and their metadataWith this approach,
objects can be stored or modified via methods exposed through a standardized file-like interfaceEverything below
the method calls is encapsulated.
Image
Image formats have already been discussed in Section 3.3.2.2 Hence, this section focuses on their relationship to the
underlying organization schemeHard disk images are one example already mentioned aboveA hard disk image is a
by-sector copy of a hard diskThe important point is that sectors are smaller than blocks, different from files, which
are mostly largerTherefore, the image could be placed on the block storage more efficientlyFurthermore, all
information is within the image and no metadata have to be consideredIn case of a VA image containing files [241],
the usage of block storage is not efficient due to additional file sources.
Decoupling pairs of images and related data to the most appropriate scheme can be a solution, but only possible if
the schemes are hidden behind intelligent object storageIn this scenario, block and file system storages are chosen
based on the type of dataTherefore, interoperability can be achieved through the intelligent placement of data
beneath an object storage layer.

FIGURE 3.8
Network Taxonomy.

3.3.3.3 Discussion
Compatibility of underlying storage resources is required to allow the shift of data from one location to another
while still retaining the possibility of an efficient usageThis issue is highlighted through the introduction of three
storage organization schemes, and the statement about the relationship between various images and the underlying
storage infrastructureTo avoid compatibility problems, the introduction of an intelligent object storage layer can be
a solutionVM images and related data are decoupled and stored on most appropriate storageLogical connections can
be kept using metadata informationThe mentioned functionalities can be supported from such an environment and
fulfill interoperability requirements.

3.3.4 Network
Figure 3.8 shows two important topics in terms of network interoperability, ie., Addressing and Application-level
communication.

3.3.4.1 Addressing
A major problem for cloud computing is the need to have a reliable remote access to applications and their
underlying VMs, even when they are being moved between subnets within a cloud or to othersTherefore, network
addressing plays an important role for interoperabilityAs shown in Figure 3.8, IP Mobility and Locator Identifier
Separation Protocol (LISP) aim to address the aforementioned problem.
IP Mobility
The possibility to keep its IP address during live migration is essential for a VM, otherwise it would directly lead to
service downtime and SLA violationsTherefore, extended mechanisms need to be considered to allow seamless
migration and facilitate interoperabilityThe mechanisms of an IP mobility are available for both IPv4 and IPv6.
IPv4 mobility is an extension of the IPv4 protocol [247]The first component is the so called a home agentThis can
be a network router that a VM is working onWhen the VM moves, a so-called care-of address of the machine is
deposited to the home agentThis is the address through which the VM is accessible in the new networkIf the address
is provided by the router, then this router is called foreign agent and has the function to forward data to the
VMWhen datagrams are sent to the migrated VM, the home agent tunnels them to the foreign agentIn the other
direction, the foreign agent serves as a standard gateway.
IPv6 mobility implements a mechanism similar to IPv4Care-of addresses and home agents are also used [398], but
there are many differences resulting in incompatibility and the inability to inter-operateThe most important
difference is that, because of mechanisms like proxy neighbor discovery and stateless or stateful auto-configuration,
IPv6 mobility does not need foreign agents [398]Furthermore, the communication routes differ, because IPv6
mobility allows bidirectional tunneling over the home agent, as well as direct communication to the care-of address
for the other communicating entity.
Locator Identifier Separation Protocol
LISP aims to have a single IP address to be used for multiple purposes [126]An IP address locates and identifies a
hostThe idea is to separate these functionalities into Endpoint Identifiers (EIDs) and Routing Locators
(RLOCs) [263], where EIDs are hosts and RLOCs are router addressesEnd-to-end communication is realized on
EID-to-EID communicationEIDs are mapped to their according RLOCs through conversion algorithms resulting in
a RLOC-based communication on the way between start and destination networkTherefore, LISP is placed on edge
routersThis model works with both IPv4 and IPv6, and allows mobility over network barriers through the
decoupling of ID and locationHowever, because of the placement of LISP on edge routers, the implementation and
adoption is extremely difficult even if the model seems to be powerfulDeriving from this, further research is needed.

3.3.4.2 Application-Level Communication


Figure 3.8 also shows two communication protocols important in terms of interoperability,
i.e., HTTP and XMPP HTTP was already mentioned in the context of RESTful APIs in Section 3.3.1.1 However, in
cloud computing, HTTP is the common application-level transport protocol, since HTTP mechanisms apply well to
REST and SOAP.
XMPP has also been discussed previously, where it provides XML over TCPUnfortunately, due to its low adoption
in clouds and the deficiencies already pointed out in Section 3.3.1.1, XMPP is not an alternative to HTTP in terms
of interoperability.

FIGURE 3.9
Security Taxonomy.

3.3.4.3 Discussion
Interoperability in networking is achieved through the dominant position of the IP protocol familyIt is achievable
even if IPv4 and IPv6 are incompatible, because there is cooperation among translation mechanismsThe ability of a
continuous low-level communication is essential, but it is currently not possible among heterogeneous
environmentsA comprehensive solution through LISP does not seem feasible at the momentTherefore, a
homogenous addressing founding on IPv6 has to become standard in order to achieve an interoperability of cloud
platformsThis is due to the exhaustion of the IPv4 address space, and the IPv6 mobility is a core component of the
initial protocol design and has no later addition.
XMPP is currently not suitable for a high-level intercloud communication, since it is not widely-used by cloud
applicationsTherefore, HTTP can be seen as the only way to achieve interoperability in application-level
communicationNevertheless, the XMPP approach should be traced and reviewed against the applicability of REST
to it.

3.3.5 Security
Figure 3.9 shows important security topics for a cloud interoperabilityAuthentication,
Authorization and Accounting/Logging introduce the AAA model into this taxonomyNext, Encryption mechanisms
for communication and data are presentedAfterwards, important considerations in the area of User Managementare
pointed outFinally, a description about a service discovery specific topic Verification is discussed.

3.3.5.1 Authentication
Authentication means a confirmation of a stated identity, and it is an essential security mechanism in clouds and
other IT areasIn the following, two kinds of mechanism are presented, as shown in Figure 3.9.
HTTP Authentication
HTTP offers two authentication mechanisms, i.e., basic and digest access [285]The basic authentication is founded
on a Base64 encoding of user ID and password within the authorization header fieldIt is not secured since Base64 is
an encoding not an encryption mechanismDigest access authentication addresses this problem through MD5-based
encryption for the password and several other values, but not the usernameUnfortunately, due to the well-known
weakness of MD5, this mechanism can not be treated as secureMoreover, these mechanisms are not interoperable
with each other.
Public Key Infrastructure
Public Key Infrastructure (PKI) is a foundation for security mechanisms that allows the use of public keys and
certificates, instead of logins and passwords [755]Therefore, PKI provides an opportunity to establish a trustable
relationship between an individual or an organization to its credentials (key or certificate), which can be used by
other mechanisms to verify an identity and authenticate itIn terms of interoperability, four components of PKI are
important, i.e., Trust Model, Cross Certification, Algorithms, and X.509.
There are several trust models available for PKIsOne model is the hierarchical modelThere, one or more entities,
called certification authorities (CAs), build a hierarchy above the credentials of an end-userIn this model, a CA
guarantees the confidentiality of the CAs or end-user credentials below and can be verified against a next higher CA
if necessaryAnother model is called Web of Trust Because no higher authorities are present, confidentiality has to be
verified on each end-user characteristicIn case of Pretty Good Privacy (PGP), other end-users acknowledge an
identity and the consumer decides if he/she trusts these acknowledgementsA third model is the Peer-to-Peer model
were each entity establishes trust with each other entity through individual trust negotiation.
Cross certification is an extension of the hierarchical modelIt gives entities of one PKI the chance to verify the
confidentiality of entities belonging to other PKIsThe idea is quite simple; if one CA trusts another, an entity
trusting one of them can also trust the other one.
In cryptography, the requirement on replacing algorithms is recommended, because of the need to be able to
substitute an algorithm with a stronger one if it got brokenThus, it is necessary that communicating systems can
negotiate an algorithm they are capable ofOn the other hand, X.509 is an International Telecommunication Union
(ITU) standard for important PKI components like public key certificates, certificate revocation lists, and attribute
certificates.
3.3.5.2 Authorization
Authorization means an allocation of resources or rights according to the user’s credentials after the user has proven
to be the one he/she statedMainly two models, discretionary and role-based access control are discussed, as shown
in Figure 3.9.
Authorization in clouds is achieved through Discretionary Access Control (DAC)Amazon and Rackspace are
examples of using DAC in their public cloudsThe concept of separation of users and objects hides behind this term
[232]The access in this case is controlled through lists, named Access Control Lists (ACLs)These lists consist of
mappings of usernames and functions that they are able to execute on the according objectThe mappings are
determined by the owner of the objectThis model has a great potential for interoperability, because only the ACL
format has to be uniform to achieve interoperabilityThe function terms do not have to be unified, since they are only
applied to a specific environment.
Role-based Access Control (RBAC) is based on the assumption that access control decisions are determined by the
role of a user [266]This can be by his/her duties, responsibilities or qualificationIn RBAC, functions are dedicated
to users according to their roles, resulting in a concept of different roles with different function setsUnprivileged
users, privileged users, database administrators or network administrators are examples of rolesAnother difference
from DAC is that these roles are not determined by users or owners, instead they are specified by the
systemCurrently, only one cloud API, VMware’s vCloud with vExpress, implements the RBAC approachA
multitiered RBAC separates user and administrative functionsHowever, RBAC is not compatible with DAC, since
the roles of RBAC systems can differ.

3.3.5.3 Accounting/Logging
In conjunction to security, accounting/logging means the record of the amounts of events and operations, and the
saving of information about themIn terms of security, the importance of accounting data lies in their availability
sustaining trust and complianceLogs are central information sources for common system and fault
analysisInteroperability between clouds can be seriously affected if no such information is available and there are no
present mechanisms to access them.
The concrete information could be made available using Information Technology Infrastructure Library
(ITIL)’s Incident Logging [379]This ITIL process defines several parameters which have to be present in every
event recordFurthermore, it proposes multi-level categorization for the identification of event importanceEvent-
based logging would also apply well to a monitoring solution to be discussed later in Section 3.3.6.3 The influence
of ITIL and its well-defined best practice guidelines could lead to acceptance and adoptionAs a result, trust and
interoperability can be achieved.

3.3.5.4 Encryption
Figure 3.9 also shows two important topics of encryption, i.e., Communication and Data Security between
endpoints through communication encryption is as important as endpoint securityEspecially through the close
relationship of REST and SOAP to HTTP, Secure Socket Layer (SSL) or its new version Transport Layer Security
(TLS), become the standard encryption protocol for communication over unsecure networksVirtual Private
Networks (VPNs) and SSH are other common mechanisms providing secure communication per defaultFor
Windows-based systems, RDP also provides a secure mechanismData encryption is a topic mostly all cloud
providers distance themselves fromCurrently, the only agreement made is that existing encrypted data can be stored
[350]The problem of encryption methods is the lack of knowledge of their practicality in cloud computing, due to
its scaleMechanisms like encfs, Linux Unified Key Setup (LUKS) or dm-crypt are feasible for local systems with
lower scale data size (in terms of Gigabyte), and are subjects of open licensesHowever, their feasibility on cloud
scenarios is not yet proven.

3.3.5.5 User Management


In the first generation of clouds, data regarding a user profile are mostly limited to login name, password, e-mail
address and credit card numberThe implementation of a secure authentication environment like a PKI raises the data
amount as a direct consequenceThis potentially enlarges the profile through certificates, stored public keys and
signatures, and more personal or organizational data necessary for such security mechanisms.
A lot of operations in the cloud are based on user interactions that require credentialsThus, how are profiles treated
in case of federation? One solution is to leverage a Single Sign-On (SSO) technology with Shibboleth, where it
allows a single user credential to access and use various services across multiple layers [657]In Shibboleth, special
components like Identity Providers (IdPs) and Service Providers (SPs) are introduced to build an environment
which is also able to federate with other SSO systemsA user or system account has to authenticate if he/she wants to
use a service, and has to communicate with the SPThen, the SP verifies the provided credential against information
it gets from the IdPIf the user is authenticated, credentials are placed on his/her side; then the SP can be used to
authenticate user interactions without disturbing the userThe process can also be realized between SPs and IdPs of
federated partnersMoreover, the amount of data retrieved between IdP, SP and the user can be configured, thus
improving the security of the user data.

3.3.5.6 Verification
Verification is a security requirement for the identification of services in conjunction with the service discovery
[126]Verification means the ability to decide if a service or application can trust others [350]If it can not, then
interoperability will be hindered, because services are not able to prove their intentions as statedThus, the
communication will be rejected due to security issuesIn order to avoid this, verification through certificates or
signatures is neededEvery service of a domain owning a certificate or being signed with a verifiable signature can
be assumed as trustableFor example, RDF or WSIL documents could be signed easilyOverall, verification
introduces security into the service discovery process and into the interaction of communicating servicesThis
strengthens interoperability as the requester can assure itself about the authenticity of its partner.
3.3.5.7 Discussion
HTTP authentication does not provide a secure mechanism, so it is not applicable commerciallyPKIs, especially
hierarchical ones that use well-known and secure mechanisms are recommended to achieve interoperability in the
short run, and with the support of cross certification or SSO in the long runX.509, as a standard certification format
used in mostly all PKIs, also provides interoperabilityMoreover, certificates can be used to verify more than just
usersServices and service entry points can be provided with certificates as well.
Only DAC becomes widely accepted since RBAC plays a smaller roleInteroperability between DAC and RBAC is
not achievable due to different underlying conceptsEven if RBAC provides several advantages compared to DAC,
its adoption is not as common as DACThis leads to interoperability through the broad adoption of ACL-based DAC
for authorization.
Accounting/logging enables trust and compliance through the availability of event and operation
traceabilityInteroperability between logging facilities can be achieved through APIs providing information and
hiding the concrete infrastructureHowever, their presence is important to achieve the willingness to inter-operateThe
functionalities exposed through APIs should follow ITIL guidelines, since these best practices are widely accepted.
The two topics, communication and data encryption, are also pointed out in this sectionHowever, data encryption
lacks adoption within cloud computingUser management in federated scenarios is also an important topic, because
personal data have to be stored, but not transmittedIf an operation needs to be billed, information will be needed by
the partnersThis process can be arranged through the utilization of federated SSO infrastructuresThe operating
principle of Shibboleth and the resulting possibilities were highlighted previouslyThe proposed approach also
applies well to the authentication and authorization mechanisms, because of the usage of common strong security
and web standardsA critical point is the difference from the underlying attribute-based authorizationFurther research
efforts are necessary at this point.
FIGURE 3.10
Service-Level Agreement Taxonomy.
Programmatic service discovery is another important feature for automated service delivery and
consumptionVerification secures the communication between services through the introduction of verifiability of
the confidentiality of their opponentsThis strengthens the idea behind interoperable systemsTherefore, verification
mechanisms have to be adopted in form of signing the service discovery endpoints with verifiable certificates.

3.3.6 Service Level Agreement


Service Level Agreements (SLAs) are ubiquitous in modern IT systems, and thus, they have to be discussed in
terms of interoperability [126] [350] [606] [567]Figure 3.10 shows four important topics, i.e., Architecture,
Template Format, Monitoring and SLA Objectives.

3.3.6.1 Architecture
Web Service Agreement Specification (WS-A) is the standard for SLA management in web service environments,
and its web service-based interface called agreement layer [87]While WS-A is XML-based and exposes a rich
function set, RESTful is preferred since it is minimal, with extensible function sets of Virtual Execution
Environment Management Interface (VMI).
3.3.6.2 Template Format
Several template formats for SLAs were developed in the past years, for example IBM’s Web Service Level
Agreement (WSLA) or SLA Definition Language (SLAng)One important property is that these template formats are
machine-readable documents, where they are the electronic representations for the purpose of automated
managementIn the first generation cloud systems, SLAs have predefined metrics and values for SLA Objectives
(SLOs), and they are mostly not machine readable [513]This results in one-round agreement, where a customer can
accept or reject an agreementThe possibility to negotiate the metrics and values of service level objectives on
runtime allows multiple round agreements and is more dynamic.
One template format developed in an extensible way with such scenarios in mind is WS-A, which is an OGF
standardSimilar to WSLA and SLAng, WS-A offers a language to describe a SLA template and all of its
components, and a basic XML-based schema for automated managementMoreover, WS-A specifies a protocol
advertising the capabilities of service providers and creating agreements based on creational offers, and for
monitoring agreement compliance at runtime [87].
In order to achieve interoperability, it is necessary to agree upon a single template formatThe possibility to process
documents electronically is crucial for automated management and the extensibility to dynamic negotiation is
required alsoWS-A seems to be a good choice; however, cloud providers have not yet started to offer machine-
readable SLAsDeriving from the choice of XML as the description language for WSLA, SLAng and WS-A, an
XML-based template format seems to be in favor.

3.3.6.3 Monitoring
Monitoring of SLAs is important for both cloud providers and customersA provider wants to ensure that the
provision of resources is according to the SLA and no liability arisesOn the other hand, a customer wants to ensure
the adherence of cloud providers as mentioned in the contractA way to monitor SLA is to use event-based dynamic
monitoring infrastructures capable of associating low-level metrics with Key Performance Indicators (KPIs) of
SLOsThresholds can be determined dynamically and re-negotiated between parties having already accepted them in
case of events forcing a change [199]In terms of interoperability, a dangerous scenario arises when high static
thresholds of SLOs can not be satisfied by any cloud providersThe consequence is permanent SLA violations and
the inability to move to other clouds, because the SLA requirements could not be metMoreover, an insufficient
event monitoring of static threshold monitoring systems, which are not capable of re-negotiation, can stress
interoperabilityThe result are possibly more critical SLA violations from shift delays and unavailabilityOverall, an
event-based dynamic monitoring system fits better to the dynamic nature of clouds, dynamic SLOs, and machine-
readable and negotiable SLAs.

3.3.6.4 SLA Objectives


SLA Objectives (SLOs) are the core component of a SLA, because they directly influence the interoperability of
cloud systemsIn a scenario where a customer wants to change his/her cloud provider, the absence of SLOs can make
it difficult to compare the old SLA with a new one, for exampleIn order to achieve interoperability, dynamic
negotiation and re-negotiation are in favor [199] [513] [87]This approach applies well to a modular and
independently-extensible SLA management entity and a dynamic extensible machine-readable SLA template format
previously discussedFor SLOs, dynamic negotiation means the participating parties communicate and negotiate
SLOs, implied values, and related information, to which only relevant SLOs are of concern.

3.3.6.5 Discussion
APIs are crucial to interoperability for exposing SLA management functionalitiesFurthermore, the separation of a
SLA managing entity strengthens its interoperability through exchangeability and extensibilityIn addition, a
modular architecture that supports dynamic SLA negotiations avoids interoperability issues, since SLA terms can be
negotiated during runtimeAssuming that machine-readable documents become common, a sole-accepted SLA
template format is essential to achieve interoperable systems, since current existing formats are not compatibleWS-
A seems to be a good choice, due to its extensibility through XML and orientation for more complex negotiation
processes, and an OGF standard.
The danger in SLA monitoring for interoperability arises from characteristics of monitoring systems using a static
low-level threshold to realize high-level SLOsNot only permanent or more critical SLA violations can occur,
depending on the monitoring systems behavior, the cumbering of interoperability through unsatisfied SLOs can be a
consequence as wellFurthermore, dynamic monitoring systems apply better to the dynamic nature of clouds and
other SLA-related mechanisms like dynamic SLO negotiation.
KPIs are well-known and should be present in cloud computingHowever, mechanisms for dynamic negotiation are
still requiredInteroperability of SLOs can be achieved through the negotiation of KPI and all other relevant
information and their aggregation to SLOsMeasures can be controlled by using a dynamic threshold-based
monitoring system, which is encapsulated as a SLA serviceThis service again provides the necessary functionality
in form of an API to the cloud provider and customersSuch a system would address all issues and achieve
interoperability.

3.3.7 Other
Besides the mostly functional topics discussed for this taxonomy, other topics can have major input on
interoperability as wellThe topic on Consensus highlights the importance for conformity of initiatives, companies
and communitiesSome compliance issues with potential cumbering influence on interoperability are discussed
in Regulation and Auditing Standards.

3.3.7.1 Consensus
As already pointed out and shown in the previous sections, standard compliant implementations and de-facto
standard tools play a decisive role in cloud computingInteroperability can be achieved if standards get adopted and
certain tools acceptedStandard Development Organizations (SDOs) and companies try to achieve this by founding
on alliances and cooperation with other initiatives or companiesOpen and academic communities are also part of
these alliancesHowever, a broad consensus about standards and best practices is necessary between company- and
community-driven projects to avoid the old battle of open source communities against closed source companiesThe
first generation of clouds lacked of interoperability, due to the absence of standards and best practicesThe second
generation should not fail, since differentiating business interests are resulting in the creation of a smaller number of
still not interoperable cloud specifications for protocols and APIs.

3.3.7.2 Regulation and Auditing Standards


Regulations can also have major influence on interoperability, because governmental laws or regulations could
restrict or prohibit interoperabilityFor example, storing data in the cloud may elicit various federal and state privacy
and data security law requirements, such as the US Health Insurance Portability and Accountability Act, and EU
Data Protection Directive [669]Thus, privacy and data security laws present a significant challenge for cloud
providers to comply with.
Auditing standards can also play important roles for interoperability [350]For example, ITIL provides an auditable
best practices catalog for IT Service Management (ITSM) [379]However, in cloud computing, IT resources are no
longer solely in users’ own data centerTherefore, it is at the discretion of a cloud provider to follow standards.
3.4 Related Work
Other publications focus on specific problems, such as requirements [567], security [469], semantic-based [195], or
mainly express some considerations [490] and use cases [239]However, several publications analyze the cloud
computing paradigm through the establishment of taxonomies [606] [588] [126]This has the advantage to be able to
structure the components and define their borders, comprehensiveness and influences on other topics.
Rimal et al[606] focus on a high-level architectural observation of cloud functionalities and capabilitiesThey use
these criteria to compare a few cloud providers and their offeringsHowever, they do not focus on IaaS and
interoperabilityThis makes their taxonomy more generalMoreover, they focus only on existing cloud offerings, but
not on current developments and standardsProdan and Osterman [588] also utilize a taxonomy to summarize the
elements in a cloud environment, but from a general perspectiveThis is reflected in the analysis of several cloud
providers and web hosting providers who are against their proposed taxonomySimilarly, the difference to our work
lies in the specialization on IaaS, interoperability and the analysis of different technologies.
On the other hand, Bernstein et al[126] focus on an intercloud communication by discussing multiple problems and
providing solutions to themThe result of their work is a set of protocols, which can be utilized to achieve an
intercloud communicationIn their work, the use of open standards is essential in order to achieve
interoperabilityHowever, this approach focuses only on the communication part, but does not consider current
trends or other important topics, such as virtual appliance, security, storage and SLA.
3.5 Conclusion and Future Work
This work presented a taxonomy of important categories and topics in the area of interoperability for Infrastructure
as a Service (IaaS)Firstly, a detailed explanation of cloud computing is presentedThen, the term interoperability is
defined and elaborated, and its benefits to stakeholders, i.e., customers, developers and cloud providers, are
explained.
The taxonomy itself spans many essential cloud computing topics, such as access mechanism, virtual appliances
(VAs), security, Service-Level Agreements (SLAs), and general recommendationsMoreover, a comprehensive
scope of topics is considered and hot topics are mentionedThe depth and width of the categories are varied, but this
taxonomy outlines a detailed picture of the significant characteristicsHowever, there are several times mentioned in
this chapter, where lack of current solutions are identifiedThus, further research is warranted.
As for future work, VA life cycle management, network addressing issues and SLAs need to be studied in more
detailMoreover, the use of the Information Technology Infrastructure Library (ITIL) to provide interoperability is an
interesting area to exploreFinally, further work is needed to compare existing cloud standards and offerings with the
topics and criteria presented in this taxonomy.

A Taxonomy Study on Cloud Computing Systems and Technologies


Christian Baun
Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Germany
Marcel Kunze
Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Germany

CONTENTS
4.1 Deployment Models
4.1.1 Public Cloud
4.1.2 Private Cloud
4.1.3 Hybrid Cloud
4.2 Delivery Models
4.2.1 Infrastructure as a Service
4.2.1.1 High Performance Computing as a Service
4.2.2 Platform as a Service
4.2.3 Software as a Service
4.2.3.1 Cloud Gaming
4.2.3.2 Cloud Print
4.2.3.3 Cloud OS
4.2.4 Landscape as a Service
4.2.5 Humans as a Service
4.3 Cloud Resource Management
4.3.1 Command-Line Tools
4.3.2 Locally Installed Management Applications with GUI
4.3.3 Browser Extensions
4.3.4 Online Services
4.4 Conclusion
Building on compute and storage virtualization, and leveraging the modern web, cloud computing provides scalable,
network- centric, abstracted IT infrastructures, platforms, and applications as on-demand services that are billed by
consumption [121]. The key advantages of cloud computing are flexibility, scalability, and usability. These features
originate in the combination of virtualization technologies with flexible and scalable web services, and lead to cost
saving [97].
There are various deployment and delivery models for cloud services. This chapter gives an overview about the
most popular cloud services and categorizes them according to their organizational and technical realization.
Afterwards, the different existing services, applications and tools for the customers’ work with the cloud
infrastructure and storage services are discussed.
4.1 Deployment Models
The existing cloud computing services differ with respect to organization and quality of the offering. Three different
deployment models of clouds exist: Public clouds, private clouds, and hybrid clouds.

4.1.1 Public Cloud


In a public cloud, the service provider and the service consumer belong to different organizations. Public clouds
always implement commercial business models and only the actual resource usage is being accounted for. Although
public clouds offer a very good value due to their supply side economy of scale, there is the concern that sensitive
data might leave the customer’s environment. Another concern is the possibility of a vendor lock-in [456]: It might
be difficult to quickly access and transfer the data in case of an urgent need. A possible solution here is to use
independent cloud services that are compatible to the APIs and features.
There are popular success stories of companies that used public cloud services to complete complex tasks in a short
time at a low budget, such as The New York Times12 and Animoto3.
Back in 2007 The New York Times processed 4 TB of data containing their scanned articles from 1851 until the
1980s. The task was to convert the scans of 11 million news stories into PDF format for online distribution. One
hundred virtual Linux servers, run by the Amazon public cloud, only needed 24 hours to convert all images at an
expense of 500 dollars.
Animoto offers an online service for the creation of videos out of a repository of pictures and music. In April 2008
the service was referred to on Facebook and immediately the number of active users increased from 25,000 to about
250,000 within three days. At peak time, within a single hour, up to 20,000 people signed up and tried to render a
video with the Animoto service. Thanks to the elasticity of the cloud services the company was able to ramp up the
number of (virtual) servers to 450 in a short time following the demand.4Another example of a heavy user of public
cloud services is the social network game developer Zynga5. The company develops browser-based games that are
hosted on 12,000 virtual servers.6

4.1.2 Private Cloud


The services of a private cloud are always operated by the same organization the consumer belongs to. The
motivation for building and running a private cloud may be security [610] and privacy concerns. However, it is
difficult to reach the economy of scale, the availability and security level of a certified professional public cloud
service provider. Lots of private cloud solutions to build up infrastructure, platform and storage services are open
source but solutions that are licensed under proprietary terms also exist.
Two examples that enable the customers to integrate public cloud resources in the form of a virtual private cloud
into their local IT infrastructure are the Amazon Virtual Private Cloud (VPC)7 which is a part of the Amazon Web
Services (AWS)8 and the Google Secure Data Connector9 for the Google App Engine10.

FIGURE 4.1
Amazon Virtual Private Cloud (VPC) ©2010 Amazon.
4.1.3 Hybrid Cloud
In a hybrid cloud, services of public and private clouds are combined. The public cloud services in a hybrid cloud
can be leveraged to satisfy peak loads or to spread redundant data inside the cloud to achieve high availability. A
special instantiation of the hybrid cloud is the so-called virtual private cloud where resources of a service provider
are transparently mapped into a customer network by use of a network tunnel.
4.2 Delivery Models
Besides the different deployment models, four different service delivery models exist to characterize service
instances. Figure 4.2 gives an overview on the different cloud service groups. It includes popular public cloud
service offerings as well as private cloud solutions that are Open Source.
FIGURE 4.2
Different Public and Private Cloud Services.

4.2.1 Infrastructure as a Service


Infrastructure as a Service (IaaS) allows hosted virtual instances of servers without the need to physically access the
hardware. The clients have full administrative privileges inside their server instances and are also allowed to define
their own level of networking security. The number and type of supported operating systems inside an IaaS depends
on the service provider. IaaS in principle allows to virtualize a complete datacenter and transfer it into the cloud.
The most popular public cloud IaaS offering is the Amazon Elastic Compute Cloud (EC2)11. Popular competitors are
GoGrid12, Joyent13, FlexiScale14 and Rackspace Cloud15. Open Source IaaS solutions that are popular in industry and
science projects are Eucalyptus16, Nimbus17 and OpenNebula18. All these projects are open source. Two proprietary
private cloud IaaS solutions are the Elastic Computing Platform from Enomaly19 and the quite new project Nimbula
Director that was released into beta in December 2010 from the start-up company Nimbula20 that was founded by a
team of exAmazon developers that developed EC2. Nimbula is an IaaS and works with the Xen hypervisor [115]
and KVM [436]. The solution uses its own Nimbula API and has no compatibility with EC2 or other cloud services
from the AWS. EC2 enables the customers to run virtual Linux, Windows Server and Open Solaris instances inside
Amazon’s data centers based on Xen hypervisor technology. While the user data inside an instance is lost after its
termination, customers may save important data inside persistent Elastic Block Store (EBS) 21volumes.
The Amazon Simple Storage Service (S3)22 is a storage service for web objects. It allows to store virtual machine
images and to provide easily handled web based storage for applications. The functionality of EC2 and it’s API may
be considered a de facto standard and a lot of the existing Open Source solutions try to be more or less compatible
to the EC2 API (see Table 4.1). Eucalyptus and Nimbus both include dedicated S3-compatible storage services that
are called Walrus for Eucalyptus and Cumulus for Nimbus.
Each instance of an IaaS obtains an internal and an external DNS name that are created dynamically during startup.
Both names are lost when an instance is terminated. In order to realize a permanently available service, it is possible
to create elastic IPs and assign them to the instances. An elastic IP can be remapped to another instance at any time.
The elastic IP concept is provided by all public and private cloud IaaS offerings and solutions.
TABLE 4.1
Private Cloud Solutions that are Open Source and Their Level of Compatibility to the Amazon Web Services
AWS APIs Implemented
Name EC2 S3 EBS
Abiquo (abiCloud) — — —

CloudStack subset — —

Enomaly ECP — — —

Eucalyptus full support full support full support

Nimbus subset subset —

OpenECP — — —

OpenNebula subset — —

Tashi under dev. — —

Instances can be launched on the basis of different machine images and instance types. The images represent
templates of different systems like Windows, Linux etc. The instance types define the number of cores, architecture,
main memory and storage provided, I/O performance and thus the price per hour. With the different instance types
and images, the customers of an IaaS have the freedom to create an arbitrary IT infrastructure on-demand and scale
it in an elastic way.
The underlying infrastructures and components of public cloud services remain usually unclear for the users. As an
example of a private cloud infrastructure service, Figure 4.3 shows the structure and components of a Eucalyptus
IaaS. The main components are the Cloud Controller (CLC), Cluster Controller (CC) and Node Controller (NC) and
the storage services Walrus and Storage Controller (SC) [546] [547]. The NC runs on every node in the cloud as
well as a Xen hypervisor or KVM and provides information about free resources to the CC. The CC needs to run
inside every site of the cloud, schedules the distribution of virtual machines to the NC and collects (free) resource
information. The CLC runs exactly one time inside the private cloud and collects resource information from the CC.
The CLC operates like a metascheduler in the cloud too and controls the database with the user and resource
information. Figure 4.3 shows the structure of Eucalyptus including CLC, CC, NC, Walrus and SC.

4.2.1.1 High Performance Computing as a Service


A more specialized instance of IaaS ls High Performance Computing as a Service (HPCaaS). An important feature
of high performance computing is the need for networking with low latency and high bandwidth. To archive this
goal, the machine instances need to be located next to each other physically. The feature of grouping nodes is
supported by OpenNebula. Several cloud service providers offer solutions for HPCaaS. Examples are the Amazon
Cluster Compute Instances23, Gridcore Gompute24, Penguin Computing on Demand25, Sabalcore HPC on
Demand26 and UNIVA UD UniCloud27.
FIGURE 4.3
Structure of a Eucalyptus Private Cloud IaaS.

4.2.2 Platform as a Service


A Platform as a Service (PaaS) is a scalable runtime environment for developers to host web applications. PaaS
represents a development and execution environment to support a single or few programming languages. The target
audience is software developers and end users that very often are able to consume the services in a corresponding
marketplace. Due to the fact that the infrastructure is abstracted away, a PaaS intrinsically allows scaling from a
single server to many. The end user has no need to care about operating system maintenance and installation of
application software packages. There is almost no administrative overhead in the process of service delivery.
Popular public cloud PaaS offerings are Google App Engine28, Bungee Connect29, Zoho Creator30 and Windows
Azure31. The Google App Engine allows to run web applications that are written in Python or Java inside Google’s
data centers. Bungee Connect has its own C-style programming language called Bungee Sky, and Zoho Creator is
focused on the creation of online database applications with HTML forms. While PaaS applications are usually run
in a sandboxed environment, it is an outstanding feature of Windows Azure that it grants access to the underlying
infrastructure, too.
It is possible to implement and host a private cloud PaaS with e.g., App-Scale3233 [185] [147] and typhoonAE34. Both
solutions are compatible to the Google App Engine API. To provide the same functionality and API these private
clouds both use the publicly available Google development server as a fundamental platform. It is a limitation of all
PaaS that the users are bound to given compiler versions and features of the environment.

4.2.3 Software as a Service


In a Software as a Service (SaaS) environment, software or applications are operated by a provider and can be
consumed as a utility. In general, no software has to be installed at the local site and the services are accessed by use
of a web browser. However, there needs to exist a trust relationship to the provider regarding privacy and
availability of the service because not only the application itself, but also the data are stored at the provider site.
Popular SaaS offerings are Google Apps35, www.Salesforce.com36, Gliffy37, Clarizen38, Zoho39, SlideRocket40, Adobe
Photoshop Express41 and Apple http://iWork.com42. For running a private SaaS, Sugar43 and Zimbra44 are two of the
existing solutions.
The SaaS category includes not only simple web applications, but also more complex services such as Cloud
Printing, Cloud Gaming or Cloud Operating Systems.
4.2.3.1 Cloud Gaming
Cloud based gaming services enable users to play 3D video games without the need to operate specific hardware or
operating systems locally. Players furthermore get rid of installing and registering the games. The advantages for the
game developers and publishers are the increased number of potential customers, and that it’s impossible to make or
use a pirate copy of games with a Cloud Gaming service. The games are run with the infrastructure at the provider’s
site. The graphics and sound output is streamed to the user’s device while input commands are sent back to the
provider and interpreted there. To use these services, a broadband internet connection is necessary, with low
latency. Popular Cloud Gaming service providers are OnLive45, Gaikai46 and Otoy47.

4.2.3.2 Cloud Print


Cloud Print48 is a part of Google’s new ChromeOS49 strategy and a new concept for accessing printers from any
applications and from any device with any operating systems in the world without the need to install any driver.
Such a service is important due to the growing number of portable networked devices like smart phones and tablet
computers. These lack drivers for printers and sometimes even a printing subsystem. It is the idea of Google Cloud
Print to send the print jobs to a service that processes the jobs and redirects them to a printer.
The printer can be a network printer compatible to this service or a legacy printer that requires a proxy software
running on the host it is connected to. Cloud Print is an open standard to get rid of the need to collect and maintain
print drivers. HP announced it will support Cloud Print and offers a series of inexpensive and compatible printers.

4.2.3.3 Cloud OS
Cloud operating systems or web desktops are browser based operating systems that provide a virtual desktop
running in a web browser. All applications, user data, and configuration are stored by a provider, and the computing
takes place remotely. Examples for well-known cloud operating systems are eyeOS50, Icloud51, Online OS52 and
Cofio53.
It is a major advantage of these services that a desktop in the cloud can be accessed from anywhere in the world by
use of a web browser. There is no need to fear security issues like malware and viruses. The availability may be
better compared to personal computers administrated by the users themselves. Drawbacks are the speed of the user
interface which is limited by the network connection and the web technologies in use. Cloud Operating Systems
also don’t provide the same level of rich features compared to traditional desktop systems.

4.2.4 Landscape as a Service


A special use case of SaaS is Landscape as a Service (LaaS). The aim of such a service is to provide complex, non-
multitenancy software like SAP R3. LaaS makes an entire SAP system landscape available as a set of services.
Offers like these enable companies to outsource their complete datacenter including hardware, software and
maintenance. Providers of LaaS are companies like Fluid Operations54 or Zimory55together with T-Systems.

4.2.5 Humans as a Service


Humans as a Service (HuaaS) follows the principle of crowdsourcing by using the intelligence of the crowd in the
web. Whenever a computer is not very well suited to fulfill the task at hand and time is a problem, crowdsourcing
may be a good solution. Large amounts of human work force can be leveraged for a small amount of money.
The most popular marketplace for HuaaS is the Amazon Mechanical Turk 56 (see Figure 4.4). Typical tasks that are
offered here are translation jobs, ratings, and matching products with product categories.
4.3 Cloud Resource Management
To interact with cloud services that provide SaaS and HuaaS the users need nothing more than an ordinary browser.
The management of cloud platform services like the Google App Engine or compatible private cloud solutions is
done with the tools the provider of these services makes available to the users. The existing cloud infrastructure and
storage service management tools can be classified by four categories:

FIGURE 4.4
Amazon Mechanical Turk.
• Command-line tools like the AWS tools delivered by Amazon, the Euca2ools57 of Eucalyptus, GSutil58 of Google
Storage and s3cmd59
• Locally installed management applications with graphical user interface like EC2Dream60, Gladinet Cloud
Desktop61 and Cyberduck62
• Firefox browser extensions like ElasticFox63, Hybridfox64 and S3Fox65
• Online services such as the AWS Management Console66 offered by Amazon, the Google Storage Manager67 and
Ylastic68
A trade-off analysis of the available cloud management solutions is shown in the Tables 4.2, 4.3, 4.4 and 4.5.

4.3.1 Command-Line Tools


The command-line AWS tools offered by Amazon only support their own public cloud offerings. The AWS tools
are a collection of a large number of command-line applications (e.g., ec2-create-volume, ec2-delete-keypair, ec2-
describe-images, ec2-start-instances,…) that are written in Java and enable the customers to access the whole
functionality of services like EC2, EBS and ELB that are part of the AWS. The Euca2ools of the Eucalyptus project
and GSutil of Google Storage support both public and private cloud infrastructure services.
The Euca2ools are inspired by the AWS tools. These command-line applications have a similar naming schema
(e.g., euca-create-volume, euca-delete-keypair, euca-describe-images, euca-run-instances,…) compared to the AWS
tools and largely accept the same options and environment variables. The implementation is made with Python and
the Python library boto69 which is an interface to cloud services that are interface compatible with the AWS. The
GSutil is also implemented with Python and boto.
TABLE 4.2
Command-Line Tools to Interact with Cloud Services
Euca2ools AWS tools s3cmd GSutil
License BSD Apache v2.0 GPLv2 Apache v2.0

Charge none none none none

Support for EC2 yes yes no no

Support for S3 no no yes yes

Support for EBS yes yes no no

Support for ELB no yes no no

Eucalyptus yes yes yes no

Walrus no no yes yes

Nimbus yes yes no no

Cumulus no no yes no

OpenNebula yes yes no no


CloudStack no no no no

Google Storage no no no yes

The command-line tool s3cmd works with S3, as well as the S3-compatible private cloud storage services Walrus
and Cumulus. An advantage of s3cmd is that it is an easy to use command-line interface for S3. With just a
few options, all important functions of S3 and interface compatible cloud storage services can be used by the
customers.
A drawback of all command-line tools is that they require a local installation and therefore may lack ease of use
because there is no graphical user interface. The Euca2ools and GSutil both require a Python runtime environment.
The AWS tools need the Java runtime edition to be installed.

4.3.2 Locally Installed Management Applications with GUI


EC2Dream is a management application for EC2, EBS, RDS and Eucalyptus. The software is open source and runs
with Windows, Mac OS X and Linux. Cyberduck is an open source client for various file transfer protocols and
cloud storage services. The application runs with Windows and Mac OS X and it supports directory synchronization
and multiple languages. The proprietary application Gladinet Cloud Desktop also supports various different cloud
storage services and integrates them into the Windows Explorer interface.
TABLE 4.3
Locally Installed Management Applications with Graphical User Interface to Interact with Cloud Services
EC2Dream Cloud Desktop Cyberduck
License Apache v2.0 proprietary GPLv3

Charge none $59.99/year none

Support for EC2 yes no no

Support for S3 no yes yes

Support for EBS yes no no

Support for ELB yes no no

Eucalyptus yes no no

Walrus no no yes

Nimbus no no no

Cumulus no no no

OpenNebula no no no

CloudStack no no no

Google Storage no yes yes


4.3.3 Browser Extensions
The existing browser extensions only work with the Firefox browser and not other popular browsers like Internet
Explorer, Google Chrome, Safari or Opera. The customers need to install and maintain the extension on their local
computer, a fact that somehow does not reflect the cloud paradigm very well. Recent versions of ElasticFox,
Hybridfox can interact with EC2, the storage services EBS and Eucalyptus infrastructure services. The extension
S3Fox (see Figure 4.5) is only capable of working with S3.
TABLE 4.4
Browser Extensions to Interact with Cloud Services
ElasticFox Hybridfox S3Fox
License Apache v2.0 Apache v2.0 proprietary

Charge none none none

Support for EC2 yes yes no

Support for S3 no no yes

Support for EBS yes yes no

Support for ELB no no no

Eucalyptus yes yes no


Walrus no no no

Nimbus no no no

Cumulus no no no

OpenNebula no no no

CloudStack no no no

Google Storage no no no

4.3.4 Online Services


When using online services the customers have the freedom to use any operating system and a wide range of
browsers. However, Amazon’s AWS Management Console (see 4.6) is in line with just Amazon’s cloud service
offerings. It is impossible for the users to configure it in a way to manage other cloud infrastructure or storage
services. The same holds for the Google Storage Manager (see 4.7) that cannot be configured to work with any other
cloud service besides Google Storage.
Ylastic is a commercial service from a company that offers just a management tool and no cloud infrastructures or
storage services. Ylastic offers support for most AWS cloud services and Eucalyptus infrastructures but it is not
possible to manage other compatible infrastructures e.g., Nimbus or OpenNebula. As the access keys are stored with
the provider of the management service, the customer also has to trust the provider of the management tool
regarding privacy and security.
The KOALA and the AWS Console are compatible with any up to date browser. The Google Storage Manager and
Ylastic cannot be used with the Internet Explorer.
The software service KOALA70 (Karlsruhe Open Application (for) cLoud Administration) is designed to manage all
cloud services that are compatible to the AWS API (see figure 4.8). KOALA71 has been developed in the framework
of the Open Cirrus™ cloud computing testbed that is sponsored by HP, Intel and Yahoo! [160] The service not only
supports the public cloud services EC2, S3, EBS and Google Storage, but also private cloud services based on
Eucalyptus, Nimbus or OpenNebula, and Walrus and Storage Controller from the Eucalyptus project can be used
with this management solution. In addition, it is possible to create and manage elastic load balancers with Amazon’s
Elastic Load Balancing (ELB) service.
FIGURE 4.5
S3Fox Browser Extension.
KOALA itself is a software service developed for the Google App Engine. Besides public hosting, it can be run as
well inside private cloud platforms based on AppScale or typhoonAE. Operating KOALA inside a private cloud
allows it even to host the management tool inside the same managed cloud that it controls. A customized version of
KOALA for Android and iPhone/iPod touch is also included, which makes it easy to control cloud infrastructures
and storage services with mobile devices.
A collection of alternative management solutions are optimized for mobile phones and similar devices and are run
inside the e.g., Apple or Android platforms in a native way Ylastic’s iPhone and Android App, the
iAWSManeger72,Elasticpod73 and directThought74.
FIGURE 4.6
AWS Management Console.
4.4 Conclusion
The growing number of public and private cloud services, as well as tools to work with these services keep the
cloud market in a dynamic condition. Big players like Amazon and Google as well as the open source projects and
start-up companies like Cloudera75 with their Hadoop distribution and Nimbula continuously improve their
solutions, products and tools. This dynamic situation leads to new interesting services like the new Amazon Cluster
GPU instances that are equipped with two Quad-Core Intel Xeon-X5570 Nehalem CPUs and include two Nvidia
Tesla-M2050 GPUs each. Such new services give more opportunities to the customers to use IT resources in an
elastic and favorable way without the need to buy and run their own hardware.

FIGURE 4.7
Google Storage Manager.
TABLE 4.5
Online Services to Interact with Cloud Services
KOALA AWS Console Ylastic Storage Manager
License Apache v2.0 proprietary proprietary proprietary

Charge none none $25/month none

Support for EC2 yes yes yes no

Support for S3 yes yes yes no

Support for EBS yes yes yes no

Support for ELB yes yes yes no

Eucalyptus yes no yes no

Walrus yes no yes no

Nimbus yes no no no

Cumulus no no no no

OpenNebula yes no no no
CloudStack no no no no

Google Storage yes no no yes