Anda di halaman 1dari 11

CITTIO Cloud Monitoring Solution - Position Paper

• Project Zeppelin

• WatchTower v4.0
• Cloud Monitoring Architecture


667 Mission Street, Fourth Floor

San Francisco, CA 94105

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

Industry Evolution
Cloud Computing has established itself as one of the most promising new technological evolutions that is about to
change how IT is built, delivered and managed. Cloud computing represents both the technology subsystems and
the delivery of remote compute infrastructure and application services over the network – whether public or
private. This phenomenon is driven by the twin forces of scale and automation to manage that scale functionally
and economically.

In 2008, well-publicized efforts such as Amazon Elastic Compute Cloud, (EC2), have been followed by other
announcements from more traditional players including IT vendors IBM, HP, Oracle, and Citrix, as well as
communication service providers like AT&T and Deutsche Telekom. Now corporate IT is close behind – particularly
those that have built grid architectures in order to benefit from the scale and flexibility of running applications
over large server clusters.

There are a few key factors that triggered the evolutionary cycle to cloud computing.

One was the advent and standardization of virtualization as an enterprise-scale technology over the past few
years. Virtualization provides the fabric to dynamically provision, allocate and manage multiple, heterogeneous
machine instances in a single piece of hardware – dramatically improving the utilization and flexibility of data
center infrastructure. As the workload varies, resource schedulers can trigger the “motion” of virtual machines to
operate in a more appropriate resource environment.

The second was the development of scale-out technologies. Parallelization subsystems like Hadoop can spawn
new processes on additional CPUs without manual intervention. Automatically distributing data and processing
over commonly available server clusters based on increasing workload allows multiple physical systems to act like
one instance.

The third was the maturation of web services standards as the established medium of communication between
applications and infrastructure systems. Since web services are based on open standards the influence of open
source contribution to the cloud ecosystem has been significant. As a result cloud technologies are more open and
standards based than silo-ed and proprietary architectures of the past.

All these scale technologies described above enable application deployment to become simpler, rapid and on-
demand as they abstract out the complexity of the underlying physical infrastructure. In essence computing
infrastructure is moving from a fixed resource on a physical machine instance to an ‘elastic’ resource spread over
one or multiple machines that could grow and contract based on demand.

That said, there is still a long way to go for mainstream adoption of clouds by enterprises as they wait for the
proven availability, reliability and strong SLA coverage of cloud services – to the level delivered by traditional
hosting and network service providers.

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

Cloud Computing in the Enterprise

While the changes in application and infrastructure architectures promise to be dramatic, this brings in new risks
to the command and control structures in today’s IT operations. Enterprises value the differentiated business
technology processes, data and skills that enable them to stay ahead of their competition. Security and
compliance are thus critical from both a self-preservation and regulatory perspective.

New technologies like virtualization bring in new level of risks – from the stand point of unknown security
vulnerabilities as well as the limited experience to deliver ‘five-nines’ reliability that today’s business demands.
However as virtualization matures, more enterprise organizations are ready to embrace it. The economic and
technological drivers are too strong to have it any other way.

The next frontier for enterprise IT evolution is the adoption of cloud services and technologies. The business
drivers for cloud adoption are indeed as or even more compelling.

For small enterprises farming out internal applications to the cloud will eliminate the cost and resources required
to buy, build, maintain and manage self-owned infrastructure. For generic business processes – sourcing entire
applications from the cloud as SaaS (software as a service) is also a viable option.

For large enterprises, that have the scale of applications, infrastructure and end users – building private clouds
that benefit from the latest technology underpinnings is worth exploring. Alternatively, they could source services
at different points in the stack – including infrastructure, platform middleware and entire applications from
multiple 3rd party cloud operators. For example, if a billing application occasionally demands peak loads at the end
of every month, the extra compute and storage capacity could be sourced from a remote cloud operator. This is
known as ‘cloud-bursting’. Or, whole applications like sales automation or intranet portals could be run on remote
cloud infrastructure much in the manner of a ‘cloud farm-out’.

Technology Challenges of Monitoring Cloud Infrastructure

However for all of the promises of cloud computing – rapid deployment, higher flexibility, massive scalability and
lower capital expenditure costs - much of the progress is dependent on the evolution of next-generation
management technology.

• Most system management solutions continue to rely either on proprietary agent technology or SNMP
(Simple Network Management Protocol) for their performance metric collection today.

The first issue with these technologies is that they were designed for on-premise monitoring. They lack
the ability to transfer data securely over public networks. This makes it necessary to support new
mechanisms to provide remote monitoring of cloud application infrastructure. Since application resources
can contract and expand – traceability and mapping of application to resource dependencies need to be
tracked as they change over time. These changes can occur over multiple data centers and/or providers.

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

The second issue is that proprietary agents require complete control of the managed environment, create
vendor lock-in and often add resource, cost and management overheads that make them unfit for cloud
monitoring. By choosing the most relevant, widely adopted standards based technologies for monitoring
and management customers can accelerate their deployments, limit their risk exposure, and maintain
their options open as the market matures.

SNMP presents severe limitations as well. As a simple protocol with a primary focus on managing network
equipment like routers, SNMP does not capture the complexity of today’s systems, their layered
architecture, inter-relationships and diversity (e.g. hardware versus software). Further, SNMP is both a
management information representation and a protocol –which needs to adhere to strict definitions,
severely limiting its extensibility. Present day systems management, on the other hand, calls for flexible
object representation and web service accessibility to harness their true potential.

Management specifications such as the DMTF-sponsored WBEM/CIM-XML standard (Web Based

Enterprise Management and the Common Information Model) and WS-Man (Web Services Management)
therefore become attractive options for cloud monitoring as they are easily deployable on public and
private clouds, capture more complex structures and are inherently internet friendly.

• New metrics will need to be created to accurately monitor cloud characteristics of elasticity and resource
availability. To best deliver elasticity, cloud computing infrastructures leverage virtualized systems, server
and network abstraction as well as geographic load balancing and dynamic provisioning. In this context,
notions of consumption of ‘fixed’ resources may not always apply as capacity becomes variable - or elastic
- in the new paradigm, potentially rendering traditional metrics unfit for cloud computing. Given the lack
of control of the underlying environment by the user, it will be important to measure the “availability to
promise” (ATP) of the resources assigned by the cloud system as well as the consistency of that availability
over time. In essence the ATP measures the ratio between “contracted for” resources like CPU cycles,
storage, bandwidth etc. against what has been actually delivered.

• Likewise, new distributed computing subsystems that manage parallelization across tens to hundreds of
machine instances have no standard monitoring instrumentation to speak of. This is the case with Hadoop
for instance. New instrumentation needs to be developed to provide operational visibility and
troubleshooting capability for such critical components.

• Given the lack of appropriate instrumentation, managing and metering usage and SLA delivery at both the
cloud operator and application user ends is still primitive or non-existent. The SLA’s defined by cloud
operators today, if any, mostly depend on basic home-grown management applications and are not
auditable and verifiable by independent monitoring. This is little comfort for enterprise managers and a
big barrier to cloud adoption.

What this all means is that cloud monitoring and management will require a dramatic evolution in the capability of
existing network and systems management vendors. CITTIO intends to support this new and disruptive
transformation of IT management by launching Project Zeppelin, an open source initiative, as a first step of to
catalyze management standardization around Cloud Computing.

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

CITTIO Cloud Monitoring Solution - Step 1: Introducing Project Zeppelin

Project Zeppelin’s goal is to provide a consistent, industry standard way to discover, monitor, evaluate and audit
the performance of cloud infrastructure and applications across disparate cloud operators. Project Zeppelin
includes a set of agents that provides detailed asset, performance, auditing, benchmarking and usage metering
information for cloud infrastructures. It can be easily deployed remotely, yet its data can be securely accessed
across the public Internet.

Instead of building proprietary instrumentation that only has applicability to limited cloud technology domains,
CITTIO has opted to promote an open source initiative that can benefit from the power of the community. CITTIO
has undertaken considerable internal design and development efforts to contribute a well planned architecture
and technology for cloud monitoring that is rooted in the use of next-generation protocols including WBEM/CIM-
XML and WS-Management.

CITTIO expects that the open source development approach will spur the community discussion and contribution
that is necessary to meet the challenges of a rapidly evolving cloud infrastructure technology and vast variability in
platforms and services. CITTIO will continue to make original contributions in key areas and participate actively in
providing thought leadership and direction to the project.

Key features of Project Zeppelin

Project Zeppelin offers the following key features aimed at monitoring and managing the broad range of cloud
infrastructure and technologies –

Standard Linux Management Instrumentation

Since most cloud infrastructures are built upon Linux as an underlying operating system and since many cloud
hosters choose Linux as their OS for their machine instances, Linux instrumentation is critical. Project Zeppelin
leverages the well known SBLIM (pronounced “sublime”) project for building in advanced Linux instrumentation.

Virtualization Framework Monitoring

Virtualization is a foundational element of a cloud infrastructure, allowing for rapid provisioning, easy scaling and
mobility of a machine instance. Project Zeppelin provides deep instrumentation for XenServer via the CIM
implementation built by Citrix as part of project Kensho and for VMware through its own CIM implementation.
Support for additional virtualization frameworks from Microsoft and Redhat will be added in forthcoming releases.

Infrastructure Performance Benchmarking

The goal of benchmarking is to test whether the infrastructure that has been delivered by a cloud operator meets
their promise. For example, an IT organization that has purchased the processing power equivalent of a single
core 2GHz CPU for a machine instance, needs to confirm that they are really getting the contracted processing
power. To do this they need to benchmark the CPU’s actual performance periodically over time. There are
scenarios where a “dirty neighbor” may consume more resources than expected thus leaving the IT organization

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

with less processing power than they actually purchased. Project Zeppelin instrumentation and data will be key to
ensuring SLA compliance.

New Hadoop Instrumentation

One of the unique uses of a cloud is the ability to harness the collective processing power of many machines to
perform a complex or compute intensive job. The software used to do this today is typically some derivative of
MapReduce or Hadoop. Neither of these technologies has any formal instrumentation and make running these
massive compute jobs essentially a black box process. For instance, Hadoop optimization is an onerous series of
trial and error that requires wading through copious logs following each run. Furthermore, if one of these large
compute jobs crashes after running for many hours, it leaves the application owner with limited capability for
troubleshooting and increases overall costs with rerun(s).

Unique Value Proposition of Project Zeppelin:

Project Zeppelin presents significant value by being the first in the industry to offer the following capabilities:
• Only project that supports XenServer, VMware, Linux, Hadoop and cloud benchmarks in one small
footprint agent with a single installer
• Only available WS-MAN/CIM-XML platform for the cloud
• Only available instrumentation on Hadoop
• Only usage metering platform that allows usage stats to be collected from the bare metal up through the
operating system to the virtualization platform
• Available in open source license

CITTIO Cloud Monitoring Solution - Step 2: WatchTower v4.0 with Cloud

The open source nature of Project Zeppelin and its use of standards based components will make it useful and
feasible for independent management vendors and open source management projects to interface with the
Project Zeppelin agents via CIM/XML or WS-Man. For example, MOM (Microsoft Operations Manager) users can
directly connect to Zeppelin agents via WS-Man.

In the second delivery phase of its Cloud Monitoring solution CITTIO will offer deep support and integration with
Project Zeppelin agents in its flagship monitoring WatchTower product. This will enable WatchTower to provide
the detailed fault and performance visibility of cloud systems comparable to what it delivers for both enterprise-
class physical and virtual infrastructure of today. CITTIO’s commercial cloud-enabled WatchTower product is
targeted for release in the second half of 2009 and will offer the following key capabilities:

• Provide infrastructure monitoring of the physical data center aspects for cloud operators, including
physical servers, networking, security and storage equipment

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

• Provide data collection and reporting from Zeppelin agents, including real time analysis and historic
trending from both on-premise and remote locations
• Provide application and service level monitoring for software running on cloud deployments including
Hadoop, databases, web servers, email servers, etc…, from both on-premise and remote locations
• Provide cloud operators with customer to resource mapping in a multi-tenant fashion for service impact
• Support customer usage and service level portals for cloud operators and Enterprise IT
• Support benchmarking and SLA audit visibility for Enterprise IT and application operators
• Support usage based billing and detailed records via WatchTower analytics

Fig 1: Targeted coverage of Cloud monitoring, metering and management tools by CITTIO

Benefits of CITTIO’s combined Cloud Monitoring Solution

CITTIO’s Project Zeppelin and its WatchTower v4.0 monitoring and reporting platform will allow both cloud
operators (those who offer cloud infrastructure and platform services) and application operators (those who put
applications on the cloud from within the Enterprise) to be able to gather deep monitoring statistics on the
performance of remotely hosted (public or private) infrastructure and applications. These capabilities will
combine to offer the following business benefits –

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

For Application Operators and Cloud Operators:

Fault and Performance Management Visibility and Operational Support

While application operators and cloud operators both wish to abstract much of the underlying complexity of the
infrastructure, they still demand clear visibility to the infrastructure in order to better tune, manage and configure
their particular application. CITTIO’s cloud monitoring solution will offer detailed fault and performance
management visibility into the Cloud infrastructure for detailed troubleshooting and operational support.
Additionally, cloud operators will be able to assess the impact of physical hardware and systems failure on
different customers and service tiers and take remedial action as necessary.

Independent and Verifiable Usage Metrics for Billing and Cost Analysis
Project Zeppelin’s instrumentation coupled with the analytics capability in WatchTower will offer cloud operators
a definitive way to measure customer usage in order to produce accurate billing against contracted terms. Cloud
application hosters will also want to gather performance information to ensure they are getting what they are
paying for through independent verification of a cloud operator’s SLA’s. This will also enable application operators
to conduct price to performance analysis to optimize their spending decisions.

Fig 2: CITTIO Cloud Instrumentation for Application Operators and Cloud Operators

For Application Operators and Enterprise IT

Accurate SLA Reporting on Cloud Sourcing

One of the key impediments to Enterprise adoption of clouds is that Enterprise application operators feel as
though they will be “flying blind” when they host critical applications on clouds because they have little or no
visibility on the underlying infrastructure upon which they rely. Project Zeppelin instrumentation viewable through

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

WatchTower v4.0 will enable cloud operators and application operators to accurately verify that their provider is
meeting their requirements on ‘promised’ computing and infrastructure resources in the cloud.

Single Platform Coverage across traditional and Cloud based applications

While seamless and dynamic cloud-burst and policy driven deployment of enterprise applications on 3rd party
provider cloud-farms are still some way off, enterprises will in all likelihood start with non-critical applications
being hosted on the cloud. The ability to view all of the cloud and internally hosted applications and infrastructure
in one single monitoring system will enable enterprises to save cost and effort of managing multiple management
platforms. All organizational categorization and aggregation, user level privileges and workflows will be easily set
up, shared and managed. Enterprises will also gain from WatchTower’s core benefits of rapid deployment,
automation, lean operation and low cost of ownership.

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

-Appendix A -
Project Zeppelin Cloud Monitoring Architecture and Key Features
Project Zeppelin - Management Stack
The proposed Cloud Management Architecture in Project Zeppelin supports two standard-based protocols CIM-
XML and WS-Man; both of which are able to pass through the firewall and transmit data across the public Internet
securely. It uses the SFCB (Small Footprint CIM Broker) from the SBLIM (pronounced “sublime”) open source
project as the CIMOM (CIM Object Manager) agent. SBLIM’s goal is to provide a complete Open Source
implementation of a WBEM-based management solution for Linux.

Fig. 3: Project Zeppelin’s Cloud Management Architecture

The CIM ‘providers’ (this refers to the web-services management interface for a system that exposes performance
metrics) can be individually installed and allow users to control the size and scope of the agent at install time. The
SBLIM ‘provider’ is actually a group of roughly 10 ‘providers’ that can be installed separately as RPMs (a core
component of many Linux distribution systems, RPM Package Manager is a powerful command line driven
package management system capable of installing, uninstalling, verifying, querying, and updating the Linux OS).

©2009 CITTIO All Rights Reserved

CITTIO Cloud Monitoring Solution - Position Paper

The XenServer ‘provider’ is taken from Project Kensho and run under SFCB. The VMWare ESX ‘provider’ is taken
from the vendor’s packaged implementation and can also run under SFCB. Other CIM providers can be installed
later and will dynamically plug into the SFCB framework.

As part of Zeppelin’s initial builds, CITTIO has developed a sizable set of instrumentation and ‘providers’ for key
pieces of technology around managing the widely used Xen/Linux virtualization stacks. Future contributions in
Zeppelin from CITTIO will include the development of ‘providers’ for other key systems including Microsoft-
Hypervisor and Red Hat.

Benchmarking information will be provided by the LMBench open source project and exposed via a CIM-XML
provider based on the DMTF CDM (Common Diagnostics Model) specification. This will enable accurate
benchmarking of performance right from the bare metal machine motherboard to the OS and application stack.
Benchmarking parameters will include throughput/bandwidth and latency measures for CPU, memory and I/O.
Running the benchmark both locally and remotely will provide a real comparison of the performance of the cloud
infrastructure vis a vis the internal data center.

The Zeppelin agent is easily installed via an RPM installer. The RPM installer is easily modifiable in order to control
which specific providers are installed. This will allow a user to control how large and how expansive the Zeppelin
agent is, once installed.

Project Zeppelin - Key Technical Features

• Project Zeppelin offers dual support for both WS-Management and CIM-XML. In doing so Project Zeppelin
offers multiple industry standard methods to access deep cloud instrumentation securely across the open
• Deep support for XenServer and VMWare ESX virtualization both for the cloud operator and the
application operator. XenServer exposes instrumentation not only to the cloud administrator, but also to
the user of a machine instance.
• Deep support for standard Linux management instrumentation leveraging SBLIM’s WBEM-based solution.
• Auditing and benchmarking capabilities from the motherboard level, to the network card, up through the
operating system performance and configuration using LMBench to ensure that the computing power
made available by a cloud operator meets its promise.
• Ability to send log information from a remote cloud machine instance back to a log management solution
within an IT environment for monitoring and regulatory compliance auditing.
• Support for Hadoop instrumentation
• Support for usage monitoring, metering and billing for contracted services

©2009 CITTIO All Rights Reserved