Anda di halaman 1dari 4

Cloud Computing Infrastructure for Latency Sensitive

ZhItuo Vun
Research, Technology and Platforms
Nokia Siemens Networks
zhitao. wan@nsn. com
Ahstract- The emerging Cloud Computing shows an attractive
vision. Cheap hardware takes over workload from expensive
dedicated hardware. Easy to develop, deploy and pay as you use
model accelerates application release cycle and saves overall cost.
The march to AII-IP network continues and more latency
sensitive mission-critical operations migrate to the Cloud. But the
current geographical distributed Cloud introduces unpredictable
latencies caused by dispatching, network transmission and
computation comparing with traditional computing model. It
challenges Cloud Computing and hinders the effort to migrate
applications to the Cloud. This paper analyzed the latency in
current Cloud and presented an empirical taxonomy of latency
with four classes, gave a Cloud Computing infrastructure for
different classes of latency sensitive applications with high
efciency and a proof of concept prototype.
Keywords- Cloud Computing; Latency Sensitive; Latency
The term cloud was a metaphor for the Interet to represent
backbone or a cluster of network elements connects to
backbone via network interface or access equipment such as
router, switch, modem and so on. The Cloud Computing
concept revitalize an old concept of computer time-sharing
technology dates back as early as 1961 by John McCarthy.
The redefned Cloud Computing concept addresses new vision
of distributed computing evolution and brings "Cloud
Revolution". It uses current technologies to practice old
concept for better use of computing, storage, network and
other resources. The most revolutionary result is it provides an
easy to use model to develop and deploy new applications. It
is so called everything is a service (XaaS) [1]. The benefts
include [2] "
1. Reduced implementation and maintenance costs
2. Increased mobility for a global workforce
3. Flexible and scalable infastructures
4. Quick time to market
5. IT department transformation (focus on i=ovation vs.
maintenance and implementation)
6. "Greening" of the data center
7. Increased availability of high-performance applications
to small/medium-sized businesses"
Today's Cloud Computing is based on virtualization,
scalability, interoperability, quality of service guarantee,
security, failure recovery and other support technologies. The
Cloud operators must accommodate computing, network and
978-1-4244-6871-3/1 0/$26.00 20 10 IEEE
storage resources with specifed, reliability, availability,
performance and security constraints. There are many
commercial public Cloud Computing systems online. The
providers include Amazon, Google, IBM, Microsof, Joynet,
OpSource, Elastra, Savvis and etc. From user's perspective
there are totally own hold private Cloud, totally third party
public Cloud and hybrid Cloud.
With the progress of optical and microelectronics fber
replaces copper and provides cheaper and higher bandwidth.
More powerfl and cheaper computers with lower power
consuming and footprint emerged. And the competition in
market drives Operators to reduce Total Cost of Ownership
(TCO) with high service quality. Traditionally, to guarantee
the response time and reliability large organizations usually
purchase and maintain mainfame, workstation and other
expensive hardware for mission-critical operations.
Telecommunication Operators use close architecture dedicated
hardware with elaborately designed sofware. In recent years
COTS (Commercial Off The Shelf computing system are
wildly adopted, e.g. ATCA (Advanced Telecom Computing
Architecture) cards and chassis with general OS, e.g. Linux.
Cloud Computing breaks the wall of physical enclosure of box
or chassis. The components distributed physically across
boxes and geographical boundaries can be organized as a
powerful computing entity on demand. But Cloud also
introduces extra network transmit latency and dispatch latency.
For those latency sensitive applications if there is no latency
guarantee mechanism the time out failure will drop the service
quality which is not acceptable. This paper analyzes the
latency sensitive applications and fnds out an high effciency
approaches to guarantee latency in current Cloud to
accommodate latency sensitive applications.
Each task submitted to a computer should be completed in
an acceptable time frame. Typically a complex scientifc
computing task may cost more than several months. These
kinds of applications are not so latency sensitive for extra
seconds. According to the ISO OSI 7-layer model the traffc
of most end users in network environment belongs to
application layer. File download, mail exchange are latency
insensitive. One of the most popular latency sensitive
applications is WWW. Commonly the latency of static or
simple dynamic web pages can be assumed ranging fom .3 to
150 ms [3] from server side. Considering the last mile latency
to a user surfng the Interet up to several seconds is not
beyond tolerance even it is kind of latency sensitive. But to
online video games, almost the most latency sensitive
applications, the WW level latency is unacceptable.
Typically the latency ranges from 20ms to 100 ms [4].
In the Carrier Network some computing tasks are more
latency sensitive. For example, the RAN part C-plane latency
should be less than 50 - 100 ms [5]. The U-plane latency
should be less than 5ms [5]. The most critical latency
limitation is the download and upload fame at air interface. It
should be less than 1 ms [6]. Commonly more latency critical
applications lower than tens of microsecond do not ft for
today's distributed network based processing.
An empirical taxonomy of latency sensitivity class is:
1. Insensitive
Include all tasks that can be put to background.
2. Second level
Cover common man-machine interactive.
3. ten milliseconds level
Satisf most of real-time man-machine interactive.
4. sub-millisecond level
Support most critical under layer services in current
The Cloud Computing is based on computer networking.
Commonly the latency in the Cloud includes network
transmission, dispatching and computing latency.
A. Network Latenc
Generally the total network latency is the RTT (Round-Trip
Time). It is the time interval between the response message
arriving time at the source and the time of request message
send out.
1. Bandwidth
For a given length packet the transmISSIOn time
inversely proportional to the network bandwidth.
2. Switch node
Most of current switch nodes work in store and forward
model. Sometimes a such node is critical point impacts
the end to end latency.
3. Hops
A more hop means more latency accumulates.
4. Packet length
Long packet will cost more transmission time and may
cause fragmentation and reassembly which introduce
extra latency.
5. QoS
Latency sensitive applications should have the highest
QoS priority.
B. Dipatching Latenc
The geographical distributed resource cost extra time to
complete resource lookup and task scheduling. The
dispatching mechanisms are centralized and/or distributed.
C Computing Latenc
The computing latency depends on the hardware and
sofware including processor, main memory, co-processor, I/O,
hypervisor, operation system, middleware, run-time
environment, applications themselves and other supporting
hardware/sofware components.
CPU architecture, main fequency, cache size, instruction
set defnitely impacts the latency. Operation System is critical
for latency control, real-time OS provides better latency
Workload of a computing system also impact the response
time. A typical benchmark on X86 architecture computing
system shows a low and smooth response time curve when
workload bellow 0. 5. With the workload increasing to 0. 8 -
0. 9 the response time increases dramatically to unacceptable
level [7].
Real-time OS fts for latency sensitive applications. Sof
real-time and hard real-time are ft for different workload.
D. Manage Te Latencie
Cloud Computing adopt complex sofware stacks to provide
abstraction of geographical distributed heterogeneous
hardware. The total latency can be breakdown into to the
sojour time in each element in the Cloud. A straightforward
way to construct latency sensitive Cloud is to mark off
competent elements in the Cloud. Assume access points to
latency sensitive service are plauned. The networking and
computing elements close to the access points are preferred.
Metering network latency and computing latency is the
precondition to construct a mesh to cover all competent
elements. In a real Cloud it is not easy to fgure out such mesh
because the Cloud itself shrinks and expands dynamically.
Reserve resources for latency sensitive applications but it is
kind of luxury. A acceptable tradeoff is to fnd out latency
stable elements as the backbone of latency sensitive Cloud and
grants the latency sensitive trafc with high enough priority or
QoS type.
The reliability of the Cloud, which is based on not reliable
hardware, is guaranteed by failure recovery mechanism.
Generally, any failure should be detected and recovered. Some
application layer workload is discardable, e.g. the live video,
and the recovery is not necessary. But for most mission
critical applications the real-time recovery is necessary. The
recovery point should be closer to access point or
geographically far from the failure element.
The capacities of computing elements are not identical in
the Cloud. To improve the efciency and smooth the service
balancing of computing and network latency on metering
information to dispatch workload to all possible elements in
Cloud fairly should be taken into consideration. The criterion
is the latency guaranteed.
Figure 1 depicts the generic architecture of Cloud and its
Services [8]. This paper analyzes the Infrastructure layer in
Cloud and presents the improvement in this layer to provide
different class of latency guarantee.
Sftware as a Srvic
Ap Cmponents asa Sric I
Sftware Aatform as a Srvic I
IMtuallnfrasrucure as a Srvice I
Ryscllnfrastrucureas a Srvic I
Figure I. Generic Cloud Architecture and Services
A. General Conception
According the taxonomy of latency sensitive class in section
III the Cloud can be classifed logically as white Cloud for
sub-millisecond level applications, gray Cloud for ten
milliseconds level applications, dark gray Cloud for second
level application, and black Cloud for latency insensitive
applications. Obviously the elements in the Cloud maybe
belong to different class of the Cloud simultaneously.
Hardware and Software assisted virtualization technologies
is the basis of today's Cloud infrastructure. Fundamentally,
virtualization technologies enable the abstraction of
underlying resources such as CPU cores, memory, devices and
communication [9]. It means that the physical resources can
be carved up into virtual resources to support provisioning on
As discussed in previous section the distributed latency
metering is the basis of latency sensitive Cloud. The most
effective way is practical metering, e. g. send pings to target
computing elements. Another effective method is knowledge
based. Profling the service code is helpfl to estimate the
possible computing latency which is effective to screen
unqualifed elements. The knowledge on networking and
computing elements also help to mark out real-time enabling
elements. In other words, latency stable network itinerary and
latency stable computing system in Cloud. Those elements can
be used as the backbone of latency sensitive Cloud.
The Carrier Etheret has been deployed widely and support
up to 100Gbps. It provides reliable bandwidth with low cost
and easy to deploy compared with conventional WAN
technologies. Carrier Etheret transport would be DWDM,
MPLS or SONET infastructures. Its UNI (User Network
Interface) supports fexible bandwidth. Carrier Etheret is a
service, customers will want service guarantees in terms of a
CIR (Committed Information Rate), CBS (Committed Burst
Size), CoS (class of service) and etc. It is good choice for the
stable network itinerary but not limited to.
Real-time sofware stack is necessary for those latency
stable computing elements.
Failure recovery is critical for Cloud and the latency
sensitive recovery is more "latency sensitive". The recovery
element should be latency stable computing system in latency
stable network itinerary close to the access point with resource
reservation or preemptive scheduling.
Other decision mechanisms include balance of network and
computing latency should be adopted to shape the Cloud with
more fexibility and robustness.
B. Constct Baic Latenc Sensitive Cloud
Ideally the Cloud is public utility and higher layer service is
lower abstraction based. Each layer hides the details of under
layer. Virtualization layer in Cloud commonly uses hypervisor
which supports real-time scheduler with priority based and
time slice based algorithm [9]. A management module is
necessary on the top of virtualization layer. The provisioning
allocation of virtual resources could be made on demand. A
straightforward idea for latency reduction is use local or
neighbor computing elements to limit network latency. But the
localization betrays the sharing idea of Cloud Computing. So
the basic idea in this paper is to mark out all competent
computing and networking elements in the cloud and dispatch
latency sensitive tasks to them. A spanning tree with the root
of an access point should be constructed as the following steps.
1. Metering the networking latency
The network connection is heterogeneous in Cloud and
the metering is both knowledge based and experiment
based. The latency is RTT with the classes discussed in
Section III. Mark off all competent elements.
2. Metering the computing latency
Computing latency is ranked by the computing capacity
of each element in the Cloud that directly connects to
the network elements in the spanning tree in step 1.
3. Tail the spanning tree by trial
Dispatch trial tasks in the spanning tree to get real
latency data to tail the spanning. The elements with
latency less than 1/5 of promised latency belong to core
Cloud and those less than 1/3 of promised latency is
expanding Cloud.
4. Merge core Cloud
Core Clouds have common networking elements can be
5. Run-time dynamic update
The topology of Cloud should be dynamically updated
when computers join and leave or workloads migrate.
6. Run-time failure recovery
The dispatcher is sof real-time and will reissue the
failed task to one of core Cloud elements. To the
latency class other than the sub-millisecond service the
recovery workload should be raised to a higher latency
C Workload Constraints
Generally the more computing intensive workloads are
realistically the less latency sensitive. One of the most
successfl cases is Google which uses lots of cheap computers
for date intensive search engine [10]. The more latency
senSItIve workload means the more processing duration
sensitive. The effort to limit the duration includes adopting
powerful hardware, parallel processing and fne granularity of
workload. Accordingly limit the workload unit is an effective
way to limit latency. With the technologies progress network
latency and jitter drops rapidly with jumbo packet support. In
the following proof of concept single chip Cloud infrastructure
each sub-frame of L TE air interface is the process unit.
D. Ne Cloud Computing Hardare
Current commercial multi-core and many-core processors
integrate interfaces, e.g. XAUI, XGMII, PCI-e, and dedicated
components for packet processing. A new Cloud Computing
hardware implementation is based on the new chips more like
the integrate computing units with routers. Comparing with
traditional computer the network throughput increases
signifcantly without frequently interrupt to the processors. In
earlier many-core processor based experimental infrastructure
with Linux as the latency is lower than 0. 2 millisecond when
the workload of processors is under 50% [11].
E Potote Benchmark Results
The hardware is, the same as the proof the concept for sub
millisecond latency sensitive infastructure [ 11], Tilepro64
PCI-e card [12] with Zero Overhead Linux.
The confguration is different fom early latency sensitive
experimental system. Beside cores reserved for packet
processing. The other cores were assigned to 3 groups each
with 16 cores. Each group is logically independent. The
network load is under 50% with slight latency [ 13]. Figure 2
shows the test results with the simulated hops fom 1 to 5 on
the switch chip. When the CPU load is under 50% the latency
is also smooth around 200 microseconds as that in [ 11]. With
the workload rising more than 60% the latency went up
rapidly but no more than 2500 microseconds at 90%. It means
the latency sensitive service class downgrade to ten
milliseconds level. The results showed if keep a group of
cores workload under 50% the whole system can provide both
sub-millisecond and ten milliseconds class service with 77%
of CPU workload. The workload may increase continuously
with overall sub-millisecond service demand decreasing with
fner granularity of core group.
ln oS|ng|eCh|pC|oudlm
I-1 h -2 t 3 t _ 4 t -5 t I
20 ...............

':E i i
2 4 5 70 0 0
WOod (%oCU
Figure 2. Latency of Single Chip Cloud Infrastructure
Cloud Computing is attractive for both big organization and
little business because of the low TCO. The geographical
distributed low cost computing elements save cost but the
network and computing latency is a main disadvantage. More
mission-critical workload migration demand drives the
evolution of Cloud Computing. The many-core processor
based experimental system with tight coupling network
components showed a competitive hardware platform for
fture Cloud with commercial ready to use sofware with low
transmission latency.
The earlier sub-millisecond experimental system works well
when the workload of processor is under 50%. It is kind of
resource reservation that improves the whole system response
time but decreases the utilization ratio. This paper analyzed
the latency of typical applications and gave a taxonomy of the
latency sensitive services, presented a method to fgure out
latency sensitive elements in Cloud. And, improves the system
utilization ratio. To use general sofware for latency guarantee
infrastructure there is a tradeoff between performance and
power consuming. Comparing with commercial Real-time as
the fee RTLinux [14] is a good choice in fture
implementation. A trial run on Dell T7400 workstation
showed the jitter less than 10 microseconds. Current failure
recovery mechanism is dispatcher based. More reliable
recovery mechanism, behavior and impact on the whole
system latency jitter is to be investigated.
[1] Gathering Clouds ofXaas!
[2] J.W. Rittinghouse, and J.F. Ransome, Cloud Computing
Implementation, Management, and Securit, CRC Press, 2010
[3] D.P. Olshefski, J. Nieh, D. Agrawal, Inferring Client Response Time at
the Web Server, ACM SIGMETRICS Performance Evaluation Review,
vo1.30, issue I, 2002, pp. 160-171.
[4] G. Armitage., Networking and online games: understanding and
engineering multiplayer, Wiley, UK, 2006.
[5] 3GPP, TR25.913, "Requirements for Evolved UTRA (E-UTRA) and
Evolved UTRAN (E-UTRAN)".
[7] V. Mainkar and K. S. Trivedi, "Approximate analysis of priority
scheduling systems using stochastic reward nets," in Proc. 13th Int. Conf
Distributed Computing Systems, Pittsburgh, PA, May, 25-28 1993, pp.
[8] F.E. Gillet, E.G. Brown, J.Staen, C. Lee. The New Tech Ecosystems of
Cloud, Cloud Services, and Cloud Computing. Forrester Research
Report, August 2008.
[10] S. Brin and L. Page. Anatomy of a large-scale hypertextual web search
engine. In Proceedings of the Seventh Interational World Wide Web
Conference, Apr. 1998.
[11] Z. Wan, "Sub-millisecond Level Latency Sensitive Cloud Computing
Infastructure", Interational Congress on Ultra Moder
Telecommunications and Control Systems, 2010.
[12] _card.
[13] W. Li, Y. Li, X. Wang, "Logic of Collision Elimination for Reducing
Propagation Delays on Etheret and its Application Simulation", The
Eighth Interational Conference on Electronic Measurement and
Instruments, 2007, pp.2-304,2-306.