Ieee Argencon 2016 Paper 19

Broker Scheduler based on ACO for Federated
Cloud-based Scientific Experiments

Elina Pacini† 1 , Cristian Mateos∗ 2 and Carlos García Garino‡ 3
† ITIC Research Institute, Facultad de Ciencias Exactas y Naturales, UNCuyo & CONICET. Mendoza, Argentina
1
epacini@uncu.edu.ar
∗ ISISTAN-CONICET & UNICEN . Tandil, Buenos Aires, Argentina
2
cmateos@conicet.gov.ar
‡ ITIC Research Institute & Facultad de Ingeniería, UNCuyo. Mendoza, Argentina
3
cgarcia@itu.uncu.edu.ar
Abstract—Federated Clouds are infrastructures arranging I. I NTRODUCTION

physical resources from different datacenters. A Cloud broker
intermediates between users and datacenters to support the Parameter sweep experiments (PSE) are simulations that
execution of jobs through Virtual Machines (VM). We exploit require performing repeated analyses, where some input pa-
federated Clouds to run CPU-intensive jobs, in particular,
Parameter Sweep Experiments (PSE). Specifically, we study a
rameters are varied among those defining the problem of
broker-level scheduler based on Ant Colony Optimization (ACO), interest, often associated to complex numerical simulations.
which aims to select the datacenters taking into account both the Users relying on PSEs need a computing environment that
network latencies and the availability of resources. The less the delivers large amounts of computational power over a long
network latency, the lower the influence on makespan. Moreover, period of time. Federated Clouds [1] have appeared recently
when more VMs can be allocated in datacenters with lower
latency, more physical resources can be taken advantage of,
as an appealing solution due to they can obtain extra resources
and hence job execution time decreases. Then, once our broker- from an arrangement of Cloud providers. Federated Clouds
level scheduler has selected a datacenter to execute jobs, VMs also use brokers [2], which act as intermediaries between the
are allocated in the physical machines of that datacenter by provider of a Cloud service and the users of that service.
another intra-datacenter scheduler based on ACO. Experiments Particularly, for executing PSEs on Clouds it is necessary to
performed using CloudSim and job data from a real PSE show
that our ACO-based broker-level scheduler succeeds in reducing develop efficient scheduling strategies to appropriately allocate
the makespan compared to similar schedulers based on latency- the jobs while reducing the associated makespan. Makespan
aware greedy and round robin heuristics. is the maximum execution time of a set of jobs. In federated
Keywords— Federated Cloud, Broker, Scheduling, Ant Clouds [1], scheduling is performed at three levels: Broker
Colony Optimization level, Infrastructure level, and VM level. Firstly, at the broker
Resumen— Clouds federados son infraestructuras que level, the scheduler is used for selecting the most suitable data-
agrupan recursos físicos pertenecientes a diferentes centros de center in which then allocate the VMs requested by a user. This
datos. Un broker Cloud actúa de intermediario entre usuarios y scheduler can deploy the VMs in a remote datacenter when
centros de datos para dar soporte a la ejecución de trabajos a there are insufficient physical resources in the datacenter where
través de máquinas virtuales (VM). Nosotros exploramos Clouds the VM creation was issued. Secondly, once a datacenter has
federados para ejecutar trabajos intensivos en uso de CPU,
en particular, experimentos de barrido de parámetros (PSE). been selected, at the infrastructure level, the VMs are allocated
Concretamente, estudiamos un planificador a nivel broker basado into real hardware through a VM scheduler. Lastly, at the VM
en Optimización por Colonias de Hormigas (ACO), cuyo objetivo level, jobs are assigned for execution into the allocated VMs.
es seleccionar centros de datos tomando en cuenta tanto latencias However, scheduling is in general an NP-Complete problem.
de red como disponibilidad de los recursos. Cuanto menor es la
latencia de red, menor influencia en el makespan. Además, cuanto
Swarm Intelligence (SI) metaheuristics have been suggested
más VMs se pueden asignar en los centros de datos de menor to solve combinatorial optimization problems –such as Cloud
latencia, más recursos físicos pueden ser aprovechados, y por scheduling– by simulating the collective behavior of social in-
lo tanto el tiempo de ejecución se reduce. Luego, una vez que sects swarms [3]. To date, the commonplace SI-based strategy
nuestro planificador a nivel broker ha seleccionado un centro is Ant Colony Optimization (ACO).
de datos para ejecutar los trabajos, las VMs son asignadas a las
máquinas físicas de ese centro de datos mediante otro planificador In this paper, we propose a broker scheduler based on
intra-centro de datos basado en ACO. Experimentos llevados ACO for the efficient execution of PSEs in federated Clouds.
a cabo mediante el uso de CloudSim y datos extraídos de un The goal is to select the most suitable datacenter taking into
PSE real muestran que nuestro planificador a nivel de broker account both the network latencies and the availability of
basado en ACO tiene éxito al reducir el makespan comparado resources of each datacenter. The less the network latency,
a planificadores similares codiciosos en reducción de latencias y
heurísticas round robin. the lower the influence on makespan to the user. Moreover,
Palabras Claves— Cloud Federado, Broker, Planificación, when more VMs can be allocated in datacenters with lower
Optimización por Colonias de Hormigas latency, more physical resources can be taken advantage of,
and hence job execution time decreases. Then, once our A work that deserves special attention is [13], where the
broker-level scheduler has selected a datacenter to execute authors used at the broker level the Dijkstra algorithm [14]
jobs, VMs are allocated in the physical machines of that to select the datacenter with the lowest monetary cost, and a
datacenter by another intra-datacenter scheduler based on ACO GA for allocating VMs. Although this work targets both the
previously studied in [4]. To allocate the VMs into hosts, this broker and the infrastructure levels, the goal was to reduce
scheduler must make a different number of “queries” (network monetary costs without considering makespan. For scientific
messages) to hosts to determine their availability upon each applications in general, makespan is very important since it
VM allocation attempt. The number of queries to be performed allows users to accelerate result processing [4].
by ACO and the latencies of datacenters also influence the From the related works found, most of them have been
makespan to the user. Finally, at the VM level, PSE-jobs are proposed for Clouds without taking into account the use of
assigned to the preallocated VMs by using FIFO, as in [4]. SI at the broker level and without considering metrics such
Briefly, in this paper we evaluate how decisions taken at the as makespan, rendering difficult their applicability to execute
broker level influence the makespan. PSEs in federated Cloud environments. Moreover, it is worth
Simulated experiments with job data from a real-world noting that in this work we have evaluated how the influence
PSE [5] suggest that our ACO-based broker scheduler, in of network latency and resource availability in each datacenter
combination with ACO at the infrastructure level and FIFO affect the makespan to the user.
at the VM level, delivers competitive makespan. Experiments
were performed by using the CloudSim [6] simulator. III. A PPROACH OVERVIEW
The goal of our scheduler is to minimize the makespan of a
II. R ELATED WORK set of CPU-intensive PSE jobs in a federated Cloud composed
of heterogeneous resources. Conceptually, a PSE is a set of
The last decade has witnessed an astonishingly amount of
N = 1, 2, ..., n independent jobs, where each job corresponds
research in bio-inspired techniques, specially ACO, applied to
to a particular value for a variable of the model being studied
distributed job scheduling [7], [8], [9]. However, to the best
by the PSE. The jobs are executed on m Cloud machines. The
of our knowledge, no efforts considering ACO for scheduling
makespan, of a job j in schedule S can be denoted by C j (S)
at the broker level in federated Clouds exist.
and hence the makespan is Cmax (S) = max jC j (S).
We address the scheduling of PSEs in federated Clouds to
For running applications in federated Clouds, resources
minimize the makespan of a set of jobs while considering the
should be scheduled at three levels as shown in Figure 1. A
influence of network latencies and the resources availability
broker is created for each user that connects to the Cloud.
among heterogeneous datacenters. Our approach differs from
Each broker knows who are the providers that are part of
those presented in literature since they have not considered SI-
the federation. The relation of each broker is colored with
based strategies at the broker level. In [4] we have presented
green and blue dotted lines. In addition, Figure 1a illustrates
an ACO-based scheduler focused on the infrastructure level.
how jobs sent by User N are executed in the datacenter of
However, this scheduler operates at two levels for Clouds
Cloud Provider 2. Then, Figure 1b shows the intra-datacenter
composed of a single datacenter. Then, in [10] we extended
scheduling activities –inside Cloud Provider 2– , i.e., at the
this scheduler to operate in federated Clouds. However, SI was
infrastructure level and the VM level.
again applied at the infrastructure level.
The proposed scheduler proceeds as follows. Firstly, at
Regarding to works which consider schedulers at the broker the broker level, a datacenter is selected via an ACO based
level we can mention [2], [11], [12]. However, these works do scheduler. The datacenter is the one which provides the lowest
not provide advanced capabilities to make automatic decisions, communication latency to a broker and at the same time meet
based on optimization algorithms, about how to optimally a certain percentage of available resources. In this paper, we
select datacenters and distribute the different VMs and jobs of consider non-dedicated datacenters, i.e., they might already
an application among different Clouds. In particular, in [2], the have allocated VMs. At the infrastructure level (subfigure 1b),
authors proposed a Cloud broker that restricts the deployment via another ACO-based VM scheduler, user VMs are allocated
of VMs across multiple heterogeneous datacenters according in the physical resources (i.e., hosts) belonging to the selected
to some placement constraints defined by the user. Users can datacenter at the broker level. When there are no available
also steer the VM allocation by specifying maximum budget hosts in the datacenter to allocate the VMs, a new datacenter
and minimum performance. The implemented algorithms are is selected at the broker level. Finally, at the VM level, jobs
based on integer programming formulations and enable price- are assigned to the preallocated VMs through a FIFO.
performance placement tradeoffs. Then, in [11] the authors
proposed a multi-objective genetic algorithm (MO-GA) to
optimize three objectives, namely energy consumption, CO2 A. Broker Scheduler based on ACO
emission, and the generated profit of a geographically dis- Each time a user requests a number of VMs in which
tributed datacenters. Moreover, in [12] the scheduler performs execute his/her experiments an ant is initialized for finding
an optimal deployment of the jobs among datacenters optimiz- the most suitable datacenter (Algorithm 1). To this end, three
ing a particular cost function based on different optimization parameters are initialized. A step parameter keeps track of the
criteria (e.g., monetary cost or performance) and different user number of steps carried out by an ant, maxStep is equals to
constraints (e.g., budget, performance). a predefined number of steps (i.e., the completion criterion of
latency and an estimate of the percentage of VMs that can be
allocated in each datacenter is created (InitializeLocalTable())
by the first ant which visits the datacenter.
Algorithm 2 ACO-specific logic

Procedure A n t B r o k e r ( s u i t a b l e D a t a c e n t e r s , vmList , m a x S t e p s )
Begin
dc = g e t I n i t i a l D a t a c e n t e r ( s u i t a b l e D a t a c e n t e r s )
While ( s t e p < m a x S t e p s ) do
r e s o u r c e s A v a i l a b i l i t y = g e t R e s o u r c e s A v a i l a b i l i t y ( dc )
vmPercentageEstimation= calculateVMsPercentage (
r e s o u r c e s A v a i l a b i l i t y , vmList )
(a) Federated Cloud c u r r e n t L a t e n c y = g e t D a t a c e n t e r L a t e n c y ( dc )
localTable . update ( vmPercentageEstimation ,
currentLatency )
i f ( random ( ) < s e a r c h R a t e ) t h e n
n e x t D a t a c e n t e r =randomlyChooseNextStep ( )
else
nextDatacenter =chooseNextStep ( )
end i f
s e a r c h R a t e = s e a r c h R a t e −d e c r e a s e R a t e
s t e p = s t e p +1
moveTo ( n e x t D a t a c e n t e r )
end w h i l e
End
In each iteration, the ant estimates the percentage of VMs

that can be allocated in the datacenter which is visiting through
(b) Scheduling intra-datacenter
the (calculateVMsPercentage(resourcesAvailability, vmList))
method and collects the latency information of the datacen-
Figura 1: Federated Cloud: Overview ter through getDatacenterLatency(datacenter). This datacenter
information collected by the ant is added to its private infor-
mation table –localTable–, which is maintained in each dat-
the ant work), and vmPercentage is a user-defined percentage acenter through (localTable.update(vmPercentageEstimation,
value used by the ant to know whether a datacenter has enough currentLatency)). The percentage of VMs that can be
resources to allocate at least these percentage of VMs. For allocated in a datacenter is vmPercentageEstimation =
them, each datacenter keeps track of the resources availability resourcesAvailability/hostProcessingPower
vmListSize ∗ 100, where resourcesAvail-
in terms of total available processing power. Resource availability is the total available processing power in a datacenter in
ability is updated every time a VM is allocated/deallocated MIPS, hostProcessingPower is the processing power in MIPS
in/from a host in a datacenter and it is then used by the ant to of its hosts, and vmListSize is the number of VMs not allocated
estimate the percentage of VMs to allocate in a datacenter. by the ant in any datacenter so far.
The information table contains both the estimation percent-
Algorithm 1 ACO-based datacenter selection algorithm age of VMs that can be allocated and the latency information
Procedure A CO B r ok e rS c h e d u l e r ( u s e r , d c L i s t , v m L i s t ) of the datacenter the ant is visiting. Besides, the ant adds
Begin to the table latency information of other datacenters, which
initializeLocalTable ()
i n i t i a l i z e ( maxSteps ) were added to the table when the ant visited these datacenters.
suitableDatacenters=getSuitableDatacenters ( dcList ) For them, the ant performs a predefined number of steps, i.e.,
a n t =new Ant ( u s e r , s u i t a b l e D a t a c e n t e r s )
ant . i n i t i a l i z e ( step , vmPercentage ) maxSteps, looking for the datacenter that allows both to allo-
repeat cate at least the predefined percentage of VMs (vmPercentage)
a n t . A n t B r o k e r ( s u i t a b l e D a t a c e n t e r s , vmList , m a x S t e p s )
until ant . isFinish () parameter and has the lowest latency.
s e l e c t e d D a ta c e n t e r = dcList . get ( ant . getDatacenter ( ) ) Every time an ant moves from one datacenter to another
ACOVMScheduler ( s e l e c t e d D a t a c e n t e r , v m L i s t )
i f ( v m L i s t . s i z e ( ) >0) it has two choices: moving to a random datacenter using
A CO B ro k e rS c he d ul e r ( a n t . g e t U s e r ( ) , d c L i s t , v m L i s t ) a constant probability or searchRate through the random-
end i f
End lyChooseNextStep() method, or using the information table
of the current datacenter (through chooseNextStep() method).
The searchRate decreases with a decreaseRate factor as time
When an ant is created, a list of suitable datacenters in passes, thus, the ant will be less dependent on random choice.
which the VMs can be allocated is built (getSuitableDatacen- Every time an ant visits a datacenter, it updates the datacenter
ters(datacenterList)). A datacenter is suitable if it has hosts information table with the information of other datacenters,
with processing power, storage capacity, memory and band- but at the same time the ant collects the information already
width greater than or equal to that of required by the VMs. The provided by the table of that datacenter, if any. The information
ant is randomly initialized in one of the obtained datacenters. table acts as a pheromone trail that an ant leaves while it is
A local table containing information both of the datacenter moving in order to choose better paths rather than wandering
randomly in the Cloud. Entries of each information table are availability of the hosts from the selected datacenter, it incurs
the datacenters that the ant has visited on their way to select in latencies. We have added a control to minimize the number
the most suitable datacenter together with their latency and of steps performed by an ant: every time an ant visits a host
percentage of VMs to allocate. that has not allocated VMs yet, i.e., the host load is equal to
When the ant reads the table in a datacenter, it chooses zero, the ant allocates its associated VM to it directly without
the entry with the lowest latency, which also meets the performing further steps. The smaller the number messages
percentage of VMs to allocate defined. If the latency of the sent to the hosts through the network, the smaller the impact
visited datacenter is smaller than any other datacenter in the of the latencies in the makespan given to the user.
table, and in addition meets the percentage of VMs defined
the ant chooses the datacenter with the smallest latency. C. VM Scheduler based on FIFO
This process is repeated until step=maxSteps. Finally, the ant
Once the VMs have been allocated to hosts at the infrastruc-
calls the infrastructure-level ACO algorithm with the selected
ture level, the scheduler proceeds to assign the jobs to these
datacenter and the list of VMs to allocate (ACOVMSched-
VMs. This sub-algorithm uses two lists, one containing the
uler(selectedDatacenter,vmList)). If the total number of VMs
jobs that have been sent by the user, i.e., a PSE, and the other
are not allocated in the selected datacenter, the ACOBroker-
list contains all user VMs that are already allocated to a host
Scheduler is executed again to select a new datacenter.
and hence are ready to execute jobs. The algorithm iterates
the list of all jobs and then, retrieves jobs by a FIFO policy.
B. Intra-datacenter Scheduler based on ACO
Each time a job is obtained from the list it is submitted to
The scheduler at the infrastructure level is performed to be executed in a VM in a round robin fashion. Internally, the
find those hosts –into the selected datacenter at the broker algorithm maintains a queue for each VM that contains its
level– that have availability to allocate VMs. Here, each list of jobs to be executed. The procedure is repeated until
ant works independently and represents a VM “looking” for all jobs have been submitted for execution using the allocated
the best host to which it can be allocated, i.e., an ant is VMs. To ensure fairness, jobs within a VM waiting queue are
initialized for each VM requested by the user. A master table executed one at a time by competing for CPU time with other
containing information on the load of each host is initialized. jobs from other VMs in the same hosts.
To do this, first, a list of all suitable hosts in which can
be allocated the VM is obtained. In each iteration, the ant IV. E VALUATION
collects the load information of the host that is visiting and
To assess the effectiveness of our scheduler and constituent
adds this information to its private load history. The ant
policies/techniques, we have processed a real study case for
then updates a load information table of visited hosts, which
solving a classical PSE [5]. It involves studying a plane strain
is maintained in each host. This table contains information
plate with a central circular hole. The dimensions of the plate
of the own load of an ant, as well as load information of
were 18 x 10 m, with R = 5 m. The 3D finite element mesh
other hosts, which were added to the table when other ants
employed had 1,152 elements. To generate the PSE jobs, a
visited the host. Like in the ACO based broker algorithm, the
material parameter –viscosity η– was selected as the variation
load table of each host acts as a pheromone trail and it is
parameter. Then, 25 different values for η were considered:
useful to guide other ants to choose better paths. The load is
x.10y Mpa, with x = 1, 2, 3, 4, 5, 7 and y = 4, 5, 6, 7, plus 1.108
calculated on each host taking into account the CPU utiliza-
Mpa. Although not strictly necessary to understand the exper-
tion made by all the VMs that are executing on each host,
iments described below, introductory details on viscoplastic
i.e., load = numberO f ExecutingV Ms/numberO f PEsInHost,
theory and numerical implementation can be found in [5].
where numberOfExecutingVMs is the number of VMs that are
The variation of a material parameter in a PSE is useful to
executing in the host, and numberOfPEsInHost is the total
other disciplines (e.g. design), where it is important to test the
number of cores in the host. This metric is useful for an ant
strength and flexibility of components.
to choose the least loaded host to allocate its VM.
When an ant moves from one host H to another it has two
choices: moving to a random host using a constant probability A. CloudSim instantiation
or searchRate, or using the load table information of H. Again, After establishing the problem parameters for the PSEs,
the search rate decreases with a decreaseRate factor as time we employed a single computer to run the experiments by
passes. This process is repeated until the finishing criterion, varying the viscosity parameter η. The execution of 25 PSE-
i.e., performing a predefined number of steps (maxAntSteps), jobs resulted in 25 input files with different configurations
is met. Finally, the ant delivers its VM to the current host and 25 output files. The tests were solved using the SOGDE
and finishes its task. Besides, every time the ant allocate its finite element solver [15]. Then, we approximated for each
associated VM, the total availability of the datacenter in which experiment the number of executed CPU instructions by NIi =
the VM is allocated is updated. In the same way, every time mipsCPU ∗ Ti , where NIi is the number of million instructions
a VM finishes its task and is released, the total availability of to be executed by a job i, mipsCPU is the processing power
the datacenter in which the VM is released is updated, thus of the CPU of our real computer, and Ti is the time that took
increasing the overall availability of datacenter. to run the job i on this computer.
Since each step by an ant involves moving through the After gathering real job data, we instantiated the CloudSim
intra-datacenter network to obtain information regarding the toolkit [6]. The experimental scenario consists of a Cloud
with 10 heterogeneous datacenters. The network topology is Then, we combined the two policies for selecting datacenters
defined using BRITE [16]. A BRITE topology file is used at the broker level with ACO at the infrastructure level for
by CloudSim to define the different nodes that compose a allocating the VMs and FIFO for mapping jobs. In our ex-
commonly-found federation (i.e., datacenters, brokers) and the periments, both the maxSteps and maxAntSteps parameters of
network connections among them. This file is used to calculate ACO at the broker level and infrastructure level, respectively,
latencies in network traffic which have been assigned taking have been configured so as to explore up to 80% of the number
into account other works in the literature [17]. Each datacenter of datacenters –maxSteps– and the number of hosts of each
is composed of a different number of hosts which are not all datacenter –maxAntSteps–, e.g., the value of maxAntSteps is
dedicated, i.e., some of them are busy executing pre-existing equals to 24 when the number the host is equal to 30. Besides,
VMs. In our scenario, each datacenter has already allocated a the vmPercentage parameter has been set to 60%, i.e., each
random number of VMs, which involve a percentage of busy selected datacenter by an ant should be available to allocate at
hosts between 30-60%, i.e., of the total datacenter availability. least 60% of the requested VMs by the user. Finally, for both
The characteristics of the the datacenters and the machines ACO algorithms we have set the searchRate and decreaseRate
that compose them are shown in Table I. All hosts had a parameters with values equal to 0.6 and 0.1, respectively.
bandwidth of 1,000 Mbps. Moreover, a user requests 100 VMs
to execute its PSE-jobs. Each VM has the same characteristics
1.6 ACO
LAG
as a t2.small instance of Amazon EC2 (4,988 MIPS, 2 GB
1.4
RAM, 100 GB image size, 100 Mbps bandwidth and 1 core).
Relative Makespan Reduction

1.2
The reason why this instance type –t2.small– was chosen to 1
set VMs in CloudSim is that the SOGDE code used to execute 0.8
the real PSE-jobs is a monolithic application, and therefore, 0.6
jobs need only one core to be executed. 0.4
0.2
Table I: Simulated Cloud datacenters (DC) characteristics
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Hosts characteristics
Number of PSE−jobs
DC Latency # Hosts Proc. power RAM Storage Cores
D1 0.80 s. 30 7,200 MIPS 32 GB 500 GB 8 Figure 2: Relative makespan reduction regarding RR
D2 1.75 s. 50 9,900 MIPS 32 GB 1 TB 6
D3 0.32 s. 20 8,036 MIPS 16 GB 500 GB 8
D4 2.00 s. 30 7,500 MIPS 16 GB 1 TB 8 Figure 2 illustrates the relative makespan reduction regard-
D5 0.25 s. 50 7,200 MIPS 32 GB 500 GB 8 ing the worst competitor –RR– as the number of jobs is in-
D6 1.50 s. 10 4,008 MIPS 8 GB 1 GB 4
creased. In the results, our ACO-based broker-level algorithm
D7 0.29 s. 20 5,618 MIPS 12 GB 500 GB 6
D8 2.20 s. 20 5,200 MIPS 8 GB 500 GB 4 is the one that produces the lowest makespan to the user with
D9 0.50 s. 50 6,600 MIPS 12 GB 500 GB 8 respect to LAG and RR. The makespan is equals to 81.23,
D10 1.20 s. 50 7,527 MIPS 16 GB 500 GB 8 97.76 and 118.08 minutes when the number of jobs to be
executed is 1,000 for ACO, LAG and RR, respectively, and
Each job had a number of instructions between 1,333,293 the gains obtained for ACO with respect to LAG and RR are
and 2,712,789 MI. Lastly, the experiments had input files of 16,91% and 31,21%, respectively. However, when the number
291.7 Kbytes and output files of 587.1 Kbytes. We evalu- of jobs to be executed is increased to 10,000, the makespan is
ated the performance of executing the user PSE-jobs as we equals to 720.75, 737.28 and 757.60 minutes for ACO, LAG
increased the number of jobs to be performed, i.e., 25 ∗ i jobs and RR, respectively, and the gains of ACO with respect to
with i = 40, 80, ..., 400. This is, the base job set comprising LAG and RR are 2.24% and 4.86%, respectively. Note that
25 jobs of the plane strain plate PSE obtained by varying the the larger the number of jobs to be executed, the lowest the
value of η were cloned to obtain larger sets. impact of the latencies in the makespan, because the latencies
are set at the moment of creating the virtual infrastructure.
B. Performed experiments Since the ACO broker scheduler considers both network
We compared our ACO-based broker-level scheduler against latencies and the percentage of VMs that can be allocated
two alternative broker schedulers: in a datacenter, it avoids to explore datacenters with lower
• Latency-Aware Greedy (LAG), which maintains a list of latency but which can allocate few VMs, i.e., the most VMs
all interconnected datacenters sorted by their latencies. are created in datacenters with low latencies. As can be see in
Each time a user requires a number the VMs to execute Figure 3, the datacenter with low latency which can allocate at
their PSE, LAG selects the datacenter with the lowest least 60% of the required VMs by the user is D7, in which are
latency in the list. Then, whenever a datacenter has no assigned ~68% of the total VMs. The remaining VMs which
more physical resources to allocate VMs, the algorithm were not allocated in D7, i.e., ~32% of VMs, were allocated
selects the next datacenter in the list with low latency. in D5. Note that only 32% of the VMs had to explore both
• Round Robin (RR) maintains a list of all network inter- D7 and D5 looking for resources availability.
connected datacenters that make up the Cloud, sorted by Second is the LAG algorithm, which always first selects
increasing latency, and assigns each VM required by the the datacenter with the lowest latency (see D5 in Table I) but
user to a datacenter from the list in a circular order. without considering the amount of available resources in such
datacenter. Later, to allocate the VMs, the ACO algorithm at scheduler provide better makespan to the user than LAG and
the infrastructure level should explore up to 80% of the hosts RR.
that compose datacenter D5 –50 hosts– to find them which We will explore other bio-inspired techniques such as Ar-
ones are available to allocate VMs. As can be seen in Figure 3, tificial Bee Colony [18]. Another issue is considering multi-
only ~35% of VMs are allocated in datacenter D5. The reason tenancy, i.e., providing schedulers for allocating resources to
is because this datacenter has few available resources. For several independent users. We will also extend our scheduler to
allocating the remaining ~65% of failed VMs, LAG must to consider other optimization criteria (e.g., monetary cost). For
select a new low-latency datacenter. The new datacenter –D7– example, in many Clouds different providers offer multiple
is explored again for each VM whih could not be allocated types of VMs with different capacities and pricing. Thus,
in the previous datacenter, i.e., 65% of the total VMs had to if one application is mapped to VMs from different Clouds
explore both the D5 and D7 datacenters looking for available providers, it may result not only in different makespans but
resources. The number of network messages to send within a also in different monetary costs.
datacenter to determine resource availability directly impacts Finally, an important issue to consider in federated Clouds is
the makespan. This is because for each message sent, latencies enhance the scheduler with dynamic optimization capabilities,
from datacenters affect the answers. The larger the number of enabling the dynamic reallocation (migration) of VMs from
VMs which can be allocated in a datacenter with a low latency, one host to another. The migration of VMs might allow to
the fewer the number of failed VMs that trigger the exploration meet a specific optimization criteria such as reduce the number
of new datacenters with greater latency. of hosts in use for minimizing energy consumption [19] or
balance the workload of all resources to avoid resources
70 saturation and performance slowdown.
ACO
60 LAG
RR
50 ACKNOWLEDGMENTS
Number of VMs
40
We acknowledge the support by ANPCyT (grants PICT-
30
2012-0045, PICT-2014-1430) and UNCuyo (project 06/B308).
20
10
R EFERENCES
0
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
[1] R. Coutinho, L. Drummond, Y. Frota, and D. de Oliveira, “Optimizing
Datacenter virtual machine allocation for parallel scientific workflows in federated
clouds,” Future Generation Computer Systems, 2014, in Press.
Figure 3: Number of allocated VMs by datacenter [2] J. Tordsson, R. Montero, R. Moreno Vozmediano, and I. Llorente,
“Cloud brokering mechanisms for optimized placement of virtual ma-
chines across multiple providers,” Future Generation Computer Systems,
Finally, the greatest makespan was obtained when the vol. 28, no. 2, pp. 358 – 367, 2012.
datacenters are selected by RR. The reason is because RR [3] J. Kennedy, “Swarm Intelligence,” in Handbook of Nature-Inspired and
Innovative Computing. Springer US, 2006, pp. 187–219.
explores all datacenters in a circular order for each VM to be [4] E. Pacini, C. Mateos, and C. García Garino, “Balancing throughput and
allocated and without consider their latencies and resources response time in online scientific clouds via ant colony optimization,”
availability (see Figure 3). The number of allocated VMs on Advances in Engineering Software, vol. 84, pp. 31–47, 2015.
[5] C. García Garino, M. Ribero Vairo, S. Andía Fagés, A. Mirasso, and J.-
each datacenter depends on the number of available hosts P. Ponthot, “Numerical simulation of finite strain viscoplastic problems,”
and the hosts characteristics that compose each datacenter. Journal of Computational and Applied Mathematics, vol. 246, pp. 174–
The hosts characteristics that compose datacenter D6 do not 184, Jul. 2013.
[6] R. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya,
have enough processing power to allocate the type of VM “Cloudsim: A toolkit for modeling and simulation of Cloud Comput-
required. In this algorithm, the greater network latencies of ing environments and evaluation of resource provisioning algorithms,”
some datacenters have a greater impact on the makespan. Software: Practice & Experience, vol. 41, no. 1, pp. 23–50, 2011.
[7] U. Singha and S. Jain, “An analysis of swarm intelligence based load
balancing algorithms in a cloud computing environment,” Journal of
V. C ONCLUSIONS Hybrid Information Technology, vol. 8, no. 1, pp. 249–256, 2015.
[8] P. Pendharkar, “An ant colony optimization heuristic for constrained
PSEs are popular in CM experiments, and involve running task allocation problem,” Journal of Computational Science, vol. 7, pp.
37–47, 2015.
many CPU-intensive jobs. The growing popularity of federated [9] R. Tavares Neto and M. Godinho Filho, “Literature review regarding
Clouds and SI-inspired algorithms has increased the research Ant Colony Optimization applied to scheduling problems: Guidelines
in resource allocation mechanisms. Then, scheduling at the for implementation and directions for future research,” Engineering
Applications of Artificial Intelligence, vol. 26, no. 1, pp. 150–161, 2013.
associated three levels plays a fundamental role. [10] E. Pacini, C. Mateos, and C. García Garino, “SI-based Scheduling of Pa-
In this work we have described an ACO-based broker-level rameter Sweep Experiments on Federated Clouds,” in First HPCLATAM
scheduler for the efficient selection of datacenters taking into - CLCAR Joint Conference (CARLA)., vol. 845, 2014, pp. 28–42.
[11] Y. Kessaci, N. Melab, and E.-G. Talbi, “A pareto-based metaheuristic
account both their network latencies and the availability of for scheduling HPC applications on a geographically distributed cloud
resources. The broker scheduler was combined with an ACO federation,” Cluster Computing, vol. 16, no. 3, pp. 451–468, 2013.
algorithm for allocating the VMs in the selected datacenters [12] J. Lucas-Simarro, R. Moreno-Vozmediano, R. Montero, and I. Llorente,
“Scheduling strategies for optimal service deployment across multiple
at the broker level, and FIFO for mapping the PSE-jobs. The clouds,” Future Generation Computer Systems, vol. 29, no. 6, pp. 1431
performed experiments suggest that our ACO-based broker – 1441, 2013.
[13] L. Agostinho, G. Feliciano, L. Olivi, E. Cardozo, and E. Guimaraes, “A
Bio-inspired approach to provisioning of virtual resources in federated
Clouds,” in DASC 2011. IEEE Computer Socienty, 2011, pp. 598–604.
[14] A. Noda and A. Raith, “A dijkstra-like method computing all extreme
supported non-dominated solutions of the biobjective shortest path
problem,” Computers & Operations Research, vol. 57, pp. 83–94, 2015.
[15] C. García Garino, F. Gabaldón, and J. M. Goicolea, “Finite element
simulation of the simple tension test in metals,” Finite Elements in
Analysis and Design, vol. 42, no. 13, pp. 1187–1197, 2006.
[16] J. Jung, S. Jung, T. Kim, and T. Chung, “A study on the Cloud
simulation with a network topology generator,” World Academy of
Science, Engineering & Technology, vol. 6, no. 11, pp. 303–306, 2012.
[17] S. Malik, F. Huet, and D. Caromel, “Latency based group discovery
algorithm for network aware Cloud scheduling,” Future Generation
Computer Systems, vol. 31, pp. 28 – 39, 2014.
[18] D. Karaboga, B. Gorkemli, C. Ozturk, and N. Karaboga, “A compre-
hensive survey: artificial bee colony (ABC) algorithm and applications,”
Artificial Intelligence Review, March 2012.
[19] R. Jeyarani, N. Nagaveni, and R. Vasanth Ram, “Design and imple-
mentation of adaptive power-aware virtual machine provisioner (APA-
VMP) using swarm intelligence,” Future Generation Computer Systems,
vol. 28, no. 5, pp. 811–821, 2012.

Ieee Argencon 2016 Paper 19

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Ieee Argencon 2016 Paper 19

Diunggah oleh

Hak Cipta:

Broker Scheduler based on ACO for Federated

Cloud-based Scientific Experiments

Abstract—Federated Clouds are infrastructures arranging I. I NTRODUCTION

Algorithm 2 ACO-specific logic

In each iteration, the ant estimates the percentage of VMs

Relative Makespan Reduction

Anda mungkin juga menyukai