Anda di halaman 1dari 11

International Journal of Cloud Computing and Services Science (IJ-CLOSER)

Vol.x, No.x, Month 201x, pp. xx~xx


ISSN: 2089-3337

31

FAULT TOLERANCE WITH DYNAMIC REPLICATION FOR


HIGHER AVAILABILITY IN CLOUD
Akanksha Chandola*, Nipur Singh*, Ajay Rawat**
Kanya Gurukul Campus, Gurukul Kangri Vishvidhyalaya, Dehradun, 248001, India.
University of Petroleum & Energy Science,Dehradun,248001, India.

Article Info

ABSTRACT

Article history:

Computation era had moved from tradition desktop/laptop based


computation to genera of virtualization cloud computing infrastructure. To
impart quality of service (QoS) for vital applications computation, higher
availability should be incorporated with available fault tolerance
mechanisms. Fault Tolerance mechanism with replication had to be
implemented on a realistic environment so as to minimize the effect of
impedance to QoS factor. This research work results into a proposed
strategy, Dynamic Replication with Higher Availability (DRHA) Fault
Tolerance mechanism with Semi-Passive Replication strategy at virtual
machine level. DRHA Fault Tolerance mechanism was analyzed over
properties corresponding to FT policies and demonstrated higher throughput
with all submitted applications completion and balanced among availability
and latency factor.

Keyword:
Cloud Computing,
Quality of Service
Dynamic Replication
Availability
Fault Tolerance
Latency
Semi-passive

Copyright 2016 Institute of Advanced Engineering and Science.


All rights reserved.

Corresponding Author:
Akanksha Chandola,
Departement of Computer Applications,
Kanya Gurukul Campus, Gurukul Kangri Vishvidhyalaya, Dehradun, 248001, India.

Email: chandola.akanksha@gmail.com

1.

INTRODUCTION
Cloud computing offers virtualized environment for dynamic computing over internet. Virtualization
had changed the trend of computing and makes it to converge into virtual world using latest technologies and
firm wares. We had proposed this Fault Tolerance system for the type of Consumer-centric service Oriented
Cloud Computing and to provide Quality of Service (QOS) for client oriented applications.
Customer requirement in the cloud environment is subjected to change with higher frequency thus
system should be adaptable to deal with this dynamic requirement and impart higher QoS. A prerequisite for
these systems is to have dynamic replication strategies, where automatic creation, deletion and management
of replica perpetuate and strategies have the ability to accommodate to changes in users behavior [1].
Self healing is a proactive method to incur tolerance to the failures in distributed systems using
redundancy provisions. Redundant application sets are formulated and kept on standby which may swap to
replace any failure instance. When a user generates a request for a file, major part of bandwidth could be
consume to transfer the file from the server side to the client side. Furthermore the latency could be
considering over the size of the files involved. The reduction of access latency and bandwidth consumption is
main aim of practice of replication [1]. With replication one should able to deliver flexible, reliable and costefficient solution for virtual machines from disaster recovery and protection of data in the environment.
Virtualization is foundation aspect of cloud as well as high performance computing and helps
deliver services on pay per usage bases. Whenever VM is replicated all the jobs residing inside them are also
replicated thus providing a two level reliability provision, one at machine level and other at processing level.
In this work we are contributing Fault Tolerance model using replication strategy. DRHA-FT is put
into analysis with some other FT policies over the properties like; availability, reliability, throughput and
Journal homepage: http://iaesjournal.com/online/index.php/ IJ-CLOSER

IJ-CLOSER
32

ISSN: 2089-3337

latency. Further it shows that our works quite efficiently and could be implemented for vital application
computing in cloud.
Our research work comprises of following folds;
We are intended to propose a policy for proactive fault tolerance using replication strategy. Here we
had selected the proactive model with message passing approach and replica manager for backups
of job using cloning of VMs.
We had incorporated residency constraints for VM replicas i.e. where to make the replica of VMs to
reside so as to react for failed situation and also to balance over the availability and latency metrics.
We had simulated our algorithm and results are generated over variable failure rates.
Overall design of Fault Tolerance mechanism is defined as:
<ft_unit > DRHAFT= (DRHA, p, A) where P states all properties of ft_unit and A are the set of attributes of
it.
<ft_unit> DRHAFT = (DRHA, p= {throughput=99.99%, availability=0.9833, average latency=low}, A=
{mechanism= semi_passive_replication, fault_model= crash fault, number_of_replica=2k (initially two and
k dynamically generated/assigned)}
2.

MOTIVATION AND BACKGROUND


This section comprises of basic concepts which are building blocks for our research area.
2.1. Concept and Rate of Failures
Failure has been always a prominent issue when it comes to deliver end on result to the customer for
any computing environment. Binca et.al [3] analyzed over the data collected over a decade at Los
Alamos National Laboratory (LANL) and discovers that failure rates are directly propositional to
number of processors, also on intensity of workload running on it. The initial phase (infant mortality)
and final phase (wear-out) shows higher rate of failure in case of hardware failure life cycle. K,
Vishwanath et.al [4] studied over the hardware reliability over cloud observed 78% of total failures
accounts the faults in hard disks thus proved to be most reliability hampering component in the system.
Failure of job is generated using continuous probability based distribution function Weibull distribution.
The most general expression of the Weibull is given defined by three parameters:
to be the shape parameter, also known as the Weibull slope

the scale parameter and


the location parameter
with Weibull distribution expression is defined as:
(1)

f (T ) 0, T 0, 0, 0,
where;
(2)
Weibull distributions for < 1 have a failure rate inversely propositional to time, also known as
infantile or early-life failures. In case of close to or equal to 1, a fairly constant failure rate, is observed
which indicates useful life or random failures. And for > 1, failure rate that increases with time, which
is known to be wear-out failures [12]. Another parameter , scale parameter stretch and skew the
distribution range. Variation with the value of and had generated enough deviation for input failure
rate, for testing over reliability of our strategy.
2.2. Failure Tolerance Concept and system design
Failure in computing environment leads to some serious consequences; so with the emerging of
cloud, lots of efforts are made to evolve tolerance to these failures either reactively, proactively or hybrid
manner. In Proactive FT preventive measures are taken in account to avoid failure experiences so as to
keep an application alive. In contrast, reactive FT at time of failure experience, try to recover running
application through recovery from experienced failures [2]. Proactive FT involves higher overhead in
failure prediction as there is no provision for checkpoint/restart and migration overhead, false alarms,
availability and reallocation of resources added more to it. Failure Prediction module plays a vital role in
Proactive FT model; however its accuracy is not part of our research work, we had included probability
based data for our simulation.
Title of manuscript is short and clear, implies research results (First Author)

IJ-CLOSER
33

ISSN: 2089-3337

Et.al Engelmann in [2] provided a foundation for proactive FT in HPC by identifying its architecture
and classification. The presented architecture overall depends on feedback-loop control mechanism,
where system and application health is monitored and to avoid imminent application failure preventive
steps are taken by reallocating it from unhealthy to healthy node. Thanda et.al [3] proved in his research
work in reference to the cluster computing that message logging is one of the most efficient ways to
achieve fault tolerance thus also giving Message Passing Interface an upper edge for deploying in High
Performance Computing environment.
2.3. Replication Technique and brief survey
Data redundancy is implemented in any system to provide availability, reliability, accessibility and
QoS for critical application processes. Replication as the name suggests is all about making the critical
components or services redundant in order to achieve fault tolerance. While using replication strategy
consistency between the primary components and its replicas are ensure using suitable protocols.
2.3.1 Types of Replication techniques
Replication techniques are broadly classified into active and passive replication.
In active replication technique, every replica behave in same manner i.e. any request receives is
passed on to all replicas and they work independently for the completion of that request. In asynchronous
mode all the replicas are having same system state, thus also known to be state-machine approach. This
approach provides low response time as the process will continue even after the failure of single replica;
however resource consumption is high for the same.
In passive replication technique, one of the processing units is primary replica which is entitled to
receive request, process it and respond back. While execution of the request other backup replicas interact
with primary replica and for system state updates. In case of failure of primary replica, one of the backup
replicas takes over control. This approach with a reduction of resource overhead, come up with the higher
response time in case of failures. Passive replication has two variations: warm and cold passive replication. In
warm passive replication, other replicas are subjected to periodically checkpoints update in accordance to the
state of primary replica. In cold passive replication, until primary is detected to be failed, no backup replicas
were launched [7]. Apart from above traditional types, these were further modified into semi-active and semipassive replications also.
Semi-active replication technique implements nondeterministic computation model with the concept
of a leader and its followers. In normal circumstances only the leader component provides output messages
whereas all replicas were subjected for processing of request. In case of non-deterministic processing it is the
leader which computes and passes on the processing information to the followers. This approach ensures fast
reaction in case of crash fault, simultaneously without incurring high cost in an incorrect failure suspicion
instance.
2.3.2 Replication based Fault Tolerance
To obtain higher reliability and availability Byzantine Fault Tolerance (BFT) protocol presented in
[8] is a powerful approach, which works over active replication strategy however too expensive for practical
usage. It uses 3f+1 replica concept (1 primary and remaining backup) so as to tolerate f byzantine faults.
However author in [8] implemented the concept of leader-follower one-backup replication in Low
Latency Fault Tolerance (LLFT) system for distributed applications within LAN. Using message protocol it
provides strong consistent group membership and while failures condition it shows lower latency
configuration and recovery. It provides transparency for the applications undergoes crash fault or timing
fault, however cant handle with Byzantine fault applications.
In his research with Large-scale Graph Processing Peng et. al [8] provide the concept of imitator, a
replica based fault tolerance mechanism to provide low-overhead and fast crash recovery. The key concept of
imitator is to extend the replication mechanism using addition mirrors a replica having direct interaction with
ongoing process i.e. master. In case of failure the complete states of master can be reconstructed from states
of its mirrors. Distribution of mirrors accounts directly to the scalability of recovery, since mirrors places key
role in reconstruction of master states.
Thus our proposal is inspired from these replication based fault tolerance mechanism where the concept
of two- backup is implemented with initial state having a trio of master-mirror-replica where master is an
active virtual machine/process, mirror is a replica having direct interaction with master and replica as another
backup replica which is idle till the failure of master.
2.4 Managing Replicas
Title of manuscript is short and clear, implies research results (First Author)


ISSN: 2089-3337
Fault tolerance mechanism should ensure that under the faulty scenario the availability of job
replicas should remain higher as it is meant to tolerate failure proactively. Thus deploying replicas to specific
location would have been thought processing where every scenario had its own weight-age. Availability in
cluster computing are defined granularity at five different levels; from level 5 to level 1 where two replicas
are made to reside on different data center, rooms, rack, servers and on same server respectively. On more
broader terms replicas can be made to place on multiple machine on same cluster, multiple clusters within a
data center and within multiple data center with ascending order of failure independence and latency
whereas lesser bandwidth usage. Another important factor i.e. availability of replica found to highest by Ravi
et. al [5] in case of locating them at different data center.
Comparing semi-active and semi-passive replication, former seems to be better in context to availability
but on the other hand resource overhead is much higher for the same. Similarly in case of fault which can
spread on, the replicas need to be distributed such that they may not be affected by making the availability of
job optimal. Assuming a cluster comprises of single host, whereas in actual computation environment a
cluster can have different number of host, as host represent a physical machine and cluster is logical
combination of more than single machine. Further we had opted for latter replication strategy, and making the
one replica to reside in same host at same data center and second replica to reside at different host again in
same data center thus trying to balance among fault dependency, availability and latency factors.

34

3. PROPOSED DYNAMIC REPLICATION FOR HIGHER AVAILABILITY FAULT TOLERANCE


MECHANISM
3.1 System Design
While working with the system design we tried to keep it easy go round with only necessary
modification with already existing components. Message Passing Interface is most common protocol used for
point-to-point communication among number of units. Since we consider asynchronous, system components
communication is performed in accordance to either completion of jobs or failure of job execution. System
synchronizes between its component as on successful execution of job and failure of same. In any span of
time, every unit may perform any finite number of computations, sends and receives variable number of
messages.
Major difference between passive replication and semi-passive replication is that in order to devise a
semi-passive replication algorithm it any form of process controlled crash is not a mandate requisition. At the
same time, semi-passive replication retains the key characteristics of passive replication which are reduce in
usage of processing power, and secondly possibility of non-deterministic processing [11] which makes our
system to work for asynchronous transactions. In addition, the interactions between client and server are
identical in active and semi-passive replication but, unlike passive replication client need not to reissue the
request in case of master crashed down. System model and communication design is represented as follows:

Figure.1. Minimal computational setup


This new Fault Tolerance mechanism supports MPI for communication among VMs and their
replicas for ease of implementation. Jobs are submitted to VMs for execution and in case of failure, master
VM stops and its mirror of VM triggers to run. Further old mirror VM is converted to new master and last
replica to new mirror. The running VM is known to be the master and at any instance of time the master will
be communicating with only one replica which is known to be mirror which is in soft-off state which is
similar to sleep in contrast that former need not get restarted at time of taking over as master. While working
with replica it is always tedious to maintain consistency between the states of replica and tidily update all
IJECE Vol. x, No. x, Month 201x : xx xx

IJ-CLOSER
35

ISSN: 2089-3337

replicas to be homogenous with master, no doubt the concept of master-mirror-replica is implemented as


solution of same. For state transition from master to mirror, asynchronously VM images of active VM are
stored in the buffer memory of processing host machine and replica is updated periodically. Thus replicas are
acting like receivers of latest image of active VM from the memory thus coordinating and maintaining sync
between master-mirror duos. System would support lazy replication which allows conflicts among commit of
check pointing update of transaction running on active VM and its updation to backup VM during failure and
resynchronization using transaction timestamp.
3.2 Implementation of Replica Strategy
Indicating advantages of semi-passive replication is it avoids unnecessary state transfer operations.
Real world applications are full of large size of state and these state transactions can put up large impact on
normal course of execution thus increasing overhead of application commencements. Author in [13]
demonstrated the issue with an assessment of Swiss stock exchange. This technique is appreciated for its low
response time, even in the case of a crash. It has however two important drawbacks: (1) the redundancy of
processing implies a high resource usage, and more importantly (2) the handling of the requests has to be
deterministic [14]. Thus we had opted for semi-passive replication strategy so that to decrease the
computation overhead and provide an optimal solution at low cost. Under normal computation mirror would
be the replica selected for active communication with master thus periodically updated during execution
cycle. In case of failure of master process, mirror replica take over the control and replica is selected and get
activated as mirror. Thus now the message log is shared between new master and mirror and a new replica is
generated dynamically thus completing the trio of master-mirror-replica. While the instantiation of replica,
one more VM backup is generated thus now converting new_vm->replica which is in idle or standby mode.
Now after k failure the scenario keeps on repeating same transition k-2 times. This incorporation with every
passage of time and failure, only two VMs are in sync with each other. As the number of failures in the
system increases by one, it had province of generating k VM replica dynamically as per requirement of
applications running with them. More specifically if we could define this trio, master is ongoing active VM,
mirror an actively updated image of master, reserved as a backup plan and replica a link to generate replicas
dynamically as per requirement of computing environment.
The following figure shows different stages that the system undergoes transitions in failures:

Figure 2 (a) Initial setup with VM1 as active VM, VM2 and VM3 as its primary and secondary replica,
residing in same and different host respectively. (b) After first failure VM1 stops working and VM2 takes as
master/active VM and VM3 as mirror i.e. its replica. (c) Next failure leads to the generation of new VM say,
VM4 acting as mirror and which in turn would be the replica of VM3 now master.
3.3 Placement Policy for Higher Availability
Initially VM would have two replica one to reside at same hosts whereas second replica to locate at
different host again within same data center. While moving within data center leverage availability but on the
other hand it would maximize the latency to trigger the replica and creation of new replica in case of failure
of former. In case of dynamic generation of new replica it is also made to follow the above residing rule thus
locating it at another available host when two instances of VMs fails in the same machine.
To improvise over the resident policy of VMs and its replicas, a constraint is added while triggering
and initializing the mirror VMs which states the only two VMs instances for same computing scenarios are
allowed to be allocated on the same host. More precisely say if VM1 fails its replica take over which is
placed on the same host. Now for the next failure as we had saturated with the processing of same jobs
Title of manuscript is short and clear, implies research results (First Author)


ISSN: 2089-3337
36
assigned to the VMs we would switch to the next host and in turn the new VM would executes on the
different host.
Data published in [9] computed the overall availability of replication strategy with respect to the
different placement schemes. Availability of the replica is highest when made to reside at different datacenter
and moderate within same group of machines and lowest inside same host. So as to balance between latency
and availability we made the above constraint for placement and regeneration of replicas.

3.4 DRHA Fault Tolerance Algorithm


DRHA Fault Tolerance Algorithm main module creates a datacenter (working with only one
datacenter in our simulation setup), with variable number of hosts and VMs. For initial time frame all the
jobs are assigned to the first VM i.e. master and its replica mirror VM is also initialized to bring up to date
images of active VM for backup storage, to keep working alive in case of failure of master, the whole
concept is employed using function allocateVmtoHost. A replica is also instantiated in another host which is
a key ingredient for implication of residential constraints to VMs in faulty upbringing.
Algorithm 1: DRHAFT Algorithm
1 create datacenter_1 // confined to only one data center
2 create host (host1 to hostn ) // add all host to hostList
3 create vm (vm1 to vmm) // add all VM to vmList
4 allocateVmtoHost(hostList, vmList)
5 create job (job1 to jobi) // add heterogeneous jobs to jobList
6 initialize vmmaster , vmmirror, vmbuf
7 for i=1 to jobList.size()
8
job.status executejob(jobi ,vmmaster ) // get Job Status in Active VM;
Success/failure
10
if ( job.status == success )
11
UpdateVmReplica (vmmirror, vmbuf);
12
else if (job.status == Failure)
13
invoke vmrep ;
14
InitilizeVmReplica (vmmirror , vmrep jobi);
15 if (i==jobList.size ()) // all assigned job executed successfully
16
msg Jobs Execution completed successfully;
17 else
18
msg all vms exhausted, Restart the process;
All the jobs; as per capacity of VM, are assigned for execution to the VM and every time jobs status
is checked for its successful completion or failure. A pre assumption is kept that failure of job means fail of
VM as we are dealing with VM replication theory, thus failure is injected at VM level and not process level.
In failed condition VM is monitored by broker and InitilizeVmReplica function is called and at times while
job status is success, UpdateVmReplica function modernize mirror VM asynchronously using buffer.
Numbers of iterations are subjected to the number of jobs that can be accommodating with the master at time
frame instance.
AllocationVmtoHost algorithm is deployed to enact our proposed resident policy to achieve higher
availability. For allocation function concept of rank is implemented and at time of initialization master VM
had given rank 1, mirror VM sets to rank 2 and replica to 3. Future on dynamic generation of replica they are
assigned numbers 4, 5. Every time only two instances of VM are allowed to function at same host, the
moment number of VMs failed at a single host increases to two, next VM is allocated to different host so as
to attain higher availability. Thus replica with odd rank would be assigned to different host.
After submission of jobs to the VM, executejob function called up so as to maintain the status of
jobs undergoing execution. This simulation environment facilitates to monitor over job and VM failures and
record desired information regarding them by using appropriate Simevent and Simentity classes. Abstracting
over the simulator level event, we are here to mention that Simevent tracks the failure of job, record event
information and stop VM for further execution of jobs thus triggers mirror replica for execution afterwards.
Algorithm 2: AllocateVmtoHost Algorithm

IJECE Vol. x, No. x, Month 201x : xx xx

IJ-CLOSER
37

ISSN: 2089-3337

// Initially all jobs are allocated to first VM whose status is already set to master VM
1 Input: hostList, vmList Output: allocation of VMs
2 Integer i, j=1
3 allocatedHost NULL
4 for Integer i=1 to vmList.size()
5
if ( i==1) then
6
vmmaster vmi
// first VM as master VM
7
vmmirror vmi+1
// second VM as mirror VM
8
else
9
vmrep vmi
// next VM as replica VM which is subjected to
generate
//dynamically in case of number of failures
exceeds to one.
10 if host has enough resource for vm then
11
hostj allocate vmi and vmi+1
12 if (hostj ==Null)
13
do nothing
14
move to next host from hostList
15 return allocation
Function of UpdateVmReplica is to update the mirror VM status for successful completion of jobs at
master VM. InitilizeVmReplica is devised to stop current master VM on encounter of failure and initiate
mirror as master and replica as new mirror VM. It is also subjected to select a new VM as new replica
mentioned as per residency policy and return new master and mirror.
Algorithm 3: executejob (job, vm)
1 Input: job,vm Output: job and vm status
2
bindjobtovm (jobi, vmmaster)
// as soon as Job gets bind to VM it
starts execution
3
if job execution fail
4
vmmaster .status <- fail;
// set VM status as fail
5
jobi.status <- fail;
// set job status as fail
6
else
7
vmmaster .status <- success;
// set VM status as fail
8
jobi.status <- success;
// set VM status as fail
9
buffer vmbufvmmaster Image
//save VM instance to buffer
10 return job and vm Status
Algorithm 4: UpdateVmReplica (vmmirror, buffer)
// update mirror for successful completion of job at master VM
1 Input: buffer and vm Output: VmUpdateAck
2 boolean VmUpdateAck false
3
if buffer != NULL
4
set vmmirrorbuffer
// save buffer data to vm mirror
5
VmUpdateAck True
6
else
7
msg Buffer empty , no data for further updation
8 return VmUpdateAck

Algorithm 5: InitilizeVmReplica (vmmaster, vmmirror, job)

Title of manuscript is short and clear, implies research results (First Author)


ISSN: 2089-3337
38
1 Input: vmmaster, vmmirror, job Output: vmmaster , vmmirror pair
2 ht_rep vmrep.gethost()
3 rank vmrep.getRank()
// get rank of replica VM
4 vmrep vmbuf;
5 vmmaster vmmirror ;
// mirror VM take over as master VM
6 vmmirror vmrep;
// replica VM take over as mirror VM
7 if rank%2!=0
// which is now the rank of mirror
8
new_vm get_vm(ht)
9
vmrepnew_vm
// create new replica in same host
10 else
// both not available with the same
host ht
11
new_vm get_vm(ht++)
// get new VM from next host of VM
replica
12
vmrepnew_vm
// invoke this VM as replica VM
13 return vmmaster , vmmirror
3.5 Low Latency
With the proliferation of on demand computation in cloud speedy processing and minimizing
latency is becoming a critical concern for cloud providers, to deliver improve QoS parameter and for better
future expansions. Latency is something which is quite complex to compute as it is highly dependent on
different infrastructural layers like, internet traffic on distributed computing environment, abstraction at
virtualization level, priorities over SLA and QoS level, high abstraction for clients over location of
datacenters where their jobs are actually getting executed.
All these factors directly influence the latency rate and complicate the procedure for exact
computation of latency.
The end-to-end latency Tlat as specified for [6], is the sum of the processing time at the client Tclient
and the processing time at the server Tserver, delay at the client end Tdelayc and at the server Tdelays, at time of
failure and overall time of message transmission Tmsg, and overall time is expressed as;
Tlat= Tclient + Tserver + Tdelayc + Tdelays+ Tmsg
(3)
At server a buffer is maintained as an intermediate entity for facilitating communication and
periodic revision between master and mirror VMs. Taken in account the faulty scenario the latency increased
as master and mirror replacement takes place. Thus resultant latency will increase a communication overhead
to Tdelays and Tdelayc thus redefining. The transmission delay at server side for each transaction is
represented as;
Tdelays = k [Ttobuffer + Tfrombuffer ] + (k+) [Ttrigmirror + Tcomtoclient]
(4)
Tdelayc = (1+ ) [Tcomtoserver]
(5)
Where is number of failures and Ttrigmirror is time required to trigger mirror VM passing execution
control to it. Tcomtoclient is the communication to be established with client and new master VM. As a two way
communication link is between client and server, for every failure a delay is also incurred at client end in
getting connecting to server.
In this era of fast and reliable communication where data is transmitted close enough with the speed
of light, overall raw transmission time is almost negligible. When we transmit data between two VMs
residing in the same host or different host, the transmission delay is analogous making the choice
imperceptible. Latency is directly propositional to the hop count in some cases and geographical locations of
data centers also have some significance over it. However here as we had not specified about zonal
distribution or location of various datacenters, we would avoid the detail dialogue over it. Rest for our
proposed policy we had subjected the choice of VMs from same data center thus imparting lower latency
thus delivering lesser communication delay and higher QoS at client end.
4. SIMULATION AND DISCUSSION
Simulation of our experiment is done using WorkflowSim which is an open source toolkit extends
CloudSim. Hundred heterogeneous jobs are submitted for execution to VM, as the jobs executes successfully
VM replica is updated whereas in case of failure, VM replica is triggered for further execution and master
VM is set into sleep mode.
Varying Weibull parameters failure rate is oscillated running simulation and output are recorded for
further analysis over throughput, availability, latency and power consumption.
4.1 Throughput
Job completion graph against failure distribution are shown in the figure 3 (a), (b) and (c), where
number of VMs are generated dynamically as per failure rate and number of jobs assigned to the master VM
IJECE Vol. x, No. x, Month 201x : xx xx

IJ-CLOSER
39

ISSN: 2089-3337

are varied over. Weibull distribution is the core concept behind failure generation, shape and scale parameter
are varied to be 0.5, 1.0 and 2.0 (for both shape and scale parameter) and we could see that for all running
instances our proposed algorithm DRHA FT which is deploying 2k-Backup generation policy come out to
have cent percent throughput.

Figure 3(a)

Figure 3(b)

Figure 3(c)
Figure 3 (a), (b) and (c) Throughput Measurement for different Failure Rate, generated by variation in
Weibull Parameters.
4.2 Optimized Availability Factor
Fault Tolerant mechanism had to be considered over several attributes in order to prove it to be
better or optimal. And some of the attributes are interdependent on others thus trading off with the other
attributes features. Similar is the behavior of latency and availability. As we move far from the current
working node to select for its backup replica for higher availability, we trade off with the latency factor. Thus
balancing between the two we selected for only two instance of VM to run on same host, and for third failure
we jump to the next host again running only the two instances on selected host.
Availability values (normalized to 1) for replication techniques at different deployment scenario are
taken from [11]. Availability for the first failure is 0.9826 as the replica is stored in the same cluster and for
next failure this factor goes to 0.9840 where mirror replica is located at different cluster. In this way the
factor keeps on altering as the location of mirror replica changes from same cluster to different cluster. As
already mentioned we had defined a cluster to be comprises of one host and not group of host. We restricted
our location to the same datacenter as fluctuation between datacenter takes latency to another higher level. As
Title of manuscript is short and clear, implies research results (First Author)


ISSN: 2089-3337
40
we had worked for dual core elements number of processors with a single host are two only. Same could be
implemented for quad-core or octa-core processors also.
DRHA FT policy is compared with traditional policy here named as Random policy in which n (here
it is set to 10), VMs are working in two hosts and in case of saturation of all the VMs of first host i.e. n/2
VMs, VMs from next host instantiated for execution.

Figure 4. Comparitive analysis of DRHA FT resident policy with Tradition Random Policy for Availability of
the System at times of Failures.
Thus we can attain an optimal availability of 0.9833 with the residing policy in our DRHA FT
algorithm. And linear analysis shows that with the passage of time and increases of failure average
availability is of monotonically increasing order. With this resident policy we tried to higher availability
factor without compromising with latency rate, thus putting them together in the equilibrium states.
5. CONCLUSION
Fault Tolerance mechanisms are always subjected to evolutions and developments because of its
dependency on numerous factors like QoS, SLA at client end and energy consumption, cost efficiency,
robust, scalable etc., at service providers end. In near future the same mechanism would be slot in for
varying cluster size hosts and scrutinize over more metrics such as reliability and power consumption. Also at
this computation solution can be orchestrated with checkpoint updation at task level. To meet up with change
in requisitions and diversified environment fault tolerance mechanism should be scalable and robust, thus we
could also investigate modified policy over these important. This proposed work had not envisage failure of
datacenter which, however can be think out using live migration of current working VM to other healthy
datacenter.
ACKNOWLEDGEMENTS
I would like to acknowledge my mentor and fellow researchers for their input in this research work.
REFERENCES
[1] B. Schroeder and G. A. Gibson, A large-scale study of failures in high-performance computing systems, IEEE
Transactions on Dependable and Secure Computing, vol. 7, no. 4, pp. 337350, Oct. 2010.
[2] K. Ranganathan and I. Foster, Identifying dynamic replication strategies for a high-performance data grid, Lecture
Notes in Computer Science, pp. 7586, Jan. 2001.
[3] C. Engelmann, G. R. Vallee, T. Naughton, and S. L. Scott, Proactive fault tolerance using preemptive migration,
2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009.
[4] K. V. Vishwanath and N. Nagappan, Characterizing cloud computing hardware reliability, Proceedings of the 1st
ACM symposium on Cloud computing - SoCC 10, 2010.
[5] R. Jhawar and V. Piuri, Fault tolerance and resilience in cloud computing environments, Computer and Information
Security Handbook, pp. 125141, 2013.
[6] W. Zhao, L. E. Moser, and P. M. Melliar-Smith, End-to-end latency of a fault-tolerant CORBA infrastructure,
Performance Evaluation, vol. 63, no. 45, pp. 341363, May 2006.
[7] X. Defago, A. Schiper, and N. Sergent, Semi-passive replication, Proceedings Seventeenth IEEE Symposium on
Reliable Distributed Systems (Cat. No.98CB36281), 1998.
[8] M. Castro and B. Liskov, Practical byzantine fault tolerance and proactive recovery, ACM Transactions on
Computer Systems, vol. 20, no. 4, pp. 398461, Nov. 2002.
[9] W. E. Smith, K. S. Trivedi, L. A. Tomek, and J. Ackaret, Availability analysis of blade server systems, IBM Systems
Journal, vol. 47, no. 4, pp. 621640, 2008.

IJECE Vol. x, No. x, Month 201x : xx xx

IJ-CLOSER
41

ISSN: 2089-3337

[10] R. Jhawar and V. Piuri, Fault tolerance management in IaaS clouds, 2012 IEEE First AESS European Conference
on Satellite Telecommunications (ESTEL), Oct. 2012.
[11] D. S. Kim, F. Machida, and K. S. Trivedi, Availability modeling and analysis of a Virtualized system, 2009 15th
IEEE Pacific Rim International Symposium on Dependable Computing, Nov. 2009.
[12] M. N. Sharif and M. N. Islam, The Weibull distribution as a general model for forecasting technological change,
Technological Forecasting and Social Change, vol. 18, no. 3, pp. 247256, Nov. 1980.
[13] X. Dfago, K. R. Mazouni, and A. Schiper, Highly available trading system: Experiments with CORBA,
Middleware98, pp. 91104, 1998.
[14] A. Duminuco, E. Biersack, and T. En-Najjary, Proactive replication in distributed storage systems using machine
availability estimation, Proceedings of the 2007 ACM CoNEXT conference on - CoNEXT 07, 2007.

BIBLIOGRAPHY OF AUTHORS
Akanksha Chandola is an Assistant Professor of Computer Applications at Amrapali Institute
of Management and Computer Application, affiliated to Uttarakhand Technical University. He
received his Masters in Computer Application from Hemwati Nandan Garhwal University (now
Central University), Srinagar Garhwal. His current research interest includes computer graphics,
artificial neural network, geographical information systems and algorithms. He had about 10
years of experience as faculty including industrial exposure. He had several publications in
national and internation conference.

Dr. (Prof.) Nipur Singh is Professor at Gurukul Kangri Vishwalvidhyalaya, Haridwar and
currently Head of Department of Computer Science at Kanya Gurukul Campus, Dehradun. She
had working experience of about three decades as an academician. Her major research interest
includes; wireless computing, distributed networking, interconnection networks, adhoc network,
mobile agents and cloud computing. She had several publications at national and international
conferences, journals and books published in her name.

Ajay Rawat has eleven yearsof experiences in the academicsand industry in India andabroad.
He received the M.S.degree in Software from BITSPilani, India and presently he is pursuing
Ph.D from the Department of Computer Science& Engineering, UttarakhandTechnology
University, Uttarakhand, India. He hadworked with the Department of Computer Application of
Graphic Era University, Dehradun, India, in the capacity of Assistant Professor and
softwaredeveloper in NIIT Technologies, New Delhi, Inida.His area of interest includes cloud
computing, faulttolerance, algorithm, etc. He has published variouspapers in national and
international journals. He is certified in OpenStack Cloud.

Title of manuscript is short and clear, implies research results (First Author)

Anda mungkin juga menyukai