0
What s New?
Architecture changes ..
AAM Legato Automated availability manager > FDM fault domain manager
No dependencies on DNS .. works on IP
Primary-Secondory relationship to FDM - Slave ( Master-slave)
Support of multiple master and full support for network partition.
Datastore hearbeating - additional level of heartbeating avoids false.. with thi
s we truly know a failure situation
Enhanced isolation validation - avoids false positive when management network fa
ils
Enhanced admission control policies Host failure
# Prerequisites
Shared SAN ,Min 2 ESXi host running ,Vcenter
Recommendation : redundant mgmt network and datastore clusters
#Firewall Requirements 8182 TCP UDP in and out
#Configuring VMware High Availability
Simple click
#Components of High Availability
HOSTD Agent,vCenter,FDM
#Fault domain manager AGENT reponsible for communicating host resource info
- virtual machines state
- HA properties to other host in the cluster
- handles heartbeat mechanism
- Virtual machine placement , restarts , logging and much more
Note FDM spawns a watch dog process ..that monitors itself .. incase of failure
it restarts
Prior, there was no log sent to syslog . now /var/log/fdm logs the info
#hostd
responsible for powering on VM
FDM relies on Hostd for information about VM's
FDM halts all operation if hostd is not operational
#Vcenter
- responsible for intial deployment of HA agents softwares on host
- communciation of cluster changes like if esxi host disconnects ..then it sends
this an update to unprotect VMs and update the list of HA Agent
- protection of VM and status of cluster
it pushes the fdm agents . (parellel)
Note: HA responds to failure without Vcenter .its not needed
Exception is Stateless esxi host .. autodeply and vcenter is a must
For stateless . restart priority needs to set for VCetner and other services AD
DNS and sqldb to bring Vcenter up quicker
#Fundamental Concepts
Master-slave , heartbeating, Isolate vs network partitioned, VM protection
Master Agent
- to keep track of the VM state and take action
- A Vm will always respond to one Master agent
- takes responsibility by taking ownership of the datastore where vmx files resi
de
- When master fails .. Election process starts
- Election situation
When master fails
- network partitioned or isolated
- Master is disconnect from vcenter
- HA is reconfigured
- this takes approximately 15 seconds and done using UDP protocol
- host with highest number of datastore will be elected as master
- Incase 2 or more host have same number of datastore ..then number of managed
object is considered ..
- this is done in lexically.. meaing 99 is bigger than 100
- After election
- Slave establishes tcp connection with master ( secure and encrypted) SSL base
d
- Note : slaves dont communicate amongst each other .. only with master
# Task done by master
- MAster takes ownership of datastore and locks protectedlist file on the datas
tore
- /root of the datastore/ .vpshere-HA/ {UUID of cluster - number of Manageobject
id - random charstring8 - name of the host}
- Master updates this file with VM information and share with slaves
- When master is isolated or fails.. this lock on the file expires and new maste
r will relock it
- Master has the responsibility of restart vm incase slaves dont report to Maste
r
#Slaves
Less responsiblities
monitors running VM and update master about it
monitors health of master via heartbeat
#Files for both Slave and Master - 2 files by all host
# remote files stored on shared storage
Poweron file stores the list of running VM ( poweredon )
host<number>poweron
- this file is used to store state of the VM
- also used by slaves to inform master that it is isolated
- top line of file with 0 and 1 .. 0 non isolate 1 isolated
# Local files on each host to store information like affinity rules , cluster m
embership, vmtohost compatibility
/var/log/fdm
Heartbeating
Network heartbeat - sent every second
Datastore heartbeat - allows master to correctly determine the state of the slav
e host
- this is considered only when slave loose connection with master
- uses to poweronfile to identify ..if host is isolated
- if master determines ..host is failed by both checks ( network and disk) it wi
ll poweron VM
- If master determines host is isolated.. it will take action based on host isol
ation response.
By default, HA uses 2 datastore for this hearbeat
Selection process .. it chooses VMFS over NFS
Filename - host <number>-hb .. its written by host every 5 seconds
##Isolated versus Partitioned
is receiving election traffic and heartbeat from master - ISOLATED and cant ping
isolation address
Is receiving election but not heartbeat from master - partitioned
#Virtual Machine Protection
host failure, isolated host and guest failure
#Restarting Virtual Machines Restart Priority and Order
This is the current order which HA uses in case of a host failure and restarts n
eed to occur:
Agent virtual machines
FT secondary virtual machines
Virtual Machines configured with a restart priority of high,
Virtual Machines configured with a medium restart priority
Virtual Machines configured with a low restart priority
# Restart Retries
default attempt is 5 times ..time interval of 2 mins
HA doesnt allow more than 33 poweron task on a given host
#Scenario
Failed Host and Failure of slave
- Slave fails .. then master monitor the disk heartbeat for 15 seconds . if no d
isk heartbeat configured.. its declared dead
Slave fails .. then master monitor the disk heartbeat for 15 seconds . if disk