HA

vSphere 5.
0
What s New?
Architecture changes ..
AAM Legato Automated availability manager > FDM fault domain manager
No dependencies on DNS .. works on IP
Primary-Secondory relationship to FDM - Slave ( Master-slave)
Support of multiple master and full support for network partition.
Datastore hearbeating - additional level of heartbeating avoids false.. with thi
s we truly know a failure situation
Enhanced isolation validation - avoids false positive when management network fa
ils
Enhanced admission control policies Host failure
# Prerequisites
Shared SAN ,Min 2 ESXi host running ,Vcenter
Recommendation : redundant mgmt network and datastore clusters
#Firewall Requirements 8182 TCP UDP in and out
#Configuring VMware High Availability
Simple click
#Components of High Availability
HOSTD Agent,vCenter,FDM
#Fault domain manager AGENT reponsible for communicating host resource info
- virtual machines state
- HA properties to other host in the cluster
- handles heartbeat mechanism
- Virtual machine placement , restarts , logging and much more
Note FDM spawns a watch dog process ..that monitors itself .. incase of failure
it restarts
Prior, there was no log sent to syslog . now /var/log/fdm logs the info
#hostd
responsible for powering on VM
FDM relies on Hostd for information about VM's
FDM halts all operation if hostd is not operational
#Vcenter
- responsible for intial deployment of HA agents softwares on host
- communciation of cluster changes like if esxi host disconnects ..then it sends
this an update to unprotect VMs and update the list of HA Agent
- protection of VM and status of cluster
it pushes the fdm agents . (parellel)
Note: HA responds to failure without Vcenter .its not needed
Exception is Stateless esxi host .. autodeply and vcenter is a must
For stateless . restart priority needs to set for VCetner and other services AD
DNS and sqldb to bring Vcenter up quicker
#Fundamental Concepts
Master-slave , heartbeating, Isolate vs network partitioned, VM protection
Master Agent
- to keep track of the VM state and take action
- A Vm will always respond to one Master agent
- takes responsibility by taking ownership of the datastore where vmx files resi
de
- When master fails .. Election process starts
- Election situation
When master fails
- network partitioned or isolated
- Master is disconnect from vcenter
- HA is reconfigured
- this takes approximately 15 seconds and done using UDP protocol
- host with highest number of datastore will be elected as master
- Incase 2 or more host have same number of datastore ..then number of managed
object is considered ..
- this is done in lexically.. meaing 99 is bigger than 100
- After election
- Slave establishes tcp connection with master ( secure and encrypted) SSL base
d
- Note : slaves dont communicate amongst each other .. only with master
# Task done by master
- MAster takes ownership of datastore and locks protectedlist file on the datas
tore
- /root of the datastore/ .vpshere-HA/ {UUID of cluster - number of Manageobject
id - random charstring8 - name of the host}
- Master updates this file with VM information and share with slaves
- When master is isolated or fails.. this lock on the file expires and new maste
r will relock it
- Master has the responsibility of restart vm incase slaves dont report to Maste
r
#Slaves
Less responsiblities
monitors running VM and update master about it
monitors health of master via heartbeat
#Files for both Slave and Master - 2 files by all host
# remote files stored on shared storage
Poweron file stores the list of running VM ( poweredon )
host<number>poweron
- this file is used to store state of the VM
- also used by slaves to inform master that it is isolated
- top line of file with 0 and 1 .. 0 non isolate 1 isolated
# Local files on each host to store information like affinity rules , cluster m
embership, vmtohost compatibility
/var/log/fdm
clusterconfig - non human readable ..holds cluster membership info

compatlist fdm.cfg - to enable logging for fdm
hostlist - list of host , ip, mac, heartbeat datastore
Heartbeating
Network heartbeat - sent every second
Datastore heartbeat - allows master to correctly determine the state of the slav
e host
- this is considered only when slave loose connection with master
- uses to poweronfile to identify ..if host is isolated
- if master determines ..host is failed by both checks ( network and disk) it wi
ll poweron VM
- If master determines host is isolated.. it will take action based on host isol
ation response.
By default, HA uses 2 datastore for this hearbeat
Selection process .. it chooses VMFS over NFS
Filename - host <number>-hb .. its written by host every 5 seconds
##Isolated versus Partitioned
is receiving election traffic and heartbeat from master - ISOLATED and cant ping
isolation address
Is receiving election but not heartbeat from master - partitioned
#Virtual Machine Protection
host failure, isolated host and guest failure
#Restarting Virtual Machines Restart Priority and Order
This is the current order which HA uses in case of a host failure and restarts n
eed to occur:
Agent virtual machines
FT secondary virtual machines
Virtual Machines configured with a restart priority of high,
Virtual Machines configured with a medium restart priority
Virtual Machines configured with a low restart priority
# Restart Retries
default attempt is 5 times ..time interval of 2 mins
HA doesnt allow more than 33 poweron task on a given host
#Scenario
Failed Host and Failure of slave
- Slave fails .. then master monitor the disk heartbeat for 15 seconds . if no d
isk heartbeat configured.. its declared dead
Slave fails .. then master monitor the disk heartbeat for 15 seconds . if disk
heartbeat configured. it checks 5 secs for diskheartbeat. its declared dead

then VM is restarted
Failure of master
Master fails, election takes place 15s, new master reads the protectlist , new m
aster restarts VM
#Isolated host and response
Poweroff / shutdown/ leave poweredon
Isolation Response and Detection
#Selecting an Additional Isolation Address
storage array ip
Failure Detection Time and Restarting Virtual Machines
Corner Case Scenario: Split-Brain

Adding Resiliency to HA (Network Redundancy)
Link State Tracking
Admission Control
Admission Control Policy
Admission Control Mechanisms
Unbalanced Configurations and Impact on Slot Calculation
Percentage of Cluster Resources Reserved
Failover Hosts
Decision Making Time
Host Failures Cluster Tolerates
Percentage as Cluster Resources Reserved
Specify Failover Hosts
Recommendations
Selecting the Right Percentage
VM and Application Monitoring
Why Do You Need VM/Application Monitoring?
VM Monitoring Implementation Details
Screenshots
Application Monitoring
Advanced Options

HA

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

HA

Diunggah oleh

Hak Cipta:

Format Tersedia

vSphere 5.

clusterconfig - non human readable ..holds cluster membership info

heartbeat configured. it checks 5 secs for diskheartbeat. its declared dead

Corner Case Scenario: Split-Brain

Anda mungkin juga menyukai