3.
16
4.
16
5.
27
6.
41
7.
54
8.
59
9.
72
10.
77
11.
100
12.
122
13.
137
1. VMware
vSphere VMFS
Technical
Overview
and
Best
Practises
VMFS5
Provides
Distributed
Infrastructure
Services
for
Multiple
vSphere
Hosts
VMFS
enables
virtual
disk
files
to
be
shared
by
as
many
as
32
vSphere
hosts.
Furthermore,
it
manages
storage
access
for
multiple
vSphere
hosts
and
enables
them
to
read
and
write
to
the
same
storage
pool
at
the
same
time
Facilitates
Dynamic
Growth
Provides
Intelligent
Cluster
Volume
Management
Optimizes
Storage
Utilization
Enables
High
Availability
with
Lower
Management
Overhead
Simplifies
Disaster
Recovery
Best
Practices
for
Deployment
and
Use
of
VMFS
Topics
Addressed:
How
Large
a
LUN?
The
best
way
to
configure
a
LUN
for
a
given
VMFS
volume
is
to
size
for
throughput
first
and
capacity
second.
That
is,
you
should
aggregate
the
total
I/O
throughput
for
all
applications
or
virtual
machines
that
might
run
on
a
given
shared
pool
of
storage;
then
make
sure
you
have
provisioned
enough
back-end
disk
spindles
(disk
array
cache)
and
appropriate
storage
service
to
meet
the
requirements.
Because
there
is
no
single
correct
answer
to
the
question
of
how
large
your
LUNs
should
be
for
a
VMFS
volume,
the
more
important
question
to
ask
is,
How
long
would
it
take
one
to
restore
the
virtual
machines
on
this
datastore
if
it
were
to
fail?
The
recovery
time
objective
(RTO)
is
now
the
major
consideration
when
deciding
how
large
to
make
a
VMFS
datastore.
This
equates
to
how
long
it
would
take
an
administrator
to
restore
all
of
the
virtual
machines
residing
on
a
single
VMFS
volume
if
there
were
a
failure
that
caused
data
loss.
The
main
concern
now
is
how
long
it
would
take
to
recover
from
a
catastrophic
storage
failure.
Another
important
question
to
ask
is,
How
does
one
determine
whether
a
certain
datastore
is
overprovisioned
or
under
provisioned?
vSphere
Storage
DRS,
introduced
in
vSphere
5.0,
can
also
be
a
useful
feature
to
leverage
for
load
balancing
virtual
machines
across
multiple
datastores,
from
both
a
capacity
and
a
performance
perspective.
Isolation or Consolidation
The
basic
answer
depends
on
the
nature
of
the
I/O
access
patterns
of
that
virtual
machine.
If
you
have
a
very
heavy
I/O-
generating
application,
in
many
cases
VMware
vSphere
Storage
I/O
Control
can
assist
in
managing
fairness
of
I/O
resources
among
virtual
machines.
Another
consideration
in
addressing
the
noisy
neighbor
problem
is
that
it
might
be
worth
the
potentially
inefficient
use
of
resources
to
allocate
a
single
LUN
to
a
single
virtual
machine.
This
can
be
accomplished
using
either
an
RDM
or
a
VMFS
volume
that
is
dedicated
to
a
single
virtual
machine.
These
two
types
of
volumes
perform
similarly
(within
5
percent
of
each
other),
with
varying
read
and
write
sizes
and
I/O
access
patterns.
One
school
of
thought
suggests
limiting
the
access
of
a
single
LUN
to
a
single
virtual
machine.
In
the
physical
world,
this
is
quite
common.
When
using
RDMs,
such
isolation
is
implicit,
because
each
RDM
volume
is
mapped
to
a
single
virtual
machine.
The
downside
to
this
approach
is
that
as
you
scale
the
virtual
environment,
you
soon
reach
the
upper
limit
of
256
LUNs
per
host.
The
consolidation
school
wants
to
gain
additional
management
productivity
and
resource
utilization
by
pooling
the
storage
resource
and
sharing
it,
with
many
virtual
machines
running
on
several
vSphere
hosts.
Dividing
this
shared
resource
among
many
virtual
machines
enables
better
flexibility
as
well
as
easier
provisioning
and
ongoing
management
of
the
storage
resources
for
the
virtual
environment.
Compared
to
strict
isolation,
consolidation
normally
offers
better
utilization
of
storage
resources.
The
cost
is
additional
resource
contention,
which
under
some
circumstances
can
lead
to
reduction
in
virtual
machine
I/O
performance.
However,
vSphere
offers
Storage
I/O
Control
and
vSphere
Storage
DRS
to
mitigate
these
risks.
In
general,
use
vSphere
Storage
DRS
to
detect
and
mitigate
storage
latency
and
capacity
bottlenecks
by
load
balancing
virtual
machines
across
multiple
VMFS
volumes.
Additionally,
vSphere
Storage
I/O
Control
can
be
leveraged
to
ensure
fairness
of
I/O
resource
distribution
among
many
virtual
machines
sharing
the
same
VMFS
datastore.
Because
workloads
can
vary
significantly,
there
is
no
exact
formula
that
determines
the
limits
of
performance
and
scalability
regarding
the
number
of
virtual
machines
per
LUN.
These
limits
also
depend
on
the
number
of
vSphere
hosts
sharing
concurrent
access
to
a
given
VMFS
volume.
The
key
is
to
remember
the
upper
limit
of
256
LUNs
per
vSphere
host
and
consider
that
this
number
can
diminish
the
consolidation
ratio
if
you
take
the
concept
of
one
LUN
per
virtual
machine
too
far.
An
RDM
file
is
a
special
file
in
a
VMFS
volume
that
manages
metadata
for
its
mapped
device.
Employing
RDMs
provides
the
advantages
of
direct
access
to
a
physical
device
while
keeping
some
advantages
of
a
virtual
disk
in
the
VMFS
file
system.
In
effect,
the
RDM
merges
VMFS
manageability
with
raw
device
access.
Use
vMotion
to
migrate
virtual
machines
using
raw
volumes.
Add
raw
volumes
to
virtual
machines
using
the
VI
client.
Use
file
system
features
such
as
distributed
file
locking,
permissions
and
naming.
For
most
applications,
VMFS
is
the
clear
choice.
It
provides
the
automated
file
system
capabilities
that
make
it
easy
to
provision
and
manage
storage
for
virtual
machines
running
on
a
cluster
of
vSphere
hosts.
VMFS
has
an
automated
hierarchical
file
system
structure
with
user-friendly
file-naming
access.
It
enables
a
higher
disk
utilization
rate
by
facilitating
the
process
of
provisioning
the
virtual
disks
from
a
shared
pool
of
clustered
storage.
As
you
scale
the
number
of
vSphere
hosts
and
the
total
capacity
of
shared
storage,
VMFS
greatly
simplifies
the
process.
It
also
enables
a
larger
pool
of
storage
than
might
be
addressed
via
RDMs.
Because
the
number
of
LUNs
that
a
given
cluster
of
vSphere
hosts
can
discover
is
currently
capped
at
256,
you
can
reach
this
number
rather
quickly
if
mapping
a
set
of
LUNs
to
every
virtual
machine
running
on
the
vSphere
host
cluster.
Using
RDMs
usually
requires
more
frequent
and
varied
dependence
on
the
storage
administration
team,
because
each
LUN
must
be
sized
for
the
needs
of
each
specific
virtual
machine
to
which
it
is
mapped.
With
VMFS,
however,
you
can
carve
out
many
smaller
VMDKs
for
virtual
machines
from
a
single
VMFS
volume.
This
enables
the
partitioning
of
a
larger
VMFS
volumeor
a
single
LUNinto
several
smaller
virtual
disks,
which
facilitates
a
centralized
management
utility
(vCenter)
to
be
used
as
a
control
point
With
RDMs,
there
is
no
way
to
break
up
the
LUN
and
address
it
as
anything
more
than
a
single
disk
for
a
given
virtual
machine
Next,
remove
access
to
the
data
disk
from
the
physical
machine
and
make
sure
the
disk
is
properly
zoned
and
accessible
from
the
vSphere
host.
Then
create
an
RDM
for
the
new
virtual
machine
pointing
to
the
data
disk.
This
enables
the
contents
of
the
existing
data
disk
to
be
accessed
just
as
they
are,
without
the
need
to
copy
them
to
a
new
location.
RDM
Scenario
2:
Using
Microsoft
Cluster
Service
in
a
Virtual
Environment,
another
common
use
of
RDMs
is
for
MSCS
configurations.
When
and
How
to
Use
Disk
Spanning
It
is
generally
best
to
begin
with
a
single
LUN
in
a
VMFS
volume.
To
increase
the
size
of
that
resource
pool,
you
can
provide
additional
capacity
by
either
1)
adding
a
new
VMFS
extent
to
the
VMFS
volume
or
2)
increasing
the
size
of
the
VMFS
volume
on
an
underlying
LUN
that
has
been
expanded
in
the
array
(via
a
dynamic
expansion
within
the
storage
array).
Adding
a
new
extent
to
the
existing
VMFS
volume
will
result
in
the
existing
VMFS
volumes
spanning
across
more
than
one
LUN.
However,
until
the
initial
capacity
is
filled,
that
additional
allocation
of
capacity
is
not
yet
put
to
use.
Expanding
the
VMFS
volume
on
an
existing,
larger
LUN
will
also
increase
the
size
of
the
VMFS
volume,
but
it
should
not
be
confused
with
spanning.
From
a
management
perspective,
it
is
preferable
that
a
single
large
LUN
with
a
single
extent
host
your
VMFS.
Using
multiple
LUNs
to
back
multiple
extents
of
a
VMFS
volume
entails
presenting
every
LUN
to
each
of
the
vSphere
hosts
sharing
the
datastore.
Although
multiple
extents
might
have
been
required
prior
to
the
release
of
vSphere
5
and
VMFS5
to
produce
VMFS
volumes
larger
than
2TB,
VMFS5
now
supports
single-extent
volumes
up
to
64TB.
Gaining
Additional
Throughput
and
Storage
Capacity
Additional
capacity
with
disk
spanning
does
not
necessarily
increase
I/O
throughput
capacity
for
that
VMFS
volume.
It
does,
however,
result
in
increased
storage
capacity.
Suggestions
for
Rescanning
In
prior
versions
of
vSphere,
it
was
recommended
that
before
adding
a
new
VMFS
extent
to
a
VMFS
volume,
you
make
sure
a
rescan
of
the
SAN
is
executed
for
all
nodes
in
the
cluster
that
share
the
common
pool
of
storage.
However,
in
more
recent
versions
of
vSphere,
there
is
an
automatic
rescan
that
is
triggered
when
the
target
detects
a
new
LUN,
so
that
each
vSphere
host
updates
its
shared
storage
information
when
a
change
is
made
on
that
shared
storage
resource.
This
auto
rescan
is
the
default
setting
in
vSphere
and
is
configured
to
occur
every
300
seconds.
2. VMware Fault
Tolerance
Recommendations
and
Considerations
on
VMware
vSphere
VMware
High
Availability
Features
Timeline
VMware
Fault
Tolerance
(FT)
VMware
FT
is
a
feature
available
with
VMware
vSphere
TM
4
(i.e.,
ESX
4
and
vCenterTM
Server
4)
that
allows
a
virtual
machine
to
continue
running
even
when
the
underlying
physical
server
fails.
It
is
a
software
solution
that
runs
on
commodity
hardware
and
does
not
require
any
modifications
to
the
guest
operating
system
or
applications
running
inside
the
virtual
machine
Overview
When
VMware
FT
is
enabled
on
a
virtual
machine
(called
the
Primary
VM),
a
copy
of
the
Primary
VM
(called
the
Secondary
VM)
is
automatically
created
on
another
host,
chosen
by
VMware
Distributed
Resource
Scheduler
(DRS)
If
VMware
DRS
is
not
enabled,
the
target
host
is
chosen
from
the
list
of
available
hosts.
VMware
FT
then
runs
the
Primary
and
Secondary
VMs
in
lockstep
with
each
other
essentially
mirroring
the
execution
state
of
the
Primary
VM
to
the
Secondary
VM.
In
the
event
of
a
hardware
failure
that
causes
the
Primary
VM
to
fail,
the
Secondary
VM
immediately
picks
up
where
the
Primary
VM
left
off,
and
continues
to
run
without
any
loss
of
network
connections,
transactions,
or
data.
VMware
FT
keeps
the
Primary
and
Secondary
VMs
in
lockstep
using
VMware
vLockstep
technology.
vLockstep
technology
ensures
that
the
Primary
and
Secondary
VMs
execute
the
same
x86
instructions
in
an
identical
sequence.
Here,
the
Primary
VM
captures
all
nondeterministic
events
and
sends
them
across
a
VMware
FT
logging
network
to
the
Secondary
VM
As
both
the
Primary
and
Secondary
VMs
execute
the
same
instruction
sequence,
both
initiate
I/O
operations.
However,
the
outputs
of
the
Primary
VM
are
the
only
ones
that
take
effect:
disk
writes
are
committed,
network
packets
are
transmitted,
and
so
on.
All
outputs
of
the
Secondary
VM
are
suppressed
by
ESX.
Thus,
only
a
single
virtual
machine
instance
appears
to
the
outside
world.
Transparent
Failover
Along
with
keeping
the
Primary
and
Secondary
VMs
in
sync,
VMware
Fault
Tolerance
must
rapidly
detect
and
respond
to
hardware
failures
of
the
physical
machines
running
the
Primary
or
the
Secondary
VM.
When
vLockstep
technology
is
initiated,
the
ESX
hypervisor
starts
sending
heartbeats
over
the
FT
logging
network
between
the
ESX
hosts
where
the
Primary
and
Secondary
VMs
reside.
This
allows
VMware
FT
to
detect
immediately
if
a
host
fails
and
execute
a
transparent
failover
where
the
remaining
VMware
FT
virtual
machine
continues
running
the
protected
workload
without
interruption.
Consider
a
VMware
HA
cluster
of
three
ESX
hosts,
two
of
which
are
running
a
Primary
and
Secondary
VM.
If
the
host
running
the
Primary
VM
fails
the
Secondary
VM
is
immediately
activated
to
replace
the
Primary
VM.
A
new
Secondary
VM
is
created
and
fault
tolerance
is
re-established
in
a
short
period
of
time.
Unlike
the
initial
creation
of
the
Secondary
VM
where
DRS
chooses
the
target
ESX
host,
for
failovers
VMware
HA
chooses
the
target
ESX
host
for
the
new
Secondary
VM.
Users
experience
no
interruption
in
service
and
no
loss
of
data
during
the
transparent
failover.
Lifecycle
of
a
fault-tolerant
virtual
machine
Turning
on
and
enabling
VMware
FT
for
a
virtual
machine
affects
the
virtual
machines
lifecycle,
but
it
is
entirely
transparent
to
the
end-user
client
and
does
not
disrupt
client
connections
or
the
clients
workload.
The
following
steps
outline
the
lifecycle
of
a
VMware
FT
virtual
machine:
1.
Administrator
selects
a
virtual
machine
in
either
the
powered-on
or
off
state
and
turns
on
VMware
FT.
2.
The
virtual
machine
becomes
the
Primary
VM
and
a
Secondary
VM
is
automatically
created
and
assigned
to
an
ESX
host,
sharing
the
same
disk
as
the
ESX
host
running
the
Primary
VM.
3.
If
the
Primary
VM
is
already
powered-on
when
VMware
FT
is
turned
on,
its
active
state
is
immediately
migrated
using
a
special
form
of
VMotion
to
the
Secondary
VM
on
an
automatically
chosen
ESX
host.
If
the
Primary
VM
is
powered-off
then
the
migration
of
its
active
state
to
the
Secondary
VM
occurs
right
after
the
Primary
VM
is
powered
on.
4.
The
Secondary
VM
stays
synchronized
with
the
Primary
VM
through
VMware
vLockstep
technology.
5.
If
the
ESX
host
running
the
Primary
VM
goes
down,
the
Secondary
VM
will
immediately
go
live
and
become
the
Primary
VM.
6.
VMware
HA
automatically
starts
a
new
Secondary
VM
on
another
available
host
to
restore
protection.
7.
The
Secondary
VM
is
powered
off
when
the
Primary
VM
powers
off
or
when
VMware
FT
is
disabled.
The
Secondary
VM
is
removed
altogether
when
VMware
FT
is
turned
off.
Requirements:
Cluster
and
Host
Requirements
Storage
Requirements
Networking
Recommendations
At
a
minimum,
use
1
GbE
NICs
for
VMware
FT
logging
network.
Use
10
GbE
NICs
for
increased
bandwidth
of
FT
logging
traffic.
Ensure
that
the
networking
latency
between
ESX
hosts
is
low
Sub-millisecond
latency
is
recommended
for
the
FT
logging
network.
Use
vmkping
to
measure
the
latency.
VMware
vSwitch
settings
on
the
hosts
should
also
be
uniform,
such
as
using
the
same
VLAN
for
VMware
FT
logging,
to
make
these
hosts
available
for
placement
of
Secondary
VMs
Consider
using
a
VMware
vNetwork
Distributed
Switch
to
avoid
inconsistencies
in
the
vSwitch
settings
Baseline
Recommendation:
Preferably,
each
host
has
separate
1
GbE
NICs
for
FT
logging
traffic
and
VMotion.
The
reason
for
recommending
separate
NICs
is
that
the
creation
of
the
Secondary
VM
is
done
by
migrating
the
Primary
VM
with
VMotion.
This
can
produce
significant
traffic
on
the
VMotion
NIC
and
could
affect
VMware
FT
logging
traffic
if
the
NICs
are
shared.
In
addition,
it
is
preferable
that
the
VMware
FT
logging
NIC
has
redundancy,
so
that
no
unnecessary
failovers
occur
if
a
single
NIC
is
lost.
As
described
in
the
steps
below,
the
VMware
FT
logging
NIC
and
VMotion
NIC
can
be
configured
so
that
they
will
automatically
share
the
remaining
NIC
if
one
or
the
other
NIC
fails.
1.
Create
a
vSwitch
that
is
connected
to
at
least
two
physical
NICs.
2.
Create
a
VMware
VMkernel
connection
(displayed
as
VM
kernel
Port
in
vSphere
Client)
for
VMotion
and
another
one
for
FT
traffic.
3.
Make
sure
that
different
IP
addresses
are
set
for
the
two
VMkernel
connections.
4.
Assign
the
NIC
teaming
properties
to
ensure
that
vMotion
and
FT
use
different
NICs
as
the
active
NIC:
a.
For
VMotion:
Set
NIC
A
as
active
and
NIC
B
as
passive.
b.
For
FT:
Set
NIC
B
as
active
and
NIC
A
as
passive.
Not
supported:
Source
port
ID
or
source
MAC
address
based
load
balancing
policies
do
not
distribute
FT
logging
traffic.
However,
if
there
are
multiple
VMware
FT
host
pairs,
some
load
balancing
is
possible
with
an
IP-hash
load
balancing
scheme,
though
IP-hash
may
require
physical
switch
changes
such
as
ether-channel
setup.
VMware
FT
will
not
automatically
change
any
vSwitch
settings.
VMware
FT
Usage
Scenarios
VMware
FT
can
be
used
to
protect
mission-critical
workloads,
while
VMware
HA
protects
the
other
workloads
by
restarting
the
virtual
machine
in
the
event
of
a
virtual
machine
or
ESX
host
failure.
Running
VMware
FT
and
VMware
HA
virtual
machines
on
the
same
ESX
host
is
fully
supported.
VMware
HA
also
helps
protect
VMware
FT
virtual
machines
in
the
unlikely
case
where
the
ESX
hosts
running
the
Primary
and
Secondary
VMs
both
fail.
In
that
case,
VMware
HA
will
trigger
the
restart
of
the
Primary
VM
as
well
as
re-spawn
a
new
Secondary
VM
onto
another
host.
Note
that
if
the
guest
operating
system
in
the
Primary
VM
fails,
such
as
resulting
from
a
blue
screen
in
Windows,
the
Secondary
VM
will
experience
the
same
failure.
The
VMware
HA
feature
called
VM
Monitoring
will
detect
this
Primary
VM
failure
through
VMware
Tools
heartbeats
and
VMware
HA
will
automatically
restart
the
failed
Primary
VM
and
re-spawn
a
new
Secondary
VM.
VMware
FT
on-demand
The
process
of
turning
on
VMware
FT
for
a
virtual
machine
takes
on
the
order
of
minutes.
Turning
off
VMware
FT
occurs
in
seconds.
This
allows
virtual
machines
to
be
turned
on
and
off
on-demand
when
needed.
Turning
on
and
off
VMware
FT
can
also
be
automated
by
scheduling
the
task
for
certain
times
using
the
vSphere
CLI.
During
critical
times
in
your
datacenter,
such
as
the
last
three
days
of
the
quarter
when
any
outage
can
be
disastrous,
VMware
FT
on-demand
can
be
scheduled
to
protect
virtual
machines
for
the
critical
72
or
96
hours
when
protection
is
vital.
When
the
critical
period
ends
VMware
FT
is
turned
off
again,
and
the
resources
used
for
the
Secondary
VM
are
no
longer
allocated.
Patching
hosts
running
VMware
FT
virtual
machines
When
ESX
hosts
are
running
VMware
FT
virtual
machines,
the
ESX
hosts
running
the
Primary
and
Secondary
VMs
must
be
running
the
same
ESX
version
and
patch
level.
This
requirement
must
be
carefully
considered
when
updating
the
ESX
hosts.
The
following
two
approaches
are
recommended
for
patching
ESX
hosts
with
FT
virtual
machines.
The
first
approach
is
suggested
for
environments
where
disabling
VMware
FT
for
virtual
machines
can
be
tolerated
for
the
amount
of
time
required
to
update
all
ESX
hosts
in
the
cluster
For
each
virtual
machine
protected
by
VMware
FT
in
the
cluster,
right-click
the
virtual
machine,
highlight
Fault
Tolerance
and
select
Disable
Fault
Tolerance
(note:
turning
off
VMware
FT
would
work
but
turning
it
back
on
later
would
take
longer).
After
updating
all
hosts
in
the
cluster
to
the
same
version
and
patch
level
right-click
each
virtual
machine
you
wish
to
protect
with
VMware
FT,
highlight
Fault
Tolerance,
and
select
Enable
Fault
Tolerance.
Please
note
that
the
performance
data
of
the
Secondary
VM
will
be
lost
when
you
turn
off
VMware
FT
for
the
virtual
machine.
This
data
is
not
lost
when
you
disable
VMware
FT.
Recommendations
for
Reliability
Removing
single
points
of
failure
from
your
environment
is
the
most
important
practice
in
increasing
reliability.
Reduce
single
points
of
failure
by
implementing
multiple
NICs,
multiple
HBAs,
multiple
power
supplies,
storage
RAID,
etc.
Fully-redundant
NIC
teaming
and
storage
multi-pathing
are
recommended
to
improve
reliability.
VMware
FT
does
attempt
a
failover
if
the
Primary
VM
loses
all
paths
to
fibre
channel
storage
and
the
Secondary
VM
still
has
connection
to
fibre
channel
storage,
but
customers
should
not
rely
on
this.
Instead
they
should
implement
fully-redundant
NIC
teaming
and
storage
multi-pathing
Other
recommendations
to
improve
reliability
include:
Uniformity
of
Hosts:
The
ESX
hosts
in
your
cluster
should
be
as
uniform
to
each
other
as
possible
as
described
in
the
Cluster
and
host
requirements
section.
For
better
performance,
the
hosts
running
the
Primary
and
Secondary
VMs
should
operate
at
roughly
the
same
processor
frequencies
in
order
to
ensure
the
highest
level
of
fault
tolerance.
Processor
speed
differences
greater
than
400
MHz
in
frequency
may
become
problematic
for
CPU-bound
workloads.
CPU
frequency
scaling
may
cause
the
Secondary
VM
to
run
slower
than
the
Primary
VM
and
will
cause
the
Primary
VM
to
slow
down.
It
is
therefore
recommended
that
BIOS-based
power
management
features
be
used
consistently
across
hosts
and
that
certain
settings
should
be
avoided
on
hosts
with
VMware
FT
virtual
machines.
VMware
Distributed
Power
Management
(DPM)
will
not
recommend
a
host
for
power
off
unless
it
can
successfully
recommend
VMotion
migrations
of
all
virtual
machines
off
that
host.
Since
VMware
FT
virtual
machines
are
VMware
DRS
disabled
and
cannot
be
migrated
by
VMotion
recommendations,
VMware
DPM
will
not
recommend
powering
off
any
host
with
running
VMware
FT
virtual
machines
However,
VMware
DPM
can
still
be
enabled
on
a
VMware
HA
cluster
running
VMware
FT
virtual
machines
and
will
simply
provide
power
on
or
off
recommendations
for
hosts
not
running
VMware
FT
virtual
machines.
Placement
of
Fault
Tolerant
Virtual
Machines
VMware
FT
creates
Secondary
VMs
and
places
them
onto
another
ESX
host.
If
VMware
DRS
is
enabled,
DRS
decides
the
target
host
for
the
Secondary
VM
when
VMware
FT
is
turned
on.
If
DRS
is
not
enabled,
the
target
host
is
chosen
from
the
list
of
available
hosts.
After
a
failover,
VMware
HA
decides
the
target
host
for
the
new
Secondary
VM.
When
enabling
VMware
FT
for
many
virtual
machines,
you
may
want
to
avoid
the
situation
where
many
Primary
and
Secondary
VMs
are
placed
on
the
same
host.
The
number
of
fault
tolerant
virtual
machines
that
you
can
safely
run
on
each
host
cannot
be
stated
precisely
because
the
number
is
based
on
the
ESX
host
size,
the
virtual
machine
size,
and
workload
factors,
all
of
which
can
vary
widely.
VMware
does
expect
the
number
of
supportable
VMware
FT
VMs
running
on
a
host
to
be
bound
by
the
saturation
of
the
VMware
FT
logging
network
Given
this,
it
is
recommended
that
no
more
than
four
Primary
and
Secondary
VMs
be
placed
onto
the
same
ESX
host.
For
running
more
than
four
VMware
FT
virtual
machines
on
a
host,
refer
to
the
following:
As
described
in
the
section
on
VMware
vLockstep
technology,
the
VMware
FT
logging
network
traffic
depends
on
the
amount
of
nondeterministic
events
and
external
inputs
that
are
recorded
at
the
Primary
VM.
Since
the
bulk
of
this
traffic
usually
consists
of
incoming
network
packets
and
disk
reads
one
could
calculate
the
amount
of
networking
bandwidth
required
for
VMware
FT
logging
using
the
following:
VMware
FT
logging
bandwidth
~=
(Avg
disk
reads
(MB/s)
x
8
+
Avg
network
input
(Mbps))
x
1.2
[20%
headroom]
The
above
calculation
reserves
an
additional
20
percent
of
networking
bandwidth
on
top
of
the
disk
and
network
inputs
to
the
virtual
machine.
This
20
percent
headroom
is
recommended
for
transmitting
nondeterministic
CPU
events
and
for
TCP/IP
overhead.
You
can
measure
the
characteristics
of
your
workload
through
the
vSphere
Client.
Click
the
Performance
tab
of
the
virtual
machine
to
see
disk
and
network
I/O.
When
running
multiple
VMware
FT
virtual
machines
on
the
same
ESX
host,
mix
Primary
and
Secondary
VMs
together.
The
bulk
of
the
VMware
FT
logging
traffic
flows
from
the
Primary
VM
to
the
Secondary
VM.
Much
less
traffic
flows
from
the
Secondary
VM
to
the
Primary
VM.
Therefore,
the
bandwidth
of
the
VMware
FT
logging
NICs
will
be
better
utilized
if
each
host
has
a
mix
of
Primary
and
Secondary
VMs,
rather
than
all
Primary
VMs
or
all
Secondary
VMs.
Also,
the
Secondary
VM
does
not
perform
any
I/O
to
the
virtual
machine
network
and
disk.
So,
the
utilization
of
the
virtual
machine
network
and
disk
will
also
be
more
balanced
if
a
host
has
a
mix
of
Primary
and
Secondary
VMs.
Timekeeping
Recommendations
In
order
to
avoid
time
mis-match
issues
of
a
virtual
machine
after
an
VMware
FT
failover,
perform
the
following
steps:
1.
Synchronize
the
guest
operating
system
time
with
a
time
source,
which
will
depend
whether
the
guest
is
Windows
or
Linux.
2.
Synchronize
the
time
of
each
ESX
server
host
with
a
network
time
protocol
(NTP)
server.
Windows
guest
operating
system
time
synch
For
Windows
Server
2003
guest
operating
systems,
synchronize
time
with
the
appropriate
domain
controllers
within
their
Microsoft
Active
Directory
(AD)
domain.
In
turn,
each
domain
controller
should
sync
their
clock
with
the
primary
domain
controller
emulator
(PDC
Emulator)
of
the
domain.
All
PDC
Emulators
should
be
time
synchronized
with
the
PDC
Emulator
of
the
root
forest
domain.
Finally
the
PDC
Emulator
of
the
root
forest
domain
should
be
time
synchronized
with
a
stratum
1
time
source
such
as
an
NTP
time
server
or
a
hardware
atomic
clock.
If
AD
is
not
being
used
in
your
environment,
synchronize
time
directly
with
the
NTP
time
server
or
another
reliable
external
time
source.
Please
refer
to
your
Windows
documentation
for
details.
Linux
guest
operating
system
time
synchronization
For
Linux
guest
operating
systems,
synchronize
time
with
an
NTP
server
by
performing
the
following
steps:
1.
Open
the
VMware
Tools
Properties
dialog
box
from
within
the
guest.
Under
Miscellaneous
Options,
make
sure
Time
synchronization
between
the
virtual
machine
and
the
ESX
Server
option
is
not
checked.
Synchronize
time
with
an
NTP
time
server.
Please
refer
to
Installing
and
Configuring
Linux
Guest
Operating
Systems
for
configuration
details.
http://www.vmware.com/resources/techresources/1076
If
your
guest
operating
system
is
very
time-sensitive,
then
synchronize
the
guest
operating
system
directly
with
the
NTP
server.
The
method
to
do
this
varies
depending
on
the
guest
operating
system.
Please
consult
your
guest
operating
system
documentation
for
details.
VMware
FT
Application
Recommendations
Here
are
a
few
example
recommendations
for
protecting
applications
with
FT.Example
1:
High
availability
for
a
multi-tiered
SAP
application
SAP
NetWeaver
7.0
is
a
service-oriented
application
and
integration
platform
that
serves
as
the
foundation
for
all
other
SAP
applications.
Within
this
multi-tiered
SAP
NetWeaver
7.0
application,
the
ABAP
SAP
Central
Services
(ASCS)
instance
is
a
single
point
of
failure.
(ABAP
stands
for
Advanced
Business
Application
Programming.)
ASCS
is
a
group
of
two
servers:
the
Message
Server
and
the
Enqueue
Server.
The
Message
Server
handles
all
communications
in
the
SAP
system.
Messaging
Server
failures
cause
internal
communications
between
SAP
dispatchers
to
fail.
Other
problems
include
failures
in
user
logon
and
in
batch
job
scheduling.
The
Enqueue
Server
manages
the
logical
locks
for
SAP
documents
and
objects
during
transactions.
Enqueue
Server
failures
result
in
automatic
roll
backs
of
all
transactions
holding
locks
and
SAP
updates
that
are
requesting
locks
will
be
aborted.
Since
the
ASCS
is
a
single
point
of
failure,
it
requires
a
high
availability
solution.
For
moderate
use
cases
of
client
connections,
a
single
vCPU
virtual
machine
running
ASCS
will
suffice.
Running
these
services
on
a
single
vCPU
virtual
machine
on
another
host
will
allow
it
to
be
protected
with
VMware
FT
ESX
#1:
Virtual
machine
with
two
vCPUs
running
the
database
and
SAP
Central
Instance
(minus
the
Message
and
Enqueue
Servers).
Note:
This
host
is
also
running
an
SAP-specific
load
driver
benchmark
called
the
Sales
and
Distribution
(SD)
Benchmark.
This
benchmark
was
used
to
validate
continuous
transaction
execution
with
VMware
FT
during
host
failover.
ESX
#2:
Virtual
machine
with
one
vCPU
running
ASCS
(i.e.,
the
Message
and
Enqueue
Servers).
This
virtual
machine
has
VMware
FT
turned
on
and
acts
as
the
Primary
VM.
ESX
#3:
Virtual
machine
with
one
vCPU
acting
as
the
Secondary
VM
for
the
ASCS.
Upon
failure
of
either
ESX
#2
or
#3,
VMware
FT
allows
the
virtual
machine
on
the
other
host
to
immediately
takeover
execution.
Thus,
the
ASCS
services
will
not
lose
any
data
and
will
not
experience
any
interruption
in
service.
This
can
be
tested
by
manually
checking
lock
integrity
via
SAP
transaction
SM12,
the
SAP
lock
management
transaction.
If
ESX
#1
fails
the
database
(protected
via
VMware
HA)
will
temporarily
go
down
but
will
not
force
a
client
disconnection
for
users
logged
onto
separate
dialog
instance
virtual
machines
(not
shown
above).
The
client
will
only
experience
a
pause
until
the
database
comes
back
online
either
when
the
host
is
rebooted
or
when
the
database
virtual
machine
is
rebooted
on
another
host
through
VMware
HA.
Example
2:
High
availability
for
the
Blackberry
Enterprise
Server
The
Blackberry
Enterprise
Server
(BES)
4.1.6
for
Microsoft
Exchange
enables
push-based
access
in
delivering
Exchange
email,
calendar,
contacts,
scheduling,
instant
messaging,
and
other
Web
services
to
Blackberry
devices.
Running
BES
in
a
single
vCPU
virtual
machine
can
support
up
to
200
users
that
receive
an
average
of
100-200
email
messages
per
day.
Unless
there
is
a
failover
mechanism
in
place,
the
loss
of
BES
due
to
hardware
failure
will
result
in
the
disruption
of
Blackberry
users
ability
to
synch
with
Exchange.
VMware
FT
can
be
turned
on
for
the
BES
virtual
machine
as
shown
in
Figure
7
to
provide
continuous
availability
that
can
survive
ESX
host
failures.
ESX
#1:
Virtual
machine
with
two
vCPUs
running
the
database
and
Microsoft
Exchange
server.
ESX
#2:
Virtual
machine
with
one
vCPU
running
BES
4.1.6.
This
virtual
machine
has
VMware
FT
turned
on
and
acts
as
the
Primary
VM.
ESX
#3:
Virtual
machine
with
one
vCPU
acting
as
the
Secondary
VM
for
BES
4.1.6.
A
failure
of
either
ESX
#2
or
#3
results
in
no
loss
of
email
delivery
to
the
Blackberry
device.
VMware
FT
ensures
that
the
BES
workload
is
uninterrupted.
Currently
there
are
a
number
of
different
methods
to
protect
BES
from
failure,
ranging
from
simple
backup
plans
to
having
offline
stand-by
servers
prepared.
However,
VMware
FT
is
the
only
software
solution
to
offer
uninterrupted
protection
for
BES
service
while
remaining
cost-effective
and
user-friendly.
Summary
of
Performance
Recommendations
For
each
virtual
machine
there
are
two
VMware
FT-related
actions
that
can
be
taken:
turning
FT
on/off
and
enabling/disabling
FT.
Turning
on
FT
prepares
the
virtual
machine
for
VMware
FT
by
prompting
for
the
removal
of
unsupported
devices,
disabling
unsupported
features,
and
setting
the
virtual
machines
memory
reservation
to
be
equal
to
its
memory
size
(thus
avoiding
ballooning
or
swapping).
Enabling
FT
performs
the
actual
creation
of
the
Secondary
VM
by
live-migrating
the
Primary
VM.
Note:
Turning
on
VMware
FT
for
a
powered-on
virtual
machine
will
also
automatically
Enable
FT
for
that
virtual
machine.
Each
of
these
operations
has
performance
implications.
Do
not
turn
on
VMware
FT
for
a
virtual
machine
unless
you
will
be
using
(i.e.,
Enabling)
VMware
FT
for
that
machine.
Turning
on
VMware
FT
automatically
disables
some
features
for
the
specific
virtual
machine
that
can
help
performance,
such
as
hardware
virtual
MMU
(if
the
processor
supports
it).
Enabling
VMware
FT
for
a
virtual
machine
uses
additional
resources
(for
example,
the
Secondary
VM
uses
as
much
CPU
and
memory
as
the
Primary
VM).
Therefore
make
sure
you
are
prepared
to
devote
the
resources
required
before
enabling
VMware
FT.
The
live
migration
that
takes
place
when
VMware
FT
is
enabled
can
briefly
saturate
the
VMotion
network
link
and
can
also
cause
spikes
in
CPU
utilization.
If
the
VMotion
network
link
is
also
being
used
for
other
operations,
such
as
VMware
FT
logging,
the
performance
of
those
other
operations
can
be
impacted.
For
this
reason,
it
is
best
to
have
separate
and
dedicated
NICs
for
FT
logging
traffic
and
also
for
VMotion,
especially
when
multiple
VMware
FT
virtual
machines
reside
on
the
same
host.
Because
this
potentially
resource-intensive
live
migration
takes
place
each
time
FT
is
enabled,
it
is
recommended
that
VMware
FT
not
be
frequently
enabled
and
disabled.
Because
VMware
FT
logging
traffic
is
asymmetric
(the
majority
of
the
traffic
flows
from
Primary
to
Secondary
VM),
congestion
on
the
logging
NIC
can
be
avoided
by
distributing
primaries
onto
multiple
hosts.
For
example,
on
a
cluster
with
two
ESX
hosts
and
two
virtual
machines
with
VMware
FT
enabled,
placing
one
of
the
Primary
VMs
on
each
of
the
hosts
allows
the
network
bandwidth
to
be
utilized
bi-directionally.
VMware
FT
virtual
machines
that
receive
large
amounts
of
network
traffic
or
perform
lots
of
disk
reads
can
create
significant
bandwidth
on
the
VMware
FT
logging
NIC.
This
is
true
of
machines
that
routinely
do
these
things
as
well
as
machines
doing
them
only
intermittently,
such
as
during
a
backup
operation.
To
avoid
saturating
the
network
link
used
for
logging
traffic,
limit
the
number
of
VMware
FT
virtual
machines
on
each
host
or
limit
disk
read
bandwidth
and
network
receive
bandwidth
of
those
virtual
machines.
Make
sure
the
VMware
FT
logging
traffic
is
carried
by
at
least
a
1
GbE-rated
NIC
(which
should
in
turn
be
connected
to
at
least
1
GbE-rated
infrastructure).
Avoid
placing
more
than
four
VMware
FT-enabled
virtual
machines
on
a
single
host.
In
addition
to
reducing
the
possibility
of
saturating
the
network
link
used
for
logging
traffic,
this
also
limits
the
number
of
live-
migrations
needed
to
create
new
Secondary
VMs
in
the
event
of
a
host
failure.
If
the
Secondary
VM
lags
too
far
behind
the
Primary
VM
(which
can
happen
when
the
Primary
VM
is
CPU
bound
and
the
Secondary
VM
is
not
getting
enough
CPU
cycles),
the
hypervisor
may
slow
down
execution
on
the
Primary
VM
to
allow
the
Secondary
VM
to
catch
up.
This
can
be
avoided
by
making
sure
the
hosts
on
which
the
Primary
and
Secondary
VMs
run
are
relatively
closely
matched
with
similar
CPU
make,
model,
and
frequency.
It
is
recommended
to
disable
certain
power
management
settings
that
do
not
allow
for
adjustments
based
on
workload.
As
another
alternative,
enabling
CPU
reservations
for
the
Primary
VM
(which
will
be
duplicated
for
the
Secondary
VM)
will
help
ensure
that
the
Secondary
VM
gets
CPU
cycles
when
it
requires
them.
Though
timer
interrupt
rates
do
not
significantly
affect
VMware
FT
performance,
high
timer
interrupt
rates
create
additional
network
traffic
on
the
FT
logging
NIC.
Therefore,
if
possible,
reduce
timer
interrupt
rates
as
described
in
the
Guest
Operating
System
CPU
Considerations
section
of
Performance
Best
Practices
for
VMware
vSphereTM
4.
Fault
Tolerance
Host
Networking
Configuration
Example
This
example
describes
the
host
network
configuration
for
Fault
Tolerance
in
a
typical
deployment
with
four
1GB
NICs.
This
is
one
possible
deployment
that
ensures
adequate
service
to
each
of
the
traffic
types
identified
in
the
example
and
could
be
considered
a
best
practice
configuration.
Fault
Tolerance
provides
full
uptime
during
the
course
of
a
physical
host
failure
due
to
power
outage,
system
panic,
or
similar
reasons.
Network
or
storage
path
failures
or
any
other
physical
server
components
that
do
not
impact
the
host
running
state
may
not
initiate
a
Fault
Tolerance
failover
to
the
Secondary
VM.
Therefore,
customers
are
strongly
encouraged
to
use
appropriate
redundancy
(for
example,
NIC
teaming)
to
reduce
that
chance
of
losing
virtual
machine
connectivity
to
infrastructure
components
like
networks
or
storage
arrays.
NIC
Teaming
policies
are
configured
on
the
vSwitch
(vSS)
Port
Groups
(or
Distributed
Virtual
Port
Groups
for
vDS)
and
govern
how
the
vSwitch
will
handle
and
distribute
traffic
over
the
physical
NICs
(vmnics)
from
virtual
machines
and
vmkernel
ports.
A
unique
Port
Group
is
typically
used
for
each
traffic
type
with
each
traffic
type
typically
assigned
to
a
different
VLAN.
Distribute
each
NIC
team
over
two
physical
switches
ensuring
L2
domain
continuity
for
each
VLAN
between
the
two
physical
switches.
Use
deterministic
teaming
policies
to
ensure
particular
traffic
types
have
an
affinity
to
a
particular
NIC
(active/standby)
or
set
of
NICs
(for
example,
originating
virtual
port-id).
Where
active/standby
policies
are
used,
pair
traffic
types
to
minimize
impact
in
a
failover
situation
where
both
traffic
types
will
share
a
vmnic.
Where
active/standby
policies
are
used,
configure
all
the
active
adapters
for
a
particular
traffic
type
(for
example,
FT
Logging)
to
the
same
physical
switch.
This
minimizes
the
number
of
network
hops
and
lessens
the
possibility
of
oversubscribing
the
switch
to
switch
links.
VLAN
A:
Virtual
Machine
Network
Port
Group-active
on
vmnic2
(to
physical
switch
#1);
standby
on
vmnic0
(to
physical
switch
#2.)
VLANB:
Management
Network
PortGroup-active
on
vmnic0
(to
physical
switch#2);stand
by
on
vmnic2
(to
physical
switch
#1.)
VLAN
C:
vMotion
Port
Group-active
on
vmnic1
(to
physical
switch
#2);
standby
on
vmnic3
(to
physical
switch
#1.)
VLAND:FT
Logging
PortGroup-active
on
vmnic3(to
physical
switch
#1);standby
on
vmnic1(to
physical
switch
#2.)
vMotion
and
FT
Logging
can
share
the
same
VLAN
(configure
the
same
VLAN
number
in
both
port
groups),
but
require
their
own
unique
IP
addresses
residing
in
different
IP
subnets.
However,
separate
VLANs
might
be
preferred
if
Quality
of
Service
(QoS)
restrictions
are
in
effect
on
the
physical
network
with
VLAN
based
QoS.
QoS
is
of
particular
use
where
competing
traffic
comes
into
play,
for
example,
where
multiple
physical
switch
hops
are
used
or
when
a
failover
occurs
and
multiple
traffic
types
compete
for
network
resources.
It
reacts
to
hardware
failure
and
network
disruptions
by
restarting
virtual
machines
on
active
hosts
within
the
cluster.
It
detects
operating
system
(OS)
failures
by
continuously
monitoring
a
virtual
machine
and
restarting
it
as
required.
It
provides
a
mechanism
to
react
to
application
failures.
It
provides
the
infrastructure
to
protect
all
workloads
within
the
cluster,
in
contrast
to
other
clustering
solutions.
Users
can
combine
HA
with
VMware
vSphere
Distributed
Resource
SchedulerTM
(DRS)
to
protect
against
failures
and
to
provide
load
balancing
across
the
hosts
within
a
cluster.
Additionally,
care
should
be
taken
to
remove
any
inconsistencies
that
would
prevent
a
virtual
machine
from
being
started
on
any
cluster
host.
Inconsistencies
such
as
the
mounting
of
datastores
to
a
subset
of
the
cluster
hosts
or
the
implementation
of
VSphere
DRSrequired
virtual
machineto-host
affinity
rules
are
scenarios
to
consider
carefully.
The
avoidance
of
these
conditions
will
increase
the
portability
of
the
virtual
machine
and
provide
a
higher
level
of
availability.
The
overall
size
of
a
cluster
is
another
important
factor
to
consider.
Smaller-sized
clusters
require
a
larger
relative
percentage
of
the
available
cluster
resources
to
be
set
aside
as
reserve
capacity
to
handle
failures
adequately.
For
example,
to
ensure
that
a
cluster
of
three
nodes
can
tolerate
a
single
host
failure,
about
33
percent
of
the
cluster
resources
are
reserved
for
failover.
A
10-node
cluster
requires
that
only
10
percent
be
reserved.
In
contrast,
as
cluster
size
increases
so
does
the
management
complexity
of
the
cluster,
However,
this
increase
in
management
complexity
is
overshadowed
by
the
benefits
a
large
cluster
can
provide.
Host
Versioning
An
ideal
configuration
is
one
in
which
all
the
hosts
contained
within
the
cluster
use
the
latest
version
of
ESXi.
When
adding
a
host
to
vSphere
5.0
clusters,
it
is
always
a
best
practice
to
upgrade
the
host
to
ESXi
5.0
and
to
avoid
using
clusters
with
mixed-host
versions.
Mixed
clusters
are
supported
but
not
recommended
because
there
is
some
differences
in
vSphere
HA
performance
between
host
versions
and
these
differences
can
introduce
operational
variances
in
a
cluster.
These
differences
arise
from
the
fact
that
earlier
host
versions
do
not
offer
the
same
capabilities
as
later
versions.
For
example,
VMware
ESX
3.5
hosts
do
not
support
certain
properties
present
within
ESX
4.0
and
greater.
These
properties
were
added
to
ESX
4.0
to
inform
vSphere
HA
of
conditions
warranting
a
restart
of
a
virtual
machine.
As
a
result,
HA
will
not
restart
virtual
machines
that
crash
while
running
on
ESX
3.5
hosts
but
will
restart
such
a
virtual
machine
if
it
was
running
on
an
ESX
4.0
or
later
host.
The
following
apply
if
using
a
vSphere
HAenabled
cluster
that
includes
hosts
with
differing
versions:
Users
should
be
aware
of
the
general
limitations
of
using
a
mixed
cluster,
as
previously
mentioned.
Users
should
also
know
that
ESXi
3.5
hosts
within
a
5.0
cluster
must
include
a
patch
to
address
an
issue
involving
file
locks.
For
ESX
3.5
hosts,
users
must
apply
the
ESX350-201012401-SG
patch.
For
ESXi
3.5,
they
must
apply
the
ESXe350-
201012401-I-BG
patch.
Prerequisite
patches
must
be
applied
before
applying
these
patches.
HA
will
not
enable
an
ESX/ESXi
3.5
host
to
be
added
to
the
cluster
if
it
does
not
meet
the
patch
requirements.
Users
should
avoid
deploying
mixed
clusters
if
VMware
vSphere
Storage
vMotion
or
VMware
vSphere
Storage
DRS
is
required.
The
vSphere
5.0
Availability
Guide
has
more
information
on
this
topic
Use
of
VMware
vCenter
Server
Heartbeata
specially
designed
high
availability
solution
for
vCenter
Server
Use
of
vSphere
HAuseful
in
environments
in
which
the
vCenter
Server
instance
is
virtualized,
such
as
when
using
the
VMware
vCenter
Server
Appliance
It
is
extremely
critical
when
using
ESXi
Auto
Deploy
that
both
the
Auto
Deploy
service
and
the
vCenter
Server
instance
used
are
highly
available.
In
the
event
of
a
loss
of
the
vCenter
Server
instance,
Auto
Deploy
hosts
might
not
be
able
to
reboot
successfully
in
certain
situations.
However,
it
bears
repeating
here
that
if
vSphere
HA
is
used
to
make
vCenter
Server
highly
available,
the
vCenter
Server
virtual
machine
must
be
configured
with
a
restart
priority
of
high.
Additionally,
this
virtual
machine
should
be
configured
to
run
on
two
or
more
hosts
that
are
not
managed
by
Auto
Deploy.
This
can
be
done
by
using
a
DRS
virtual
machineto-host
must
run
on
rule
or
by
deploying
the
virtual
machine
on
a
datastore
accessible
to
only
these
hosts.
Because
Auto
Deploy
depends
upon
the
availability
of
vCenter
Server
in
certain
circumstances,
this
ensures
that
the
vCenter
Server
virtual
machine
is
able
to
come
online.
This
does
not
require
that
vSphere
DRS
be
enabled
if
users
employ
DRS
rules,
because
these
rules
will
remain
in
effect
after
DRS
has
been
disabled.
Networking
Design
Considerations
General
Networking
Guidelines
If
the
physical
network
switches
that
connect
the
servers
support
the
PortFast
(or
an
equivalent
setting,
this
should
be
enabled.
If
this
feature
is
not
enabled,
it
can
take
a
while
for
a
host
to
regain
network
connectivity
after
booting
due
to
the
execution
of
lengthy
spanning
tree
algorithms.
While
this
execution
is
occurring,
virtual
machines
cannot
run
on
the
host
and
HA
will
report
the
host
as
isolated
or
dead.
Isolation
will
be
reported
if
the
host
and
an
FDM
master
can
access
the
hosts
heartbeat
datastores.
Host
monitoring
should
be
disabled
when
performing
any
network
maintenance
that
might
disable
all
heartbeat
paths
(including
storage
heartbeats)
between
the
hosts
within
the
cluster,
because
this
might
trigger
an
isolation
response.
Configuration
of
hosts
with
management
networks
on
different
subnets
as
part
of
the
same
cluster
is
supported.
One
or
more
isolation
addresses
for
each
subnet
should
be
configured
accordingly.
Refer
to
the
Host
Isolation
section
for
more
details.
The
management
network
supports
the
use
of
jumbo
frames
as
long
as
the
MTU
values
and
physical
network
switch
configurations
are
set
correctly.
Ensure
that
the
network
supports
jumbo
frames
end
to
end.
Setting
Up
Redundancy
for
vSphere
HA
Networking
Networking
redundancy
between
cluster
hosts
is
absolutely
critical
for
vSphere
HA
reliability.
Redundant
management
networking
enables
the
reliable
detection
of
failures.
NOTE:
Because
this
document
is
primarily
focused
on
vSphere
5.0,
its
use
of
the
term
management
network
refers
to
the
VMkernel
network
selected
for
use
as
a
management
network.
Refer
to
the
vSphere
Availability
Guide
for
information
regarding
the
service
console
network
when
using
VMware
ESX
4.1,
ESX
4.0,
or
ESX
3.5x.
Network
Adaptor
Teaming
and
Management
Networks
Using
a
team
of
two
network
adaptors
connected
to
separate
physical
switches
can
improve
the
reliability
of
the
management
network.
The
cluster
is
more
resilient
to
failures
because
the
hosts
are
connected
to
each
other
through
two
network
adaptors
and
through
two
separate
switches
and
thus
they
have
two
independent
paths
for
cluster
communication.
To
configure
a
network
adaptor
team
for
the
management
network,
it
is
recommended
to
configure
the
vNICs
in
the
distributed
switch
configuration
for
the
ESXi
host
in
an
active/standby
configuration.
This
is
illustrated
in
the
following
example:
Requirements:
Two
physical
network
adaptors
VLAN
trunking
Two
physical
switches
The
distributed
switch
should
be
configured
as
follows:
Load
balancing
set
to
route
based
on
the
originating
virtual
port
ID
(default)
Failback
set
to
No
vSwitch0:
Two
physical
network
adaptors
(for
example,
vmnic0
and
vmnic2)
Two
port
groups
(for
example,
vMotion
and
management)
In
this
example,
the
management
network
runs
on
vSwitch0
as
active
on
vmnic0
and
as
standby
on
vmnic2.
The
vMotion
network
runs
on
vSwitch0
as
active
on
vmnic2
and
as
standby
on
vmnic0.
It
is
recommended
to
use
NIC
ports
from
different
physical
NICs
and
it
is
preferable
that
the
NICs
are
different
makes
and
models.
Failback
is
set
to
no
because
in
the
case
of
physical
switch
failure
and
restart,
ESXi
might
falsely
determine
that
the
switch
is
back
online
when
its
ports
first
come
online.
However,
the
switch
itself
might
not
be
forwarding
any
packets
until
it
is
fully
online.
Therefore,
when
failback
is
set
to
no
and
an
issue
arises,
both
the
management
network
and
vMotion
network
will
be
running
on
the
same
network
adaptor
and
will
continue
running
until
the
user
manually
intervenes.
Management
Network
Changes
in
a
vSphere
HA
Cluster
vSphere
HA
uses
the
management
network
as
its
primary
communication
path.
As
a
result,
it
is
critical
that
proper
precautions
are
taken
whenever
a
maintenance
action
will
affect
the
management
network.
As
a
general
rule,
whenever
maintenance
is
to
be
performed
on
the
management
network,
the
host-monitoring
functionality
of
vSphere
HA
should
be
disabled.
This
will
prevent
HA
from
determining
that
the
maintenance
action
is
a
failure
and
from
consequently
triggering
the
isolation
responses.
If
there
are
changes
involving
the
management
network,
it
is
advisable
to
reconfigure
HA
on
all
hosts
in
the
cluster
after
the
maintenance
action
is
completed.
This
ensures
that
any
pertinent
changes
are
recognized
by
HA.
Changes
that
cause
a
loss
of
management
network
connectivity
are
grounds
for
performing
a
reconfiguration
of
HA.
An
example
of
this
is
the
addition
or
deletion
of
networks
used
for
management
network
traffic
when
the
host
is
not
in
maintenance
mode.
Storage
Design
Considerations
Best
practices
for
storage
design
reduces
the
likelihood
of
hosts
losing
connectivity
to
the
storage
used
by
the
virtual
machines,
and
that
used
by
vSphere
HA
for
Heartbeating.
To
maintain
a
constant
connection
between
an
ESXi
host
and
its
storage,
ESXi
supports
multipathing,
a
technique
that
enables
users
to
employ
more
than
one
physical
path
to
transfer
data
between
the
host
and
an
external
storage
device.
In
case
of
a
failure
of
any
element
in
the
SAN,
such
as
an
adapter,
switch
or
cable,
ESXi
can
move
to
another
physical
path
that
does
not
use
the
failed
component.
In
addition
to
path
failover,
multipathing
provides
load
balancing,
which
is
the
process
of
distributing
I/O
loads
across
multiple
physical
paths.
Load
balancing
reduces
or
removes
potential
bottlenecks.
Storage
Heartbeats
A
new
feature
of
vSphere
HA
in
vSphere
5.0
makes
it
possible
to
use
storage
subsystems
as
a
means
of
communication
between
the
hosts
of
a
cluster.
Storage
heartbeats
are
used
when
the
management
network
is
unavailable
to
enable
a
slave
HA
agent
to
communicate
with
a
master
HA
agent.
The
feature
also
makes
it
possible
to
distinguish
accurately
between
the
different
failure
scenarios
of
dead,
isolated
or
partitioned
hosts.
Storage
heartbeats
enable
detection
of
cluster
partition
scenarios
that
are
not
supported
with
previous
versions
of
vSphere.
This
results
in
a
more
coordinated
failover
when
host
isolation
occurs.
By
default,
vCenter
Server
will
select
automatically
two
datastores
to
use
for
storage
heartbeats,
It
is
intended
to
select
datastores
that
are
connected
to
the
highest
number
of
hosts.
The
algorithm
is
designed
to
select
datastores
that
are
backed
by
different
LUNs
or
NFS
servers.
A
preference
is
given
to
VMware
vSphere
VMFSformatted
datastores
over
NFS-
hosted
datastores.
vCenter
Server
selects
the
heartbeat
datastores
when
HA
is
enabled,
when
a
datastore
is
added
or
removed
from
a
host
and
when
the
accessibility
to
a
datastore
changes.
Users
can,
however,
configure
vSphere
HA
to
give
preference
to
a
subset
of
the
datastores
mounted
by
the
hosts
in
the
cluster.
Alternately,
they
can
require
that
HA
choose
only
from
a
subset
of
these.
VMware
recommends
the
users
employ
the
default
setting
unless
there
are
datastores
in
the
cluster
that
are
more
highly
available
than
others.
If
there
are
some
more
highly
available
datastores,
VMware
recommends
that
users
configure
vSphere
HA
to
give
preference
to
these.
VMware
does
not
recommend
restricting
vSphere
HA
to
using
only
a
subset
of
the
datastores
because
this
setting
restricts
the
systems
ability
to
respond
when
a
host
loses
connectivity
to
one
of
its
configured
heartbeat
datastores.
NOTE:
vSphere
HA
datastore
heartbeating
is
very
lightweight
and
will
not
impact
in
any
way
the
use
of
the
datastores
by
virtual
machines.
Although
users
can
increase
to
four
the
number
of
heartbeat
datastores
chosen
for
each
host,
increasing
the
number
does
not
make
the
cluster
significantly
more
tolerant
of
failures.
(See
the
vSphere
Metro
Storage
Cluster
white
paper
for
details
about
heartbeat
datastore
recommendations
specific
to
stretched
clusters.)
Environments
that
provide
only
network-based
storage
must
work
optimally
with
the
network
architecture
to
realize
fully
the
potential
of
the
storage
heartbeat
feature.
If
the
storage
network
traffic
and
the
management
network
traffic
flow
through
the
same
network
components,
disruptions
in
network
service
might
disrupt
both.
It
is
recommended
that
these
networks
be
separated
as
much
as
possible
or
that
datastores
with
a
different
failure
domain
be
used
for
heartbeating.
In
cases
where
converged
networking
is
used,
VMware
recommends
that
users
leave
heartbeating
enabled.
This
is
because
even
with
converged
networking
failures
can
occur
that
disrupt
only
the
management
network
traffic.
For
example,
the
VLAN
tags
for
the
management
network
might
be
incorrectly
changed
without
impacting
those
used
for
storage
traffic.
It
is
also
recommended
that
all
hosts
within
a
cluster
have
access
to
the
same
datastores.
This
promotes
virtual
machine
portability
because
the
virtual
machines
can
then
run
on
any
of
the
hosts
within
the
cluster.
Such
a
configuration
is
also
beneficial
because
it
maximizes
the
chance
that
an
isolated
or
partitioned
host
can
communicate
with
a
master
during
a
network
partition
or
isolation
event.
If
network
partitions
or
isolations
are
anticipated
within
the
environment,
users
should
ensure
that
a
minimum
of
two
shared
datastores
is
provisioned
to
all
hosts
in
the
cluster
Cluster
Configuration
Considerations
Host
Isolation
One
key
mechanism
within
vSphere
HA
is
the
ability
for
a
host
to
detect
when
it
has
become
network-isolated
from
the
rest
of
the
cluster.
With
this
information,
vSphere
is
able
to
take
administrator-specified
action
with
respect
to
running
virtual
machines
on
the
host
that
has
been
isolated.
Depending
on
network
layout
and
specific
business
needs,
the
administrator
might
wish
to
tune
the
vSphere
HA
response
to
an
isolated
host
to
favor
rapid
failover
or
to
leave
the
virtual
machine
running
so
clients
can
continue
to
access
it.
The
following
section
explains
how
a
vSphere
HA
node
detects
when
it
has
been
isolated
from
the
rest
of
the
cluster,
and
the
response
options
available
to
that
node
after
that
determination
has
been
made.
Host
Isolation
Detection
Host
isolation
detection
happens
at
the
individual
host
level.
Isolation
fundamentally
means
a
host
is
no
longer
able
to
communicate
over
the
management
network.
To
determine
if
it
is
network-isolated,
the
host
attempts
to
ping
its
configured
isolation
addresses.
The
isolation
address
used
should
always
be
reachable
by
the
host
under
normal
situations,
because
after
five
seconds
have
elapsed
with
no
response
from
the
isolation
addresses,
the
host
then
declares
itself
isolated.
The
default
isolation
address
is
the
gateway
specified
for
the
management
network.
Advanced
settings
can
be
used
to
modify
the
isolation
addresses
used
for
your
particular
environment.
The
option
das.isolationaddress[X]
(where
X
is
09)
is
used
to
configure
multiple
isolation
addresses.
Additionally,
das.usedefaultisolationaddress
is
used
to
indicate
whether
the
default
isolation
address
(the
default
gateway)
should
be
used
to
determine
if
the
host
is
network-isolated.
If
the
default
gateway
is
not
able
to
receive
ICMP
ping
packets,
you
must
set
this
option
to
false.
Host
Isolation
Response
Tuning
the
host
isolation
response
is
typically
based
on
whether
loss
of
connectivity
to
a
host
via
the
management
network
would
typically
also
indicate
that
clients
accessing
the
virtual
machine
would
also
be
affected.
In
this
case
it
is
likely
that
administrators
would
want
the
virtual
machines
shut
down
so
other
hosts
with
operational
networks
can
start
them
up.
If
failures
of
the
management
network
are
not
likely
correlated
with
failures
of
the
virtual
machine
network,
where
the
loss
of
the
management
network
simply
results
in
the
inability
to
manage
the
virtual
machines
on
the
isolated
host,
it
is
often
preferable
to
leave
the
virtual
machines
running
while
the
management
network
connectivity
is
restored.
The
Host
Isolation
Response
setting
provides
a
means
to
set
the
action
preferred
for
the
powered-on
virtual
machines
maintained
by
a
host
when
that
host
has
declared
it
is
isolated.
There
are
three
possible
isolation
response
values
that
can
be
configured
and
applied
to
a
cluster
or
individually
to
a
specific
virtual
machine.
Leave
Powered
On
Power
Off
Shut
Down
Leave
Powered
On
With
this
option,
virtual
machines
hosted
on
an
isolated
host
are
left
powered
on.
In
situations
where
a
host
loses
all
management
network
access,
a
virtual
machine
might
still
have
the
ability
to
access
the
storage
subsystem
and
the
virtual
machine
network.
By
selecting
this
option,
the
user
enables
the
virtual
machine
to
continue
to
function
if
this
were
to
occur.
This
is
the
default
isolation
response
setting
in
vSphere
HA
5.0.
Power
Off
When
this
isolation
response
option
is
used,
the
virtual
machines
on
the
isolated
host
are
immediately
stopped.
This
is
similar
to
removing
the
power
from
a
physical
host.
This
can
induce
inconsistency
with
the
file
system
of
the
OS
used
in
the
virtual
machine.
The
advantage
of
this
action
is
that
vSphere
HA
will
attempt
to
restart
the
virtual
machine
more
quickly
than
when
using
the
Shut
Down
option.
Shut
Down
Through
the
use
of
the
VMware
Tools
package
installed
within
the
guest
OS
of
a
virtual
machine,
this
option
attempts
to
shut
down
the
OS
gracefully
with
the
virtual
machine
before
powering
off
the
virtual
machine.
This
is
more
desirable
than
using
the
Power
Off
option
because
it
provides
the
OS
with
time
to
commit
any
outstanding
I/O
activity
to
disk.
HA
will
wait
for
a
default
time
period
of
300
seconds
(five
minutes)
for
this
graceful
shutdown
to
occur.
If
the
OS
is
not
gracefully
shut
down
by
this
time,
it
will
initiate
a
power
off
of
the
virtual
machine.
Changing
the
das.isolationshutdowntimeout
attribute
will
modify
this
timeout
if
it
is
determined
that
more
time
is
required
to
shut
down
an
OS
gracefully.
The
Shut
Down
option
requires
that
the
VMware
Tools
package
be
installed
in
the
guest
OS.
Otherwise,
it
is
equivalent
to
the
Power
Off
setting.
In
environments
that
use
only
network-based
storage
protocols,
such
as
iSCSI
and
NFS,
and
those
that
share
physical
network
components
between
the
management
and
storage
traffic,
the
recommended
isolation
response
is
Power
Off.
With
these
environments,
it
is
likely
that
a
network
outage
causing
a
host
to
become
isolated
will
also
affect
the
hosts
ability
to
communicate
to
the
datastores.
This
situation
might
be
problematic
if
both
instances
of
the
virtual
machine
retain
access
to
the
virtual
machine
network.
The
Power
Off
isolation
response
recommendation
reduces
the
impact
of
this
issue
by
having
the
isolated
HA
agent
power
off
the
virtual
machines
on
the
isolated
host.
The
following
table
lists
the
recommended
isolation
policy
for
converged
network
configurations:
Host
Monitoring
The
host
monitoring
setting
determines
whether
vSphere
HA
restarts
virtual
machines
on
other
hosts
in
the
cluster
after
a
host
isolation,
a
host
failure
or
after
they
should
crash
for
some
other
reason.
This
setting
does
not
impact
the
VM/application
monitoring
feature.
If
host
monitoring
is
disabled,
isolated
hosts
wont
apply
the
configured
isolation
response,
and
vSphere
HA
wont
restart
virtual
machines
that
fail
for
any
reason.
Disabling
host
monitoring
also
impacts
VMware
vSphere
Fault
Tolerance
(FT)
because
it
controls
whether
HA
will
restart
an
FT
secondary
virtual
machine
after
a
failure
event.
Cluster
Partitions
A
cluster
partition
is
a
situation
where
a
subset
of
hosts
within
the
cluster
loses
the
ability
to
communicate
with
the
rest
of
the
hosts
in
the
cluster
but
can
still
communicate
with
each
other.
This
can
occur
for
various
reasons,
but
the
most
common
cause
is
the
use
of
a
stretched
cluster
configuration.
A
stretched
cluster
is
defined
as
a
cluster
that
spans
multiple
sites
within
a
metropolitan
area.
When
a
cluster
partition
occurs,
one
subset
of
hosts
is
still
able
to
communicate
to
a
master
node.
The
other
subset
of
hosts
cannot.
For
this
reason,
the
second
subset
will
go
through
an
election
process
and
elect
a
new
master
node.
Therefore,
it
is
possible
to
have
multiple
master
nodes
in
a
cluster
partition
scenario,
with
one
per
partition.
This
situation
will
last
only
as
long
as
the
partition
exists.
After
the
network
issue
causing
the
partition
is
resolved,
the
master
nodes
will
be
able
to
communicate
and
discover
multiple
master
roles.
Anytime
multiple
master
nodes
exist
and
can
communicate
with
each
other
over
the
management
network,
all
but
one
will
abdicate.
Robust
management
network
architecture
helps
to
avoid
cluster
partition
situations.
Additionally,
if
a
network
partition
occurs,
users
should
ensure
that
each
host
retains
access
to
its
heartbeat
datastores,
and
that
the
masters
are
able
to
access
the
heartbeat
datastores
used
by
the
slave
hosts
vSphere
Metro
Storage
Cluster
Considerations
VMware
vSphere
Metro
Storage
Clusters
(vMSC),
or
stretched
clusters
as
they
are
often
called,
are
environments
that
span
multiple
sites
within
a
metropolitan
area
(typically
up
to
100km).
Storage
systems
in
these
environments
typically
enable
a
seamless
failover
between
sites.
Because
this
a
complex
environment,
a
paper
specific
to
the
vMSC
has
been
produced.
Download
it
here:
http://www.vmware.com/resources/
techresources/10299
Auto
Deploy
Considerations
Auto
Deploy
utilizes
a
PXE
boot
infrastructure
to
provision
a
host
automatically.
No
host-state
information
is
stored
on
the
host
itself.
The
best
practices
recommendation
from
VMware
staff
for
environments
using
Auto
Deploy
is
as
follows:
Deploy
vCenter
Server
Heartbeat.
vCenter
Server
Heartbeat
delivers
high
availability
for
vCenter
Server,
protecting
the
virtual
and
cloud
infrastructure
from
application-,
configuration-,
OS-
or
hardware-related
outages.
(EOA)
Avoid
using
Auto
Deploy
in
stretched
cluster
environments,
because
this
complicates
the
environment
Deploy
vCenter
Server
in
a
virtual
machine.
Run
the
vCenter
Server
virtual
machine
in
a
vSphere
HAenabled
cluster
and
configure
the
virtual
machine
with
a
vSphere
HA
restart
priority
of
high.
Perform
one
of
the
following
actions:
o Include
two
or
more
hosts
in
the
cluster
that
are
not
managed
by
Auto
Deploy
and
pin
the
vCenter
Server
virtual
machine
to
these
hosts
by
using
a
rule
(vSphere
DRSrequired
virtual
machineto-host
rule).
Users
can
set
up
the
rule
and
then
disable
DRS
if
they
do
not
wish
to
use
DRS
in
the
cluster.
o Deploy
vCenter
Server
and
Auto
Deploy
in
a
separate
management
environment,
that
is,
by
hosts
managed
by
a
different
vCenter
server.
Virtual
Machine
and
Application
Health
Monitoring
These
features
enable
the
vSphere
HA
agent
on
a
host
to
detect
heartbeat
information
on
a
virtual
machine
through
VMware
Tools
or
an
agent
running
within
the
virtual
machine
that
is
monitoring
the
application
health.
After
the
loss
of
a
defined
number
of
VMware
Tools
heartbeats
on
the
virtual
machine,
vSphere
HA
will
reset
the
virtual
machine.
Virtual
machine
and
application
monitoring
are
not
dependent
on
the
virtual
machine
protection
state
attribute
as
reported
by
the
vSphere
Client.
This
attribute
signifies
that
vSphere
HA
detects
that
the
preferred
state
of
the
virtual
machine
is
to
be
powered
on.
For
this
reason,
HA
will
attempt
to
restart
the
virtual
machine
assuming
that
there
is
nothing
restricting
the
restart.
Conditions
that
might
restrict
this
action
include
insufficient
resources
available
and
a
disabled
virtual
machine
restart
priority.
This
functionality
is
not
available
when
the
vSphere
HA
agent
on
a
host
is
in
the
uninitialized
state,
as
would
occur
immediately
after
the
vSphere
HA
agent
has
been
installed
on
the
host
or
when
the
host
is
not
available.
Additionally,
the
number
of
missed
heartbeats
is
reset
after
the
vSphere
HA
agent
on
the
host
reboots.
This
should
occur
rarely
if
at
all,
or
after
vSphere
HA
is
reconfigured
on
the
host.
Because
virtual
machines
exist
only
for
the
purposes
of
hosting
an
application,
it
is
highly
recommended
that
virtual
machine
health
monitoring
be
enabled.
All
virtual
machines
must
have
the
VMware
Tools
package
installed
within
the
guest
OS.
NOTE:
Guest
OS
sleep
states
are
not
currently
supported
by
virtual
machine
monitoring
and
can
trigger
an
unnecessary
restart
of
the
virtual
machine.
vSphere
HA
and
vSphere
FT
Often
vSphere
HA
is
used
in
conjunction
with
vSphere
FT
and
provides
protection
for
extremely
critical
virtual
machines
where
any
loss
of
service
is
intolerable
vSphere
HA
detects
the
use
of
FT
to
ensure
proper
operation.
This
section
describes
some
of
the
unique
behavior
specific
to
vSphere
FT
with
vSphere
HA.
Additional
vSphere
FT
best
practices
can
be
found
in
the
vSphere
5.0
Availability
Guide.
Host
Partitions
vSphere
HA
will
restart
a
secondary
virtual
machine
of
a
vSphere
FT
virtual
machine
pair
when
the
primary
virtual
machine
is
running
in
the
same
partition
as
the
master
HA
agent
that
is
responsible
for
the
virtual
machine.
If
this
condition
is
not
met,
the
secondary
virtual
machine
in
5.0
cannot
be
restarted
until
the
partition
ends.
Host
Isolation
Host
isolation
responses
are
not
performed
on
virtual
machines
enabled
with
vSphere
FT.
The
rationale
is
that
the
primary
and
secondary
FT
virtual
machine
pairs
are
already
communicating
via
the
FT
logging
network.
So
they
either
continue
to
function
and
have
network
connectivity
or
they
have
lost
network
and
they
are
not
heartbeating
over
the
FT
logging
network,
in
which
case
one
of
them
will
then
take
over
as
a
primary
FT
virtual
machine.
Because
vSphere
HA
does
not
offer
better
protection
than
that,
it
bypasses
FT
virtual
machines
when
initiating
host
isolation
response.
Ensure
that
the
FT
logging
network
that
is
used
is
implemented
with
redundancy
to
provide
greater
resiliency
to
failures
for
FT.
Admission
Control
vCenter
Server
uses
HA
admission
control
to
ensure
that
sufficient
resources
in
the
cluster
are
reserved
for
virtual
machine
recovery
in
the
event
of
host
failure.
Admission
control
will
prevent
the
following
if
there
is
encroachment
on
resources
reserved
for
virtual
machines
restarted
due
to
failure:
This
mechanism
is
highly
recommended
to
guarantee
the
availability
of
virtual
machines.
With
vSphere
5.0,
HA
offers
the
following
configuration
options
for
choosing
users
admission
control
strategy:
Host
Failures
Cluster
Tolerates
(default):
HA
ensures
that
a
specified
number
of
hosts
can
fail
and
that
sufficient
resources
remain
in
the
cluster
to
fail
over
all
the
virtual
machines
from
those
hosts.
HA
uses
a
concept
called
slots
to
calculate
available
resources
and
required
resources
for
a
failing
over
of
virtual
machines
from
a
failed
host.
Under
some
configurations,
this
policy
might
be
too
conservative
in
its
reservations.
The
slot
size
can
be
controlled
using
several
advanced
configuration
options.
In
addition,
an
advanced
option
can
be
used
to
specify
the
default
slot
size
value
for
CPU.
This
value
is
used
when
no
CPU
reservation
has
been
specified
for
a
virtual
machine.
The
value
was
changed
in
vSphere
5.0
from
256MHz
to
32MHz.
When
no
memory
reservation
is
specified
for
a
virtual
machine,
the
largest
memory
overhead
for
any
virtual
machine
in
the
cluster
will
be
used
as
the
default
slot
size
value
for
memory.
See
the
vSphere
Availability
Guide
for
more
information
on
slot-size
calculation
and
tuning
Percentage
of
Cluster
Resources
Reserved
as
failover
spare
capacity:
vSphere
HA
ensures
that
a
specified
percentage
of
memory
and
CPU
resources
are
reserved
for
failover.
This
policy
is
recommended
for
situations
where
the
user
must
have
host
virtual
machines
with
significantly
different
CPU
and
memory
reservations
in
the
same
cluster
or
have
different-sized
hosts
in
terms
of
CPU
and
memory
capacity
(vSphere
5.0
adds
the
ability
to
specify
different
percentages
for
memory
and
CPU
through
the
vSphere
Client).
A
key
difference
between
this
policy
and
the
Host
Failures
Cluster
Tolerates
policy
is
that
with
this
option
the
capacity
set
aside
for
failures
can
be
fragmented
across
hosts.
Specify
a
Failover
Host:
vSphere
HA
designates
a
specific
host
or
hosts
as
a
failover
host(s).
When
a
host
fails,
HA
attempts
to
restart
its
virtual
machines
on
the
specified
failover
host(s).
The
ability
to
specify
more
than
one
failover
host
is
a
new
feature
in
vSphere
HA
5.0.
When
a
host
is
designated
as
a
failover
host,
HA
admission
control
does
not
enable
the
powering
on
of
virtual
machines
on
that
host,
and
DRS
will
not
migrate
virtual
machines
to
the
failover
host.
It
effectively
becomes
a
hot
standby.
With
each
of
the
three
admission
control
policies
there
is
a
chance
in
specific
scenarios
that,
at
the
time
of
failing
over
a
virtual
machine,
there
might
be
insufficient
contiguous
capacity
available
on
a
single
host
to
power
on
a
given
virtual
machine
.
Although
these
are
corner
case
scenarios
this
has
been
taken
into
account
and
HA
will
request
vSphere
DRS,
if
it
is
enabled,
to
attempt
to
defragment
the
capacity
in
such
situations.
Further,
if
a
host
had
been
put
into
standby
and
vSphere
DPM
is
enabled,
it
will
attempt
to
power
up
a
host
if
defragmentation
is
not
sufficient.
The
best
practices
recommendation
from
VMware
staff
for
admission
control
is
as
follows:
Select
the
Percentage
of
Cluster
Resources
Reserved
policy
for
admission
control.
This
policy
offers
the
most
flexibility
in
terms
of
host
and
virtual
machine
sizing
and
is
sufficient
for
most
situations.
When
configuring
this
policy,
the
user
should
choose
a
percentage
for
CPU
and
memory
that
reflects
the
number
of
host
failures
they
wish
to
support.
For
example,
if
the
user
wants
vSphere
HA
to
set
aside
capacity
for
two
host
failures
and
there
are
10
hosts
of
equal
capacity
in
the
cluster,
then
they
should
specify
20
percent
(2/10).
If
there
are
not
equal
capacity
hosts,
then
the
user
should
specify
a
percentage
that
equals
the
capacity
of
the
two
largest
hosts
as
a
percentage
of
the
cluster
capacity.
If
the
Host
Failures
Cluster
Tolerates
policy
is
used,
attempt
to
keep
virtual
machine
resource
reservations
similar
across
all
configured
virtual
machines.
Host
Failures
Cluster
Tolerates
uses
a
notion
of
slot
sizes
to
calculate
the
amount
of
capacity
needed
as
a
reserve
for
each
virtual
machine.
The
slot
size
is
based
on
the
largest
reserved
memory
and
CPU
needed
for
any
virtual
machine.
Mixing
virtual
machines
of
greatly
different
CPU
and
memory
requirements
will
cause
the
slot
size
calculation
to
default
to
the
largest
possible
virtual
machine,
limiting
consolidation.
See
the
vSphere
5.0
Availability
Guide
for
more
information
on
slot-size
calculation
and
overriding
slot-size
calculation
in
cases
where
it
is
necessary
to
configure
different-
sized
virtual
machines
in
the
same
cluster.
If
the
Failover
Host
policy
is
used,
decide
how
many
host
failures
to
support,
and
then
specify
this
number
of
hosts
as
failover
hosts.
Ensure
that
all
cluster
hosts
are
sized
equally.
If
unequally
sized
hosts
are
used
with
the
Host
Failures
Cluster
Tolerates
policy,
vSphere
HA
will
reserve
excess
capacity
to
handle
failures
of
the
largest
N
hosts,
where
N
is
the
number
of
host
failures
specified.
With
Percentage
of
Cluster
Resources
Reserved
policy,
unequally
sized
hosts
will
require
that
the
user
increase
the
percentages
to
reserve
enough
capacity
for
the
planned
number
of
host
failures.
Finally,
with
the
Specify
a
Failover
Host
policy,
users
must
specify
failover
hosts
that
are
as
large
as
the
largest
nonfailover
hosts
in
the
cluster.
This
ensures
that
there
is
adequate
capacity
in
case
of
failures.
HA
added
a
capability
in
vSphere
4.1
to
balance
virtual
machine
loading
on
failover,
thereby
reducing
the
issue
of
resource
imbalance
in
a
cluster
after
a
failover.
With
this
capability,
there
is
less
likelihood
for
vMotion
instances
after
a
failover.
Also
in
vSphere
4.1,
HA
invokes
vSphere
DRS
to
create
more
contiguous
capacity
on
hosts.
This
increases
the
chance
for
larger
virtual
machines
to
be
restarted
if
some
virtual
machines
cannot
be
restarted
because
of
resource
fragmentation.
This
does
not
guarantee
enough
contiguous
resources
to
restart
all
the
failed
virtual
machines.
It
simply
means
that
vSphere
will
make
the
best
effort
to
restart
all
virtual
machines
with
the
host
resources
remaining
after
a
failure.
The
admission
control
policy
is
evaluated
against
the
current
state
of
the
cluster,
not
the
normal
state
of
the
cluster.
The
normal
state
means
that
all
hosts
are
connected
and
healthy.
Admission
control
does
not
take
into
account
resources
of
hosts
that
are
disconnected
or
in
maintenance
mode.
Only
healthy
and
connected
hosts
including
standby
hosts,
if
vSphere
DPM
is
enabled
can
provide
resources
that
are
reserved
for
tolerating
host
failures.
Affinity
Rules
A
virtual
machinehost
affinity
rule
specifies
that
the
members
of
a
selected
virtual
machine
DRS
group
should
or
must
run
on
the
members
of
a
specific
host
DRS
group.
Unlike
a
virtual
machinevirtual
machine
affinity
rule,
which
specifies
affinity
(or
anti-affinity)
between
individual
virtual
machines,
a
virtual
machinehost
affinity
rule
specifies
an
affinity
relationship
between
a
group
of
virtual
machines
and
a
group
of
hosts.
There
are
required
rules
(designated
by
the
term
must)
and
preferred
rules
(designated
by
the
term
should).
See
the
vSphere
Resource
Management
Guide
for
more
details
on
setting
up
virtual
machinehost
affinity
rules.
When
restarting
virtual
machines
after
a
failure,
HA
ignores
the
preferential
virtual
machinehost
rules
but
follows
the
required
rules.
If
HA
violates
any
preferential
rule,
DRS
will
attempt
to
correct
it
after
the
failover
is
complete
by
migrating
virtual
machines.
Additionally,
vSphere
DRS
might
be
required
to
migrate
other
virtual
machines
to
make
space
on
the
preferred
hosts.
If
required
rules
are
specified,
vSphere
HA
will
restart
virtual
machines
on
an
ESXi
host
in
the
same
host
DRS
group
only.
If
no
available
hosts
are
in
the
host
DRS
group
or
the
hosts
are
resource
constrained,
the
restart
will
fail.
Any
required
rules
defined
when
DRS
is
enabled
are
enforced
even
if
DRS
is
subsequently
disabled.
So
to
remove
the
effect
of
such
a
rule,
it
must
be
explicitly
disabled.
Limit
the
use
of
required
virtual
machinehost
affinity
rules
to
situations
where
they
are
necessary,
because
such
rules
can
restrict
HA
target
host
selection
when
restarting
a
virtual
machine
after
a
failure.
Log
Files
In
the
latest
version
of
HA,
the
changes
in
the
architecture
enabled
changes
in
how
logging
is
performed.
Previous
versions
of
HA
stored
the
operational
logging
information
across
several
distinct
log
files.
In
vSphere
HA
5.0,
this
information
is
consolidated
into
a
single
operational
log
file.
This
log
file
utilizes
a
circular
log
rotation
mechanism,
resulting
in
multiple
files,
with
each
file
containing
a
part
of
the
overall
retained
log
history.
To
improve
the
ability
of
the
VMware
support
staff
to
diagnose
problems,
VMware
recommends
configuring
logging
to
retain
approximately
one
week
of
history.
The
following
table
provides
recommended
log
capacities
for
several
sample
cluster
configurations.
The
preceding
recommendations
are
sufficient
for
most
environments.
If
the
user
notices
that
the
HA
log
history
does
not
span
one
week
after
implementing
the
recommended
settings
in
the
preceding
table,
they
should
consider
increasing
the
capacity
beyond
what
is
noted.
Increasing
the
log
capacity
for
HA
involves
specifying
the
number
of
log
rotations
that
are
preserved
and
the
size
of
each
log
file
in
the
rotation.
For
log
capacities
up
to
30MB,
use
a
1MB
file
size;
for
log
capacities
greater
than
30MB,
use
a
5MB
file
size.
1.
The
default
log
settings
are
sufficient
for
ESXi
hosts
that
are
logging
to
persistent
storage.
2.
The
default
log
setting
is
sufficient
for
ESXi
5.0
hosts
if
the
following
conditions
are
met:
(i)
they
are
not
managed
by
Auto
Deploy
and
(ii)
they
are
configured
with
the
default
log
location
in
a
scratch
directory
on
a
vSphere
VMFS
partition.
NOTE:
The
name
of
the
vSphere
HA
logger
is
Fault
Domain
Manager
(FDM).
General
Logging
Recommendations
for
All
ESX
Versions
Ensure
that
the
location
where
the
log
files
will
be
stored
has
sufficient
space
available.
For
ESXi
hosts,
ensure
that
logging
is
being
done
to
a
persistent
location.
When
changing
the
directory
path,
ensure
that
it
is
present
on
all
hosts
in
the
cluster
and
is
mapped
to
a
different
directory
for
each
host.
Configure
each
HA
cluster
separately.
In
vSphere
5.0,
if
a
cluster
contains
5.0
and
earlier
host
versions,
setting
the
das.config.log.maxFileNum
advanced
option
will
cause
the
5.0
hosts
to
maintain
two
copies
of
the
log
files,
one
maintained
by
the
5.0
logging
mechanism
discussed
in
the
ESXi
5.0
documentation
(see
the
following)
and
one
maintained
by
the
pre-5.0
logging
mechanism,
which
is
configured
using
the
advanced
options
previously
discussed.
In
vSphere
5.0U1,
this
issue
has
been
resolved.
In
this
version,
to
maintain
two
sets
of
log
files,
the
new
HA
advanced
configuration
option
das.config.log.outputToFiles
must
be
set
to
true,
and
das.config.log.maxFileNum
must
be
set
to
a
value
greater
than
two.
After
changing
the
advanced
options,
reconfigure
HA
on
each
host
in
the
cluster.
The
log
values
users
configure
in
this
manner
will
be
preserved
across
vCenter
Server
updates.
However,
applying
an
update
that
includes
a
new
version
of
the
HA
agent
will
require
HA
to
be
reconfigured
on
each
host
for
the
configured
values
to
be
reapplied.
5.
Business
Continuity
and
Minimizing
Downtime
vSphere
makes
it
possible
for
organizations
to
dramatically
reduce
planned
downtime.
Because
workloads
in
a
vSphere
environment
can
be
dynamically
moved
to
different
physical
servers
without
downtime
or
service
interruption,
server
maintenance
can
be
performed
without
requiring
application
and
service
downtime.
With
vSphere,
organizations
can:
Eliminate
downtime
for
common
maintenance
operations.
Eliminate
planned
maintenance
windows.
Perform
maintenance
at
any
time
without
disrupting
users
and
services.
The
vSphere
vMotion and
Storage
vMotion
functionality
in
vSphere
makes
it
possible
for
organizations
to
reduce
planned
downtime
because
workloads
in
a
VMware
environment
can
be
dynamically
moved
to
different
physical
servers
or
to
different
underlying
storage
without
service
interruption
Preventing
Unplanned
Downtime
Key
availability
capabilities
are
built
into
vSphere:
vSphere
HA
Provides
Rapid
Recovery
from
Outages
Unlike
other
clustering
solutions,
vSphere
HA
provides
the
infrastructure
to
protect
all
workloads
with
the
infrastructure:
You
do
not
need
to
install
special
software
within
the
application
or
virtual
machine.
All
workloads
are
protected
by
vSphere
HA.
After
vSphere
HA
is
configured,
no
actions
are
required
to
protect
new
virtual
machines.
They
are
automatically
protected.
You
can
combine
vSphere
HA
with
vSphere
Distributed
Resource
Scheduler
(DRS)
to
protect
against
failures
and
to
provide
load
balancing
across
the
hosts
within
a
cluster.
Minimal
setup
After
a
vSphere
HA
cluster
is
set
up,
all
virtual
machines
in
the
cluster
get
failover
support
without
additional
configuration.
Reduced
hardware
cost
and
setup
The
virtual
machine
acts
as
a
portable
container
for
the
applications
and
it
can
be
moved
among
hosts.
Administrators
avoid
duplicate
configurations
on
multiple
machines.
When
you
use
vSphere
HA,
you
must
have
sufficient
resources
to
fail
over
the
number
of
hosts
you
want
to
protect
with
vSphere
HA.
However,
the
vCenter
Server
system
automatically
manages
resources
and
configures
clusters.
Increased
application
availability
Any
application
running
inside
a
virtual
machine
has
access
to
increased
availability.
Because
the
virtual
machine
can
recover
from
hardware
failure,
all
applications
that
start
at
boot
have
increased
availability
without
increased
computing
needs,
even
if
the
application
is
not
itself
a
clustered
application.
By
monitoring
and
responding
to
VMware
Tools
heartbeats
and
restarting
nonresponsive
virtual
machines,
it
protects
against
guest
operating
system
crashes.
DRS
and
vMotion
integration
If
a
host
fails
and
virtual
machines
are
restarted
on
other
hosts,
DRS
can
provide
migration
recommendations
or
migrate
virtual
machines
for
balanced
resource
allocation.
If
one
or
both
of
the
source
and
destination
hosts
of
a
migration
fail,
vSphere
HA
can
help
recover
from
that
failure.
vSphere
Fault
Tolerance
Provides
Continuous
Availability
vSphere
HA
provides
a
base
level
of
protection
for
your
virtual
machines
by
restarting
virtual
machines
in
the
event
of
a
host
failure.
vSphere
Fault
Tolerance
provides
a
higher
level
of
availability,
allowing
users
to
protect
any
virtual
machine
from
a
host
failure
with
no
loss
of
data,
transactions,
or
connections.
How
vSphere
HA
Works
When
you
create
a
vSphere
HA
cluster,
a
single
host
is
automatically
elected
as
the
master
host.
The
master
host
communicates
with
vCenter
Server
and
monitors
the
state
of
all
protected
virtual
machines
and
of
the
slave
hosts.
The
master
host
must
distinguish
between
a
failed
host
and
one
that
is
in
a
network
partition
or
that
has
become
network
isolated.
The
master
host
uses
datastore
heartbeating
to
determine
the
type
of
failure.
Master
and
Slave
Hosts
When
you
add
a
host
to
a
vSphere
HA
cluster,
an
agent
is
uploaded
to
the
host
and
configured
to
communicate
with
other
agents
in
the
cluster.
Each
host
in
the
cluster
functions
as
a
master
host
or
a
slave
host.
When
vSphere
HA
is
enabled
for
a
cluster,
all
active
hosts
(those
not
in
standby
or
maintenance
mode,
or
not
disconnected)
participate
in
an
election
to
choose
the
cluster's
master
host
The
host
that
mounts
the
greatest
number
of
datastores
has
an
advantage
in
the
election.
Only
one
master
host
exists
per
cluster
and
all
other
hosts
are
slave
hosts.
If
the
master
host
fails,
is
shut
down,
or
is
removed
from
the
cluster
a
new
election
is
held.
The
master
host
in
a
cluster
has
a
number
of
responsibilities:
Monitoring
the
state
of
slave
hosts.
If
a
slave
host
fails
or
becomes
unreachable,
the
master
host
identifies
which
virtual
machines
need
to
be
restarted.
Monitoring
the
power
state
of
all
protected
virtual
machines.
If
one
virtual
machine
fails,
the
master
host
ensures
that
it
is
restarted.
Using
a
local
placement
engine,
the
master
host
also
determines
where
the
restart
should
be
done.
Managing
the
lists
of
cluster
hosts
and
protected
virtual
machines.
Acting
as
vCenter
Server
management
interface
to
the
cluster
and
reporting
the
cluster
health
state.
The
slave
hosts
primarily
contribute
to
the
cluster
by
running
virtual
machines
locally,
monitoring
their
runtime
states,
and
reporting
state
updates
to
the
master
host.
A
master
host
can
also
run
and
monitor
virtual
machines.
Both
slave
hosts
and
master
hosts
implement
the
VM
and
Application
Monitoring
features.
One
of
the
functions
performed
by
the
master
host
is
virtual
machine
protection.
When
a
virtual
machine
is
protected,
vSphere
HA
guarantees
that
it
attempts
to
power
it
back
on
after
a
failure.
A
master
host
commits
to
protecting
a
virtual
machine
when
it
observes
that
the
power
state
of
the
virtual
machine
changes
from
powered
off
to
powered
on
in
response
to
a
user
action.
If
a
failover
occurs,
the
master
host
must
restart
the
virtual
machines
that
are
protected
and
for
which
it
is
responsible.
This
responsibility
is
assigned
to
the
master
host
that
has
exclusively
locked
a
system-
defined
file
on
the
datastore
that
contains
a
virtual
machine's
configuration
file.
NOTE
If
you
disconnect
a
host
from
a
cluster,
all
of
the
virtual
machines
registered
to
that
host
are
unprotected
by
vSphere
HA.
Host
Failure
Types
and
Detection
In
a
vSphere
HA
cluster,
three
types
of
host
failure
are
detected:
The
master
host
monitors
the
liveness
of
the
slave
hosts
in
the
cluster.
This
communication
is
done
through
the
exchange
of
network
heartbeats
every
second.
When
the
master
host
stops
receiving
these
heartbeats
from
a
slave
host,
it
checks
for
host
liveness
before
declaring
the
host
to
have
failed.
The
liveness
check
that
the
master
host
performs
is
to
determine
whether
the
slave
host
is
exchanging
heartbeats
with
one
of
the
datastores.
Also,
the
master
host
checks
whether
the
host
responds
to
ICMP
pings
sent
to
its
management
IP
addresses.
If
a
master
host
is
unable
to
communicate
directly
with
the
agent
on
a
slave
host,
the
slave
host
does
not
respond
to
ICMP
pings,
and
the
agent
is
not
issuing
heartbeats
it
is
considered
to
have
failed.
The
host's
virtual
machines
are
restarted
on
alternate
hosts.
If
such
a
slave
host
is
exchanging
heartbeats
with
a
datastore,
the
master
host
assumes
that
it
is
in
a
network
partition
or
network
isolated
and
so
continues
to
monitor
the
host
and
its
virtual
machines
Host
network
isolation
occurs
when
a
host
is
still
running,
but
it
can
no
longer
observe
traffic
from
vSphere
HA
agents
on
the
management
network.
If
a
host
stops
observing
this
traffic,
it
attempts
to
ping
the
cluster
isolation
addresses.
If
this
also
fails,
the
host
declares
itself
as
isolated
from
the
network.
The
master
host
monitors
the
virtual
machines
that
are
running
on
an
isolated
host
and
if
it
observes
that
they
power
off,
and
the
master
host
is
responsible
for
the
virtual
machines,
it
restarts
them.
NOTE
If
you
ensure
that
the
network
infrastructure
is
sufficiently
redundant
and
that
at
least
one
network
path
is
available
at
all
times,
host
network
isolation
should
be
a
rare
occurrence.
Network
Partitions
Datastore
Heartbeating
When
the
master
host
in
a
vSphere
HA
cluster
can
not
communicate
with
a
slave
host
over
the
management
network,
the
master
host
uses
datastore
heartbeating
to
determine
whether
the
slave
host
has
failed,
is
in
a
network
partition,
or
is
network
isolated.
If
the
slave
host
has
stopped
datastore
heartbeating,
it
is
considered
to
have
failed
and
its
virtual
machines
are
restarted
elsewhere.
You
can
use
the
advanced
attribute
das.heartbeatdsperhost
to
change
the
number
of
heartbeat
datastores
selected
by
vCenter
Server
for
each
host.
The
default
is
two
and
the
maximum
valid
value
is
five.
vSphere
HA
creates
a
directory
at
the
root
of
each
datastore
that
is
used
for
both
datastore
heartbeating
and
for
persisting
the
set
of
protected
virtual
machines.
The
name
of
the
directory
is
.vSphere-HA.
Do
not
delete
or
modify
the
files
stored
in
this
directory,
because
this
can
have
an
impact
on
operations.
vSphere
HA
Security
vSphere
HA
uses
TCP
and
UDP
port
8182
for
agent-to-agent
communication.
The
firewall
ports
open
and
close
automatically
to
ensure
they
are
open
only
when
needed.
vSphere
HA
stores
configuration
information
on
the
local
storage
or
on
ramdisk
if
there
is
no
local
datastore.
These
files
are
protected
using
file
system
permissions
and
they
are
accessible
only
to
the
root
user.
For
ESXi
5.x
hosts,
vSphere
HA
writes
to
syslog
only
by
default,
so
logs
are
placed
where
syslog
is
configured
to
put
them.
The
log
file
names
for
vSphere
HA
are
prepended
with
fdm,
fault
domain
manager,
which
is
a
service
of
vSphere
HA
All
communication
between
vCenter
Server
and
the
vSphere
HA
agent
is
done
over
SSL.
vSphere
HA
requires
that
each
host
have
a
verified
SSL
certificate.
Each
host
generates
a
self-signed
certificate
when
it
is
booted
for
the
first
time.
This
certificate
can
then
be
regenerated
or
replaced
with
one
issued
by
an
authority.
If
the
certificate
is
replaced,
vSphere
HA
needs
to
be
reconfigured
on
the
host.
If
a
host
becomes
disconnected
from
vCenter
Server
after
its
certificate
is
updated
and
the
ESXi
or
ESX
Host
agent
is
restarted,
then
vSphere
HA
is
automatically
reconfigured
when
the
host
is
reconnected
to
vCenter
Server.
If
the
disconnection
does
not
occur
because
vCenter
Server
host
SSL
certificate
verification
is
disabled
at
the
time,
verify
the
new
certificate
and
reconfigure
vSphere
HA
on
the
host.
Using
vSphere
HA
and
DRS
Together
Using
vSphere
HA
with
Distributed
Resource
Scheduler
(DRS)
combines
automatic
failover
with
load
balancing.
When
vSphere
HA
performs
failover
and
restarts
virtual
machines
on
different
hosts,
its
first
priority
is
the
immediate
availability
of
all
virtual
machines.
After
the
virtual
machines
have
been
restarted,
those
hosts
on
which
they
were
powered
on
might
be
heavily
loaded,
while
other
hosts
are
comparatively
lightly
loaded.
In
a
cluster
using
DRS
and
vSphere
HA
with
admission
control
turned
on,
virtual
machines
might
not
be
evacuated
from
hosts
entering
maintenance
mode.
This
behavior
occurs
because
of
the
resources
reserved
for
restarting
virtual
machines
in
the
event
of
a
failure.
You
must
manually
migrate
the
virtual
machines
off
of
the
hosts
using
vMotion.
In
some
scenarios,
vSphere
HA
might
not
be
able
to
fail
over
virtual
machines
because
of
resource
constraints.
This
can
occur
for
several
reasons.
HA admission control is disabled and Distributed Power Management(DPM)is enabled. This can result in DPM consolidating
virtual
machines
onto
fewer
hosts
and
placing
the
empty
hosts
in
standby
mode
leaving
insufficient
powered-on
capacity
to
perform
a
failover.
VM-Host
affinity
(required)
rules
might
limit
the
hosts
on
which
certain
virtual
machines
can
be
placed.
There
might
be
sufficient
aggregate
resources
but
these
can
be
fragmented
across
multiple
hosts
so
that
they
can
not
be
used
by
virtual
machines
for
failover.
In
such
cases,
vSphere
HA
can
use
DRS
to
try
to
adjust
the
cluster
(for
example,
by
bringing
hosts
out
of
standby
mode
or
migrating
virtual
machines
to
defragment
the
cluster
resources)
so
that
HA
can
perform
the
failovers.
If
DPM
is
in
manual
mode,
you
might
need
to
confirm
host
power-on
recommendations.
Similarly,
if
DRS
is
in
manual
mode,
you
might
need
to
confirm
migration
recommendations.
If
you
are
using
VM-Host
affinity
rules
that
are
required,
be
aware
that
these
rules
cannot
be
violated.
vSphere
HA
does
not
perform
a
failover
if
doing
so
would
violate
such
a
rule.
vSphere
HA
Admission
Control
vCenter
Server
uses
admission
control
to
ensure
that
sufficient
resources
are
available
in
a
cluster
to
provide
failover
protection
and
to
ensure
that
virtual
machine
resource
reservations
are
respected.
Three
types
of
admission
control
are
available.
Host
Ensures
that
a
host
has
sufficient
resources
to
satisfy
the
reservations
of
all
virtual
machines
running
on
it.
Resource
Pool
Ensures
that
a
resource
pool
has
sufficient
resources
to
satisfy
the
reservations,
shares,
and
limits
of
all
virtual
machines
associated
with
it.
vSphere
HA
Ensures
that
sufficient
resources
in
the
cluster
are
reserved
for
virtual
machine
recovery
in
the
event
of
host
failure.
Admission
control
imposes
constraints
on
resource
usage
and
any
action
that
would
violate
these
constraints
is
not
permitted.
Examples
of
actions
that
could
be
disallowed
include
the
following:
Powering
on
a
virtual
machine.
Migrating
a
virtual
machine
onto
a
host
or
into
a
cluster
or
resource
pool.
Increasing
the
CPU
or
memory
reservation
of
a
virtual
machine.
Of
the
three
types
of
admission
control,
only
vSphere
HA
admission
control
can
be
disabled.
However,
without
it
there
is
no
assurance
that
the
expected
number
of
virtual
machines
can
be
restarted
after
a
failure.
VMware
recommends
that
you
do
not
disable
admission
control,
but
you
might
need
to
do
so
temporarily,
for
the
following
reasons:
If
you
need
to
violate
the
failover
constraints
when
there
are
not
enough
resources
to
support
them--for
example,
if
you
are
placing
hosts
in
standby
mode
to
test
them
for
use
with
Distributed
Power
Management
(DPM).
If
an
automated
process
needs
to
take
actions
that
might
temporarily
violate
the
failover
constraints
(for
example,
as
part
of
an
upgrade
directed
by
vSphere
Update
Manager).
If
you
need
to
perform
testing
or
maintenance
operations.
NOTE
When
vSphere
HA
admission
control
is
disabled,
vSphere
HA
ensures
that
there
are
at
least
two
powered-on
hosts
in
the
cluster
even
if
DPM
is
enabled
and
can
consolidate
all
virtual
machines
onto
a
single
host.
This
is
to
ensure
that
failover
is
possible.
Host
Failures
Cluster
Tolerates
Admission
Control
Policy
You
can
configure
vSphere
HA
to
tolerate
a
specified
number
of
host
failures.
With
the
Host
Failures
Cluster
Tolerates
admission
control
policy,
vSphere
HA
ensures
that
a
specified
number
of
hosts
can
fail
and
sufficient
resources
remain
in
the
cluster
to
fail
over
all
the
virtual
machines
from
those
hosts.
With
the
Host
Failures
Cluster
Tolerates
policy,
vSphere
HA
performs
admission
control
in
the
following
way:
Calculates
the
slot
size.
A
slot
is
a
logical
representation
of
memory
and
CPU
resources.
By
default,
it
is
sized
to
satisfy
the
requirements
for
any
powered-on
virtual
machine
in
the
cluster.
Determines
how
many
slots
each
host
in
the
cluster
can
hold.
Determines
the
Current
Failover
Capacity
of
the
cluster.
This
is
the
number
of
hosts
that
can
fail
and
still
leave
enough
slots
to
satisfy
all
of
the
powered-on
virtual
machines.
Determines
whether
the
Current
Failover
Capacity
is
less
than
the
Configured
Failover
Capacity
(provided
by
the
user).
If
it
is,
admission
control
disallows
the
operation.
The
cluster
is
comprised
of
three
hosts,
each
with
a
different
amount
of
available
CPU
and
memory
resources.
The
first
host
(H1)
has
9GHz
of
available
CPU
resources
and
9GB
of
available
memory,
while
Host
2
(H2)
has
9GHz
and
6GB
and
Host
3
(H3)
has
6GHz
and
6GB.
There
are
five
powered-on
virtual
machines
in
the
cluster
with
differing
CPU
and
memory
requirements.
VM1
needs
2GHz
of
CPU
resources
and
1GB
of
memory,
while
VM2
needs
2GHz
and
1GB,
VM3
needs
1GHz
and
2GB,
VM4
needs
1GHz
and
1GB,
and
VM5
needs
1GHz
and
1GB.
1.
Slot
size
is
calculated
by
comparing
both
the
CPU
and
memory
requirements
of
the
virtual
machines
and
selecting
the
largest.
The
largest
CPU
requirement
(shared
by
VM1
and
VM2)
is
2GHz,
while
the
largest
memory
requirement
(for
VM3)
is
2GB.
Based
on
this,
the
slot
size
is
2GHz
CPU
and
2GB
memory.
2.
Maximum
number
of
slots
that
each
host
can
support
is
determined.
H1
can
support
four
slots.
H2
can
support
three
slots
(which
is
the
smaller
of
9GHz/2GHz
and
6GB/2GB)
and
H3
can
also
support
three
slots.
3.
Current
Failover
Capacity
is
computed.
The
largest
host
is
H1
and
if
it
fails,
six
slots
remain
in
the
cluster,
which
is
sufficient
for
all
five
of
the
powered-on
virtual
machines.
If
both
H1
and
H2
fail,
only
three
slots
remain,
which
is
insufficient.
Therefore,
the
Current
Failover
Capacity
is
one.
The
cluster
has
one
available
slot
(the
six
slots
on
H2
and
H3
minus
the
five
used
slots).
Percentage
of
Cluster
Resources
Reserved
Admission
Control
Policy
You
can
configure
vSphere
HA
to
perform
admission
control
by
reserving
a
specific
percentage
of
cluster
CPU
and
memory
resources
for
recovery
from
host
failures.
With
the
Percentage
of
Cluster
Resources
Reserved
admission
control
policy,
vSphere
HA
ensures
that
a
specified
percentage
of
aggregate
CPU
and
memory
resources
are
reserved
for
failover.
With
the
Cluster
Resources
Reserved
policy,
vSphere
HA
enforces
admission
control
as
follows:
Calculates
the
total
resource
requirements
for
all
powered-on
virtual
machines
in
the
cluster.
Calculates
the
total
host
resources
available
for
virtual
machines.
Calculates
the
Current
CPU
Failover
Capacity
and
Current
Memory
Failover
Capacity
for
the
cluster.
Determines
if
either
the
Current
CPU
Failover
Capacity
or
Current
Memory
Failover
Capacity
is
less
than
the
corresponding
Configured
Failover
Capacity
(provided
by
the
user).
If
so,
admission
control
disallows
the
operation.
vSphere
HA
uses
the
actual
reservations
of
the
virtual
machines.
If
a
virtual
machine
does
not
have
reservations,
meaning
that
the
reservation
is
0,
a
default
of
0MB
memory
and
32MHz
CPU
is
applied.
NOTE
The
Percentage
of
Cluster
Resources
Reserved
admission
control
policy
also
checks
that
there
are
at
least
two
vSphere
HA-
enabled
hosts
in
the
cluster
(excluding
hosts
that
are
entering
maintenance
mode).
If
there
is
only
one
vSphere
HA-enabled
host,
an
operation
is
not
allowed,
even
if
there
is
a
sufficient
percentage
of
resources
available.
The
reason
for
this
extra
check
is
that
vSphere
HA
cannot
perform
failover
if
there
is
only
a
single
host
in
the
cluster.
Computing
the
Current
Failover
Capacity
The
total
resource
requirements
for
the
powered-on
virtual
machines
is
comprised
of
two
components,
CPU
and
memory.
vSphere
HA
calculates
these
values.
The
CPU
component
by
summing
the
CPU
reservations
of
the
powered-on
virtual
machines.
If
you
have
not
specified
a
CPU
reservation
for
a
virtual
machine,
it
is
assigned
a
default
value
of
32MHz
(this
value
can
be
changed
using
the
das.vmcpuminmhz
advanced
attribute.)
The
memory
component
by
summing
the
memory
reservation
(plus
memory
overhead)
of
each
powered-
on
virtual
machine.
The
total
host
resources
available
for
virtual
machines
is
calculated
by
adding
the
hosts'
CPU
and
memory
resources.
These
amounts
are
those
contained
in
the
host's
root
resource
pool,
not
the
total
physical
resources
of
the
host.
Resources
being
used
for
virtualization
purposes
are
not
included.
Only
hosts
that
are
connected,
not
in
maintenance
mode,
and
have
no
vSphere
HA
errors
are
considered.
The
Current
CPU
Failover
Capacity
is
computed
by
subtracting
the
total
CPU
resource
requirements
from
the
total
host
CPU
resources
and
dividing
the
result
by
the
total
host
CPU
resources.
The
Current
Memory
Failover
Capacity
is
calculated
similarly.
Admission
Control
Using
Percentage
of
Cluster
Resources
Reserved
Policy
The
way
that
Current
Failover
Capacity
is
calculated
and
used
with
this
admission
control
policy
is
shown
with
an
example.
Make
the
following
assumptions
about
a
cluster:
The
cluster
is
comprised
of
three
hosts,
each
with
a
different
amount
of
available
CPU
and
memory
resources.
The
first
host
(H1)
has
9GHz
of
available
CPU
resources
and
9GB
of
available
memory,
while
Host
2
(H2)
has
9GHz
and
6GB
and
Host
3
(H3)
has
6GHz
and
6GB.
There
are
five
powered-on
virtual
machines
in
the
cluster
with
differing
CPU
and
memory
requirements.
VM1
needs
2GHz
of
CPU
resources
and
1GB
of
memory,
while
VM2
needs
2GHz
and
1GB,
VM3
needs
1GHz
and
2GB,
VM4
needs
1GHz
and
1GB,
and
VM5
needs
1GHz
and
1GB
The
Configured
Failover
Capacity
is
set
to
25%.
The
total
resource
requirements
for
the
powered-on
virtual
machines
is
7GHz
and
6GB.
The
total
host
resources
available
for
virtual
machines
is
24GHz
and
21GB.
Based
on
this,
the
Current
CPU
Failover
Capacity
is
70%
((24GHz
-
7GHz)/24GHz).
Similarly,
the
Current
Memory
Failover
Capacity
is
71%
((21GB-6GB)/21GB).
Because
the
cluster's
Configured
Failover
Capacity
is
set
to
25%,
45%
(70-25)
of
the
cluster's
total
CPU
resources
and
46%
(71-25)
of
the
cluster's
memory
resources
are
still
available
to
power
on
additional
virtual
machines.
Specify
Failover
Hosts
Admission
Control
Policy
You
can
configure
vSphere
HA
to
designate
specific
hosts
as
the
failover
hosts.
With
the
Specify
Failover
Hosts
admission
control
policy,
when
a
host
fails,
vSphere
HA
attempts
to
restart
its
virtual
machines
on
one
of
the
specified
failover
hosts.
If
this
is
not
possible,
for
example
the
failover
hosts
have
failed
or
have
insufficient
resources,
then
vSphere
HA
attempts
to
restart
those
virtual
machines
on
other
hosts
in
the
cluster.
To
ensure
that
spare
capacity
is
available
on
a
failover
host,
you
are
prevented
from
powering
on
virtual
machines
or
using
vMotion
to
migrate
virtual
machines
to
a
failover
host.
Also,
DRS
does
not
use
a
failover
host
for
load
balancing.
NOTE
If
you
use
the
Specify
Failover
Hosts
admission
control
policy
and
designate
multiple
failover
hosts,
DRS
does
not
load
balance
failover
hosts
and
VM-VM
affinity
rules
are
not
supported.
The
Current
Failover
Hosts
appear
in
the
vSphere
HA
section
of
the
cluster's
Summary
tab
in
the
vSphere
Client.
The
status
icon
next
to
each
host
can
be
green,
yellow,
or
red.
Green.
The
host
is
connected,
not
in
maintenance
mode,
and
has
no
vSphere
HA
errors.
No
powered-on
virtual
machines
reside
on
the
host.
Yellow.
The
host
is
connected,
not
in
maintenance
mode,
and
has
no
vSphere
HA
errors.
However,
powered-on
virtual
machines
reside
on
the
host.
Red.
The
host
is
disconnected,
in
maintenance
mode,
or
has
vSphere
HA
errors.
Choosing
an
Admission
Control
Policy
You
should
choose
a
vSphere
HA
admission
control
policy
based
on
your
availability
needs
and
the
characteristics
of
your
cluster.
When
choosing
an
admission
control
policy,
you
should
consider
a
number
of
factors.
Avoiding
Resource
Fragmentation
Resource
fragmentation
occurs
when
there
are
enough
resources
in
aggregate
for
a
virtual
machine
to
be
failed
over.
However,
those
resources
are
located
on
multiple
hosts
and
are
unusable
because
a
virtual
machine
can
run
on
one
ESXi
host
at
a
time
The
Host
Failures
Cluster
Tolerates
policy
avoids
resource
fragmentation
by
defining
a
slot
as
the
maximum
virtual
machine
reservation.
The
Percentage
of
Cluster
Resources
policy
does
not
address
the
problem
of
resource
fragmentation.
With
the
Specify
Failover
Hosts
policy,
resources
are
not
fragmented
because
hosts
are
reserved
for
failover.
NOTE
ESX/ESXi
3.5
hosts
are
supported
by
vSphere
HA
but
must
include
a
patch
to
address
an
issue
involving
file
locks.
For
ESX
3.5
hosts,
you
must
apply
the
patch
ESX350-201012401-SG,
while
for
ESXi
3.5
you
must
apply
the
patch
ESXe350-201012401-I-BG.
Prerequisite
patches
need
to
be
applied
before
applying
these
patches.
Enabling
or
Disabling
Admission
Control
You
can
enable
or
disable
admission
control
for
the
vSphere
HA
cluster.
Enable:
Disallow
VM
power
on
operations
that
violate
availability
constraints
Enables
admission
control
and
enforces
availability
constraints
and
preserves
failover
capacity.
Any
operation
on
a
virtual
machine
that
decreases
the
unreserved
resources
in
the
cluster
and
violates
availability
constraints
is
not
permitted.
Disable:
Allow
VM
power
on
operations
that
violate
availability
constraints
Disables
admission
control.
Virtual
machines
can,
for
example,
be
powered
on
even
if
that
causes
insufficient
failover
capacity.
When
you
do
this,
no
warnings
are
presented,
and
the
cluster
does
not
turn
red.
If
a
cluster
has
insufficient
failover
capacity,
vSphere
HA
can
still
perform
failovers
and
it
uses
the
VM
Restart
Priority
setting
to
determine
which
virtual
machines
to
power
on
first.
vSphere
HA
provides
three
policies
for
enforcing
admission
control,
if
it
is
enabled.
Medium.
Application
servers
that
consume
data
in
the
database
and
provide
results
on
web
pages.
Low.
Web
servers
that
receive
user
requests,
pass
queries
to
application
servers,
and
return
results
to
users.
You
can
customize
this
property
for
individual
virtual
machines.
To
use
the
Shut
down
VM
setting,
you
must
install
VMware
Tools
in
the
guest
operating
system
of
the
virtual
machine.
Virtual
machines
that
are
in
the
process
of
shutting
down
will
take
longer
to
fail
over
while
the
shutdown
completes.
Virtual
Machines
that
have
not
shut
down
in
300
seconds,
or
the
time
specified
in
the
advanced
attribute
das.isolationshutdowntimeout
seconds,
are
powered
off.
NOTE
After
you
create
a
vSphere
HA
cluster,
you
can
override
the
default
cluster
settings
for
Restart
Priority
and
Isolation
Response
for
specific
virtual
machines.
Such
overrides
are
useful
for
virtual
machines
that
are
used
for
special
tasks.
For
example,
virtual
machines
that
provide
infrastructure
services
like
DNS
or
DHCP
might
need
to
be
powered
on
before
other
virtual
machines
in
the
cluster.
If
a
host
has
its
isolation
response
disabled
(that
is,
it
leaves
virtual
machines
powered
on
when
isolated)
and
the
host
loses
access
to
both
the
management
and
storage
networks,
a
"split
brain"
situation
can
arise.
In
this
case,
the
isolated
host
loses
the
disk
locks
and
the
virtual
machines
are
failed
over
to
another
host
even
though
the
original
instances
of
the
virtual
machines
remain
running
on
the
isolated
host.
When
the
host
comes
out
of
isolation,
there
will
be
two
copies
of
the
virtual
machines,
although
the
copy
on
the
originally
isolated
host
does
not
have
access
to
the
vmdk
files
and
data
corruption
is
prevented.
In
the
vSphere
Client,
the
virtual
machines
appear
to
be
flipping
back
and
forth
between
the
two
hosts.
To
recover
from
this
situation,
ESXi
generates
a
question
on
the
virtual
machine
that
has
lost
the
disk
locks
for
when
the
host
comes
out
of
isolation
and
realizes
that
it
cannot
reacquire
the
disk
locks.
vSphere
HA
automatically
answers
this
question
and
this
allows
the
virtual
machine
instance
that
has
lost
the
disk
locks
to
power
off,
leaving
just
the
instance
that
has
the
disk
locks.
VM
and
Application
Monitoring
VM
Monitoring
restarts
individual
virtual
machines
if
their
VMware
Tools
heartbeats
are
not
received
within
a
set
time.
Similarly,
Application
Monitoring
can
restart
a
virtual
machine
if
the
heartbeats
for
an
application
it
is
running
are
not
received.
You
can
enable
these
features
and
configure
the
sensitivity
with
which
vSphere
HA
monitors
non-responsiveness.
When
you
enable
VM
Monitoring,
the
VM
Monitoring
service
(using
VMware
Tools)
evaluates
whether
each
virtual
machine
in
the
cluster
is
running
by
checking
for
regular
heartbeats
and
I/O
activity
from
the
VMware
Tools
process
running
inside
the
guest.
If
no
heartbeats
or
I/O
activity
are
received,
this
is
most
likely
because
the
guest
operating
system
has
failed
or
VMware
Tools
is
not
being
allocated
any
time
to
complete
tasks.
In
such
a
case,
the
VM
Monitoring
service
determines
that
the
virtual
machine
has
failed
and
the
virtual
machine
is
rebooted
to
restore
service.
Occasionally,
virtual
machines
or
applications
that
are
still
functioning
properly
stop
sending
heartbeats.
To
avoid
unnecessary
resets,
the
VM
Monitoring
service
also
monitors
a
virtual
machine's
I/O
activity.
If
no
heartbeats
are
received
within
the
failure
interval,
the
I/O
stats
interval
(a
cluster-level
attribute)
is
checked.
The
I/O
stats
interval
determines
if
any
disk
or
network
activity
has
occurred
for
the
virtual
machine
during
the
previous
two
minutes
(120
seconds).
If
not,
the
virtual
machine
is
reset.
This
default
value
(120
seconds)
can
be
changed
using
the
advanced
attribute
das.iostatsinterval.
To
enable
Application
Monitoring,
you
must
first
obtain
the
appropriate
SDK
(or
be
using
an
application
that
supports
VMware
Application
Monitoring)
and
use
it
to
set
up
customized
heartbeats
for
the
applications
you
want
to
monitor.
After
you
have
done
this,
Application
Monitoring
works
much
the
same
way
that
VM
Monitoring
does.
vSphere
HA
Advanced
Attributes
Attribute
Description
das.isolationaddress[...]
das.usedefaultisolationaddress
das.isolationshutdowntimeout
das.slotmeminmb
das.slotcpuinmhz
das.vmmemoryminmb
das.vmcpuminmhz
das.iostatsinterval
das.ignoreinsufficienthbdatastore
das.heartbeatdsperhost
NOTE
:
If
you
change
the
value
of
any
of
the
following
advanced
attributes,
you
must
disable
and
then
re-enable
vSphere
HA
before
your
changes
take
effect.
das.isolationaddress[...]
das.usedefaultisolationaddress
das.isolationshutdowntimeout
das.defaultfailoverhost
das.failureDetectionTime
das.failureDetectionInterval
Monitoring
Cluster
Validity
A
valid
cluster
is
one
in
which
the
admission
control
policy
has
not
been
violated.
A
cluster
enabled
for
vSphere
HA
becomes
invalid
(red)
when
the
number
of
virtual
machines
powered
on
exceeds
the
failover
requirements,
that
is,
the
current
failover
capacity
is
smaller
than
configured
failover
capacity.
If
admission
control
is
disabled,
clusters
do
not
become
invalid.
Admission
Control
Best
Practices
The
following
recommendations
are
best
practices
for
vSphere
HA
admission
control
Select
the
Percentage
of
Cluster
Resources
Reserved
admission
control
policy.
This
policy
offers
the
most
flexibility
in
terms
of
host
and
virtual
machine
sizing.
In
most
cases,
a
calculation
of
1/N,
where
N
is
the
number
of
total
nodes
in
the
cluster,
yields
adequate
sparing.
Ensure
that
you
size
all
cluster
hosts
equally.
An
unbalanced
cluster
results
in
excess
capacity
being
reserved
to
handle
failure
of
the
largest
possible
node.
Try
to
keep
virtual
machine
sizing
requirements
similar
across
all
configured
virtual
machines.
The
Host
Failures
Cluster
Tolerates
admission
control
policy
uses
slot
sizes
to
calculate
the
amount
of
capacity
needed
to
reserve
for
each
virtual
machine.
The
slot
size
is
based
on
the
largest
reserved
memory
and
CPU
needed
for
any
virtual
machine.
When
you
mix
virtual
machines
of
different
CPU
and
memory
requirements,
the
slot
size
calculation
defaults
to
the
largest
possible,
which
limits
consolidation.
When
making
changes
to
the
networks
that
your
clustered
ESXi
hosts
are
on,
VMware
recommends
that
you
suspend
the
Host
Monitoring
feature.
Changing
your
network
hardware
or
networking
settings
can
interrupt
the
heartbeats
that
vSphere
HA
uses
to
detect
host
failures,
and
this
might
result
in
unwanted
attempts
to
fail
over
virtual
machines.
When
you
change
the
networking
configuration
on
the
ESXi
hosts
themselves,
for
example,
adding
port
groups,
or
removing
vSwitches,
VMware
recommends
that
in
addition
to
suspending
Host
Monitoring,
you
place
the
hosts
on
which
the
changes
are
being
made
into
maintenance
mode.
When
the
host
comes
out
of
maintenance
mode,
it
is
reconfigured,
which
causes
the
network
information
to
be
reinspected
for
the
running
host.
If
not
put
into
maintenance
mode,
the
vSphere
HA
agent
runs
using
the
old
network
configuration
information.
Networks
Used
for
vSphere
HA
Communications
To
identify
which
network
operations
might
disrupt
the
functioning
of
vSphere
HA,
you
should
know
which
management
networks
are
being
used
for
heart
beating
and
other
vSphere
HA
communications.
On
legacy
ESX
hosts
in
the
cluster,
vSphere
HA
communications
travel
over
all
networks
that
are
designated
as
service
console
networks.
VMkernel
networks
are
not
used
by
these
hosts
for
vSphere
HA
communications.
On
ESXi
hosts
in
the
cluster,
vSphere
HA
communications,
by
default,
travel
over
VMkernel
networks,
except
those
marked
for
use
with
vMotion.
If
there
is
only
one
VMkernel
network,
vSphere
HA
shares
it
with
vMotion,
if
necessary.
With
ESXi
4.x
and
ESXi,
you
must
also
explicitly
enable
the
Management
Network
checkbox
for
vSphere
HA
to
use
this
network.
NOTE
VMware
recommends
that
you
do
not
configure
hosts
with
multiple
vmkNICs
on
the
same
subnet.
If
this
is
done,
be
aware
that
vSphere
HA
sends
packets
using
any
pNIC
that
is
associated
with
a
given
subnet
if
at
least
one
vNIC
for
that
subnet
has
been
configured
for
management
traffic.
Network
Isolation
Addresses
A
network
isolation
address
is
an
IP
address
that
is
pinged
to
determine
whether
a
host
is
isolated
from
the
network.
This
address
is
pinged
only
when
a
host
has
stopped
receiving
heartbeats
from
all
other
hosts
in
the
cluster.
If
a
host
can
ping
its
network
isolation
address,
the
host
is
not
network
isolated,
and
the
other
hosts
in
the
cluster
have
failed.
However,
if
the
host
cannot
ping
its
isolation
address,
it
is
likely
that
the
host
has
become
isolated
from
the
network
and
no
failover
action
is
taken.
By
default,
the
network
isolation
address
is
the
default
gateway
for
the
host.
Only
one
default
gateway
is
specified,
regardless
of
how
many
management
networks
have
been
defined.
You
should
use
the
das.isolationaddress[...]
advanced
attribute
to
add
isolation
addresses
for
additional
networks.
Other
Networking
Considerations
Configuring Switches. If the physical network switches that connect your servers support the Port Fast (or an equivalent)
setting,
enable
it.
This
setting
prevents
a
host
from
incorrectly
determining
that
a
network
is
isolated
during
the
execution
of
lengthy
spanning
tree
algorithms.
Port
Group
Names
and
Network
Labels.
Use
consistent
port
group
names
and
network
labels
on
VLANs
for
public
networks.
Port
group
names
are
used
to
reconfigure
access
to
the
network
by
virtual
machines.
If
you
use
inconsistent
names
between
the
original
server
and
the
failover
server,
virtual
machines
are
disconnected
from
their
networks
after
failover.
Network
labels
are
used
by
virtual
machines
to
reestablish
network
connectivity
upon
restart.
Configure
the
management
networks
so
that
the
vSphere
HA
agent
on
a
host
in
the
cluster
can
reach
the
agents
on
any
of
the
other
hosts
using
one
of
the
management
networks.
If
you
do
not
set
up
such
a
configuration,
a
network
partition
condition
can
occur
after
a
master
host
is
elected.
When
a
network
path
carrying
iSCSI
storage
traffic
is
oversubscribed,
a
bad
situation
quickly
grows
worse
and
performance
further
degrades
as
dropped
packets
must
be
resent.
There
can
be
multiple
reasons
for
an
iSCSI
path
being
overloaded,
ranging
from
oversubscription
(too
much
traffic),
to
network
switches
that
have
a
low
port
buffer.
Another
consideration
is
the
network
bandwidth.
Network
bandwidth
is
dependent
on
the
Ethernet
standards
used
(1Gb
or
10Gb).
There
are
other
mechanisms
such
as
port
aggregation
and
bonding
links
that
deliver
greater
network
bandwidth.
When
implementing
software
iSCSI
that
uses
network
interface
cards
rather
than
dedicated
iSCSI
adapters,
gigabit
Ethernet
interfaces
are
required.
These
interfaces
tend
to
consume
a
significant
amount
of
CPU
Resource.
One
way
of
overcoming
this
demand
for
CPU
resources
is
to
use
a
feature
called
a
TOE
(TCP/IP
offload
engine).
TOEs
shift
TCP
packet
processing
tasks
from
the
server
CPU
to
specialized
TCP
processors
on
the
network
adaptor
or
storage
device
iSCSI
was
considered
a
technology
that
did
not
work
well
over
most
shared
wide-area
networks.
It
has
prevalently
been
approached
as
a
local
area
network
technology.
However,
this
is
changing.
For
synchronous
replication
writes
(in
the
case
of
high
availability)
or
remote
data
writes,
iSCSI
might
not
be
a
good
fit.
Latency
introductions
bring
greater
delays
to
data
transfers
and
might
impact
application
performance.
Asynchronous
replication,
which
is
not
dependent
upon
latency
sensitivity,
makes
iSCSI
an
ideal
solution.
VMware
vCenterTM
Site
Recovery
ManagerTM
may
build
upon
iSCSI
asynchronous
storage
replication
for
simple,
reliable
site
disaster
protection.
iSCSI
Architecture
iSCSI
initiators
must
manage
multiple,
parallel
communication
links
to
multiple
targets.
Similarly,
iSCSI
targets
must
manage
multiple,
parallel
communications
links
to
multiple
initiators.
Several
identifiers
exist
in
iSCSI
to
make
this
happen,
including
iSCSI
Name,
ISID
(iSCSI
session
identifiers),
TSID
(target
session
identifier),
CID
(iSCSI
connection
identifier)
and
iSCSI
portals.
iSCSI
Names
iSCSI
nodes
have
globally
unique
names
that
do
not
change
when
Ethernet
adapters
or
IP
addresses
change.
iSCSI
supports
two
name
formats
as
well
as
aliases.
The
first
name
format
is
the
Extended
Unique
Identifier
(EUI).
An
example
of
an
EUI
name
might
be
eui.02004567A425678D.
The
second
name
format
is
the
iSCSI
Qualified
Name
(IQN).
An
example
of
an
IQN
name
might
be
iqn.1998-01.
com.vmware:tm-
pod04-esx01-6129571c.
iSCSI
Initiators
and
Targets
A
storage
network
consists
of
two
types
of
equipment:
initiators
and
targets.
Initiators,
such
as
hosts,
are
data
consumers.
Targets,
such
as
disk
arrays
or
tape
libraries,
are
data
providers.
In
the
context
of
vSphere,
iSCSI
initiators
fall
into
three
distinct
categories.
They
can
be
software,
hardware
dependent
or
hardware
independent.
Software
iSCSI
Adapter
A
software
iSCSI
adapter
is
VMware
code
built
into
the
VMkernel.
It
enables
your
host
to
connect
to
the
iSCSI
storage
device
through
standard
network
adaptors.
The
software
iSCSI
adapter
handles
iSCSI
processing
while
communicating
with
the
network
adaptor.
With
the
software
iSCSI
adapter,
you
can
use
iSCSI
technology
without
purchasing
specialized
hardware.
Dependent
Hardware
iSCSI
Adapter
This
hardware
iSCSI
adapter
depends
on
VMware
networking
and
iSCSI
configuration
and
management
interfaces
provided
by
VMware.
This
type
of
adapter
can
be
a
card
that
presents
a
standard
network
adaptor
and
iSCSI
offload
functionality
for
the
same
port.
The
iSCSI
offload
functionality
depends
on
the
hosts
network
configuration
to
obtain
the
IP
and
MAC
addresses,
as
well
as
other
parameters
used
for
iSCSI
sessions.
An
example
of
a
dependent
adapter
is
the
iSCSI
licensed
Broadcom
5709
NIC.
Independent
Hardware
iSCSI
Adapter
This
type
of
adapter
implements
its
own
networking
and
iSCSI
configuration
and
management
interfaces.
An
example
of
an
independent
hardware
iSCSI
adapter
is
a
card
that
presents
either
iSCSI
offload
functionality
only
or
iSCSI
offload
functionality
and
standard
NIC
functionality.
The
iSCSI
offload
functionality
has
independent
configuration
management
that
assigns
the
IP
address,
MAC
address,
and
other
parameters
used
for
the
iSCSI
sessions.
An
example
of
an
independent
hardware
adapter
is
the
QLogic
QLA4052
adapter.
SCSI
Portals
iSCSI
nodes
keep
track
of
connections
via
portals,
enabling
separation
between
names
and
IP
addresses.
A
portal
manages
an
IP
address
and
a
TCP
port
number.
Therefore,
from
an
architectural
perspective,
sessions
can
be
made
up
of
multiple
logical
connections,
and
portals
track
connections
via
TCP/IP
port/address
iSCSI
Implementation
Options
With
the
hardware-initiator
iSCSI
implementation,
the
iSCSI
HBA
provides
the
translation
from
SCSI
commands
to
an
encapsulated
format
that
can
be
sent
over
the
network.
A
TCP
offload
engine
(TOE)
does
this
translation
on
the
adapter.
The
software-initiator
iSCSI
implementation
leverages
the
VMkernel
to
perform
the
SCSI
to
IP
translation
and
requires
extra
CPU
cycles
to
perform
this
work.
As
mentioned
previously,
most
enterprise-level
networking
chip
sets
offer
TCP
offload
or
checksum
offloads,
which
vastly
improve
CPU
overhead.
With
the
hardware-initiator
iSCSI
implementation,
the
iSCSI
HBA
provides
the
translation
from
SCSI
commands
to
an
encapsulated
format
that
can
be
sent
over
the
network.
A
TCP
offload
engine
(TOE)
does
this
translation
on
the
adapter.
Mixing
iSCSI
Options
Having
both
software
iSCSI
and
hardware
iSCSI
enabled
on
the
same
host
is
supported.
However,
use
of
both
software
and
hardware
adapters
on
the
same
vSphere
host
to
access
the
same
target
is
not
supported.
One
cannot
have
the
host
access
the
same
target
via
hardware-dependent/hardware-independent/software
iSCSI
adapters
for
multipathing
purposes
Networking
Settings
Network
design
is
key
to
making
sure
iSCSI
works.
In
a
production
environment,
gigabit
Ethernet
is
essential
for
software
iSCSI.
Hardware
iSCSI,
in
a
VMware
Infrastructure
environment,
is
implemented
with
dedicated
HBAs.
iSCSI
should
be
considered
a
local-area
technology,
not
a
wide-area
technology,
because
of
latency
issues
and
security
concerns.
You
should
also
segregate
iSCSI
traffic
from
general
traffic.
Layer-2
VLANs
are
a
particularly
good
way
to
implement
this
segregation.
Beware
of
oversubscription.
Oversubscription
occurs
when
more
users
are
connected
to
a
system
than
can
be
fully
supported
at
the
same
time.
Networks
and
servers
are
almost
always
designed
with
some
amount
of
oversubscription,
assuming
that
users
do
not
all
need
the
service
simultaneously.
If
they
do,
delays
are
certain
and
outages
are
possible.
Oversubscription
is
permissible
on
general-
purpose
LANs,
but
you
should
not
use
an
oversubscribed
configuration
for
iSCSI.
Best
practice
is
to
have
a
dedicated
LAN
for
iSCSI
traffic
and
not
share
the
network
with
other
network
traffic.
It
is
also
best
practice
not
to
oversubscribe
the
dedicated
LAN.
Finally,
because
iSCSI
leverages
the
IP
network,
VMkernel
NICs
can
be
placed
into
teaming
configurations.
Alternatively,
a
VMware
recommendation
is
to
use
port
binding
rather
than
NIC
teaming.
Port
binding
will
be
explained
in
detail
later
in
this
paper
but
suffice
to
say
that
with
port
binding,
iSCSI
can
leverage
VMkernel
multipath
capabilities
such
as
failover
on
SCSI
errors
and
Round
Robin
path
policy
for
performance.
In
the
interest
of
completeness,
both
methods
will
be
discussed.
However,
port
binding
is
the
recommended
best
practice.
VMkernel
Network
Configuration
A
VMkernel
network
is
required
for
IP
storage
and
thus
is
required
for
iSCSI.
A
best
practice
would
be
to
keep
the
iSCSI
traffic
separate
from
other
networks,
including
the
management
and
virtual
machine
networks.
IPv6
Supportability
Statements
At
the
time
of
this
writing,
there
is
no
IPv6
support
for
either
hardware
iSCSI
or
software
iSCSI
adapters
in
vSphere
5.1.
Throughput
Options
There
are
a
number
of
options
available
to
improve
iSCSI
performance.
1.
10GbE
This
is
an
obvious
option
to
begin
with.
If
you
can
provide
a
larger
pipe,
the
likelihood
is
that
you
will
achieve
greater
throughput.
Of
course,
if
there
is
not
enough
I/O
to
fill
a
1GbE
connection,
then
a
larger
connection
isnt
going
to
help
you.
But
lets
assume
that
there
are
enough
virtual
machines
and
enough
datastores
for
10GbE
to
be
beneficial.
2.
Jumbo
frames
This
feature
can
deliver
additional
throughput
by
increasing
the
size
of
the
payload
in
each
frame
from
a
default
MTU
of
1,500
to
an
MTU
of
9,000.
However,
great
care
and
consideration
must
be
used
if
you
decide
to
implement
it.
All
devices
sitting
in
the
I/O
path
(iSCSI
target,
physical
switches,
network
interface
cards
and
VMkernel
ports)
must
be
able
to
implement
jumbo
frames
for
this
option
to
provide
the
full
benefits.
For
example,
if
the
MTU
is
not
correctly
set
on
the
switches,
the
datastores
might
mount
but
I/O
will
fail.
A
common
issue
with
jumbo-frame
configurations
is
that
the
MTU
value
on
the
switch
isnt
set
correctly.
In
most
cases,
this
must
be
higher
than
that
of
the
hosts
and
storage,
which
are
typically
set
to
9,000.
Switches
must
be
set
higher,
to
9,198
or
9,216
for
example,
to
account
for
IP
overhead.
Refer
to
switch-vendor
documentation
as
well
as
storage-vendor
documentation
before
attempting
to
configure
jumbo
frames.
3.
Round
Robin
path
policy
Round
Robin
uses
an
automatic
path
selection
rotating
through
all
available
paths,
enabling
the
distribution
of
load
across
the
configured
paths.
This
path
policy
can
help
improve
I/O
throughput.
For
active/passive
storage
arrays,
only
the
paths
to
the
active
controller
will
be
used
in
the
Round
Robin
policy.
For
active/active
storage
arrays,
all
paths
will
be
used
in
the
Round
Robin
policy.
For
ALUA
arrays
(Asymmetric
Logical
Unit
Assignment),
Round
Robin
uses
only
the
active/optimized
(AO)
paths.
These
are
the
paths
to
the
disk
through
the
managing
controller.
Active/nonoptimized
(ANO)
paths
to
the
disk
through
the
nonmanaging
controller
are
not
used.
Not
all
arrays
support
the
Round
Robin
path
policy.
Refer
to
your
storage-array
vendors
documentation
for
recommendations
on
using
this
Path
Selection
Policy
(PSP).
Minimizing
Latency
Because
iSCSI
on
VMware
uses
TCP/IP
to
transfer
I/O,
latency
can
be
a
concern.
To
decrease
latency,
one
should
always
try
to
minimize
the
number
of
hops
between
the
storage
and
the
vSphere
host.
Ideally,
one
would
not
route
traffic
between
the
vSphere
host
and
the
storage
array,
and
both
would
coexist
on
the
same
subnet.
NOTE:
If
iSCSI
port
bindings
are
implemented
for
the
purposes
of
multipathing,
you
cannot
route
your
iSCSI
traffic.
Routing
A
vSphere
host
has
a
single
routing
table
for
all
of
its
VMkernel
Ethernet
interfaces.
This
imposes
some
limits
on
network
communication.
Consider
a
configuration
that
uses
two
Ethernet
adapters
with
one
VMkernel
TCP/IP
stack.
One
adapter
is
on
the
10.17.1.1/24
IP
network
and
the
other
on
the
192.168.1.1/24
network.
Assume
that
10.17.1.253
is
the
address
of
the
default
gateway.
The
VMkernel
can
communicate
with
any
servers
reachable
by
routers
that
use
the
10.17.1.253
gateway.
It
might
not
be
able
to
talk
to
all
servers
on
the
192.168
network
unless
both
networks
are
on
the
same
broadcast
domain.
Some
systems
can
offload
the
iSCSI
digest
calculations
to
the
network
processor,
thus
reducing
the
impact
on
performance.
Flow
Control
The
general
consensus
from
our
storage
partners
is
that
hardware-based
flow
control
is
recommended
for
all
network
interfaces
and
switches.
Security
Considerations
Private
Network
iSCSI
storage
traffic
is
transmitted
in
an
unencrypted
format
across
the
LAN.
Therefore,
it
is
considered
best
practice
to
use
iSCSI
on
trusted
networks
only
and
to
isolate
the
traffic
on
separate
physical
switches
or
to
leverage
a
private
VLAN.
All
iSCSI-array
vendors
agree
that
it
is
good
practice
to
isolate
iSCSI
traffic
for
security
reasons.
This
would
mean
isolating
the
iSCSI
traffic
on
its
own
separate
physical
switches
or
leveraging
a
dedicated
VLAN
(IEEE
802.1Q).
Encryption
ISCSI
supports
several
types
of
security.
IPSec
(Internet
Protocol
Security)
is
a
developing
standard
for
security
at
the
network
or
packet-processing
layer
of
network
communication.
IKE
(Internet
Key
Exchange)
is
an
IPSec
standard
protocol
used
to
ensure
security
for
VPNs.
However,
at
the
time
of
this
writing
IPSec
was
not
supported
on
vSphere
hosts.
Authentication
There
are
also
a
number
of
authentication
methods
supported
with
iSCSI.
At
the
time
of
this
writing
(vSphere
5.1),
a
vSphere
host
does
not
support
Kerberos,
SRP
or
public-key
authentication
methods
for
iSCSI
The
only
authentication
protocol
supported
is
CHAP.
CHAP
verifies
identity
using
a
hashed
transmission.
The
target
initiates
the
challenge.
Both
parties
know
the
secret
key.
It
periodically
repeats
the
challenge
to
guard
against
replay
attacks.
CHAP
is
a
one-way
protocol,
but
it
might
be
implemented
in
two
directions
to
provide
security
for
both
ends.
The
iSCSI
specification
defines
the
CHAP
security
method
as
the
only
must-support
protocol.
The
VMware
implementation
uses
this
security
option.
Initially,
VMware
supported
only
unidirectional
CHAP,
but
bidirectional
CHAP
is
now
supported.
iSCSI
Datastore
Provisioning
Steps
1.
Create
a
new
VMkernel
port
group
for
IP
storage
on
an
already
existing
virtual
switch
(vSwitch)
or
on
a
new
vSwitch
when
it
is
configured.
The
vSwitch
can
be
a
vSphere
Standard
Switch
(VSS)
or
a
VMware
vSphere
Distributed
Switch.
2.
3.
Ensure
that
the
iSCSI
storage
is
configured
to
export
a
LUN
accessible
to
the
vSphere
host
iSCSI
initiators
on
a
trusted
network.
that
if
the
management
network
suffers
an
outage,
you
continue
to
have
iSCSI
connectivity
via
the
VMkernel
ports
participating
in
the
iSCSI
bindings.
NOTE:
VMware
considers
the
implementation
of
iSCSI
multipathing
versus
NIC
teaming
a
best
practice.
Software
iSCSI
Multipathing
Configuration
Steps
For
port
binding
to
work
correctly,
the
initiator
must
be
able
to
reach
the
target
directly
on
the
same
subnet
iSCSI
port
binding
in
vSphere
5.0
does
not
support
routing.
In
this
configuration,
if
I
place
my
VMkernel
ports
on
VLAN
74,
they
can
reach
the
iSCSI
target
without
the
need
of
a
router.
This
is
an
important
point
and
requires
further
elaboration
because
it
causes
some
confusion.
If
I
do
not
implement
port
binding
and
use
a
standard
VMkernel
port,
then
my
initiator
can
reach
the
targets
through
a
routed
network.
This
is
supported
and
works
well.
It
is
only
when
iSCSI
binding
is
implemented
that
a
direct,
non-routed
network
between
the
initiators
and
targets
is
required.
In
other
words,
initiators
and
targets
must
be
on
the
same
subnet.
There
is
another
important
point
to
note
when
it
comes
to
the
configuration
of
iSCSI
port
bindings.
On
VMware
standard
switches
that
contain
multiple
vmnic
uplinks,
each
VMkernel
(vmk)
port
used
for
iSCSI
bindings
must
be
associated
with
a
single
vmnic
uplink.
The
other
uplink(s)
on
the
vSwitch
must
be
placed
into
an
unused
state.
This
is
only
a
requirement
when
there
are
multiple
vmnic
uplinks
on
the
same
vSwitch.
If
you
are
using
multiple
VSSs
with
their
own
vmnic
uplinks,
then
this
is
not
an
issue.
Continuing
with
the
network
configuration,
a
second
VMkernel
(vmk)
port
is
created.
Now
there
are
two
vmk
ports,
labeled
iSCSI1
and
iSCSI2.
These
will
be
used
for
the
iSCSI
port
binding/multipathing
configuration.
The
next
step
is
to
configure
the
bindings
and
iSCSI
targets.
This
is
done
in
the
properties
of
the
software
iSCSI
adapter.
Since
vSphere
5.0,
there
is
a
new
Network
Configuration
tab
in
the
Software
iSCSI
Initiator
Properties
window.
This
is
where
the
VMkernel
ports
used
for
binding
to
the
iSCSI
adapter
are
added.
After
selecting
the
VMkernel
adapters
for
use
with
the
software
iSCSI
adapter,
the
Port
Group
Policy
tab
will
tell
you
whether
or
not
these
adapters
are
compliant
for
binding.
If
you
have
more
than
one
active
uplink
on
a
vSwitch
that
has
multiple
vmnic
uplinks,
the
vmk
interfaces
will
not
show
up
as
compliant.
Only
one
uplink
should
be
active.
All
other
uplinks
should
be
placed
into
an
unused
state.
Interoperability
Considerations
NOTE:
Support
was
introduced
for
VMware
ESXi
only,
and
not
classic
ESX.
Not
all
of
our
storage
partners
support
iSCSI
Boot
Firmware
Table
(iBFT)
boot
from
SAN.
Refer
to
the
partners
own
documentation
for
clarification.
Why
Boot
from
SAN?
It
quickly
became
clear
that
there
was
a
need
to
boot
via
software
iSCSI.
Partners
of
VMware
were
developing
blade
chassis
containing
blade
servers,
storage
and
network
interconnects
in
a
single
rack.
The
blades
were
typically
diskless,
with
no
local
storage.
The
requirement
was
to
have
the
blade
servers
boot
off
of
an
iSCSI
LUN
using
network
interface
cards
with
iSCSI
capabilities,
rather
than
using
dedicated
hardware
iSCSI
initiators.
Compatible
Network
Interface
Card
Much
of
the
configuration
for
booting
via
software
iSCSI
is
done
via
the
BIOS
settings
of
the
network
interface
cards
and
the
host.
Check
the
VMware
Hardware
Compatibility
List
(HCL)
to
ensure
that
the
network
interface
card
is
compatible.
This
is
important,
but
a
word
of
caution
is
necessary.
If
you
select
a
particular
network
interface
card
and
you
see
iSCSI
as
a
feature,
you
might
assume
that
you
can
use
it
to
boot
a
vSphere
host
from
an
iSCSI
LUN.
This
is
not
the
case.
To
see
if
a
particular
network
interface
card
is
supported
for
iSCSI
boot,
set
the
I/O
device
type
to
Network
(not
iSCSI)
in
the
HCL
and
then
check
the
footnotes.
If
the
footnotes
state
that
iBFT
is
supported,
then
this
card
can
be
used
for
boot
from
iSCSI.
Advanced
Settings
There
are
a
number
of
tunable
parameters
available
when
using
iSCSI
datastores.
Before
drilling
into
these
advanced
settings
in
more
detail,
you
should
understand
that
the
recommended
values
for
some
of
these
settings
might
(and
probably
will)
vary
from
storage-array
vendor
to
storage-array
vendor.
LoginTimeout
When
iSCSI
establishes
a
session
between
initiator
and
target,
it
must
log
in
to
the
target.
It
will
try
to
log
in
for
a
period
of
LoginTimeout
seconds.
If
that
is
exceeded,
the
login
fails.
LogoutTimeout
When
iSCSI
finishes
a
session
between
initiator
and
target,
it
must
log
out
of
the
target.
It
will
try
to
log
out
for
a
period
of
LogoutTimeout
seconds.
If
that
is
exceeded,
the
logout
fails.
RecoveryTimeout
The
other
options
relate
to
how
a
dead
path
is
determined.
RecoveryTimeout
is
used
to
determine
how
long
we
should
wait,
in
seconds,
after
PDUs
are
no
longer
being
sent
or
received
before
placing
a
once-active
path
into
a
dead
state.
Realistically
its
a
bit
longer
than
that,
because
other
considerations
are
taken
into
account
as
well.
NoopInterval
and
NoopTimeout
The
noop
settings
are
used
to
determine
if
a
path
is
dead
when
it
is
not
the
active
path.
iSCSI
will
passively
discover
if
this
path
is
dead
by
using
the
noop
timeout.
This
test
is
carried
out
on
nonactive
paths
every
NoopInterval
seconds.
If
a
response
isnt
received
by
NoopTimeout,
measured
in
seconds,
the
path
is
marked
as
dead.
Unless
faster
failover
times
are
desirable,
it
is
not
required
to
change
these
parameters
from
their
default
settings.
Use
caution
when
modifying
these
parameters,
because
if
paths
fail
too
quickly
and
then
recover,
you
might
have
LUNs/devices
moving
ownership
unnecessarily
between
targets,
and
that
can
lead
to
path
thrashing.
QFullSampleSize
and
QFullThreshold
Some
of
our
storage
partners
require
the
use
of
the
parameters
QFullSampleSize
and
QFullThreshold
to
enable
the
adaptive
queue-
depth
algorithm
of
VMware.
With
the
algorithm
enabled,
no
additional
I/O
throttling
is
required
on
the
vSphere
hosts.
Refer
to
your
storage-array
vendors
documentation
to
see
if
this
is
applicable
to
your
storage.
Disk.DiskMaxIOSize
To
improve
the
performance
of
virtual
machines
that
generate
large
I/O
sizes,
administrators
can
consider
setting
the
advanced
parameter
Disk.DiskMaxIOSize.
Some
of
our
partners
suggest
setting
this
to
128KB
to
enhance
storage
performance.
However,
it
would
be
best
to
understand
the
I/O
size
that
the
virtual
machine
is
generating
before
setting
this
parameter.
A
different
size
might
be
more
suitable
to
your
application.
DelayedAck
A
host
receiving
a
stream
of
TCP
data
segments,
as
in
the
case
of
iSCSI,
can
increase
efficiency
in
both
the
network
and
the
hosts
by
sending
less
than
one
ack
acknowledgment
segment
per
data
segment
received.
This
is
known
as
a
delayed
ack.
The
common
practice
is
to
send
an
ack
for
every
other
full-sized
data
segment
and
not
to
delay
the
ack
for
a
segment
by
more
than
a
specified
threshold.
This
threshold
varies
between
100
and
500
milliseconds.
vSphere
hosts,
as
do
most
other
servers,
use
a
delayed
ack
because
of
its
benefits.
Some
arrays,
however,
take
the
very
conservative
approach
of
retransmitting
only
one
lost
data
segment
at
a
time
and
waiting
for
the
hosts
ack
before
retransmitting
the
next
one.
This
approach
slows
read
performance
to
a
halt
in
a
congested
network
and
might
require
the
delayed
ack
feature
to
be
disabled
on
the
vSphere
host.
More
details
can
be
found
in
KB
article
1002598.
Additional
Considerations
Disk
Alignment
This
is
not
a
recommendation
specific
to
iSCSI,
because
it
also
can
have
an
adverse
effect
on
the
performance
of
all
block
storage.
Nevertheless,
to
account
for
every
contingency,
it
should
be
considered
a
best
practice
to
have
the
partitions
of
the
guest
OS
running
with
the
virtual
machine
aligned
to
the
storage.
Microsoft
Clustering
Support
With
the
release
of
vSphere
5.1,
VMware
supports
as
many
as
five
nodes
in
a
Microsoft
Cluster.
However,
at
the
time
of
this
writing,
VMware
does
not
support
the
cluster
quorum
disk
over
the
iSCSI
protocol.
In-Guest
iSCSI
Support
A
number
of
in-guest
iSCSI
software
solutions
are
available.
The
iSCSI
driver
of
Microsoft
is
one
commonly
seen
running
in
a
virtual
machine
when
the
guest
OS
is
a
version
of
Microsoft
Windows.
The
support
statement
for
this
driver
can
be
found
in
KB
article
1010547,
which
states
that
if
you
encounter
connectivity
issues
using
a
third-party
software
iSCSI
initiator
to
the
third-party
storage
device,
engage
the
third-party
vendors
for
assistance.
If
the
third-party
vendors
determine
that
the
issue
is
due
to
a
lack
of
network
connectivity
to
the
virtual
machine,
contact
VMware
for
troubleshooting
assistance.
All
Paths
Down
and
Permanent
Device
Loss
All
Paths
Down
(APD)
can
occur
on
a
vSphere
host
when
a
storage
device
is
removed
in
an
uncontrolled
manner
or
if
the
device
fails
and
the
VMkernel
core
storage
stack
cannot
detect
how
long
the
loss
of
device
access
will
last.
One
possible
scenario
for
an
APD
condition
is
an
FC
switch
failure
that
brings
down
all
the
storage
paths,
or,
in
the
case
of
an
iSCSI
array,
a
network
connectivity
issue
that
similarly
brings
down
all
the
storage
paths.
A
new
condition
known
as
Permanent
Device
Loss
(PDL)
was
introduced
in
vSphere
5.0.
The
PDL
condition
enabled
the
vSphere
host
to
take
specific
actions
when
it
detected
that
the
device
loss
was
permanent.
The
vSphere
host
can
be
informed
of
a
PDL
situation
by
specific
SCSI
sense
codes
sent
by
the
target
array.
In
vSphere
5.1,
VMware
introduced
a
PDL
detection
method
for
those
iSCSI
arrays
that
present
only
one
LUN
for
each
target.
These
arrays
were
problematic,
because
after
LUN
access
was
lost,
the
target
also
was
lost.
Therefore,
the
vSphere
host
had
no
way
of
reclaiming
any
SCSI
sense
codes.
vSphere
5.1
extends
PDL
detection
to
those
arrays
that
have
only
a
single
LUN
per
target.
With
vSphere
5.1,
for
those
iSCSI
arrays
that
have
a
single
LUN
per
target,
an
attempt
is
made
to
log
in
again
to
the
target
after
a
dropped
session.
If
there
is
a
PDL
condition,
the
storage
system
rejects
the
effort
to
access
the
device.
Depending
on
how
the
array
rejects
the
efforts
to
access
the
LUN,
the
vSphere
host
can
determine
whether
the
device
has
been
lost
permanently
(PDL)
or
is
temporarily
unreachable.
Round
Robin
Path
Policy
Setting
IOPS=1
A
number
of
our
partners
have
documented
that
if
using
the
Round
Robin
path
policy,
best
results
can
be
achieved
with
an
IOPS=1
setting.
This
might
well
be
true
in
very
small
environments
where
there
are
a
small
number
of
virtual
machines
and
a
small
number
of
datastores.
However,
because
the
environment
scales
with
a
greater
number
of
virtual
machines
and
a
greater
number
of
datastores,
VMware
considers
that
the
default
settings
associated
with
the
Round
Robin
path
policy
to
be
sufficient.
Consult
your
storage
array
vendor
for
advice
on
this
setting.
Data
Center
Bridging
(DCB)
Support
Our
storage
partner
Dell
now
supports
iSCSI
over
DCB
under
the
PVSP
(Partner
Verified
and
Supported
Products)
program
of
VMware.
This
is
for
the
Dell
EqualLogic
(EQL)
array
only
with
certain
Converged
Network
Adapters
(CNAs)
and
only
on
vSphere
version
5.1.
See
KB
article
2044431
for
further
details.
7. Best
Practices
for
running
VMware
vSphere on
Network
Attached
Storage
Background
VMware
introduced
the
support
of
IP
based
storage
in
release
3
of
the
ESX
server.
Prior
to
that
release,
the
only
option
for
shared
storage
pools
was
Fibre
Channel
(FC).
With
VI3,
both
iSCSI
and
NFS
storage
were
introduced
as
storage
resources
that
could
be
shared
across
a
cluster
of
ESX
servers.
The
addition
of
new
choices
has
led
to
a
number
of
people
asking
What
is
the
best
storage
protocol
choice
for
one
to
deploy
a
virtualization
project
on?
The
answer
to
that
question
has
been
the
subject
of
much
debate,
and
there
seems
to
be
no
single
correct
answer.
The
considerations
for
this
choice
tend
to
hinge
on
the
issue
of
cost,
performance,
availability,
and
ease
of
manageability.
However,
an
additional
factor
should
also
be
the
legacy
environment
and
the
storage
administrator
familiarity
with
one
protocol
vs.
the
other
based
on
what
is
already
installed.
The
bottom
line
is,
rather
than
ask
which
storage
protocol
to
deploy
virtualization
on,
the
question
should
be,
Which
virtualization
solution
enables
one
to
leverage
multiple
storage
protocols
for
their
virtualization
environment?
And,
Which
will
give
them
the
best
ability
to
move
virtual
machines
from
one
storage
pool
to
another,
regardless
of
what
storage
protocol
it
uses,
without
downtime,
or
application
disruption?
Once
those
questions
are
considered,
the
clear
answer
is
VMware
vSphere.
However,
to
investigate
the
options
a
bit
further,
performance
of
FC
is
perceived
as
being
a
bit
more
industrial
strength
than
IP
based
storage.
However,
for
most
virtualization
environments,
NFS
and
iSCSI
provide
suitable
I/O
performance.
The
comparison
has
been
the
subject
of
many
papers
and
projects.
One
posted
on
VMTN
is
located
at:
http://www.vmware.com/files/pdf/storage_protocol_perf.pdf.
The
general
conclusion
reached
by
the
above
paper
is
that
for
most
workloads,
the
performance
is
similar
with
a
slight
increase
in
ESX
Server
CPU
overhead
per
transaction
for
NFS
and
a
bit
more
for
software
iSCSI.
For
most
virtualization
environments,
the
end
user
might
not
even
be
able
to
detect
the
performance
delta
from
one
virtual
machine
running
on
IP
based
storage
vs.
another
on
FC
storage.
The
more
important
consideration
that
often
leads
people
to
choose
NFS
storage
for
their
virtualization
environment
is
the
ease
of
provisioning
and
maintaining
NFS
shared
storage
pools.
NFS
storage
is
often
less
costly
than
FC
storage
to
set
up
and
maintain.
For
this
reason,
NFS
tends
to
be
the
choice
taken
by
small
to
medium
businesses
that
are
deploying
virtualizationas
well
as
the
choice
for
deployment
of
virtual
desktop
infrastructures.
This
paper
will
investigate
the
trade
offs
and
considerations
in
more
detail.
Overview
of
the
Steps
to
Provision
NFS
Datastores
Before NFS storage can be addressed by an ESX server, the following issues need to be addressed:
For
more
details
on
NFS
storage
options
and
setup,
consult
the
best
practices
for
VMware
provided
by
the
storage
vendor.
EMC
with
VMware
vSphere
4
Applied
Best
Practices
NetApp
and
VMware
vSphere
Storage
Best
Practices
Regarding
item
one
above,
to
configure
the
vSwitch
for
IP
storage
access
you
will
need
to
create
a
new
vSwitch
under
ESX
server
configuration,
networking
tab
in
vCenter.
Indicating
it
is
a
vmkernel
type
connection
will
automatically
add
to
the
vSwitch.
You
will
need
to
populate
the
network
access
information.
Regarding
item
two
above,
to
configure
the
ESX
host
for
running
its
NFS
client,
youll
need
to
open
a
firewall
port
for
the
NFS
client.
To
do
this,
select
the
configuration
tab
for
the
ESX
Server
in
Virtual
Center
and
click
on
Security
Profile
(listed
under
software
options)
and
then
check
the
box
for
NFS
Client
listed
under
the
remote
access
choices
in
the
Firewall
Properties
screen.
With
these
items
addressed,
an
NFS
datastore
can
now
be
added
to
the
ESX
server
following
the
same
process
used
to
configure
a
datastore
for
block
based
(FC
or
iSCSI)
datastores.
On
the
ESX
Server
configuration
tab
in
VMware
VirtualCenter,
select
storage
(listed
under
hardware
options)
and
then
click
the
add
button.
On
the
screen
for
select
storage
type,
select
Network
File
System
and
in
the
next
screen
enter
the
IP
address
of
the
NFS
server,
mount
point
for
the
specific
destination
on
that
server
and
the
desired
name
for
that
new
datastore.
If
everything
is
completed
correctly,
the
new
NFS
datastore
will
show
up
in
the
refreshed
list
of
datastores
available
for
that
ESX
server.
The main differences in provisioning an NFS datastores compared to block based storage datastores are:
For
NFS
there
are
fewer
screens
to
navigate
through
but
more
data
entry
required
than
block
based
storage.
The
NFS
device
needs
to
be
specified
via
an
IP
address
and
folder
(mount
point)
on
that
filer,
rather
than
a
pick
list
of
options
to
choose
from.
only
an
option
with
a
limited
number
of
switches
that
are
available
today.
IP
hash
Method
of
switching
to
an
alternate
path
based
on
a
hash
of
the
IP
address
of
both
end
points
for
multiple
connections.
Virtual
IP
(VIF)
An
interface
used
by
the
NAS
device
to
present
the
same
IP
address
out
of
two
ports
from
that
single
array.
Avoiding
single
points
of
failure
a
the
NIC,
switch,
filer
levels
The
first
level
of
High
Availability
(HA)
is
to
avoid
a
single
point
of
failure
being
a
NIC
card
in
an
ESX
server,
or
the
cable
between
the
NIC
card
and
the
switch.
With
the
solution
having
two
NICs
connected
to
the
same
LAN
switch
and
configured
as
teamed
at
the
switch
and
having
IP
hash
failover
enabled
at
the
ESX
server.
The
second
level
of
HA
is
to
avoid
a
single
point
of
failure
being
a
loss
of
the
switch
to
which
the
ESX
connects.
With
this
solution,
one
has
four
potential
NIC
cards
in
the
ESX
server
configured
with
IP
hash
failover
and
two
pairs
going
to
separate
LAN
switches
with
each
pair
configured
as
teamed
at
the
respective
LAN
switches.
The
third
level
of
HA
protects
against
loss
of
a
filer
(or
NAS
head)
becoming
unavailable.
With
storage
vendors
that
provide
clustered
NAS
heads
that
can
take
over
for
another
in
the
event
of
a
failure,
one
can
configure
the
LAN
such
that
downtime
can
be
avoided
in
the
event
of
losing
a
single
filer,
or
NAS
head.
An
even
higher
level
of
performance
and
HA
can
build
on
the
previous
HA
level
with
the
addition
of
Cross
Stack
Ether-channel
capable
switches.
With
certain
network
switches,
it
is
possible
to
team
ports
across
two
separate
physical
switches
that
are
managed
as
one
logical
switch.
This
provides
additional
resilience
as
well
as
some
performance
optimization
that
one
can
get
HA
with
fewer
NICs,
or
have
more
paths
available
across
which
one
can
distribute
load
sharing.
Caveat:
NIC
teaming
provides
failover
but
not
load-balanced
performance
(in
the
common
case
of
a
single
NAS
datastore)
It
is
also
important
to
understand
that
there
is
only
one
active
pipe
for
the
connection
between
the
ESX
server
and
a
single
storage
target
(LUN
or
mountpoint).
This
means
that
although
there
may
be
alternate
connections
available
for
failover,
the
bandwidth
for
a
single
datastore
and
the
underlying
storage
is
limited
to
what
a
single
connection
can
provide.
To
leverage
more
available
bandwidth,
an
ESX
server
has
multiple
connections
from
server
to
storage
targets.
One
would
need
to
configure
multiple
datastores
with
each
datastore
using
separate
connections
between
the
server
and
the
storage.
This
is
where
one
often
runs
into
the
distinction
between
load
balancing
and
load
sharing.
The
configuration
of
traffic
spread
across
two
or
more
datastores
configured
on
separate
connections
between
the
ESX
server
and
the
storage
array
is
load
sharing.
Security
Considerations
VMware
vSphere
implementation
of
NFS
supports
NFS
version
3
in
TCP.
There
is
currently
no
support
for
NFS
version
2,
UDP,
or
CIFS/SMB.
Kerberos
is
also
not
supported
in
the
ESX
Server
4,
and
as
such
traffic
is
not
encrypted.
Storage
traffic
is
transmitted
as
clear
text
across
the
LAN.
Therefore,
it
is
considered
best
practice
to
use
NFS
storage
on
trusted
networks
only.
And
to
isolate
the
traffic
on
separate
physical
switches
or
leverage
a
private
VLAN.
Another
security
concern
is
that
the
ESX
Server
must
mount
the
NFS
server
with
root
access.
This
raises
some
concerns
about
hackers
getting
access
to
the
NFS
server.
To
address
the
concern,
it
is
best
practice
to
use
of
either
dedicated
LAN
or
VLAN
to
provide
protection
and
isolation.
Additional
Attributes
of
NFS
Storage
There
are
several
additional
options
to
consider
when
using
NFS
as
a
shared
storage
pool
for
virtualization.
Some
additional
considerations
are
thin
provisioning,
de-duplication,
and
the
ease-of-backup-and-restore
of
virtual
machines,
virtual
disks,
and
even
files
on
a
virtual
disk
via
array
based
snapshots.
Thin
Provisioning
Virtual
disks
(VMDKs)
created
on
NFS
datastores
are
in
thin
provisioned
format
by
default.
This
capability
offers
better
disk
utilization
of
the
underlying
storage
capacity
in
that
it
removes
what
is
often
considered
wasted
disk
space.
For
the
purpose
of
this
paper,
VMware
will
define
wasted
disk
space
as
allocated
but
not
used.
The
thin-provisioning
technology
removes
a
significant
amount
of
wasted
disk
space.
On
NFS
datastores,
the
default
virtual
disk
format
is
thin.
As
such,
less
allocation
of
VMFS
volume
storage
space
than
is
needed
for
the
same
set
of
virtual
disks
provisioned
as
thick
format
De-duplication
Some
NAS
storage
vendors
offer
data
de-duplication
features
that
can
greatly
reduce
the
amount
of
storage
space
required.
It
is
important
to
distinguish
between
in-place
de-duplication
and
de-duplication
for
backup
streams.
Both
offer
significant
savings
in
space
requirements,
but
in-place
de-duplication
seems
to
be
far
more
significant
for
virtualization
environments.
Some
customers
have
been
able
to
reduce
their
storage
needs
by
up
to
75
percent
of
their
previous
storage
footprint
with
the
use
of
in
place
de-
duplication
technology.
Summary
of
Best
Practices
Networking
Settings
To
isolate
storage
traffic
from
other
networking
traffic,
it
is
considered
best
practice
to
use
either
dedicated
switches
or
VLANs
for
your
NFS
and
iSCSI
ESX
server
traffic.
The
minimum
NIC
speed
should
be
1
gig
E.
In
VMware
vSphere,
use
of
10gig
E
is
supported.
Best
to
look
at
the
VMware
HCL
to
confirm
which
models
are
supported.
It
is
important
to
not
over-subscribe
the
network
connection
between
the
LAN
switch
and
the
storage
array.
The
retransmitting
of
dropped
packets
can
further
degrade
the
performance
of
an
already
heavily
congested
network
fabric.
Datastore
Settings
The
default
setting
for
the
maximum
number
of
mount
points/datastore
an
ESX
server
can
concurrently
mount
is
eight.
Although
the
limit
can
be
increased
to
64
in
the
existing
release.
If
you
increase
max
NFS
mounts
above
the
default
setting
of
eight,
make
sure
to
also
increase
Net.TcpipHeapSize
as
well.
If
32
mount
points
are
used,
increase
tcpip.Heapsize
to
30MB.
NFS
Locking
NFS
locking
on
ESX
does
not
use
the
NLM
protocol.
VMware
has
established
its
own
locking
protocol.
These
NFS
locks
are
implemented
by
creating
lock
files
on
the
NFS
server.
Lock
files
are
named
.lck-<fileid>,
where
<fileid>
is
the
value
of
the
fileid
field
returned
from
a
GETATTR
request
for
the
file
being
locking.
Once
a
lock
file
is
created,
VMware
periodically
(every
NFS.DiskFileLockUpdateFreq
seconds)
send
updates
to
the
lock
file
to
let
other
ESX
hosts
know
that
the
lock
is
still
active.
The
lock
file
updates
generate
small
(84
byte)
WRITE
requests
to
the
NFS
server.
Changing
any
of
the
NFS
locking
parameters
will
change
how
long
it
takes
to
recover
stale
locks.
The
following
formula
can
be
used
to
calculate
how
long
it
takes
to
recover
a
stale
NFS
lock:
(NFS.DiskFileLockUpdateFreq
*
NFS.LockRenewMaxFailureNumber)
+
NFS.LockUpdateTimeout
If
any
of
these
parameters
are
modified,
its
very
important
that
all
ESX
hosts
in
the
cluster
use
identical
settings.
Having
inconsistent
NFS
lock
settings
across
ESX
hosts
can
result
in
data
corruption!
In
vSphere
the
option
to
change
the
NFS.Lockdisable
setting
has
been
removed.
This
was
done
to
remove
the
temptation
to
disable
the
VMware
locking
mechanism
for
NFS.
So
it
is
no
longer
an
option
to
turn
it
off
in
vSphere.
Virtual
Machine
Swap
Space
Location
Keeping
the
virtual
machine
swap
space
on
the
NFS
datastore
is
now
considered
to
be
the
best
practice.
NFS
Advanced
Options
NFS.DiskFileLockUpdateFreq
Time
between
updates
to
the
NFS
lock
file
on
the
NFS
server.
Increasing
this
value
will
increase
the
time
it
takes
to
recover
stale
NFS
locks.
(See
NFS
Locking)
NFS.LockUpdateTimeout
Amount
of
time
VMWare
waits
before
we
abort
a
lock
update
request.
(See
NFS
Locking)
NFS.LockRenewMaxFailureNumber
Number
of
lock
update
failures
that
must
occur
before
VMare
marks
the
lock
as
stale.
(See
NFS
Locking)
NFS.HeartbeatFrequency
How
often
the
NFS
heartbeat
world
runs
to
see
if
any
NFS
volumes
need
a
heartbeat
request.
(See
NFS
Heartbeats)
NFS.HeartbeatTimeout
Amount
of
time
VMware
waits
before
aborting
a
heartbeat
request.
(See
NFS
Heartbeats)
NFS.HeartbeatDelta
Amount
of
time
after
a
successful
GETATTR
request
before
the
heartbeat
world
will
issue
a
heartbeat
request
for
a
volume.
If
an
NFS
volume
is
in
an
unavailable
state,
an
update
will
be
sent
every
time
the
heartbeat
world
runs
(NFS.HeartbeatFrequency
seconds).
(See
NFS
Heartbeats)
NFS.HeartbeatMaxFailures
Number
of
consecutive
heartbeat
requests
that
must
fail
before
VMwares
mark
a
server
as
unavailable.
(See
NFS
Heartbeats)
NFS.MaxVolumes
Maximum
number
of
NFS
volume
that
can
be
mounted.
The
TCP/IP
heap
must
be
increased
to
accommo-
date
the
number
of
NFS
volumes
configured
(See
TCP/IP
Heap
Size)
NFS.SendBufferSize
This
is
the
size
of
the
send
buffer
for
NFS
sockets.
This
value
was
chosen
based
on
internal
performance
testing.
Customers
should
not
need
to
adjust
this
value.
NFS.ReceiveBufferSize
This
is
the
size
of
the
receive
buffer
for
NFS
sockets.
This
value
was
chosen
based
on
internal
performance
testing.
Customers
should
not
need
to
adjust
this
value.
NFS.VolumeRemountFrequency
This
determines
how
often
VMWare
would
try
to
mount
an
NFS
volume
that
was
initially
unmountable.
Once
a
volume
is
mounted,
it
never
needs
to
be
remounted.
The
volume
may
be
marked
unavailable
if
VMWare
loses
connectivity
to
the
NFS
serverbut
it
will
still
remain
mounted.
8.
VMware
vSphere 5.0
Upgrade
Best
Practices
VMware
vSphere
5.0
Whats
New
Industrys
largest
virtual
machines
VMware
can
support
even
the
largest
applications
with
the
introduction
of
virtual
machines
that
can
grow
to
as
many
as
32
vCPUs
and
can
use
up
to
1TB
of
memory.
This
enhancement
is
4x
bigger
than
the
previous
release.
vSphere
can
now
support
business-critical
applications
of
any
size
and
dimension.
vSphere
High
Availability
(VMware
HA)
New
architecture
ensures
the
most
simplified
setup
and
the
best
guarantees
for
the
availability
of
business-critical
applications.
Setup
of
the
most
widely
used
VMware
HA
technology
in
the
industry
has
never
been
easier.
VMware
HA
can
now
be
set
up
in
just
minutes.
VMware
vSphere
Auto
Deploy
In
minutes,
you
can
deploy
more
vSphere
hosts
running
the
ESXi
hypervisor
architecture
on
the
fly.
After
it
is
running,
Auto
Deploy
simplifies
patching
by
enabling
you
to
do
a
one-time
patch
of
the
source
ESXi
image
and
then
push
the
updated
image
out
to
your
ESXi
hosts,
as
opposed
to
the
traditional
method
of
having
to
apply
the
same
patch
to
each
host
individually.
Profile-Driven
Storage
You
can
reduce
the
steps
in
the
selection
of
storage
resources
by
grouping
storage
according
to
a
user-defined
policy.
vSphere
Storage
DRS
Automated
load
balancing
now
analyzes
storage
characteristics
to
determine
the
best
place
for
a
given
virtual
machines
data
to
live
when
it
is
created
and
then
used
over
time.
vSphere
Web
Client
This
rich
browser-based
client
provides
full
virtual
machine
administration,
and
now
has
multiplatform
support
and
optimized
client/server
communication,
which
delivers
faster
response
and
a
more
efficient
user
experience
that
helps
take
care
of
business
needs
faster.
VMware
vCenter
Appliance
(VCSA)
This
VMware
vCenter
ServerTM
preinstalled
virtual
appliance
simplifies
the
deployment
and
configuration
of
vCenter
Server,
slipstreams
future
upgrades
and
patching,
and
reduces
the
time
and
cost
associated
with
managing
vCenter
Server.
(Upgrading
to
the
VMware
vCenter
Appliance
from
the
installable
vCenter
Server
is
not
supported.)
Licensing
Reporting
Manager
With
the
new
vSphere
vRAM
licensing
introduced
with
vSphere
5.0,
vCenter
Server
is
enabled
to
show
not
only
installed
licenses
but
the
vRAM
license
memory
pooling
and
its
real-time
utilization.
This
allows
administrators
to
see
the
benefits
of
vRAM
pooling
and
how
to
size
as
the
business
grows.
Upgrading
to
VMware
vCenter
Server
5.0
The
first
step
in
any
vSphere
migration
project
should
always
be
the
upgrade
of
vCenter
Server.
Your
vCenter
Server
must
be
running
at
version
5.0
in
order
to
manage
an
ESXi
5.0
host.
Upgrading
vCenter
Server
5.0
involves
upgrading
the
vCenter
Server
machine,
its
accompanying
database,
and
any
configured
plug-
ins,
including
VMware
vSphere
Update
Manager
and
VMware
vCenter
Orchestrator.
As
of
vSphere
4.1,
vCenter
Server
requires
a
64-bit
server
running
a
64-bit
operating
system
(OS).
If
you
are
currently
running
vCenter
Server
on
a
32-bit
OS,
you
must
migrate
to
the
64-bit
architecture
first.
With
the
64-bit
vCenter
Server,
you
also
must
use
a
64-bit
database
source
name
(DSN)
for
the
vCenter
database.
Planning
the
Upgrade
It
is
recommended
that
you
create
an
inventory
of
the
current
components
and
that
you
validate
compatibility
with
the
requirements
of
vCenter
5.0
Requirements
These
are
supported
minimums.
Scaling
and
sizing
of
vCenter
Server
and
components
should
be
based
on
the
size
of
the
current
virtual
environment
and
anticipated
growth.
Processor:
Two
CPUs
2.0GHz
or
higher
Intel
or
AMD
x86
processors,
with
processor
requirements
higher
if
the
database
runs
on
the
same
machine
Memory:
4GB
RAM,
with
RAM
requirements
higher
if
your
database
runs
on
the
same
machine
Disk
storage:
4GB,
with
disk
requirements
higher
if
your
database
runs
on
the
same
machine
Networking:
1Gb
recommended
OS:
64-bit
Supported
database
platform
Upgrade
Process
The
following
diagram
depicts
possible
upgrade
scenarios
NOTE:
With
the
release
of
vSphere
5.0,
vCenter
Server
is
also
offered
as
a
Linux-based
appliance,
referred
to
as
the
vCenter
Server
Appliance
(VCSA),
which
can
be
deployed
in
minutes.
Due
to
the
architectural
differences
between
the
installable
vCenter
and
the
new
VCSA,
there
is
no
migration
path
or
database
conversion
tool
to
migrate
to
the
VCSA.
You
must
deploy
a
new
VCSA
and
attach
all
the
infrastructure
components
before
recreating
and
attaching
inventory
objects.
We
will
explore
the
three
most
common
scenarios:
vCenter
4.0
and
Upgrade
Manager
4.0,
and
a
32-bit
OS
with
a
local
database
vCenter
4.1
and
Upgrade
Manager
4.1,
a
64-bit
OS
with
a
local
database,
and
the
requirement
to
migrate
to
a
remote
database
vCenter
4.1,
a
64-bit
OS
with
a
remote
database,
and
a
separate
Upgrade
Manager
server
Backing
Up
Your
vCenter
Configuration
Before
starting
the
upgrade
procedure,
it
is
recommended
to
back
up
your
current
vCenter
Server
to
ensure
that
you
can
restore
to
the
previous
configuration
in
the
case
of
an
unsuccessful
upgrade.
It
is
important
to
realize
that
there
are
multiple
objects
that
must
be
backed
up
to
provide
the
ability
to
roll
back:
SSL
certificates
vpxd.cfg
Database
Depending
on
the
type
of
platform
used
to
host
your
vCenter
Server,
it
might
be
possible
to
simply
create
a
clone
or
snapshot
of
your
vCenter
Server
and
database
to
allow
for
a
simple
and
effective
rollback
scenario.
In
most
cases,
however,
it
is
recommended
that
you
back
up
each
of
the
aforementioned
items
separately
to
allow
for
a
more
granular
recovery
when
required,
following
the
database
software
vendors
best
practices
and
documentation.
The
vCenter
configuration
file
vpxd.cfg
and
the
SSL
certificates
can
be
simply
backed
up
by
copying
them
to
a
different
location.
It
is
recommended
that
you
copy
them
to
a
location
external
to
the
vCenter
Server.
The
SSL
certificates
are
located
in
a
folder
named
SSL
under
the
following
foldersvpxd.cfg
can
be
in
the
root
of
these
folders:
Windows
2003:
%ALLUSERSPROFILE%\Application
Data\VMware\VMware
VirtualCenter\
Windows
2008:
%systemdrive%\ProgramData\VMware\VMware
VirtualCenter\
It
is
important
to
also
document
any
changes
made
to
the
vCenter
configuration
and
to
your
database
configuration
settings,
such
as
the
database
DSN,
user
name
and
password.
Before
any
upgrade
is
undertaken,
it
is
recommended
that
you
back
up
your
database
and
vCenter
Server.
Host
Agents
It
is
recommended
that
you
validate
that
the
current
configuration
meets
the
vCenter
Server
requirements.
This
can
be
done
manually
or
by
using
the
Agent
Pre-Upgrade
Checker,
which
is
provided
with
the
vCenter
Server
installation
media.
The
Agent
Pre-Upgrade
Checker
will
investigate
each
of
ESX/ESXi
hosts
in
the
environment,
and
will
report
whether
or
not
the
agent
on
the
host
can
be
updated
Upgrading
a
32-Bit
vCenter
4.0
OS
with
a
Local
Database
This
scenario
will
describe
an
upgrade
of
vCenter
Server
4.0
with
a
local
database
running
on
a
32-bit
version
of
a
Microsoft
Windows
2003
OS.
As
vCenter
5.0
is
a
64-bit
platform,
an
in-place
upgrade
is
not
impossible.
A
VMware
Data
Migration
Tool
included
with
the
vCenter
Server
media
can
be
utilized
to
migrate
data
and
settings
from
the
old
32-bit
OS
to
the
new
64-bit
OS.
The
Data
Migration
Tool
should
be
unzipped
in
both
the
source
and
destination
vCenter
Server.
Backup
Configuration
Using
the
Data
Migration
Tool
Stop
the
following
services
on
the
source
vCenter
Server:
Open
a
Command
Prompt
and
go
to
the
location
from
which
datamigration.zip
was
extracted.
Type
backup.bat.
Decide
whether
the
host
patches
should
be
backed
up
or
not.
We
recommend
not
backing
them
up
and
downloading
new
patches
and
excluding
ESX
patches
to
minimize
stored
data.
Installing
vCenter
Using
Data
Provided
by
the
Data
Migration
Tool
Copy
the
contents
of
the
source
vCenter
Servers
datamigration
folder
to
the
new
vCenter
Server.
Open
up
a
Command
Prompt
and
go
to
the
folder
containing
the
datamigration
tools
that
you
just
copied.
Run
install.bat.
Using
the
Data
Migration
Tool,
you
can
easily
migrate
the
vCenter
Server
4.0
32-bit
OS
using
Microsoft
SQL
Server
2005
Express
to
a
64-bit
OS.
As
with
any
tool,
there
are
some
caveats.
We
have
listed
the
most
accessed
VMware
knowledge
base
articles
regarding
the
Data
Migration
Tool
for
your
convenience
as
follows:
Backing
up
the
vCenter
Server
4.x
bundle
using
the
Data
Migration
tool
fails
with
the
error:
Object
reference
not
set
to
an
instance
of
an
object
(http://kb.vmware.com/kb/1036228)
Data
migration
tool
fails
with
the
error:
RESTORE
cannot
process
database
VIM_VCDB
because
it
is
in
use
by
this
session
(http://kb.vmware.com/kb/2001184)
vCenter
Server
4.1
Data
Migration
Tool
fails
with
the
error:
HResult
0x2,
Level
16,
State
1
(http://kb.vmware.com/kb/1024490)
Using
the
Data
Migration
Tool
to
upgrade
from
vCenter
Server
4.0
to
vCenter
Server
4.1
fails
(http://kb.vmware.com/kb/1024380)
When
upgrading
to
vCenter
Server
4.1,
running
install.bat
of
the
Data
Migration
Tool
fails
(http://kb.vmware.com/kb/1029663)
Upgrading
a
64-Bit
vCenter
4.1
Server
with
a
Remote
Database
Of
the
three
scenarios
this
is
the
most
straightforward,
but
we
still
suggest
that
you
back
up
your
current
vCenter
configuration
and
database
to
provide
a
rollback
scenario
Insert
the
VMware
vCenter
Server
5.0
CD.
Select
vCenter
Server
and
click
Install.
Select
the
appropriate
language
and
click
OK.
Install
.NET
Framework
3.5
SP1
by
clicking
Install.
The
ESXi
Installer
should
now
detect
that
vCenter
is
already
installed.
Upgrade
the
current
installation
by
clicking
Next.
Upgrading
a
64-Bit
vCenter
4.1
Server
with
a
Local
Database
to
a
Remote
Database
When
upgrading
your
environment
from
vCenter
Server
4.1
to
vCenter
Server
5.0,
it
might
also
be
the
right
time
to
make
adjustments
to
your
design
decisions.
One
of
those
changes
might
be
the
location
of
the
vCenter
Server
database,
where
instead
of
using
a
local
Microsoft
SQL
Server
Express
2005
database,
a
remote
SQL
server
is
used.
In
this
scenario,
we
will
primarily
focus
on
how
to
migrate
the
database.
The
upgrade
of
vCenter
Server
4.1
can
be
done
in
two
different
ways,
which
we
will
briefly
explain
at
the
end
of
the
migration
workflow
section.
If
vCenter
Server
is
currently
installed
as
a
virtual
machine,
we
recommended
that
you
create
a
new
virtual
machine
for
vCenter
Server
5.0.
That
way,
in
case
a
rollback
is
required,
the
vCenter
Server
4.1
virtual
machine
can
be
powered
on
with
a
minimal
impact
on
your
management
environment.
Download
the
Microsoft
SQL
Server
Management
Studio
Express
and
install
it
on
your
vCenter
Server
(Guide
assumes
you
are
using
SQL
Express).
Stop
the
service
named
VMware
VirtualCenter
Server.
Start
the
Microsoft
SQL
Server
Management
Studio
Express
application
and
log
in
to
the
local
SQL
instance.
Right-click
your
vCenter
Server
Database
VIM_VCDB
and
click
Back
Up
under
Tasks.
Copy
this
database
from
the
selected
location
to
your
new
Microsoft
SQL
Database
Server.
Create
a
new
database
on
your
destination
Microsoft
SQL
Server
2008.
Use
the
database
calculator
to
identify
the
initial
size
of
the
database.
Leave
this
set
to
the
default
and
click
OK.
Now
that
the
database
has
been
created,
the
old
database
must
be
restored
to
this
newly
created
database.
Open
Microsoft
SQL
Server
Management
Studio
Express.
Log
in
to
the
local
Microsoft
Right-click
the
newly
created
database
and
select
Restore
Database.
Select
From
device.
Select
the
correct
database.
SQL
Server
instance.
Unfold
Databases.
Ensure
that
the
correct
database
is
selected
to
restore,
as
depicted
in
the
following.
Select
Overwrite
the
existing
database
(WITH
REPLACE).
If
you
want
to
reuse
your
current
environment,
go
to
the
vCenter
Server
and
recreate
the
system
DSN.
If
you
prefer
to
keep
this,
go
to
the
new
vCenter
Server
and
create
a
new
system
DSN.
If
the
current
vCenter
Server
environment
is
reused,
take
the
following
steps.
If
a
new
vCenter
Server
is
used,
skip
this
step.
We
have
tested
the
upgrade
without
uninstalling
vCenter
Server.
Although
it
was
successful,
we
recommend
removing
it
every
time
to
prevent
any
unexpected
performance
or
results.
Uninstall
vCenter
Server.
Reboot
the
vCenter
Server
host.
In
both
cases,
vCenter
Server
must
be
reinstalled.
Install
vCenter
Server.
In
the
installation
wizard,
select
the
newly
created
DSN
that
connects
to
your
SQL2008
database.
Select
the
Do
not
overwrite,
leave
my
existing
database
in
place
option.
Ensure
that
the
authentication
type
used
in
SQL2008
is
the
same
as
that
used
on
SQLExpress2005.
Reset
the
permissions
of
the
vCenter
account
that
connects
to
the
database
as
the
database
owner
(dbo)
user
of
the
MSDB
system
database.
Details
regarding
this
migration
procedure
can
also
be
found
in
VMware
knowledge
base
article
1028601
(http://kb.vmware.com/kb/1028601),
Migrating
the
vCenter
Server
4.x
database
from
SQL
Express
2005
to
SQL
Server
2008.
Upgrading
to
VMware
ESXi
5.0
Following
the
vCenter
Server
upgrade,
you
are
ready
to
begin
upgrading
your
ESXi
hosts.
You
can
upgrade
your
ESX/ESXi
4.x
hosts
to
ESXi
5.0
using
either
the
ESXi
Installer
or
vSphere
Update
Manager.
Each
method
has
a
unique
set
of
advantages
and
disadvantages.
Choosing
an
Upgrade
Path
The
two
upgrade
methods
work
equally
well,
but
there
are
specific
requirements
that
must
be
met
before
a
host
can
be
upgraded
to
ESXi
5.0.
The
following
chart
takes
into
account
the
various
upgrade
requirements
and
can
be
used
as
a
guide
to
help
determine
both
your
upgrade
eligibility
and
your
upgrade
path.
Verifying
Hardware
Compatibility
ESXi
5.0
supports
only
64-bit
servers.
Supported
servers
are
listed
on
the
vSphere
Hardware
Compatibility
List
(HCL).
When
verifying
hardware
compatibility,
its
also
important
to
consider
firmware
versions.
VMware
will
often
annotate
firmware
requirements
in
the
footnotes
of
the
HCL.
Disk
Partitioning
Requirements
Upgrading
an
existing
ESX/ESXi
4.x
host
to
ESXi
5.0
modifies
the
hosts
boot
disk.
As
such,
a
successful
upgrade
is
highly
dependent
on
having
a
supported
boot
disk
partition
layout.
Disk
Partitioning
Requirements
for
ESXi
ESXi
5.0
uses
the
same
boot
disk
layout
as
ESXi
4.x.
Therefore,
in
most
cases
the
boot
disk
partition
table
does
not
require
modification
as
part
of
the
5.0
upgrade.
One
notable
exception
is
with
an
ESXi
3.5
host
that
is
upgraded
to
ESXi
4.x
and
then
immediately
upgraded
to
ESXi
5.0.
In
ESXi
3.5,
the
boot
banks
are
48MB.
In
ESXi
4.x,
the
size
of
the
boot
banks
changed
to
250MB.
When
a
host
is
upgraded
from
ESXi
3.5
to
ESX
4.x,
only
one
of
the
two
boot
banks
is
resized.
This
results
in
a
situation
where
a
host
will
have
one
boot
bank
at
250MB
and
the
other
at
48MB,
a
condition
referred
to
as
having
lopsided
boot
banks.
An
ESXi
host
with
lopsided
boot
banks
must
have
a
new
partition
table
written
to
the
disk
during
the
upgrade.
Update
Manager
cannot
be
used
to
upgrade
a
host
with
lopsided
boot
banks.
The
ESXi
Installer
must
be
used
instead.
Disk
Partitioning
Requirements
for
ESX
When
upgrading
an
ESX
4.x
host
to
ESXi
5.0,
the
ESX
boot
disk
partition
table
is
modified
to
support
the
dual-
image
bank
architecture
used
by
ESXi.
The
VMFS-3
partition
is
the
only
partition
that
is
retained.
All
other
partitions
on
the
disk
are
destroyed.
Limitations
of
an
Upgraded
ESXi
5.0
Host
There
are
some
side
effects
associated
with
upgrading
an
ESX
host
to
ESXi
5.0
as
compared
to
performing
a
fresh
installation.
These
include
the
following:
Upgraded
hosts
retain
the
legacy
MSDOS-based
partition
label
and
are
still
limited
to
a
physical
disk
that
is
less
than
2TB
in
size.
Installing
ESXi
on
a
disk
larger
than
2TB
requires
a
fresh
install.
Upgraded
hosts
do
not
have
a
dedicated
scratch
partition.
Instead,
as
scratch
directory
is
created
and
mounted
off
a
VMFS
volume.
Aside
from
the
scratch
partition,
all
other
disk
partitions,
such
as
the
boot
banks,
locker
and
vmkcore,
are
identical
to
that
of
a
freshly
installed
ESXi
5.0
host.
The
existing
VMFS
partition
is
not
upgraded
from
VMFS-3
to
VMFS-5.You
can
manually
upgrade
the
VMFS
partition
after
the
upgrade.
ESXi
5.0
is
compatible
with
VMFS-3
partitions,
so
upgrading
to
VMFS-5
is
required
only
to
enable
new
vSphere
5.0
features.
For
hosts
in
which
the
VMFS
partition
is
on
a
separate
disk
from
the
boot
drive,
the
VMFS
partition
is
left
intact
and
the
entire
boot
disk
is
overwritten.
Any
extra
data
on
the
disk
is
erased.
Preserving
the
ESX/ESXi
Host
Configuration
During
the
upgrade,
most
of
the
ESX/ESXi
host
configuration
is
retained.
However,
not
all
of
the
host
settings
are
preserved.
The
following
list
highlights
key
configuration
settings
that
are
not
carried
forward
during
an
upgrade:
Third-Party
Software
Packages
Some
customers
run
optional
third-party
software
components
on
their
ESX/ESXi
4.x
hosts.
When
upgrading,
if
third-party
components
are
detected,
you
are
warned
that
they
will
be
lost
during
the
upgrade.
If
a
host
being
upgraded
contains
third-party
software
components,
such
as
CIM
providers
or
nonstandard
device
drivers,
either
these
components
can
be
reinstalled
after
the
upgrade
or
you
can
use
vSphere
5.0
Image
Builder
CLI
to
create
a
customized
ESXi
installation
image
with
these
packages
bundled.
Backup
the
files
in
the
/etc/passwd,
/etc/groups,
/etc/shadow
and
/etc/gshadow
directories.
The
/etc/shadow
and
/etc/gshadow
files
might
not
be
present
on
all
installations.
Backup
any
custom
scripts.
Backup
your
.vmxfiles.
Backup
local
images,
such
as
templates,
exported
virtual
machines
and
.isofiles.
Backing
Up
Your
ESXi
Host
Configuration:
Procedure
Install
the
vSphereCLI.
In
the
vSphere
CLI,
run
the
vicfg-cfg
backup
command
with
the-s
flag
to
save
the
host
configuration
to
a
specified
backup
filename.
~#
vicfg-cfgbackup
--server
<ESXi-host-ip>
--portnumber
<port_number>
--protocol
<protocol_type>
--username
username
--
password
<password>
-s
<backup-filename>
In
addition,
its
a
good
idea
to
document
the
host
configuration
and
to
have
this
information
available
in
the
event
that
problems
arise
during
the
host
upgrade.
Verify
that
your
hardware
is
supported
with
ESXi5.0
by
using
the
vSphere5.0
Hardware
Compatibility
List
(HCL)
at
http://www.vmware.com/resources/compatibility/search.php.
Consider
phasing
out
the
older
servers
and
refreshing
your
hardware
in
conjunctionwithanESXi5.0upgrade.
Backup
your
host
before
attempting
an
upgrade.
The
upgrade
process
modifies
the
ESX/ESXi
hosts
boot
disk
partition
table,
preventing
automated
rollback.
Verify
that
the
boot
disk
partition
table
meets
the
upgrade
requirementsparticularly
regarding
the
size
of
the
/boot
partition
and
the
location
of
the
VMFS
partition
(the
VMFS
partition
can
be
preserved
only
when
it
is
physically
located
beyond
the
1GB
markthat
is,
after
the
ESX
boot
partition,
which
is
partition
4,
and
after
the
extended
disk
partition
on
the
disk
(8192
+
1835008
sectors).
Use
Image
Builder
CLI
to
add
optional
third-party
software
components,
such
as
CIM
providers
and
device
drivers,
to
your
ESXi
5.0
installation
image.
Move
virtual
machines
on
local
storage
over
to
shared
storage,
where
they
can
be
kept
highly
available
using
vMotion
and
Storage
vMotion
together
with
VMware
HA
and
DRS.
If
the
host
was
upgraded
from
ESXi3.5,watch
out
for
lopsided
bootbanks.
Upgrade
hosts
with
lopsided
boot
banks
using
the
ESXi
Installer.
If
the
ESXi
Installer
does
not
provide
an
option
to
upgrade,
verify
that
the
required
disk
space
is
available
(350MB
in
/boot,
50MB
in
VMFS).
If
a
host
returns
a
status
of
Incompatible
with
the
reason
being
that
optional
third-party
software
was
detected,
you
can
proceed
with
the
upgrade
and
reinstall
the
optional
software
packages
afterward
or
you
can
proactively
add
the
optional
packages
to
the
ESXi
installation
image
using
Image
Builder
CLI.
Remediating
Your
Host
After
the
scan
completes
and
your
host
is
flagged
as
Non-Compliant,
you
are
ready
to
perform
the
upgrade.
From
the
Hosts
and
Clusters
view,
select
the
host/cluster,
select
the
Update
Manager
tab
and
select
Remediate.
You
will
get
a
pop-up
asking
if
you
want
to
install
patches,
upgrade,
or
do
both.
Choose
the
upgrade
option
and
follow
the
wizard
to
complete
the
remediation.
Assuming
that
DRS
is
enabled
and
running
in
fully
automated
mode,
Update
Manager
will
proceed
to
place
the
host
into
maintenance
mode
(if
not
already
in
maintenance
mode)
and
perform
the
upgrade.
If
DRS
is
not
enabled,
you
must
evacuate
the
virtual
machines
off
the
host
and
put
it
into
maintenance
mode
before
remediating.
After
the
upgrade,
the
host
will
reboot
and
Update
Manager
will
take
it
out
of
maintenance
mode
and
return
the
host
into
operation.
Using
Update
Manager
to
Upgrade
an
Entire
Cluster
You
can
use
Update
Manager
to
remediate
an
individual
host
or
an
entire
cluster.
If
you
choose
to
remediate
an
entire
cluster,
Update
Manager
will
roll
the
upgrade
through
the
cluster,
upgrading
each
host
in
turn.
You
have
flexibility
in
determining
how
Update
Manager
will
treat
the
virtual
machines
during
the
upgrade.
You
can
choose
to
either
power
them
off
or
use
vMotion
to
migrate
them
to
another
host.
If
you
chose
to
power
off
the
virtual
machines,
Update
Manager
will
first
power
off
all
the
virtual
machines
in
the
cluster
and
then
proceed
to
upgrade
the
entire
cluster
in
parallel.
If
you
choose
to
migrate
the
virtual
machines,
Update
Manager
will
evacuate
as
many
hosts
as
it
can
(keeping
within
the
HA
admission
control
constraints)
and
upgrade
the
evacuated
hosts
in
parallel.
Then,
after
they
are
upgraded,
it
will
move
on
to
the
next
set
of
hosts.
Rolling
Back
from
a
Failed
Update
Manager
Upgrade
During
the
upgrade,
the
files
on
the
boot
disk
are
overwritten.
This
prevents
any
kind
of
automated
rollback
if
problems
arise.
To
restore
a
host
to
its
pre-upgrade
state,
reinstall
the
ESX/ESXi
4.x
software
and
restore
the
host
configuration
from
the
backup.
Upgrading
Using
the
ESXi
Installer
Requirements
As
a
reminder,
the
following
requirements
must
be
met
to
perform
an
upgrade
using
the
ESXi
Installer:
Placing
the
Host
into
Maintenance
Mode
Use
vMotion/Storage
vMotion
to
evacuate
all
virtual
machines
off
the
host
and
put
the
host
into
maintenance
mode.
If
DRS
is
enabled
in
fully
automated
mode,
the
virtual
machines
on
shared
storage
will
be
automatically
migrated
when
the
host
is
put
into
maintenance
mode.
Alternatively,
you
can
power
off
any
virtual
machines
running
on
the
host.
Booting
Off
the
ESXi
5.0
Installation
Media
Connect
to
the
host
console
and
boot
the
host
off
the
ESXi
5.0
installation
media.
From
the
boot
menu,
select
the
option
to
boot
from
the
ESXi
Installer.
Selecting
Option
to
Migrate
and
Preserving
the
VMFS
Datastore
When
an
existing
ESX/ESXi
4.x
installation
is
detected,
the
ESXi
Installer
will
prompt
to
both
migrate
(upgrade)
the
host
and
preserve
the
existing
VMFS
datastore,
or
to
do
a
fresh
install
(with
options
to
preserve
or
overwrite
the
VMFS
datastore).
Select
the
Migrate
ESX,
preserve
VMFS
datastore
option.
Third-Party-Software
Warning
If
third-party
software
components
are
detected,
a
warning
is
displayed
indicating
that
these
components
will
be
lost.
If
the
identified
software
components
are
required,
ensure
either
that
they
are
included
with
the
ESXi
installation
media
(use
Image
Builder
CLI
to
added
third-party
software
packages
to
the
install
media)
or
that
you
reinstall
them
after
the
upgrade.
Press
Enter
to
continue
the
install
or
Escape
to
cancel.
Confirming
the
Upgrade
The
system
is
then
scanned
in
preparation
for
the
upgrade.
When
the
scan
completes,
the
user
is
asked
to
confirm
the
upgrade
by
pressing
the
F11
key.
The
ESXi
Installer
will
then
proceed
to
upgrade
the
host
to
ESXi
5.0.
After
the
installation,
the
user
will
be
asked
to
reboot
the
host.
Then
reconnect
the
host
and
exit
maintenance
mode.
Post-Upgrade
Considerations
Configuring
the
VMware
ESXi
5.0
Dump
Collector
A
core
dump
is
the
state
of
working
memory
in
the
event
of
host
failure.
By
default,
an
ESXi
core
dump
is
saved
to
the
local
boot
disk.
Use
the
VMware
ESXi
Dump
Collector
to
consolidate
core
dumps
onto
a
network
server
to
ensure
that
they
are
available
for
use
if
debugging
is
required.
You
can
install
the
ESXi
Dump
Collector
on
the
vCenter
Server
or
on
a
separate
Windows
server
that
has
a
network
connection
to
the
vCenter
Server.
Refer
to
the
vSphere
Installation
and
Setup
Guide
for
more
information
on
setting
up
the
ESXi
Dump
Collector.
Configuring
the
ESXi
5.0
Syslog
Collector
Install
the
vSphere
Syslog
Collector
to
enable
ESXi
system
logs
to
be
directed
to
a
network
server
rather
than
to
the
local
disk.
You
can
install
the
Syslog
Collector
on
the
vCenter
Server
or
on
a
separate
Windows
server
that
has
a
network
connection
to
the
vCenter
Server.
Refer
to
the
vSphere
Installation
and
Setup
Guide
for
more
information
on
setting
up
the
ESXi
Syslog
Collector.
Configuring
a
Remote
Management
Host
Most
ESXi
host
administration
will
be
done
through
the
vCenter
Server,
using
the
vSphere
Client.
There
also
will
be
occasions
when
remote
command-line
access
is
beneficial,
such
as
for
scripting,
troubleshooting
and
some
advanced
configuration
tuning.
ESXi
provides
a
rich
set
of
APIs
that
are
accessible
using
VMware
vSphere
Command
Line
Interface
(vCLI)
and
Windows
based
VMware
vSphere
PowerCLI.
Upgrading
Virtual
Machines
After
you
perform
an
upgrade,
you
must
determine
if
you
will
also
upgrade
the
virtual
machines
that
reside
on
the
upgraded
hosts.
Upgrading
virtual
machines
ensures
that
they
remain
compatible
with
the
upgraded
host
software
and
can
take
advantage
of
new
features.
Upgrading
your
virtual
machines
entails
upgrading
the
version
of
VMware
Tools
as
well
as
the
virtual
machines
virtual
hardware
version.
VMware
Tools
The
first
step
in
upgrading
virtual
machines
is
to
upgrade
VMware
Tools.
vSphere
5.0
supports
virtual
machines
running
both
VMware
Tools
version
4.x
and
5.0.
Running
virtual
machines
with
VMware
Tools
version
5.0
on
older
ESX/ESXi
4.x
hosts
is
also
supported
Therefore,
virtual
machines
running
VMware
Tools
4.x
or
higher
do
not
require
upgrading
following
the
ESXi
host
upgrade.
However,
only
the
upgraded
virtual
machines
will
benefit
from
the
new
features
and
latest
performance
benefits
associated
with
the
most
recent
version
of
VMware
Tools.
Virtual
Hardware
The
second
step
in
upgrading
virtual
machines
is
to
upgrade
the
virtual
hardware
version.
Before
upgrading
the
virtual
hardware,
you
must
first
upgrade
the
VMware
Tools.
The
hardware
version
of
a
virtual
machine
reflects
the
virtual
machines
supported
virtual
hardware
features.
These
features
correspond
to
the
physical
hardware
available
on
the
ESXi
host
on
which
you
create
the
virtual
machine.
Virtual
hardware
features
include
BIOS
and
EFI,
available
virtual
PCI
slots,
maximum
number
of
CPUs,
maximum
memory
configuration,
and
other
characteristics
typical
to
hardware.
One
important
consideration
when
upgrading
the
virtual
hardware
is
that
virtual
machines
running
the
latest
virtual
hardware
version
(version
8)
can
run
only
on
ESXi
5.0
hosts.
Do
not
upgrade
the
virtual
hardware
for
virtual
machines
running
in
a
mixed
cluster
made
up
of
ESX/ESXi
4.x
hosts
and
ESXi
5.0
hosts.
Only
upgrade
a
virtual
machines
virtual
hardware
version
after
all
the
hosts
in
the
cluster
have
been
upgraded
to
ESXi
5.0.
Upgrading
the
virtual
machines
virtual
hardware
version
is
a
one-way
operation.
There
is
no
option
to
reverse
the
upgrade
after
it
is
done.
Orchestrated
Upgrade
of
VMware
Tools
and
Virtual
Hardware
An
orchestrated
upgrade
enables
you
to
upgrade
both
the
VMware
Tools
and
the
virtual
hardware
of
the
virtual
machines
in
your
vSphere
inventory
at
the
same
time.
Use
Update
Manager
to
perform
an
orchestrated
upgrade.
You
can
perform
an
orchestrated
upgrade
of
virtual
machines
at
the
folder
or
datacenter
level.
Update
Manager
makes
the
process
of
upgrading
the
virtual
machines
convenient
by
providing
baseline
groups.
When
you
remediate
a
virtual
machine
against
a
baseline
group
containing
the
VMware
Tools
Upgrade
to
Match
Host
baseline
and
the
VM
Hardware
Upgrade
to
Match
Host
baseline,
Update
Manager
sequences
the
upgrade
operations
in
the
correct
order.
As
a
result,
the
guest
operating
system
is
in
a
consistent
state
at
the
end
of
the
upgrade.
Upgrading
VMware
vSphere
VMFS
After
you
perform
an
ESX/ESXi
upgrade,
you
might
need
to
upgrade
your
VMFS
to
take
advantage
of
the
new
features.
vSphere
5.0
supports
both
VMFS
version
3
and
version
5,
so
it
is
not
necessary
to
upgrade
your
VMFS
volumes
unless
one
needs
to
leverage
new
5.0
features.
However,
VMFS-5
offers
a
variety
of
new
features
such
as
larger
single-extent
volume
(approximately
60TB),
larger
VMDKs
with
unified
1MB
block
size
(2TB),
smaller
subblock
(8KB)
to
reduce
the
amount
of
stranded/unused
space,
and
an
improvement
in
performance
and
scalability
via
the
implementation
of
the
vSphere
Storage
API
for
Array
Integration
(VAAI)
primitive
Atomic
Test
&
Set
(ATS)
across
all
datastore
operations.
VMware
recommends
that
customers
move
to
VMFS-5
to
benefit
from
these
features.
A
complete
set
of
VMFS-5
enhancements
can
be
found
in
the
Whats
New
in
vSphere
5.0
Storage
white
paper.
Considerations
Upgrade
to
VMFS-5
or
Create
New
VMFS-5
Although
a
VMFS-3
that
is
upgraded
to
VMFS-5
provides
you
with
most
of
the
same
capabilities
as
a
newly
created
VMFS-5,
there
are
some
differences.
Both
upgraded
and
newly
created
VMFS-5
support
single-extent
volumes
up
to
approximately
60TB
and
both
support
VMDK
sizes
of
2TB,
no
matter
what
the
VMFS
file
block
size
is.
However,
the
additional
differences,
although
minor,
should
be
considered
when
making
a
decision
on
upgrading
to
VMFS-5
or
creating
new
VMFS-5
volumes.
VMFS-5
upgraded
from
VMFS-3
continues
to
use
the
previous
file
block
size,
which
might
be
larger
than
the
unified
1MB
file
block
size.
This
can
lead
to
stranded/unused
disk
space
when
there
are
many
small
files
on
the
datastore.
VMFS-5
upgraded
from
VMFS-3
continues
to
use
64KB
subblocks,
not
new
8K
subblocks.
This
can
also
lead
to
stranded/unused
disk
space.
VMFS-5
upgraded
from
VMFS-3
continues
to
have
a
file
limit
of
30720
rather
than
the
new
file
limit
of
>100000
for
a
newly
created
VMFS-5.
This
has
an
impact
on
the
scalability
of
the
file
system.
For
these
reasons,
VMware
recommends
using
newly
created
VMFS-5
volumes
if
you
have
the
luxury
of
doing
so.
You
can
then
migrate
the
virtual
machines
from
the
original
VMFS-3
to
VMFS-5.
If
you
do
not
have
the
available
space
to
create
new
VMFS-5
volumes,
upgrading
VMFS-3
to
VMFS-5
will
still
provide
you
with
most
of
the
benefits
that
come
with
a
newly
created
VMFS-5.
Online
Upgrade
If
you
do
decide
to
upgrade
VMFS-3
to
VMFS-5,
it
is
a
simple,
single-click
operation.
After
you
have
upgraded
the
host
to
ESXi
5.0,
go
to
the
Configuration
tab
>
Storage
view.
Select
the
VMFS-3
datastore.
Above
the
Datastore
Details
window,
an
option
to
Upgrade
to
VMFS-5...
will
be
displayed:
The
upgrade
process
is
online
and
non-disruptive.
Virtual
machines
can
continue
to
run
on
the
datastore
while
it
is
being
upgraded.
Upgrading
VMFS
is
a
one-way
operation.
There
is
no
option
to
reverse
the
upgrade
after
it
is
done.
Also,
after
a
file
system
has
been
upgraded,
it
will
no
longer
be
accessible
by
older
ESX/ESXi
4.x
hosts,
so
you
must
ensure
that
all
hosts
accessing
the
datastore
are
running
ESXi
5.0.
In
fact,
there
are
checks
built
in
to
vSphere
that
will
prevent
you
from
upgrading
to
VMFS-5
if
any
of
the
hosts
accessing
the
datastore
are
running
a
version
of
ESX/ESXi
that
is
older
than
5.0.
As
with
any
upgrade,
VMware
recommends
that
a
backup
of
your
virtual
machines
be
made
prior
to
upgrading
your
VMFS-3
to
VMFS-5.
After
the
VMFS-5
volume
is
in
place,
the
size
can
be
extended
to
approximately
60TB,
even
if
it
is
a
single
extent,
and
2TB
virtual
machine
disks
(VMDKs)
can
be
created,
no
matter
what
the
underlying
file
block
size.
These
features
are
available
out
of
the
box,
without
any
additional
configuration
steps.
Refer
to
the
vSphere
Upgrade
Guide
for
more
information
on
features
that
require
VMFS
version
5,
the
differences
between
VMFS
versions
3
and
5,
and
how
to
upgrade.
The
following
table
provides
a
matrix
showing
the
supported
VMware
Tools,
virtual
hardware
and
VMFS
versions
in
ESXi
5.0.
9. Best
Practices
for
Performance
Tuning
of
Latency-Sensitive
Workloads
in
vSphere
VMs
This
summarizes
our
findings
and
recommends
best
practices
to
tune
the
different
layers
of
an
applications
environment
for
similar
latency-sensitive
workloads.
By
latency-sensitive,
we
mean
workloads
that
require
optimizing
for
a
few
microseconds
to
a
few
tens
of
microseconds
end-to-end
latencies;
we
dont
mean
workloads
in
the
hundreds
of
microseconds
to
tens
of
milliseconds
end-to-
end-latencies.
In
fact,
many
of
the
recommendations
in
this
paper
that
can
help
with
the
microsecond
level
latency
can
actually
end
up
hurting
the
performance
of
applications
that
are
tolerant
of
higher
latency.
Please
note
that
the
exact
benefits
and
effects
of
each
of
these
configuration
choices
will
be
highly
dependent
upon
the
specific
applications
and
workloads,
so
we
strongly
recommend
experimenting
with
the
different
configuration
options
with
your
workload
before
deploying
them
in
a
production
environment.
BIOS
Settings
Most
servers
with
new
Intel
and
AMD
processors
provide
power
savings
features
that
use
several
techniques
to
dynamically
detect
the
load
on
a
system
and
put
various
components
of
the
server,
including
the
CPU,
chipsets,
and
peripheral
devices
into
low
power
states
when
the
system
is
mostly
idle.
There
are
two
parts
to
power
management
on
ESXi
platforms:
1.
The
BIOS
settings
for
power
management,
which
influence
what
the
BIOS
advertises
to
the
OS/hypervisor
about
whether
it
should
be
managing
power
states
of
the
host
or
not.
2.
The
OS/hypervisor
settings
for
power
management,
which
influence
the
policies
of
what
to
do
when
it
detects
that
the
system
is
idle.
For
latency-sensitive
applications,
any
form
of
power
management
adds
latency
to
the
path
where
an
idle
system
(in
one
of
several
power
savings
modes)
responds
to
an
external
event.
So
our
recommendation
is
to
set
the
BIOS
setting
for
power
management
to
static
high,
that
is,
no
OS-controlled
power
management,
effectively
disabling
any
form
of
active
power
management.
Note
that
achieving
the
lowest
possible
latency
and
saving
power
on
the
hosts
and
running
the
hosts
cooler
are
fundamentally
at
odds
with
each
other,
so
we
recommend
carefully
evaluating
the
trade-offs
of
disabling
any
form
of
power
management
in
order
to
achieve
the
lowest
possible
latencies
for
your
applications
needs.
Servers
with
Intel
Nehalem
class
and
newer
(Intel
Xeon
55xx
and
newer)
CPUs
also
offer
two
other
power
management
options:
C-
states
and
Intel
Turbo
Boost.
Leaving
C-states
enabled
can
increase
memory
latency
and
is
therefore
not
recommended
for
low-
latency
workloads.
Even
the
enhanced
C-state
known
as
C1E
introduces
longer
latencies
to
wake
up
the
CPUs
from
halt
(idle)
states
to
full-power,
so
disabling
C1E
in
the
BIOS
can
further
lower
latencies.
Intel
Turbo
Boost,
on
the
other
hand,
will
step
up
the
internal
frequency
of
the
processor
should
the
workload
demand
more
power,
and
should
be
left
enabled
for
low-latency,
high-performance
workloads.
However,
since
Turbo
Boost
can
over-clock
portions
of
the
CPU,
it
should
be
left
disabled
if
the
applications
require
stable,
predictable
performance
and
low
latency
with
minimal
jitter.
How
power
managementrelated
settings
are
changed
depends
on
the
OEM
make
and
model
of
the
server.
For
example,
for
HP
ProLiant
servers:
Set
the
Power
Regulator
Mode
to
Static
High
Mode.
Disable
Processor
C-State
Support.
Disable
Processor
C1E
Support.
Disable
QPI
Power
Management.
Enable
Intel
Turbo
Boost.
For
Dell
PowerEdge
servers:
Set
the
Power
Management
Mode
to
Maximum
Performance.
Set
the
CPU
Power
and
Performance
Management
Mode
to
Maximum
Performance.
Processor
Settings:
set
Turbo
Mode
to
enabled.
Processor
Settings:
set
C
States
to
disabled.
NUMA
The
high
latency
of
accessing
remote
memory
in
NUMA
(Non-Uniform
Memory
Access)
architecture
servers
can
add
a
non-trivial
amount
of
latency
to
application
performance.
ESXi
uses
a
sophisticated,
NUMA-aware
scheduler
to
dynamically
balance
processor
load
and
memory
locality.
For
best
performance
of
latency-sensitive
applications
in
guest
OSes,
all
vCPUs
should
be
scheduled
on
the
same
NUMA
node
and
all
VM
memory
should
fit
and
be
allocated
out
of
the
local
physical
memory
attached
to
that
NUMA
node.
Processor
affinity
for
vCPUs
to
be
scheduled
on
specific
NUMA
nodes,
as
well
as
memory
affinity
for
all
VM
memory
to
be
allocated
from
those
NUMA
nodes,
can
be
set
using
the
vSphere
Client
under
VM
Settings
Options
tab
Advanced
General
Configuration
Parameters
and
adding
entries
for
numa.nodeAffinity=0,
1,
...,
where
0,
1,
etc.
are
the
processor
socket
numbers.
Note
that
when
you
constrain
NUMA
node
affinities,
you
might
interfere
with
the
ability
of
the
NUMA
scheduler
to
rebalance
virtual
machines
across
NUMA
nodes
for
fairness.
Specify
NUMA
node
affinity
only
after
you
consider
the
rebalancing
issues.
Note
also
that
when
a
VM
is
migrated
(for
example,
using
vMotion)
to
another
host
with
a
different
NUMA
topology,
these
advanced
settings
may
not
be
optimal
on
the
new
host
and
could
lead
to
sub-optimal
performance
of
your
application
on
the
new
host.
You
will
need
to
re-
tune
these
advanced
settings
for
the
NUMA
topology
for
the
new
host.
ESXi
5.0
and
newer
also
support
vNUMA
where
the
underlying
physical
hosts
NUMA
architecture
can
be
exposed
to
the
guest
OS
by
providing
certain
ACPI
BIOS
tables
for
the
guest
OS
to
consume.
Exposing
the
physical
hosts
NUMA
topology
to
the
VM
helps
the
guest
OS
kernel
make
better
scheduling
and
placement
decisions
for
applications
to
minimize
memory
access
latencies.
vNUMA
is
automatically
enabled
for
VMs
configured
with
more
than
8
vCPUs
that
are
wider
than
the
number
of
cores
per
physical
NUMA
node.
For
certain
latency-sensitive
workloads
running
on
physical
hosts
with
fewer
than
8
cores
per
physical
NUMA
node,
enabling
vNUMA
may
be
beneficial.
This
is
achieved
by
adding
an
entry
for
"numa.vcpu.min
=
N",
where
N
is
less
than
the
number
of
vCPUs
in
the
VM,
in
the
vSphere
Client
under
VM
Settings
Options
tab
Advanced
General
Configuration
Parameters.
To
learn
more
about
this
topic,
please
refer
to
the
NUMA
sections
in
the
"vSphere
Resource
Management
Guide"
and
the
white
paper
explaining
the
vSphere
CPU
Scheduler:
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
Choice
of
Guest
OS
Certain
older
guest
OSes
like
RHEL5
incur
higher
virtualization
overhead
for
various
reasons,
such
as
frequent
accesses
to
virtual
PCI
devices
for
interrupt
handling,
frequent
accesses
to
the
virtual
APIC
(Advanced
Programmable
Interrupt
Controller)
for
interrupt
handling,
high
virtualization
overhead
when
reading
the
current
time,
inefficient
mechanisms
to
idle,
and
so
on.
Moving
to
a
more
modern
guest
OS
(like
SLES11
SP1
or
RHEL6
based
on
2.6.32
Linux
kernels,
or
Windows
Server
2008
or
newer)
minimizes
these
virtualization
overheads
significantly.
For
example,
RHEL6
is
based
on
a
tickless
kernel,
which
means
that
it
doesnt
rely
on
high-frequency
timer
interrupts
at
all.
For
a
mostly
idle
VM,
this
saves
the
power
consumed
when
the
guest
wakes
up
for
periodic
timer
interrupts,
finds
out
there
is
no
real
work
to
do,
and
goes
back
to
an
idle
state.
Note
however,
that
tickless
kernels
like
RHEL6
can
incur
higher
overheads
in
certain
latency-sensitive
workloads
because
the
kernel
programs
one-shot
timers
every
time
it
wakes
up
from
idle
to
handle
an
interrupt,
while
the
legacy
periodic
timers
are
pre-
programmed
and
dont
have
to
be
programmed
every
time
the
guest
OS
wakes
up
from
idle.
To
override
tickless
mode
and
fall
back
to
the
legacy
periodic
timer
mode
for
such
modern
versions
of
Linux,
pass
the
nohz=off
kernel
boot-time
parameter
to
the
guest
OS.
These
newer
guest
OSes
also
have
better
support
for
MSI-X
(Message
Signaled
Interrupts)
which
are
more
efficient
than
legacy
INT-x
style
APIC
-based
interrupts
for
interrupt
delivery
and
acknowledgement
from
the
guest
OSes.
Since
there
is
a
certain
overhead
when
reading
the
current
time,
due
to
overhead
in
virtualizing
various
timer
mechanisms,
we
recommend
minimizing
the
frequency
of
reading
the
current
time
(using
gettimeofday()
or
currentTimeMillis()
calls)
in
your
guest
OS,
either
via
the
latency-sensitive
application
doing
so
directly,
or
via
some
other
software
component
in
the
guest
OS
doing
this.
The
overhead
in
reading
the
current
time
was
especially
worse
in
Linux
versions
older
than
RHEL
5.4,
due
to
the
underlying
timer
device
they
relied
on
as
their
time
source
and
the
overhead
in
virtualizing
them.
Versions
of
Linux
after
RHEL5.4
incur
significantly
lower
overhead
when
reading
the
current
time.
To
learn
more
about
best
practices
for
time
keeping
in
Linux
guests,
please
see
the
VMware
KB
1006427:
http://kb.vmware.com/kb/1006427.
To
learn
more
about
how
timekeeping
works
in
VMware
VMs,
please
read
http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf.
Physical
NIC
Settings
Most
1GbE
or
10GbE
NICs
(Network
Interface
Cards)
support
a
feature
called
interrupt
moderation
or
interrupt
throttling,
which
coalesces
interrupts
from
the
NIC
to
the
host
so
that
the
host
doesnt
get
overwhelmed
and
spend
all
its
CPU
cycles
processing
interrupts.
However,
for
latency-sensitive
workloads,
the
time
the
NIC
is
delaying
the
delivery
of
an
interrupt
for
a
received
packet
or
a
packet
that
has
successfully
been
sent
on
the
wire
is
the
time
that
increases
the
latency
of
the
workload.
Most
NICs
also
provide
a
mechanism,
usually
via
the
ethtool
command
and/or
module
parameters,
to
disable
interrupt
moderation.
Our
recommendation
is
to
disable
physical
NIC
interrupt
moderation
on
the
ESXi
host
as
follows:
#
esxcli
system
module
parameters
set
-m
ixgbe
-p
"InterruptThrottleRate=0"
This
example
applies
to
the
Intel
10GbE
driver
called
ixgbe.
You
can
find
the
appropriate
module
parameter
for
your
NIC
by
first
finding
the
driver
using
the
ESXi
command:
#
esxcli
network
nic
list
Then
find
the
list
of
module
parameters
for
the
driver
used:
#
esxcli
system
module
parameters
list
-m
<driver>
Note
that
while
disabling
interrupt
moderation
on
physical
NICs
is
extremely
helpful
in
reducing
latency
for
latency-sensitive
VMs,
it
can
lead
to
some
performance
penalties
for
other
VMs
on
the
ESXi
host,
as
well
as
higher
CPU
utilization
to
handle
the
higher
rate
of
interrupts
from
the
physical
NIC.
Disabling
physical
NIC
interrupt
moderation
can
also
defeat
the
benefits
of
Large
Receive
Offloads
(LRO),
since
some
physical
NICs
(like
Intel
10GbE
NICs)
that
support
LRO
in
hardware
automatically
disable
it
when
interrupt
moderation
is
disabled,
and
ESXis
implementation
of
software
LRO
has
fewer
packets
to
coalesce
into
larger
packets
on
every
interrupt.
LRO
is
an
important
offload
for
driving
high
throughput
for
large-message
transfers
at
reduced
CPU
cost,
so
this
trade-off
should
be
considered
carefully.
Virtual
NIC
Settings
ESXi
VMs
can
be
configured
to
have
one
of
the
following
types
of
virtual
NICs
(http://kb.vmware.com/kb/1001805):
Vlance,
VMXNET,
Flexible,
E1000,
VMXNET
2
(Enhanced),
or
VMXNET
3.
We
recommend
you
choose
VMXNET
3
virtual
NICs
for
your
latency-sensitive
or
otherwise
performance-critical
VMs.
VMXNET
3
is
the
latest
generation
of
our
paravirtualized
NICs
designed
from
the
ground
up
for
performance,
and
is
not
related
to
VMXNET
o r
VMXNET
2
in
any
way.
It
offers
several
advanced
features
including
multi-queue
support:
Receive
Side
Scaling,
IPv4/IPv6
offloads,
and
MSI/MSI-X
interrupt
delivery.
Modern
enterprise
Linux
distributions
based
on
2.6.32
or
newer
kernels,
like
RHEL6
and
SLES11
SP1,
ship
with
out-
of-the-box
support
for
VMXNET
3
NICs.
VMXNET
3
by
default
also
supports
an
adaptive
interrupt
coalescing
algorithm,
for
the
same
reasons
that
physical
NICs
implement
interrupt
moderation.
This
virtual
interrupt
coalescing
helps
drive
high
throughputs
to
VMs
with
multiple
vCPUs
with
parallelized
workloads
(for
example,
multiple
threads),
while
at
the
same
time
striving
to
minimize
the
latency
of
virtual
interrupt
delivery.
However,
if
your
workload
is
extremely
sensitive
to
latency,
then
we
recommend
you
disable
virtual
interrupt
coalescing
for
VMXNET
3
virtual
NICs
as
follows.
To
do
so
through
the
vSphere
Client,
go
to
VM
Settings
->
Options
tab
->
Advanced
General
->
Configuration
Parameters
and
add
an
entry
for
ethernetX.coalescingScheme
with
the
value
of
disabled.
Please
note
that
this
new
configuration
option
is
only
available
in
ESXi
5.0
and
later.
An
alternative
way
to
disable
virtual
interrupt
coalescing
for
all
virtual
NICs
on
the
host
which
affects
all
VMs,
not
just
the
latency-sensitive
ones,
is
by
setting
the
advanced
networking
performance
option
(Configuration
Advanced
Settings
Net)
Coalesce
DefaultOn
to
0
(disabled).
See
http://communities.vmware.com/docs/DOC-10892
for
details.
Another
feature
of
VMXNET
3
that
helps
deliver
high
throughput
with
lower
CPU
utilization
is
Large
Receive
Offload
(LRO),
which
aggregates
multiple
received
TCP
segments
into
a
larger
TCP
segment
before
delivering
it
up
to
the
guest
TCP
stack.
However,
for
latency-sensitive
applications
that
rely
on
TCP,
the
time
spent
aggregating
smaller
TCP
segments
into
a
larger
one
adds
latency.
It
can
also
affect
TCP
algorithms
like
delayed
ACK,
which
now
cause
the
TCP
stack
to
delay
an
ACK
until
the
two
larger
TCP
segments
are
received,
also
adding
to
end-to-end
latency
of
the
application.
Therefore,
you
should
also
consider
disabling
LRO
if
your
latency-sensitive
application
relies
on
TCP.
To
do
so
for
Linux
guests,
you
need
to
reload
the
vmxnet3
driver
in
the
guest:
#
modprobe
-r
vmxnet3
Add
the
following
line
in
/etc/modprobe.conf
(Linux
version
dependent):
options
vmxnet3
disable_lro=1
Then
reload
the
driver
using:
#
modprobe
vmxnet3
VM
Settings
If
your
application
is
multi-threaded
or
consists
of
multiple
processes
that
could
benefit
from
using
multiple
CPUs,
you
can
add
more
virtual
CPUs
(vCPUs)
to
your
VM.
However,
for
latency-sensitive
applications,
you
should
not
overcommit
vCPUs
as
compared
to
the
number
of
pCPUs
(processors)
on
your
ESXi
host.
For
example,
if
your
host
has
8
CPU
cores,
limit
your
number
of
vCPUs
for
your
VM
to
7.
This
will
ensure
that
the
ESXi
vmkernel
scheduler
has
a
better
chance
of
placing
your
vCPUs
on
pCPUs
which
wont
be
contended
by
other
scheduling
contexts,
like
vCPUs
from
other
VMs
or
ESXi
helper
worlds.
If
your
application
needs
a
large
amount
of
physical
memory
when
running
unvirtualized,
consider
configuring
your
VM
with
a
lot
of
memory
as
well,
but
again,
try
to
refrain
from
overcommitting
the
amount
of
physical
memory
in
the
system.
You
can
look
at
the
memory
statistics
in
the
vSphere
Client
under
the
hosts
Resource
Allocation
tab
under
Memory
->
Available
Capacity
to
see
how
much
memory
you
can
configure
for
the
VM
after
all
the
virtualization
overheads
are
accounted
for.
If
you
want
to
ensure
that
the
VMkernel
does
not
deschedule
your
VM
when
the
vCPU
is
idle
(most
systems
generally
have
brief
periods
of
idle
time,
unless
youre
running
an
application
which
has
a
tight
loop
executing
CPU
instructions
without
taking
a
break
or
yielding
the
CPU),
you
can
add
the
following
configuration
option.
Go
to
VM
Settings
->
Options
tab
->
Advanced
General
->
Configuration
Parameters
and
add
monitor_control.halt_desched
with
the
value
of
false.
Note
that
this
option
should
be
considered
carefully,
because
this
option
will
effectively
force
the
vCPU
to
consume
all
of
its
allocated
pCPU
time,
such
that
when
that
vCPU
in
the
VM
idles,
the
VM
Monitor
will
spin
on
the
CPU
without
yielding
the
CPU
to
the
VMkernel
scheduler,
until
the
vCPU
needs
to
run
in
the
VM
again.
However,
for
extremely
latency-sensitive
VMs
which
cannot
tolerate
the
latency
of
being
descheduled
and
scheduled,
this
option
has
been
seen
to
help.
A
slightly
more
power
conserving
approach
which
still
results
in
lower
latencies
when
the
guest
needs
to
be
woken
up
soon
after
it
idles,
are
the
following
advanced
configuration
parameters
(see
also
http://kb.vmware.com/kb/1018276):
New
in
vSphere
5.5
is
a
VM
option
called
Latency
Sensitivity,
which
defaults
to
Normal.
Setting
this
to
High
can
yield
significantly
lower
latencies
and
jitter,
as
a
result
of
the
following
mechanisms
that
take
effect
in
ESXi:
Exclusive
access
to
physical
resources,
including
pCPUs
dedicated
to
vCPUs
with
no
contending
threads
for
executing
on
these
pCPUs.
Full
memory
reservation
eliminates
ballooning
or
hypervisor
swapping
leading
to
more
predictable
performance
with
no
latency
overheads
due
to
such
mechanisms.
Halting
in
the
VM
Monitor
when
the
vCPU
is
idle,
leading
to
faster
vCPU
wake-up
from
halt,
and
bypassing
the
VMkernel
scheduler
for
yielding
the
pCPU.
This
also
conserves
power
as
halting
makes
the
pCPU
enter
a
low
power
mode,
compared
to
spinning
in
the
VM
Monitor
with
the
monitor_control.halt_desched=FALSE
option.
Disabling
interrupt
coalescing
and
LRO
automatically
for
VMXNET
3
virtual
NICs.
Optimized
interrupt
delivery
path
for
VM
DirectPath
I/O
and
SR-IOV
passthrough
devices,
using
heuristics
to
derive
hints
from
the
guest
OS
about
optimal
placement
of
physical
interrupt
vectors
on
physical
CPUs.
To
learn
more
about
this
topic,
please
refer
to
the
technical
whitepaper:
http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-
vsphere55.pdf
Hardware-Assisted
Virtualization
Most
recent
processors
from
both
Intel
and
AMD
include
hardware
features
to
assist
virtualization.
These
features
were
released
in
two
generations:
The
first
generation
introduced
CPU
virtualization
The
second
generation
added
memory
management
unit
(MMU)
virtualization
For
the
best
performance,
make
sure
your
system
uses
processors
with
second-generation
hardware-assist
features.
Hardware-Assisted
CPU
Virtualization
(VT-x
and
AMD-V)
The
first
generation
of
hardware
virtualization
assistance,
VT-x
from
Intel
and
AMD-V
from
AMD,
became
available
in
2006.
These
technologies
automatically
trap
sensitive
events
and
instructions,
eliminating
the
overhead
required
to
do
so
in
software.
This
allows
the
use
of
a
hardware
virtualization
(HV)
virtual
machine
monitor
(VMM)
as
opposed
to
a
binary
translation
(BT)
VMM.
Hardware-Assisted
MMU
Virtualization
(Intel
EPT
and
AMD
RVI)
More
recent
processors
also
include
second
generation
hardware
virtualization
assistance
that
addresses
the
overheads
due
to
memory
management
unit
(MMU)
virtualization
by
providing
hardware
support
to
virtualize
the
MMU.
ESXi
supports
this
feature
both
in
AMD
processors,
where
it
is
called
rapid
virtualization
indexing
(RVI)
or
nested
page
tables
(NPT),
and
in
Intel
processors,
where
it
is
called
extended
page
tables
(EPT).
Hardware-assisted
MMU
virtualization
allows
an
additional
level
of
page
tables
that
map
guest
physical
memory
to
host
physical
memory
addresses,
eliminating
the
need
for
ESXi
to
maintain
shadow
page
tables.
This
reduces
memory
consumption
and
speeds
up
workloads
that
cause
guest
operating
systems
to
frequently
modify
page
tables.
While
hardware-assisted
MMU
virtualization
improves
the
performance
of
the
vast
majority
of
workloads,
it
does
increase
the
time
required
to
service
a
TLB
miss,
thus
potentially
reducing
the
performance
of
workloads
that
stress
the
TLB.
Hardware-Assisted
I/O
MMU
Virtualization
(VT-d
and
AMD-Vi)
An
even
newer
processor
feature
is
an
I/O
memory
management
unit
that
remaps
I/O
DMA
transfers
and
device
interrupts.
This
can
allow
virtual
machines
to
have
direct
access
to
hardware
I/O
devices,
such
as
network
cards,
storage
controllers
(HBAs)
and
GPUs.
In
AMD
processors
this
feature
is
called
AMD
I/O
Virtualization
(AMD-Vi
or
IOMMU)
and
in
Intel
processors
the
feature
is
called
Intel
Virtualization
Technology
for
Directed
I/O
(VT-d).
Hardware
Storage
Considerations
Storage
performance
is
a
vast
topic
that
depends
on
workload,
hardware,
vendor,
RAID
level,
cache
size,
stripe
size,
and
so
on.
Consult
the
appropriate
documentation
from
VMware
as
well
as
the
storage
vendor.
Many
workloads
are
very
sensitive
to
the
latency
of
I/O
operations.
It
is
therefore
important
to
have
storage
devices
configured
correctly
VMware
Storage
vMotion
performance
is
heavily
dependent
on
the
available
storage
infrastructure
Bandwidth
Consider
choosing
storage
hardware
that
supports
VMware
vStorage
APIs
for
Array
Integration
(VAAI).
VAAI
can
improve
storage
scalability
by
offloading
some
operations
to
the
storage
hardware
instead
of
performing
them
in
ESXi.
On
SANs,
VAAI
offers
the
following
features:
Hardware-accelerated
cloning
(sometimes
called
full
copy
or
copy
offload)
frees
resources
on
the
host
and
can
speed
u
workloads
that
rely
on
cloning,
such
as
Storage
vMotion.
Block
zeroing
speeds
up
creation
of
eager-zeroed
thick
disks
and
can
improve
first-time
write
performance
on
lazy-zeroed
thick
disks
and
on
thin
disks.
Scalable
lock
management
(sometimes
called
atomic
test
and
set,
or
ATS)
can
reduce
locking-related
overheads,
speeding
up
thin-disk
expansion
as
well
as
many
other
administrative
and
file
system-intensive
tasks.
This
helps
improve
the
scalability
of
very
large
deployments
by
speeding
up
provisioning
operations
like
boot
storms,
expansion
of
thin
disks,
On
NAS
devices,
VAAI
offers
the
following
features:
Hardware-accelerated
cloning
(sometimes
called
full
copy
or
copy
offload)
frees
resources
on
the
host
and
can
speed
up
workloads
that
rely
on
cloning.
(Note
that
Storage
vMotion
does
not
make
use
of
this
feature
on
NAS
devices.)
Space
reservation
allows
ESXi
to
fully
pre-allocate
space
for
a
virtual
disk
at
the
time
the
virtual
disk
is
created.
Thus,
in
addition
to
the
thin
provisioning
and
eager-zeroed
thick
provisioning
options
that
non-VAAI
NAS
devices
support,
VAAI
NAS
devices
also
support
lazy-zeroed
thick
provisioning.
Though
the
degree
of
improvement
is
dependent
on
the
storage
hardware,
VAAI
can
reduce
storage
latency
for
several
types
of
storage
operations,
can
reduce
the
ESXi
host
CPU
utilization
for
storage
operations,
and
can
reduce
storage
network
traffic.
Performance
design
for
a
storage
network
must
take
into
account
the
physical
constraints
of
the
network,
not
logical
allocations.
Using
VLANs
or
VPNs
does
not
provide
a
suitable
solution
to
the
problem
of
link
oversubscription
in
shared
configurations.
VLANs
and
other
virtual
partitioning
of
a
network
provide
a
way
of
logically
configuring
a
network,
but
don't
change
the
physical
capabilities
of
links
and
trunks
between
switches.
If
you
have
heavy
disk
I/O
loads,
you
might
need
to
assign
separate
storage
processors
(SPs)
to
separate
systems
to
handle
the
amount
of
traffic
bound
for
storage.
To
optimize
storage
array
performance,
spread
I/O
loads
over
the
available
paths
to
the
storage
(that
is,
across
multiple
host
bus
adapters
(HBAs)
and
storage
processors).
Make
sure
that
end-to-end
Fibre
Channel
speeds
are
consistent
to
help
avoid
performance
problems.
For
more
information,
see
KB
article
1006602.
Configure
maximum
queue
depth
for
Fibre
Channel
HBA
cards.
For
additional
information
see
VMware
KB
article
1267.
Applications
or
systems
that
write
large
amounts
of
data
to
storage,
such
as
data
acquisition
or
transaction
logging
systems,
should
not
share
Ethernet
links
to
a
storage
device
with
other
applications
or
systems.
These
types
of
applications
perform
best
with
dedicated
connections
to
storage
devices.
For
iSCSI
and
NFS,
make
sure
that
your
network
topology
does
not
contain
Ethernet
bottlenecks,
where
multiple
links
are
routed
through
fewer
links,
potentially
resulting
in
oversubscription
and
dropped
network
packets.
Any
time
a
number
of
links
transmitting
near
capacity
are
switched
to
a
smaller
number
of
links,
such
oversubscription
is
a
possibility.
Recovering
from
these
dropped
network
packets
results
in
large
performance
degradation.
In
addition
to
time
spent
determining
that
data
was
dropped,
the
retransmission
uses
network
bandwidth
that
could
otherwise
be
used
for
new
transactions.
For
iSCSI
and
NFS,
if
the
network
switch
deployed
for
the
data
path
supports
VLAN,
it
might
be
beneficial
to
create
a
VLAN
just
for
the
ESXi
host's
vmknic
and
the
iSCSI/NFS
server.
This
minimizes
network
interference
from
other
packet
sources.
Be
aware
that
with
software-initiated
iSCSI
and
NFS
the
network
protocol
processing
takes
place
on
the
host
system,
and
thus
these
might
require
more
CPU
resources
than
other
storage
options.
Local
storage
performance
might
be
improved
with
write-back
cache.
If
your
local
storage
has
write-back
cache
installed,
make
sure
its
enabled
and
contains
a
functional
battery
module.
For
more
information,
see
KB
article
1006602.
Hardware
Networking
Considerations
Before
undertaking
any
network
optimization
effort,
you
should
understand
the
physical
aspects
of
the
network.
The
following
are
just
a
few
aspects
of
the
physical
layout
that
merit
close
consideration:
Consider
using
server-class
network
interface
cards
(NICs)
for
the
best
performance.
Make
sure
the
network
infrastructure
between
the
source
and
destination
NICs
doesnt
introduce
bottlenecks.
For
example,
if
both
NICs
are
10Gigabit,
make
sure
all
cables
and
switches
are
capable
of
the
same
speed
and
that
the
switches
are
not
configured
to
a
lower
speed.
For
the
best
networking
performance,
we
recommend
the
use
of
network
adapters
that
support
the
following
hardware
features:
Checksum
offload
TCP
segmentation
offload
(TSO)
Ability
to
handle
high-memory
DMA
(that
is,
64-bit
DMA
addresses)
Ability
to
handle
multiple
Scatter
Gather
elements
per
Tx
frame
Jumbo
frames
(JF)
Large
receive
offload
(LRO)
On
some
10
Gigabit
Ethernet
hardware
network
adapters,
ESXi
supports
NetQueue,
a
technology
that
significantly
improves
performance
of
10Gigabit
Ethernet
network
adapters
in
virtualized
environments.
In
addition
to
the
PCI
and
PCI-X
bus
architectures,
we
now
have
the
PCI
Express
(PCIe)
architecture.
Ideally
single-port
10
Gigabit
Ethernet
network
adapters
should
use
PCIe
x8
(or
higher)
or
PCI-X
266
and
dual-port
10
Gigabit
Ethernet
network
adapters
should
use
PCIe
x16
(or
higher).
There
should
preferably
be
no
bridge
chip
(e.g.,
PCI-X
to
PCIe
or
PCIe
to
PCI-X)
in
the
path
to
the
actual
Ethernet
device
(including
any
embedded
bridge
chip
on
the
device
itself),
as
these
chips
can
reduce
performance.
Multiple
physical
network
adapters
between
a
single
virtual
switch
(vSwitch)
and
the
physical
network
constitute
a
NIC
team.
NIC
teams
can
provide
passive
failover
in
the
event
of
hardware
failure
or
network
outage
and,
in
some
configurations,
can
increase
performance
by
distributing
the
traffic
across
those
physical
network
adapters.
Hardware
BIOS
Settings,
General
BIOS
Settings
Make
sure
you
are
running
the
latest
version
of
the
BIOS
available
for
your
system.
Make
sure
the
BIOS
is
set
to
enable
all
populated
processor
sockets
and
to
enable
all
cores
in
each
socket.
Enable
Turbo
Boost
in
the
BIOS
if
your
processors
support
it.
Make
sure
hyper-threading
is
enabled
in
the
BIOS
for
processors
that
support
it.
Some
NUMA-capable
systems
provide
an
option
in
the
BIOS
to
disable
NUMA
by
enabling
node
interleaving.
In
most
cases
you
will
get
the
best
performance
by
disabling
node
interleaving
(in
other
words,
leaving
NUMA
enabled).
Make
sure
any
hardware-assisted
virtualization
features
(VT-x,
AMD-V,
EPT,
RVI,
and
so
on)
are
enabled
in
the
BIOS.
Disable
from
within
the
BIOS
any
devices
you
wont
be
using.
This
might
include,
for
example,
unneeded
serial,
USB,
or
network
ports
Cache
prefetching
mechanisms
(sometimes
called
DPL
Prefetch,
Hardware
Prefetcher,
L2
Streaming
Prefetch,
or
Adjacent
Cache
Line
Prefetch)
usually
help
performance,
especially
when
memory
access
patterns
are
regular.
When
running
applications
that
access
memory
randomly,
however,
disabling
these
mechanisms
might
result
in
improved
performance.
If
the
BIOS
allows
the
memory
scrubbing
rate
to
be
configured,
we
recommend
leaving
it
at
the
manufacturers
default
setting.
Power
Management
BIOS
Settings
VMware
ESXi
includes
a
full
range
of
host
power
management
capabilities
in
the
software
that
can
save
power
when
a
host
is
not
fully
utilized.
We
recommend
that
you
configure
your
BIOS
settings
to
allow
ESXi
the
most
flexibility
in
using
(or
not
using)
the
power
management
features
offered
by
your
hardware,
then
make
your
power-management
choices
within
ESXi.
In
order
to
allow
ESXi
to
control
CPU
power-saving
features,
set
power
management
in
the
BIOS
to
OS
Controlled
Mode
or
equivalent.
Even
if
you
dont
intend
to
use
these
power-saving
features,
ESXi
provides
a
convenient
way
to
manage
them.
Availability
of
the
C1E
halt
state
typically
provides
a
reduction
in
power
consumption
with
little
or
no
impact
on
performance.
When
Turbo
Boost
is
enabled,
the
availability
of
C1E
can
sometimes
even
increase
the
performance
of
certain
single-threaded
workloads.
We
therefore
recommend
that
you
enable
C1E
in
BIOS.
However,
for
a
very
few
workloads
that
are
highly
sensitive
to
I/O
latency,
especially
those
with
low
CPU
utilization,
C1E
can
reduce
performance.
In
these
cases,
you
might
obtain
better
performance
by
disabling
C1E
in
BIOS,
if
that
option
is
available.
C-states
deeper
than
C1/C1E
(i.e.,
C3,
C6)
allow
further
power
savings,
though
with
an
increased
chance
of
performance
impacts.
We
recommend,
however,
that
you
enable
all
C-states
in
BIOS,
then
use
ESXi
host
power
management
to
control
their
use.
ESXi
and
Virtual
Machines
ESXi
General
Considerations
This
subsection
provides
guidance
regarding
a
number
of
general
performance
considerations
in
ESXi.
Plan
your
deployment
by
allocating
enough
resources
for
all
the
virtual
machines
you
will
run,
as
well
as
those
needed
by
ESXi
itself.
Allocate
to
each
virtual
machine
only
as
much
virtual
hardware
as
that
virtual
machine
requires.
Provisioning
a
virtual
machine
with
more
resources
than
it
requires
can,
in
some
cases,
reduce
the
performance
of
that
virtual
machine
as
well
as
other
virtual
machines
sharing
the
same
host.
Disconnect
or
disable
any
physical
hardware
devices
that
you
will
not
be
using.
These
might
include
devices
such
as:
o COM
ports
o LPT
ports
o USB
controllers
o Floppy
drives
o Optical
drives
(that
is,
CD
or
DVD
drives)
o Network
interfaces
o Storage
controllers
Disabling
hardware
devices
(typically
done
in
BIOS)
can
free
interrupt
resources.
Additionally,
some
devices,
such
as
USB
controllers,
operate
on
a
polling
scheme
that
consumes
extra
CPU
resources.
Lastly,
some
PCI
devices
reserve
blocks
of
memory,
making
that
memory
unavailable
to
ESXi.
Unused
or
unnecessary
virtual
hardware
devices
can
impact
performance
and
should
be
disabled.
For
example,
Windows
guest
operating
systems
poll
optical
drives
(that
is,
CD
or
DVD
drives)
quite
frequently.
When
virtual
machines
are
configured
to
use
a
physical
drive,
and
multiple
guest
operating
systems
simultaneously
try
to
access
that
drive,
performance
could
suffer.
This
can
be
reduced
by
configuring
the
virtual
machines
to
use
ISO
images
instead
of
physical
drives,
and
can
be
avoided
entirely
by
disabling
optical
drives
in
virtual
machines
when
the
devices
are
not
needed.
ESXi
5.0
introduces
virtual
hardware
version
8.
By
creating
virtual
machines
using
this
hardware
version,
or
upgrading
existing
virtual
machines
to
this
version,
a
number
of
additional
capabilities
become
available.
Some
of
these,
such
as
support
for
virtual
machines
with
up
to
1TB
of
RAM
and
up
to
32
vCPUs,
support
for
virtual
NUMA,
and
support
for
3D
graphics,
can
improve
performance
for
some
workloads.
This
hardware
version
is
not
compatible
with
versions
of
ESXi
prior
to
5.0,
however,
and
thus
if
a
cluster
of
ESXi
hosts
will
contain
some
hosts
running
pre-5.0
versions
of
ESXi,
the
virtual
machines
running
on
hardware
version
8
will
be
constrained
to
run
only
on
the
ESXi
5.0
hosts.
This
could
limit
vMotion
choices
for
Distributed
Resource
Scheduling
(DRS)
or
Distributed
Power
Management
(DPM).
ESXi
CPU
Considerations
CPU
virtualization
adds
varying
amounts
of
overhead
depending
on
the
percentage
of
the
virtual
machines
workload
that
can
be
executed
on
the
physical
processor
as
is
and
the
cost
of
virtualizing
the
remainder
of
the
workload:
For
many
workloads,
CPU
virtualization
adds
only
a
very
small
amount
of
overhead,
resulting
in
performance
essentially
comparable
to
native.
Many
workloads
to
which
CPU
virtualization
does
add
overhead
are
not
CPU-boundthat
is,
most
of
their
time
is
spent
waiting
for
external
events
such
as
user
interaction,
device
input,
or
data
retrieval,
rather
than
executing
instructions.
Because
otherwise-unused
CPU
cycles
are
available
to
absorb
the
virtualization
overhead,
these
workloads
will
typically
have
throughput
similar
to
native,
but
potentially
with
a
slight
increase
in
latency.
For
a
small
percentage
of
workloads,
for
which
CPU
virtualization
adds
overhead
and
which
are
CPU-bound,
there
might
be
a
noticeable
degradation
in
both
throughput
and
latency.
If
an
ESXi
host
becomes
CPU
saturated
(that
is,
the
virtual
machines
and
other
loads
on
the
host
demand
all
the
CPU
resources
the
host
has),
latency
sensitive
workloads
might
not
perform
well.
In
this
case
you
might
want
to
reduce
the
CPU
load,
for
example
by
powering
off
some
virtual
machines
or
migrating
them
to
a
different
host
(or
allowing
DRS
to
migrate
them
automatically).
It
is
a
good
idea
to
periodically
monitor
the
CPU
usage
of
the
host.
This
can
be
done
through
the
vSphere
Client
or
by
using
esxtop
or
resxtop.
Below
we
describe
how
to
interpret
esxtop
data:
o If
the
load
average
on
the
first
line
of
the
esxtop
CPU
panel
is
equal
to
or
greater
than
1,
this
indicates
that
the
system
is
overloaded.
o The
usage
percentage
for
the
physical
CPUs
on
the
PCPU
line
can
be
another
indication
of
a
possibly
overloaded
condition.
In
general,
80%
usage
is
a
reasonable
ceiling
and
90%
should
be
a
warning
that
the
CPUs
are
approaching
an
overloaded
condition.
However
organizations
will
have
varying
standards
regarding
the
desired
load
percentage.
Configuring
a
virtual
machine
with
more
virtual
CPUs
(vCPUs)
than
its
workload
can
use
might
cause
slightly
increased
resource
usage,
potentially
impacting
performance
on
very
heavily
loaded
systems.
Common
examples
of
this
include
a
single-threaded
workload
running
in
a
multiple-vCPU
virtual
machine
or
a
multi-threaded
workload
in
a
virtual
machine
with
more
vCPUs
than
the
workload
can
effectively
use.
Most
guest
operating
systems
execute
an
idle
loop
during
periods
of
inactivity.
Within
this
loop,
most
of
these
guest
operating
systems
halt
by
executing
the
HLT
or
MWAIT
instructions.
Some
older
guest
operating
systems
(including
Windows
2000
(with
certain
HALs),
Solaris
8
and
9,
and
MS-DOS),
however,
use
busy-waiting
within
their
idle
loops.
This
results
in
the
consumption
of
resources
that
might
otherwise
be
available
for
other
uses
(other
virtual
machines,
the
VMkernel,
and
so
on).
ESXi
automatically
detects
these
loops
and
de-schedules
the
idle
vCPU.
Though
this
reduces
the
CPU
overhead,
it
can
also
reduce
the
performance
of
some
I/O-heavy
workloads.
For
additional
information
see
VMware
KB
articles
1077
and
2231.
The
guest
operating
systems
scheduler
might
migrate
a
single-threaded
workload
amongst
multiple
vCPUs,
thereby
losing
cache
locality.
UP/
vs
SMP
HALs
/Kernels
NOTE
When
changing
an
existing
virtual
machine
running
Windows
from
multi-core
to
single-core
the
HAL
usually
remains
SMP.
For
best
performance,
the
HAL
should
be
manually
changed
back
to
UP.
Hyper-Threading
Hyper-threading
technology
(sometimes
also
called
simultaneous
multithreading,
or
SMT)
allows
a
single
physical
processor
core
to
behave
like
two
logical
processors,
essentially
allowing
two
independent
threads
to
run
simultaneously.
Unlike
having
twice
as
many
processor
coresthat
can
roughly
double
performancehyper-threading
can
provide
anywhere
from
a
slight
to
a
significant
increase
in
system
performance
by
keeping
the
processor
pipeline
busier.
If
the
hardware
and
BIOS
support
hyper-threading,
ESXi
automatically
makes
use
of
it.
For
the
best
performance
we
recommend
that
you
enable
hyper-threading.
Be
careful
when
using
CPU
affinity
on
systems
with
hyper-threading.
Because
the
two
logical
processors
share
most
of
the
processor
resources,
pinning
vCPUs,
whether
from
different
virtual
machines
or
from
a
single
SMP
virtual
machine,
to
both
logical
processors
on
one
core
(CPUs
0
and
1,
for
example)
could
cause
poor
performance.
ESXi
provides
configuration
parameters
for
controlling
the
scheduling
of
virtual
machines
on
hyper-threaded
systems
(Edit
virtual
machine
settings
>
Resources
tab
>
Advanced
CPU).
When
choosing
hyper-threaded
core
sharing
choices,
the
Any
option
(which
is
the
default)
is
almost
always
preferred
over
None.
The
None
option
indicates
that
when
a
vCPU
from
this
virtual
machine
is
assigned
to
a
logical
processor,
no
other
vCPU,
whether
from
the
same
virtual
machine
or
from
a
different
virtual
machine,
should
be
assigned
to
the
other
logical
processor
that
resides
on
the
same
core.
That
is,
each
vCPU
from
this
virtual
machine
should
always
get
a
whole
core
to
itself
and
the
other
logical
CPU
on
that
core
should
be
placed
in
the
halted
state.
This
option
is
like
disabling
hyper-threading
for
that
one
virtual
machine.
For
nearly
all
workloads,
custom
hyper-threading
settings
are
not
necessary.
In
cases
of
unusual
workloads
that
interact
badly
with
hyper-threading,
however,
choosing
the
None
hyper-
threading
option
might
help
performance.
For
example,
even
though
the
ESXi
scheduler
tries
to
dynamically
run
higher-priority
virtual
machines
on
a
whole
core
for
longer
durations,
you
can
further
isolate
a
high-priority
virtual
machine
from
interference
by
other
virtual
machines
by
setting
its
hyper-threading
sharing
property
to
None.
Non-Uniform
Memory
Access
(NUMA)
By
default,
ESXi
NUMA
scheduling
and
related
optimizations
are
enabled
only
on
systems
with
a
total
of
at
least
four
CPU
cores
and
with
at
least
two
CPU
cores
per
NUMA
node.
On
such
systems,
virtual
machines
can
be
separated
into
the
following
two
categories:
Virtual
machines
with
a
number
of
vCPUs
equal
to
or
less
than
the
number
of
cores
in
each
physical
NUMA
node.
These
virtual
machines
will
be
assigned
to
cores
all
within
a
single
NUMA
node
and
will
be
preferentially
allocated
memory
local
to
that
NUMA
node.
This
means
that,
subject
to
memory
availability,
all
their
memory
accesses
will
be
local
to
that
NUMA
node,
resulting
in
the
lowest
memory
access
latencies.
Virtual
machines
with
more
vCPUs
than
the
number
of
cores
in
each
physical
NUMA
node
(called
wide
virtual
machines).
These
virtual
machines
will
be
assigned
to
two
(or
more)
NUMA
nodes
and
will
be
preferentially
allocated
memory
local
to
those
NUMA
nodes.
Because
vCPUs
in
these
wide
virtual
machines
might
sometimes
need
to
access
memory
outside
their
own
NUMA
node,
they
might
experience
higher
average
memory
access
latencies
than
virtual
machines
that
fit
entirely
within
a
NUMA
node.
Host
Power
Management
in
ESXi
Power
Policy
Options
in
ESXi
ESXi
5.0
offers
the
following
power
policy
options:
High
performance
-
This
power
policy
maximizes
performance,
using
no
power
management
features.
Balanced
-
This
power
policy
(the
default
in
ESXi
5.0)
is
designed
to
reduce
host
power
consumption
while
having
little
or
no
impact
on
performance.
Low
power
-
This
power
policy
is
designed
to
more
aggressively
reduce
host
power
consumption
at
the
risk
of
reduced
performance.
Custom
-
This
power
policy
starts
out
the
same
as
Balanced,
but
allows
for
the
modification
of
individual
parameters.
While
the
default
power
policy
in
ESX/ESXi
4.1
was
High
performance,
in
ESXi
5.0
the
default
is
now
Balanced.
This
power
policy
will
typically
not
impact
the
performance
of
CPU-intensive
workloads.
Rarely,
however,
the
Balanced
policy
might
slightly
reduce
the
performance
of
latency
sensitive
workloads.
In
these
cases,
selecting
the
High
performance
power
policy
will
provide
the
full
hardware
performance.
ESXi
Memory
Considerations
Memory
Overhead
Virtualization
causes
an
increase
in
the
amount
of
physical
memory
required
due
to
the
extra
memory
needed
by
ESXi
for
its
own
code
and
for
data
structures.
This
additional
memory
requirement
can
be
separated
into
two
components:
1.
A
system-wide
memory
space
overhead
for
the
VMkernel
and
various
host
agents
(hostd,
vpxa,
etc.).
2.
An
additional
memory
space
overhead
for
each
virtual
machine.
The
per-virtual-machine
memory
space
overhead
can
be
further
divided
into
the
following
categories:
Memory
reserved
for
the
virtual
machine
executable
(VMX)
process.
This
is
used
for
data
structures
needed
to
bootstrap
and
support
the
guest
(i.e.,
thread
stacks,
text,
and
heap).
Memory
reserved
for
the
virtual
machine
monitor
(VMM).
This
is
used
for
data
structures
required
by
the
virtual
hardware
(i.e.,
TLB,
memory
mappings,
and
CPU
state).
Memory
reserved
for
various
virtual
devices
(i.e.,
mouse,
keyboard,
SVGA,
USB,
etc.)
Memory
reserved
for
other
subsystems,
such
as
the
kernel,
management
agents,
etc.
The
amounts
of
memory
reserved
for
these
purposes
depend
on
a
variety
of
factors,
including
the
number
of
vCPUs,
the
configured
memory
for
the
guest
operating
system,
whether
the
guest
operating
system
is
32-bit
or
64-bit,
and
which
features
are
enabled
for
the
virtual
machine.
For
more
information
about
these
overheads,
see
vSphere
Resource
Management.
Memory
Sizing
You
should
allocate
enough
memory
to
hold
the
working
set
of
applications
you
will
run
in
the
virtual
machine,
thus
minimizing
thrashing.
You
should
also
avoid
over-allocating
memory.
Allocating
more
memory
than
needed
unnecessarily
increases
the
virtual
machine
memory
overhead,
thus
consuming
memory
that
could
be
used
to
support
more
virtual
machines.
Memory
Overcommit
Techniques
ESXi
uses
five
memory
management
mechanismspage
sharing,
ballooning,
memory
compression,
swap
to
host
cache,
and
regular
swappingto
dynamically
reduce
the
amount
of
machine
physical
memory
required
for
each
virtual
machine.
Page
Sharing:
ESXi
uses
a
proprietary
technique
to
transparently
and
securely
share
memory
pages
between
virtual
machines,
thus
eliminating
redundant
copies
of
memory
pages.
In
most
cases,
page
sharing
is
used
by
default
regardless
of
the
memory
demands
on
the
host
system.
(The
exception
is
when
using
large
pages,
as
discussed
in
Large
Memory
Pages
for
Hypervisor
and
Guest
Operating
System
on
page
28.)
Ballooning:
If
the
virtual
machines
memory
usage
approaches
its
memory
target,
ESXi
will
use
ballooning
to
reduce
that
virtual
machines
memory
demands.
Using
a
VMware-supplied
vmmemctl
module
installed
in
the
guest
operating
system
as
part
of
VMware
Tools
suite,
ESXi
can
cause
the
guest
operating
system
to
relinquish
the
memory
pages
it
considers
least
valuable.
Ballooning
provides
performance
closely
matching
that
of
a
native
system
under
similar
memory
constraints.
To
use
ballooning,
the
guest
operating
system
must
be
configured
with
sufficient
swap
space.
Memory
Compression:
If
the
virtual
machines
memory
usage
approaches
the
level
at
which
host-level
swapping
will
be
required,
ESXi
will
use
memory
compression
to
reduce
the
number
of
memory
pages
it
will
need
to
swap
out.
Because
the
decompression
latency
is
much
smaller
than
the
swap-in
latency,
compressing
memory
pages
has
significantly
less
impact
on
performance
than
swapping
out
those
pages.
Swap
to
Host
Cache:
If
memory
compression
doesnt
keep
the
virtual
machines
memory
usage
low
enough,
ESXi
will
next
forcibly
reclaim
memory
using
host-level
swapping
to
a
host
cache
(if
one
has
been
configured).
Swap
to
host
cache
is
a
new
feature
in
ESXi
5.0
that
allows
users
to
configure
a
special
swap
cache
on
SSD
storage.
In
most
cases
this
host
cache
(being
on
SSD)
will
be
much
faster
than
the
regular
swap
files
(typically
on
hard
disk
storage),
significantly
reducing
access
latency.
Thus,
although
some
of
the
pages
ESXi
swaps
out
might
be
active,
swap
to
host
cache
has
a
far
lower
performance
impact
than
regular
host-level
swapping.
Regular
Swapping:
If
the
host
cache
becomes
full,
or
if
a
host
cache
has
not
been
configured,
ESXi
will
next
reclaim
memory
from
the
virtual
machine
by
swapping
out
pages
to
a
regular
swap
file.
Like
swap
to
host
cache,
some
of
the
pages
ESXi
swaps
out
might
be
active.
Unlike
swap
to
host
cache,
however,
this
mechanism
can
cause
virtual
machine
performance
to
degrade
significantly
due
to
its
high
access
latency.
While
ESXi
uses
page
sharing,
ballooning,
memory
compression,
and
swap
to
host
cache
to
allow
significant
memory
over-
commitment,
usually
with
little
or
no
impact
on
performance,
you
should
avoid
overcommitting
memory
to
the
point
that
active
memory
pages
are
swapped
out
with
regular
host-level
swapping.
In
the
vSphere
Client,
select
the
virtual
machine
in
question,
select
the
Performance
tab,
then
look
at
the
value
of
Memory
Balloon
(Average).
An
absence
of
ballooning
suggests
that
ESXi
is
not
under
heavy
memory
pressure
and
thus
memory
over
commitment
is
not
affecting
the
performance
of
that
virtual
machine.
In
the
vSphere
Client,
select
the
virtual
machine
in
question,
select
the
Performance
tab,
then
compare
the
values
of
Consumed
Memory
and
Active
Memory.
If
consumed
is
higher
than
active,
this
suggests
that
the
guest
is
currently
getting
all
the
memory
it
requires
for
best
performance.
In
the
vSphere
Client,
select
the
virtual
machine
in
question,
select
the
Performance
tab,
then
look
at
the
values
of
Swap-In
and
Decompress.
Swapping
in
and
decompressing
at
the
host
level
indicate
more
significant
memory
pressure.
Check
for
guest
operating
system
swap
activity
within
that
virtual
machine.
This
can
indicate
that
ballooning
might
be
starting
to
impact
performance,
though
swap
activity
can
also
be
related
to
other
issues
entirely
within
the
guest
(or
can
be
an
indication
that
the
guest
memory
size
is
simply
too
small).
Memory
Swapping
Optimizations
Because
ESXi
uses
page
sharing,
ballooning,
and
memory
compression
to
reduce
the
need
for
host-level
memory
swapping,
dont
disable
these
techniques.
If
you
choose
to
overcommit
memory
with
ESXi,
be
sure
you
have
sufficient
swap
space
on
your
ESXi
system.
At
the
time
a
virtual
machine
is
first
powered
on,
ESXi
creates
a
swap
file
for
that
virtual
machine
equal
in
size
to
the
difference
between
the
virtual
machine's
configured
memory
size
and
its
memory
reservation.
The
available
disk
space
must
therefore
be
at
least
this
large
(plus
the
space
required
for
VMX
swap,
as
described
in
Memory
Overhead
on
page
25).
You
can
optionally
configure
a
special
host
cache
on
an
SSD
(if
one
is
installed)
to
be
used
for
the
new
swap
to
host
cache
feature.
NOTE
Placing
the
regular
swap
file
in
SSD
and
using
swap
to
host
cache
in
SSD
(as
described
above)
are
two
different
approaches
to
improving
host
swapping
performance.
Because
it
is
unusual
to
have
enough
SSD
space
for
a
hosts
entire
swap
file
needs,
we
recommend
using
local
SSD
for
swap
to
host
cache.
If
you
cant
use
SSD
storage,
place
the
regular
swap
file
on
the
fastest
available
storage.
This
might
be
a
Fibre
Channel
SAN
array
or
a
fast
local
disk.
Placing
swap
files
on
local
storage
(whether
SSD
or
hard
drive)
could
potentially
reduce
vMotion
performance.
This
is
because
if
a
virtual
machine
has
memory
pages
in
a
local
swap
file,
they
must
be
swapped
in
to
memory
before
a
vMotion
operation
on
that
virtual
machine
can
proceed.
Regardless
of
the
storage
type
or
location
used
for
the
regular
swap
file,
for
the
best
performance,
and
to
avoid
the
possibility
of
running
out
of
space,
swap
files
should
not
be
placed
on
thin-provisioned
storage.
Large
Memory
Pages
for
Hypervisor
and
Guest
Operating
System
In
addition
to
the
usual
4KB
memory
pages,
ESXi
also
provides
2MB
memory
pages
(commonly
referred
to
as
large
pages).
By
default
ESXi
assigns
these
2MB
machine
memory
pages
to
guest
operating
systems
that
request
them,
giving
the
guest
operating
system
the
full
advantage
of
using
large
pages.
The
use
of
large
pages
results
in
reduced
memory
management
overhead
and
can
therefore
increase
hypervisor
performance.
If
an
operating
system
or
application
can
benefit
from
large
pages
on
a
native
system,
that
operating
system
or
application
can
potentially
achieve
a
similar
performance
improvement
on
a
virtual
machine
backed
with
2MB
machine
memory
pages.
Use
of
large
pages
can
also
change
page
sharing
behavior.
While
ESXi
ordinarily
uses
page
sharing
regardless
of
memory
demands,
it
does
not
share
large
pages.
Therefore
with
large
pages,
page
sharing
might
not
occur
until
memory
over-commitment
is
high
enough
to
require
the
large
pages
to
be
broken
into
small
pages.
ESXi
Storage
Considerations
VMware
vStorage
APIs
for
Array
Integration
(VAAI)
For
the
best
storage
performance,
consider
using
VAAI-capable
storage
hardware.
The
performance
gains
from
VAAI
(described
in
Hardware
Storage
Considerations
on
page
11)
can
be
especially
noticeable
in
VDI
environments
(where
VAAI
can
improve
boot-
storm
and
desktop
workload
performance),
large
data
centers
(where
VAAI
can
improve
the
performance
of
mass
virtual
machine
provisioning
and
of
thin-provisioned
virtual
disks),
and
in
other
large-scale
deployments.
LUN
Access
Methods,
Virtual
Disk
Modes,
and
Virtual
Disk
Types
You
can
use
RDMs
in
virtual
compatibility
mode
or
physical
compatibility
mode:
Virtual
mode
specifies
full
virtualization
of
the
mapped
device,
allowing
the
guest
operating
system
to
treat
the
RDM
like
any
other
virtual
disk
file
in
a
VMFS
volume.
Physical
mode
specifies
minimal
SCSI
virtualization
of
the
mapped
device,
allowing
the
greatest
flexibility
for
SAN
management
software
or
other
SCSI
target-based
software
running
in
the
virtual
machine.
ESXi
supports
multiple
virtual
disk
types:
Thick
Thick
virtual
disks,
which
have
all
their
space
allocated
at
creation
time,
are
further
divided
into
two
types:
eager
zeroed
and
lazy
zeroed.
Eager-zeroed
An
eager-zeroed
thick
disk
has
all
space
allocated
and
zeroed
out
at
the
time
of
creation.
This
increases
the
time
it
takes
to
create
the
disk,
but
results
in
the
best
performance,
even
on
the
first
write
to
each
block.
Lazy-zeroed
A
lazy-zeroed
thick
disk
has
all
space
allocated
at
the
time
of
creation,
but
each
block
is
zeroed
only
on
first
write.
This
results
in
a
shorter
creation
time,
but
reduced
performance
the
first
time
a
block
is
written
to.
Subsequent
writes,
however,
have
the
same
performance
as
on
eager-zeroed
thick
disks.
Thin
Space
required
for
a
thin-provisioned
virtual
disk
is
allocated
and
zeroed
upon
first
write,
as
opposed
to
upon
creation.
There
is
a
higher
I/O
cost
(similar
to
that
of
lazy-zeroed
thick
disks)
during
the
first
write
to
an
unwritten
file
block,
but
on
subsequent
writes
thin-provisioned
disks
have
the
same
performance
as
eager-zeroed
thick
disks.
Partition
Alignment
The
alignment
of
file
system
partitions
can
impact
performance.
VMware
makes
the
following
recommendations
for
VMFS
partitions:
Like
other
disk-based
filesystems,
VMFS
filesystems
suffer
a
performance
penalty
when
the
partition
is
unaligned.
Using
the
vSphere
Client
to
create
VMFS
partitions
avoids
this
problem
since,
beginning
with
ESXi
5.0,
it
automatically
aligns
VMFS3
or
VMFS5
partitions
along
the
1MB
boundary.
SAN
Multipathing
By
default,
ESXi
uses
the
Most
Recently
Used
(MRU)
path
policy
for
devices
on
Active/Passive
storage
arrays.
Do
not
use
Fixed
path
policy
for
Active/Passive
storage
arrays
to
avoid
LUN
path
thrashing.
NOTE
With
some
Active/Passive
storage
arrays
that
support
ALUA
(described
below)
ESXi
can
use
Fixed
path
policy
without
risk
of
LUN
path
thrashing.
By
default,
ESXi
uses
the
Fixed
path
policy
for
devices
on
Active/Active
storage
arrays.
When
using
this
policy
you
can
maximize
the
utilization
of
your
bandwidth
to
the
storage
array
by
designating
preferred
paths
to
each
LUN
through
different
storage
controllers.
For
more
information,
see
the
VMware
SAN
Configuration
Guide.
In
addition
to
the
Fixed
and
MRU
path
policies,
ESXi
can
also
use
the
Round
Robin
path
policy,
which
can
improve
storage
performance
in
some
environments.
Round
Robin
policy
provides
load
balancing
by
cycling
I/O
requests
through
all
Active
paths,
sending
a
fixed
(but
configurable)
number
of
I/O
requests
through
each
one
in
turn.
If
your
storage
array
supports
ALUA
(Asymmetric
Logical
Unit
Access),
enabling
this
feature
on
the
array
can
improve
storage
performance
in
some
environments.
ALUA,
which
is
automatically
detected
by
ESXi,
allows
the
array
itself
to
designate
paths
as
Active
Optimized.
When
ALUA
is
combined
with
the
Round
Robin
path
policy,
ESXi
cycles
I/O
requests
through
these
Active
Optimized
paths.
Storage
I/O
Resource
Allocation
VMware
vSphere
provides
mechanisms
to
dynamically
allocate
storage
I/O
resources,
allowing
critical
workloads
to
maintain
their
performance
even
during
peak
load
periods
when
there
is
contention
for
I/O
resources.
This
allocation
can
be
performed
at
the
level
of
the
individual
host
or
for
an
entire
datastore.
The
storage
I/O
resources
available
to
an
ESXi
host
can
be
proportionally
allocated
to
the
virtual
machines
running
on
that
host
by
using
the
vSphere
Client
to
set
disk
shares
for
the
virtual
machines
(select
Edit
virtual
machine
settings,
choose
the
Resources
tab,
select
Disk,
then
change
the
Shares
field).
The
maximum
storage
I/O
resources
available
to
each
virtual
machine
can
be
set
using
limits.
These
limits,
set
in
I/O
operations
per
second
(IOPS),
can
be
used
to
provide
strict
isolation
and
control
on
certain
workloads.
By
default,
these
are
set
to
unlimited.
When
set
to
any
other
value,
ESXi
enforces
the
limits
even
if
the
underlying
datastores
are
not
fully
utilized.
An
entire
datastores
I/O
resources
can
be
proportionally
allocated
to
the
virtual
machines
accessing
that
datastore
using
Storage
I/O
Control
(SIOC).
When
enabled,
SIOC
evaluates
the
disk
share
values
set
for
all
virtual
machines
accessing
a
datastore
and
allocates
that
datastores
resources
accordingly.
SIOC
can
be
enabled
using
the
vSphere
Client
(select
a
datastore,
choose
the
Configuration
tab,
click
Properties...
(at
the
far
right),
then
under
Storage
I/O
Control
add
a
checkmark
to
the
Enabled
box).
With
SIOC
disabled
(the
default),
all
hosts
accessing
a
datastore
get
an
equal
portion
of
that
datastores
resources.
Any
shares
values
determine
only
how
each
hosts
portion
is
divided
amongst
its
virtual
machines.
With
SIOC
enabled,
the
disk
shares
are
evaluated
globally
and
the
portion
of
the
datastores
resources
each
host
receives
depends
on
the
sum
of
the
shares
of
the
virtual
machines
running
on
that
host
relative
to
the
sum
of
the
shares
of
all
the
virtual
machines
accessing
that
datastore.
General
ESXi
Storage
Recommendations
I/O
latency
statistics
can
be
monitored
using
esxtop
(or
resxtop),
which
reports
device
latency,
time
spent
in
the
kernel,
and
latency
seen
by
the
guest
operating
system.
Make
sure
that
the
average
latency
for
storage
devices
is
not
too
high.
This
latency
can
be
seen
in
esxtop
(or
resxtop)
by
looking
at
the
GAVG/cmd
metric.
A
reasonable
upper
value
for
this
metric
depends
on
your
storage
subsystem.
If
you
use
SIOC,
you
can
use
your
SIOC
setting
as
a
guide
your
GAVG/cmd
value
should
be
well
below
your
SIOC
setting.
The
default
SIOC
setting
is
30
ms,
but
if
you
have
very
fast
storage
(SSDs,
for
example)
you
might
have
reduced
that
value.
For
further
information
on
average
latency
see
VMware
KB
article
1008205.
You
can
adjust
the
maximum
number
of
outstanding
disk
requests
per
VMFS
volume,
which
can
help
equalize
the
bandwidth
across
virtual
machines
using
that
volume.
For
further
information
see
VMware
KB
article
1268.
If
you
will
not
be
using
Storage
I/O
Control
and
often
observe
QFULL/BUSY
errors,
enabling
and
configuring
queue
depth
throttling
might
improve
storage
performance.
This
feature
can
significantly
reduce
the
number
of
commands
returned
from
the
array
with
a
QFULL/BUSY
error.
If
any
system
accessing
a
particular
LUN
or
storage
array
port
has
queue
depth
throttling
enabled,
all
systems
(both
ESX
hosts
and
other
systems)
accessing
that
LUN
or
storage
array
port
should
use
an
adaptive
queue
depth
algorithm.
Queue
depth
throttling
is
not
compatible
with
Storage
DRS.
For
more
information
about
both
QFULL/BUSY
errors
and
this
feature
see
KB
article
1008113.
Running
Storage
Latency
Sensitive
Applications
By
default
the
ESXi
storage
stack
is
configured
to
drive
high
storage
throughout
at
low
CPU
cost.
While
this
default
configuration
provides
better
scalability
and
higher
consolidation
ratios,
it
comes
at
the
cost
of
potentially
higher
storage
latency.
Applications
that
are
highly
sensitive
to
storage
latency
might
therefore
benefit
from
the
following:
Adjust
the
host
power
management
settings:
Some
of
the
power
management
features
in
newer
server
hardware
can
increase
storage
latency.
Disable
them
as
follows:
Set
the
ESXi
host
power
policy
to
Maximum
performance
(as
described
in
Host
Power
Management
in
ESXi
on
page
23;
this
is
the
preferred
method)
or
disable
power
management
in
the
BIOS
(as
described
in
Power
Management
BIOS
Settings
on
page
14).
Disable
C1E
and
other
C-states
in
BIOS
(as
described
in
Power
Management
BIOS
Settings
on
page
14).
Enable
Turbo
Boost
in
BIOS
(as
described
in
General
BIOS
Settings
on
page
14).
ESXi
Networking
Considerations
In
a
native
environment,
CPU
utilization
plays
a
significant
role
in
network
throughput.
To
process
higher
levels
of
throughput,
more
CPU
resources
are
needed.
The
effect
of
CPU
resource
availability
on
the
network
throughput
of
virtualized
applications
is
even
more
significant.
Because
insufficient
CPU
resources
will
limit
maximum
throughput,
it
is
important
to
monitor
the
CPU
utilization
of
high-throughput
workloads.
Use
separate
virtual
switches,
each
connected
to
its
own
physical
network
adapter,
to
avoid
contention
between
the
VMkernel
and
virtual
machines,
especially
virtual
machines
running
heavy
networking
workloads.
To
establish
a
network
connection
between
two
virtual
machines
that
reside
on
the
same
ESXi
system,
connect
both
virtual
machines
to
the
same
virtual
switch.
If
the
virtual
machines
are
connected
to
different
virtual
switches,
traffic
will
go
through
wire
and
incur
unnecessary
CPU
and
network
overhead.
Network
I/O
Control
(NetIOC)
Network
I/O
Control
(NetIOC)
allows
the
allocation
of
network
bandwidth
to
network
resource
pools.
You
can
either
select
from
among
seven
predefined
resource
pools
(Fault
Tolerance
traffic,
iSCSI
traffic,
vMotion
traffic,
management
traffic,
vSphere
Replication
(VR)
traffic,
NFS
traffic,
and
virtual
machine
traffic)
or
you
can
create
user-defined
resource
pools.
Each
resource
pool
is
associated
with
a
portgroup
and,
optionally,
assigned
a
specific
802.1p
priority
level.
Network
bandwidth
can
be
allocated
to
resource
pools
using
either
shares
or
limits:
Shares
can
be
used
to
allocate
to
a
resource
pool
a
proportion
of
a
network
links
bandwidth
equivalent
to
the
ratio
of
its
shares
to
the
total
shares.
If
a
resource
pool
doesnt
use
its
full
allocation,
the
unused
bandwidth
is
available
for
use
by
other
resource
pools.
Limits
can
be
used
to
set
a
resource
pools
maximum
bandwidth
utilization
(in
Mbps)
from
a
host
through
a
specific
virtual
distributed
switch
(vDS).
These
limits
are
enforced
even
if
a
vDS
is
not
saturated,
potentially
limiting
a
resource
pools
bandwidth
while
simultaneously
leaving
some
bandwidth
unused.
On
the
other
hand,
if
a
resource
pools
bandwidth
utilization
is
less
than
its
limit,
the
unused
bandwidth
is
available
to
other
resource
pools.
NetIOC
can
guarantee
bandwidth
for
specific
needs
and
can
prevent
any
one
resource
pool
from
impacting
the
others.
DirectPath
I/O
vSphere
DirectPath
I/O
leverages
Intel
VT-d
and
AMD-Vi
hardware
support
(described
in
Hardware-Assisted
I/O
MMU
Virtualization
(VT-d
and
AMD-Vi)
on
page
10)
to
allow
guest
operating
systems
to
directly
access
hardware
devices.
In
the
case
of
networking,
DirectPath
I/O
allows
the
virtual
machine
to
access
a
physical
NIC
directly
rather
than
using
an
emulated
device
(E1000)
or
a
para-
virtualized
device
(VMXNET,
VMXNET3).
While
DirectPath
I/O
provides
limited
increases
in
throughput,
it
reduces
CPU
cost
for
networking-intensive
workloads.
DirectPath
I/O
is
not
compatible
with
certain
core
virtualization
features,
however.
This
list
varies
with
the
hardware
on
which
ESXi
is
running:
New
for
vSphere
5.0,
when
ESXi
is
running
on
certain
configurations
of
the
Cisco
Unified
Computing
System
(UCS)
platform,
DirectPath
I/O
for
networking
is
compatible
with
vMotion,
physical
NIC
sharing,
snapshots,
and
suspend/resume.
It
is
not
compatible
with
Fault
Tolerance,
NetIOC,
memory
overcommit,
VMCI,
or
VMSafe.
For
server
hardware
other
than
the
Cisco
UCS
platform,
DirectPath
I/O
is
not
compatible
with
vMotion,
physical
NIC
sharing,
snapshots,
suspend/resume,
Fault
Tolerance,
NetIOC,
memory
overcommit,
or
VMSafe.
Typical
virtual
machines
and
their
workloads
don't
require
the
use
of
DirectPath
I/O.
For
workloads
that
are
very
networking
intensive
and
don't
need
the
core
virtualization
features
mentioned
above,
however,
DirectPath
I/O
might
be
useful
to
reduce
CPU
usage.
SplitRx
Mode
SplitRx
mode,
a
new
feature
in
ESXi
5.0,
uses
multiple
physical
CPUs
to
process
network
packets
received
in
a
single
network
queue.
This
feature
can
significantly
improve
network
performance
for
certain
workloads.
These
workloads
include:
Multiple
virtual
machines
on
one
ESXi
host
all
receiving
multicast
traffic
from
the
same
source.
(SplitRx
mode
will
typically
improve
throughput
and
CPU
efficiency
for
these
workloads.)
Traffic
via
the
vNetwork
Appliance
(DVFilter)
API
between
two
virtual
machines
on
the
same
ESXi
host.
(SplitRx
mode
will
typically
improve
throughput
and
maximum
packet
rates
for
these
workloads.)
This
feature,
which
is
supported
only
for
VMXNET3
virtual
network
adapters,
is
individually
configured
for
each
virtual
NIC
using
the
ethernetX.emuRxMode
variable
in
each
virtual
machines
.vmx
file
(where
X
is
replaced
with
the
network
adapters
ID).
The
possible
values
for
this
variable
are:
ethernetX.emuRxMode
=
"0"
This
value
disables
splitRx
mode
for
ethernetX.
ethernetX.emuRxMode
=
"1"
This
value
enables
splitRx
mode
for
ethernetX.
To
change
this
variable
through
the
vSphere
Client:
1.
Select
the
virtual
machine
you
wish
to
change,
then
click
Edit
virtual
machine
settings.
2.
Under
the
Options
tab,
select
General,
then
click
Configuration
Parameters.
3.
Look
for
ethernetX.emuRxMode
(where
X
is
the
number
of
the
desired
NIC).
If
the
variable
isnt
present,
click
Add
Row
and
enter
it
as
a
new
variable.
4.
Click
on
the
value
to
be
changed
and
configure
it
as
you
wish.
The
change
will
not
take
effect
until
the
virtual
machine
has
been
restarted.
Running
Network
Latency
Sensitive
Applications
By
default
the
ESXi
network
stack
is
configured
to
drive
high
network
throughout
at
low
CPU
cost.
While
this
default
configuration
provides
better
scalability
and
higher
consolidation
ratios,
it
comes
at
the
cost
of
potentially
higher
network
latency.
Applications
that
are
highly
sensitive
to
network
latency
might
therefore
benefit
from
the
following:
Use
VMXNET3
virtual
network
adapters
Adjust
the
host
power
management
settings
(Maximum
Performance,
disable
C1E
and
other
C-States,
Enable
Turbo
Boost
in
BIOS)
Disable
VMXNET3
virtual
interrupt
coalescing
for
the
desired
NIC.
In
some
cases
this
can
improve
performance
for
latency-
sensitive
applications.
In
other
casesmost
notably
applications
with
high
numbers
of
outstanding
network
requestsit
can
reduce
performance.
Guest
Operating
Systems
Install
the
latest
version
of
VMware
Tools
in
the
guest
operating
system.
Disable
screen
savers
and
Window
animations
in
virtual
machines.
Schedule
backups
and
virus
scanning
programs
in
virtual
machines
to
run
at
off-peak
hours
For
the
most
accurate
timekeeping,
consider
configuring
your
guest
operating
system
to
use
NTP,
Windows
Time
Service,
the
VMware
Tools
time-synchronization
option,
or
another
timekeeping
utility
suitable
for
your
operating
system.
We
recommend,
however,
that
within
any
particular
virtual
machine
you
use
either
the
VMware
Tools
time-
synchronization
option
or
another
timekeeping
utility,
but
not
both.
Measuring
Performance
in
Virtual
Machines
Be
careful
when
measuring
performance
from
within
virtual
machines.
Timing
numbers
measured
from
within
virtual
machines
can
be
inaccurate,
especially
when
the
processor
is
overcommitted.
NOTE
One
possible
approach
to
this
issue
is
to
use
a
guest
operating
system
that
has
good
timekeeping
behavior
when
run
in
a
virtual
machine,
such
as
a
guest
that
uses
the
NO_HZ
kernel
configuration
option
(sometimes
called
tickless
timer).
More
information
about
this
topic
can
be
found
in
Timekeeping
in
VMware
Virtual
Machines
(http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf).
Measuring
performance
from
within
virtual
machines
can
fail
to
take
into
account
resources
used
by
ESXi
for
tasks
it
offloads
from
the
guest
operating
system,
as
well
as
resources
consumed
by
virtualization
overhead.
Guest
Operating
System
CPU
Considerations
Many
operating
systems
keep
time
by
counting
timer
interrupts.
The
timer
interrupt
rates
vary
between
different
operating
systems
and
versions.
For
example:
Unpatched
2.4
and
earlier
Linux
kernels
typically
request
timer
interrupts
at
100
Hz
(that
is,
100
interrupts
per
second),
though
this
can
vary
with
version
and
distribution.
Linux
kernels
have
used
a
variety
of
timer
interrupt
rates,
including
100
Hz,
250
Hz,
and
1000
Hz,
again
varying
with
version
and
distribution.
The
most
recent
2.6
Linux
kernels
introduce
the
NO_HZ
kernel
configuration
option
(sometimes
called
tickless
timer)
that
uses
a
variable
timer
interrupt
rate.
Microsoft
Windows
operating
system
timer
interrupt
rates
are
specific
to
the
version
of
Microsoft
Windows
and
the
Windows
HAL
that
is
installed.
Windows
systems
typically
use
a
base
timer
interrupt
rate
of
64
Hz
or
100
Hz.
Running
applications
that
make
use
of
the
Microsoft
Windows
multimedia
timer
functionality
can
increase
the
timer
interrupt
rate.
For
example,
some
multimedia
applications
or
Java
applications
increase
the
timer
interrupt
rate
to
approximately
1000
Hz.
In
addition
to
the
timer
interrupt
rate,
the
total
number
of
timer
interrupts
delivered
to
a
virtual
machine
also
depends
on
a
number
of
other
factors:
Virtual
machines
running
SMP
HALs/kernels
(even
if
they
are
running
on
a
UP
virtual
machine)
require
more
timer
interrupts
than
those
running
UP
HALs/kernels.
The
more
vCPUs
a
virtual
machine
has,
the
more
interrupts
it
requires.
Delivering
many
virtual
timer
interrupts
negatively
impacts
virtual
machine
performance
and
increases
host
CPU
consumption.
If
you
have
a
choice,
use
guest
operating
systems
that
require
fewer
timer
interrupts.
For
example:
If
you
have
a
UP
virtual
machine
use
a
UP
HAL/kernel.
In
some
Linux
versions,
such
as
RHEL
5.1
and
later,
the
divider=10
kernel
boot
parameter
reduces
the
timer
interrupt
rate
to
one
tenth
its
default
rate.
See
VMware
KB
article
1006427
for
further
information
Kernels
with
tickless-timer
support
(NO_HZ
kernels)
do
not
schedule
periodic
timers
to
maintain
system
time.
As
a
result,
these
kernels
reduce
the
overall
average
rate
of
virtual
timer
interrupts,
thus
improving
system
performance
and
scalability
on
hosts
running
large
numbers
of
virtual
machines
Virtual
NUMA
(vNUMA)
Virtual
NUMA
(vNUMA),
a
new
feature
in
ESXi
5.0,
exposes
NUMA
topology
to
the
guest
operating
system,
allowing
NUMA-
aware
guest
operating
systems
and
applications
to
make
the
most
efficient
use
of
the
underlying
hardwares
NUMA
architecture.
Virtual
NUMA,
which
requires
virtual
hardware
version
8,
can
provide
significant
performance
benefits,
though
the
benefits
depend
heavily
on
the
level
of
NUMA
optimization
in
the
guest
operating
system
and
applications.
You
can
obtain
the
maximum
performance
benefits
from
vNUMA
if
your
clusters
are
composed
entirely
of
hosts
with
matching
NUMA
architecture.
This
is
because
the
very
first
time
a
vNUMA-enabled
virtual
machine
is
powered
on,
its
vNUMA
topology
is
set
based
in
part
on
the
NUMA
topology
of
the
underlying
physical
host
on
which
it
is
running.
Once
a
virtual
machines
vNUMA
topology
is
initialized
it
doesnt
change
unless
the
number
of
vCPUs
in
that
virtual
machine
is
changed.
This
means
that
if
a
vNUMA
virtual
machine
is
moved
to
a
host
with
a
different
NUMA
topology,
the
virtual
machines
vNUMA
topology
might
no
longer
be
optimal
for
the
underlying
physical
NUMA
topology,
potentially
resulting
in
reduced
performance.
Size
your
virtual
machines
so
they
align
with
physical
NUMA
boundaries.
For
example,
if
you
have
a
host
system
with
six
cores
per
NUMA
node,
size
your
virtual
machines
with
a
multiple
of
six
vCPUs
(i.e.,
6
vCPUs,
12
vCPUs,
18
vCPUs,
24
vCPUs,
and
so
on).
NOTE
Some
multi-core
processors
have
NUMA
node
sizes
that
are
different
than
the
number
of
cores
per
socket.
For
example,
some
12-core
processors
have
two
six-core
NUMA
nodes
per
processor.
Guest
Operating
System
Storage
Considerations
The
default
virtual
storage
adapter
in
ESXi
5.0
is
either
BusLogic
Parallel,
LSI
Logic
Parallel,
or
LSI
Logic
SAS,
depending
on
the
guest
operating
system
and
the
virtual
hardware
version.
However,
ESXi
also
includes
a
paravirtualized
SCSI
storage
adapter,
PVSCSI
(also
called
VMware
Paravirtual).
The
PVSCSI
adapter
offers
a
significant
reduction
in
CPU
utilization
as
well
as
potentially
increased
throughput
compared
to
the
default
virtual
storage
adapters,
and
is
thus
the
best
choice
for
environments
with
very
I/O-intensive
guest
applications.
The
depth
of
the
queue
of
outstanding
commands
in
the
guest
operating
system
SCSI
driver
can
significantly
impact
disk
performance.
A
queue
depth
that
is
too
small,
for
example,
limits
the
disk
bandwidth
that
can
be
pushed
through
the
virtual
machine.
In
some
cases
large
I/O
requests
issued
by
applications
in
a
virtual
machine
can
be
split
by
the
guest
storage
driver.
Changing
the
guest
operating
systems
registry
settings
to
issue
larger
block
sizes
can
eliminate
this
splitting,
thus
enhancing
performance.
For
additional
information
see
VMware
KB
article
9645697.
Make
sure
the
disk
partitions
within
the
guest
are
aligned.
Guest
Operating
System
Networking
Considerations
The
default
virtual
network
adapter
emulated
in
a
virtual
machine
is
either
an
AMD
PCnet32
device
(vlance)
or
an
Intel
E1000
device
(E1000).
VMware
also
offers
the
VMXNET
family
of
paravirtualized
network
adapters,
however,
that
provide
better
performance
than
these
default
adapters
and
should
be
used
for
optimal
performance
within
any
guest
operating
s ystem
for
which
they
are
available.
For
the
best
performance,
use
the
VMXNET3
paravirtualized
network
adapter
for
operating
systems
in
which
it
is
supported.
This
requires
that
the
virtual
machine
use
virtual
hardware
version
7
or
later,
and
that
VMware
Tools
be
installed
in
the
guest
operating
system.
The
VMXNET3,
Enhanced
VMXNET,
and
E1000
devices
support
jumbo
frames
for
better
performance.
(Note
that
the
vlance
device
does
not
support
jumbo
frames.)
To
enable
jumbo
frames,
set
the
MTU
size
to
9000
in
both
the
guest
network
driver
and
the
virtual
switch
configuration.
The
physical
NICs
at
both
ends
and
all
the
intermediate
hops/routers/switches
must
also
support
jumbo
frames.
In
ESXi,
TCP
Segmentation
Offload
(TSO)
is
enabled
by
default
in
the
VMkernel,
but
is
supported
in
virtual
machines
only
when
they
are
using
the
VMXNET3
device,
the
Enhanced
VMXNET
device,
or
the
E1000
device.
TSO
can
improve
performance
even
if
the
underlying
hardware
does
not
support
TSO.
In
some
cases,
low
receive
throughput
in
a
virtual
machine
can
be
caused
by
insufficient
receive
buffers
in
the
receiver
network
device.
If
the
receive
ring
in
the
guest
operating
systems
network
driver
overflows,
packets
will
be
dropped
in
the
VMkernel,
degrading
network
throughput.
A
possible
workaround
is
to
increase
the
number
of
receive
buffers,
though
this
might
increase
the
host
physical
CPU
workload.
For
VMXNET,
the
default
number
of
receive
and
transmit
buffers
is
100
each,
with
the
maximum
possible
being
128.
For
Enhanced
VMXNET,
the
default
number
of
receive
and
transmit
buffers
are
150
and
256,
respectively,
with
the
maximum
possible
receive
buffers
being
512.
You
can
alter
these
settings
by
changing
the
buffer
size
defaults
in
the
.vmx
(configuration)
files
for
the
affected
virtual
machines.
For
additional
information
see
VMware
KB
article
1010071
Receive-side
scaling
(RSS)
allows
network
packet
receive
processing
to
be
scheduled
in
parallel
on
multiple
CPUs.
Without
RSS,
receive
interrupts
can
be
handled
on
only
one
CPU
at
a
time.
With
RSS,
received
packets
from
a
single
NIC
can
be
processed
on
multiple
CPUs
concurrently.
This
helps
receive
throughput
in
cases
where
a
single
CPU
would
otherwise
be
saturated
with
receive
processing
and
become
a
bottleneck.
To
prevent
out-of-order
packet
delivery,
RSS
schedules
all
of
a
flows
packets
to
the
same
CPU.
Virtual
Infrastructure
Management
Use
resource
settings
(that
is,
Reservation,
Shares,
and
Limits)
only
if
needed
in
your
environment.
If
you
expect
frequent
changes
to
the
total
available
resources,
use
Shares,
not
Reservation,
to
allocate
resources
fairly
across
virtual
machines.
If
you
use
Shares
and
you
subsequently
upgrade
the
hardware,
each
virtual
machine
stays
at
the
same
relative
priority
(keeps
the
same
number
of
shares)
even
though
each
share
represents
a
larger
amount
of
memory
or
CPU.
Use
Reservation
to
specify
the
minimum
acceptable
amount
of
CPU
or
memory,
not
the
amount
you
would
like
to
have
available.
After
all
resource
reservations
have
been
met,
ESXi
allocates
the
remaining
resources
based
on
the
number
of
shares
and
the
resource
limits
configured
for
your
virtual
machine.
When
specifying
the
reservations
for
virtual
machines,
always
leave
some
headroom
for
memory
virtualization
overhead
and
migration
overhead.
In
a
DRS-enabled
cluster,
reservations
that
fully
commit
the
capacity
of
the
cluster
or
of
individual
hosts
in
the
cluster
can
prevent
DRS
from
migrating
virtual
machines
between
hosts.
As
you
approach
fully
reserving
all
capacity
in
the
system,
it
also
becomes
increasingly
difficult
to
make
changes
to
reservations
and
to
the
resource
pool
hierarchy
without
violating
admission
control.
VMware
vCenter
This
section
lists
VMware
vCenter
practices
and
configurations
recommended
for
optimal
performance.
It
also
includes
a
few
features
that
are
controlled
or
accessed
through
vCenter.
The
performance
of
vCenter
Server
is
dependent
in
large
part
on
the
number
of
managed
entities
(hosts
and
virtual
machines)
and
the
number
of
connected
VMware
vSphere
Clients.
Exceeding
the
maximums
specified
in
Configuration
Maximums
for
VMware
vSphere
5.0,
in
addition
to
being
unsupported,
is
thus
likely
to
impact
vCenter
Server
performance.
Whether
run
on
virtual
machines
or
physical
systems,
make
sure
you
provide
vCenter
Server
and
the
vCenter
Server
database
with
sufficient
CPU,
memory,
and
storage
resources
for
your
deployment
size.
To
minimize
the
latency
of
vCenter
operations,
keep
to
a
minimum
the
number
of
network
hops
between
the
vCenter
Server
system
and
the
vCenter
Server
database.
Although
VMware
vCenter
Update
Manager
can
be
run
on
the
same
system
and
use
the
same
database
as
vCenter
Server,
for
maximum
performance,
especially
on
heavily-loaded
vCenter
systems,
consider
running
Update
Manager
on
its
own
system
and
providing
it
with
a
dedicated
database.
Similarly,
VMware
vCenter
Converter
can
be
run
on
the
same
system
as
vCenter
Server,
but
doing
so
might
impact
performance,
especially
on
heavily-loaded
vCenter
systems.
VMware
vCenter
Database
Considerations
VMware
vCenter
Database
Network
and
Storage
Considerations
To
minimize
the
latency
of
operations
between
vCenter
Server
and
the
database,
keep
to
a
minimum
the
number
of
network
hops
between
the
vCenter
Server
system
and
the
database
system.
The
hardware
on
which
the
vCenter
database
is
stored,
and
the
arrangement
of
the
files
on
that
hardware,
can
have
a
significant
effect
on
vCenter
performance:
The
vCenter
database
performs
best
when
its
files
are
placed
on
high-performance
storage.
The
database
data
files
generate
mostly
random
read
I/O
traffic,
while
the
database
transaction
logs
generate
mostly
sequential
write
I/O
traffic.
For
this
reason,
and
because
their
traffic
is
often
significant
and
simultaneous,
vCenter
performs
best
when
these
two
file
types
are
placed
on
separate
storage
resources
that
share
neither
disks
nor
I/O
bandwidth.
VMware
vCenter
Database
Configuration
and
Maintenance
Configure
the
vCenter
statistics
level
to
a
setting
appropriate
for
your
uses.
This
setting
can
range
from
1
to
4,
but
a
setting
of
1
is
recommended
for
most
situations.
Higher
settings
can
slow
the
vCenter
Server
system.
You
can
also
selectively
disable
statistics
rollups
for
particular
collection
levels.
To
avoid
frequent
log
file
switches,
ensure
that
your
vCenter
database
logs
are
sized
appropriately
for
your
vCenter
inventory.
For
example,
with
a
large
vCenter
inventory
running
with
an
Oracle
database,
the
size
of
each
redo
log
should
be
at
least
512MB.
vCenter
Server
starts
up
with
a
database
connection
pool
of
50
threads.
This
pool
is
then
dynamically
sized,
growing
adaptively
as
needed
based
on
the
vCenter
Server
workload,
and
does
not
require
modification.
However,
if
a
heavy
workload
is
expected
on
the
vCenter
Server,
the
size
of
this
pool
at
startup
can
be
increased,
with
the
maximum
being
128
threads.
Note
that
this
might
result
in
increased
memory
consumption
by
vCenter
Server
and
slower
vCenter
Server
startup.
Update
statistics
of
the
tables
and
indexes
on
a
regular
basis
for
better
overall
performance
of
the
database.
As
part
of
the
regular
database
maintenance
activity,
check
the
fragmentation
of
the
index
objects
and
recreate
indexes
if
needed
(i.e.,
if
fragmentation
is
more
than
about
30%).
Microsoft
SQL
Server
Database
Recommendations
If
you
are
using
a
Microsoft
SQL
Server
database,
the
following
points
can
improve
vCenter
Server
performance:
Setting
the
transaction
logs
to
Simple
recovery
mode
significantly
reduces
the
database
logs
disk
space
usage
as
well
as
their
storage
I/O
load.
If
it
isnt
possible
to
set
this
to
Simple,
make
sure
to
have
a
high-performance
storage
subsystem.
To
further
improve
database
performance
for
large
inventories,
place
tempDB
on
a
different
disk
than
either
the
database
data
files
or
the
database
transaction
logs.
We
recommend
a
fill
factor
of
about
70%
for
the
four
VPX_HIST_STAT
tables
(vpx_hist_stat1,
vpx_hist_stat2,
vpx_hist_stat3,
and
vpx_hist_stat4).
If
the
fill
factor
is
set
too
high,
the
server
must
take
time
splitting
pages
when
they
fill
up.
If
the
fill
factor
is
set
too
low,
the
database
will
be
larger
than
necessary
due
to
the
unused
space
on
each
page,
thus
increasing
the
number
of
pages
that
need
to
be
read
during
normal
operations.
Oracle
Database
Recommendations
If
you
are
using
an
Oracle
database,
the
following
points
can
improve
vCenter
Server
performance:
When
using
Automatic
Memory
Management
(AMM)
in
Oracle
11g,
or
Automatic
Shared
memory
Management
(ASMM)
in
Oracle
10g,
allocate
sufficient
memory
for
the
Oracle
database.
Set
appropriate
PROCESSES
or
SESSIONS
initialization
parameters.
Oracle
creates
a
new
server
process
for
every
new
connection
that
is
made
to
it.
The
number
of
connections
an
application
can
make
to
the
Oracle
instance
thus
depends
on
how
many
processes
Oracle
can
create.
PROCESSES
and
SESSIONS
together
determine
how
many
simultaneous
connections
Oracle
can
accept.
In
large
vSphere
environments
(as
defined
in
vSphere
Installation
and
Setup
for
vSphere
5.0)
we
recommend
setting
PROCESSES
to
800.
If
database
operations
are
slow,
after
checking
that
the
statistics
are
up
to
date
and
the
indexes
are
not
fragmented,
you
should
move
the
indexes
to
separate
tablespaces
(i.e.,
place
tables
and
primary
key
(PK)
constraint
index
on
one
tablespace
and
the
other
indexes
(i.e.,
BTree)
on
another
tablespace).
For
large
inventories
(i.e.,
those
that
approach
the
limits
for
the
number
of
hosts
or
virtual
machines),
increase
the
db_writer_processes
parameter
to
4.
the
virtual
machines.
This
is
one
more
reason
to
configure
virtual
machines
with
only
as
many
vCPUs
and
only
as
much
virtual
memory
as
they
need.
Have
virtual
machines
in
DRS
automatic
mode
when
possible,
as
they
are
considered
for
cluster
load
balancing
migrations
across
the
ESXi
hosts
before
the
virtual
machines
that
are
not
in
automatic
mode.
Powered-on
virtual
machines
consume
memory
resourcesand
typically
consume
some
CPU
resourceseven
when
idle.
Thus
even
idle
virtual
machines,
though
their
utilization
is
usually
small,
can
affect
DRS
decisions.
For
this
and
other
reasons,
a
marginal
performance
increase
might
be
obtained
by
shutting
down
or
suspending
virtual
machines
that
are
not
being
used.
Resource
pools
help
improve
manageability
and
troubleshooting
of
performance
problems.
We
recommend,
however,
that
resource
pools
and
virtual
machines
not
be
made
siblings
in
a
hierarchy.
Instead,
each
level
should
contain
only
resource
pools
or
only
virtual
machines.
DRS
affinity
rules
can
keep
two
or
more
virtual
machines
on
the
same
ESXi
host
(VM/VM
affinity)
or
make
sure
they
are
always
on
different
hosts
(VM/VM
anti-affinity).
DRS
affinity
rules
can
also
be
used
to
make
sure
a
group
of
virtual
machines
runs
only
on
(or
has
a
preference
for)
a
specific
group
of
ESXi
hosts
(VM/Host
affinity)
or
never
runs
on
(or
has
a
preference
against)
a
specific
group
of
hosts
(VM/Host
anti-affinity).
In
most
cases
leaving
the
affinity
settings
unchanged
will
provide
the
best
results.
In
rare
cases,
however,
specifying
affinity
rules
can
help
improve
performance.
To
change
affinity
settings,
select
a
cluster
from
within
the
vSphere
Client,
choose
the
Summary
tab,
click
Edit
Settings,
choose
Rules,
click
Add,
enter
a
name
for
the
new
rule,
choose
a
rule
type,
and
proceed
through
the
GUI
as
appropriate
for
the
rule
type
you
selected.
Besides
the
default
setting,
the
affinity
setting
types
are:
Keep
Virtual
Machines
Together
This
affinity
type
can
improve
performance
due
to
lower
latencies
of
communication
between
machines.
Separate
Virtual
Machines
This
affinity
type
can
maintain
maximal
availability
of
the
virtual
machines.
For
instance,
if
they
are
both
web
server
front
ends
to
the
same
application,
you
might
want
to
make
sure
that
they
don't
both
go
down
at
the
same
time.
Also
co-location
of
I/O
intensive
virtual
machines
could
end
up
saturating
the
host
I/O
capacity,
leading
to
performance
degradation.
DRS
currently
does
not
make
virtual
machine
placement
decisions
based
on
their
I/O
resources
usage.
Virtual
Machines
to
Hosts
(including
Must
run
on...,
Should
run
on...,
Must
not
run
on...,
and
Should
not
run
on...)
These
affinity
types
can
be
useful
for
clusters
with
software
licensing
restrictions
or
specific
availability
zone
requirements.
To
allow
DRS
the
maximum
flexibility:
Place
virtual
machines
on
shared
datastores
accessible
from
all
hosts
in
the
cluster.
Make
sure
virtual
machines
are
not
connected
to
host
devices
that
would
prevent
them
from
moving
off
of
those
hosts.
The
drmdump
files
produced
by
DRS
can
be
very
useful
in
diagnosing
potential
DRS
performance
issues
during
a
support
call.
For
particularly
active
clusters,
or
those
with
more
than
about
16
hosts,
it
can
be
helpful
to
keep
more
such
files
than
can
fit
in
the
default
maximum
drmdump
directory
size
of
20MB.
This
maximum
can
be
increased
using
the
DumpSpace
option,
which
can
be
set
using
DRS
Advanced
Options.
Cluster
Sizing
and
Resource
Settings
Exceeding
the
maximum
number
of
hosts,
virtual
machines,
or
resource
pools
for
each
DRS
cluster
specified
in
Configuration
Maximums
for
VMware
vSphere
5.0
is
not
supported.
Even
if
it
seems
to
work,
doing
so
could
adversely
affect
vCenter
Server
or
DRS
performance.
Carefully
select
the
resource
settings
(that
is,
reservations,
shares,
and
limits)
for
your
virtual
machines.
Setting
reservations
too
high
can
leave
few
unreserved
resources
in
the
cluster,
thus
limiting
the
options
DRS
has
to
balance
load.
Setting
limits
too
low
could
keep
virtual
machines
from
using
extra
resources
available
in
the
cluster
to
improve
their
performance.
Use
reservations
to
guarantee
the
minimum
requirement
a
virtual
machine
needs,
rather
than
what
you
might
like
it
to
get.
Note
that
shares
take
effect
only
when
there
is
resource
contention.
Note
also
that
additional
resources
reserved
for
virtual
machine
memory
overhead
need
to
be
accounted
for
when
sizing
resources
in
the
cluster.
If
the
overall
cluster
capacity
might
not
meet
the
needs
of
all
virtual
machines
during
peak
hours,
you
can
assign
relatively
higher
shares
to
virtual
machines
or
resource
pools
hosting
mission-critical
applications
to
reduce
the
performance
interference
from
less-
critical
virtual
machines.
If
you
will
be
using
vMotion,
its
a
good
practice
to
leave
some
unused
CPU
capacity
in
your
cluster.
As
described
in
VMware
vMotion
on
page
51,
when
a
vMotion
operation
is
started,
ESXi
reserves
some
CPU
resources
for
that
operation.
DRS
Performance
Tuning
The
migration
threshold
for
fully
automated
DRS
(cluster
>
DRS
tab
>
Edit...
>
vSphere
DRS)
allows
the
administrator
to
control
the
aggressiveness
of
the
DRS
algorithm.
The
migration
threshold
should
be
set
to
more
aggressive
levels
when
the
following
conditions
are
satisfied:
If
the
hosts
in
the
cluster
are
relatively
homogeneous.
If
the
virtual
machines'
resource
utilization
does
not
vary
much
over
time
and
you
have
relatively
few
constraints
on
where
a
virtual
machine
can
be
placed.
The
migration
threshold
should
be
set
to
more
conservative
levels
in
the
converse
situations.
NOTE
If
the
most
conservative
threshold
is
chosen,
DRS
will
only
apply
move
recommendations
that
must
be
taken
either
to
satisfy
hard
constraints,
such
as
affinity
or
anti-affinity
rules,
or
to
evacuate
virtual
machines
from
a
host
entering
maintenance
or
standby
mode.
VMware
Distributed
Power
Management
(DPM)
VMware
Distributed
Power
Management
(DPM)
conserves
power
by
migrating
virtual
machines
to
fewer
hosts
when
utilizations
are
low.
DPM
is
most
appropriate
for
clusters
in
which
composite
virtual
machine
demand
varies
greatly
over
time;
for
example,
clusters
in
which
overall
demand
is
higher
during
the
day
and
significantly
lower
at
night.
If
demand
is
consistently
high
relative
to
overall
cluster
capacity
DPM
will
have
little
opportunity
to
put
hosts
into
standby
mode
to
save
power.
Because
DPM
uses
DRS,
most
DRS
best
practices
(described
in
VMware
Distributed
Resource
Scheduler
(DRS)
on
page
52)
are
relevant
to
DPM
as
well.
DPM
considers
historical
demand
in
determining
how
much
capacity
to
keep
powered
on
and
keeps
some
excess
capacity
available
for
changes
in
demand.
DPM
will
also
power
on
additional
hosts
when
needed
for
unexpected
increases
in
the
demand
of
existing
virtual
machines
or
to
allow
]
virtual
machine
admission
The
aggressiveness
of
the
DPM
algorithm
can
be
tuned
by
adjusting
the
DPM
Threshold
in
the
cluster
settings
menu.
This
parameter
controls
how
far
outside
the
target
utilization
range
per-host
resource
utilization
can
be
before
DPM
makes
host
power-on/power-
off
recommendations.
The
default
setting
for
the
threshold
is
3
(medium
aggressiveness).
For
datacenters
that
often
have
unexpected
spikes
in
virtual
machine
resource
demands,
you
can
use
the
DPM
advanced
option
MinPoweredOnCpuCapacity
(default
1
MHz)
or
MinPoweredOnMemCapacity
(default
1
MB)
to
ensure
that
a
minimum
amount
of
CPU
or
memory
capacity
is
kept
on
in
the
cluster.
DPM
can
be
disabled
on
individual
hosts
that
are
running
mission-critical
virtual
machines,
and
the
VM/Host
affinity
rules
can
be
used
to
ensure
that
these
virtual
machines
are
not
migrated
away
from
these
hosts.
DPM
can
be
enabled
or
disabled
on
a
predetermined
schedule
using
Scheduled
Tasks
in
vCenter
Server.
When
DPM
is
disabled,
all
hosts
in
a
cluster
will
be
powered
on.
This
might
be
useful,
for
example,
to
reduce
the
delay
in
responding
to
load
spikes
expected
at
certain
times
of
the
day
or
to
reduce
the
likelihood
of
some
hosts
being
left
in
standby
for
extended
periods.
In
a
cluster
with
VMware
High
Availability
(HA)
enabled,
DRS/DPM
maintains
excess
powered-on
capacity
to
meet
the
High
Availability
settings.
The
cluster
might
therefore
not
allow
additional
virtual
machines
to
be
powered
on
and/or
some
hosts
might
not
be
powered
down
even
when
the
cluster
appears
to
be
sufficiently
idle.
These
factors
should
be
considered
when
configuring
HA.
If
VMware
HA
is
enabled
in
a
cluster,
DPM
always
keeps
a
minimum
of
two
hosts
powered
on.
This
is
true
even
if
HA
admission
control
is
disabled
or
if
no
virtual
machines
are
powered
on.
VMware
Storage
Distributed
Resource
Scheduler
(Storage
DRS)
A
new
feature
in
vSphere
5.0,
Storage
Distributed
Resource
Scheduler
(Storage
DRS),
provides
I/O
load
balancing
across
datastores
within
a
datastore
cluster
(a
new
vCenter
object).
This
load
balancing
can
avoid
storage
performance
bottlenecks
or
address
them
if
they
occur.
When
deciding
which
datastores
to
group
into
a
datastore
cluster,
try
to
choose
datastores
that
are
as
homogeneous
as
possible
in
terms
of
host
interface
protocol
(i.e.,
FCP,
iSCSI,
NFS),
RAID
level,
and
performance
characteristics.
We
recommend
not
mixing
SSD
and
hard
disks
in
the
same
datastore
cluster.
While
a
datastore
cluster
can
have
as
few
as
two
datastores,
the
more
datastores
a
datastore
cluster
has,
the
more
flexibility
Storage
DRS
has
to
better
balance
that
clusters
I/O
load.
As
you
add
workloads
you
should
monitor
datastore
I/O
latency
in
the
performance
chart
for
the
datastore
cluster,
particularly
during
peak
hours.
If
most
or
all
of
the
datastores
in
a
datastore
cluster
consistently
operate
with
latencies
close
to
the
congestion
threshold
used
by
Storage
I/O
Control
(set
to
30ms
by
default,
but
sometimes
tuned
to
reflect
the
needs
of
a
particular
deployment),
this
might
be
an
indication
that
there
aren't
enough
spare
I/O
resources
left
in
the
datastore
cluster.
In
this
case,
consider
adding
more
datastores
to
the
datastore
cluster
or
reducing
the
load
on
that
datastore
cluster.
NOTE
Make
sure,
when
adding
more
datastores
to
increase
I/O
resources
in
the
datastore
cluster,
that
your
changes
do
actually
add
resources,
rather
than
simply
creating
additional
ways
to
access
the
same
underlying
physical
disks.
By
default,
Storage
DRS
affinity
rules
keep
all
of
a
virtual
machines
virtual
disks
on
the
same
datastore
(using
intra-VM
affinity).
However
you
can
give
Storage
DRS
more
flexibility
in
I/O
load
balancing,
potentially
increasing
performance,
by
overriding
the
default
intra-VM
affinity
rule.
This
can
be
done
for
either
a
specific
virtual
machine
(from
the
vSphere
Client,
select
Edit
Settings
>
Virtual
Machine
Settings,
then
deselect
Keep
VMDKs
together)
or
for
the
entire
datastore
cluster
(from
the
vSphere
Client,
select
Home
>
Inventory
>
Datastore
and
Datastore
Clusters,
select
a
datastore
cluster,
select
the
Storage
DRS
tab,
click
Edit,
select
Virtual
Machine
Settings,
then
deselect
Keep
VMDKs
together).
Inter-VM
anti-affinity
rules
can
be
used
to
keep
the
virtual
disks
from
two
or
more
different
virtual
machines
from
being
placed
on
the
same
datastore,
potentially
improving
performance
in
some
situations.
They
can
be
used,
for
example,
to
separate
the
storage
I/O
of
multiple
workloads
that
tend
to
have
simultaneous
but
intermittent
peak
loads,
preventing
those
peak
loads
from
combining
to
stress
a
single
datastore.
VMware
High
Availability
VMware
High
Availability
(HA)
minimizes
virtual
machine
downtime
by
monitoring
hosts,
virtual
machines,
or
applications
within
virtual
machines,
then,
in
the
event
a
failure
is
detected,
restarting
virtual
machines
on
alternate
hosts.
When
vSphere
HA
is
enabled
in
a
cluster,
all
active
hosts
(those
not
in
standby
mode,
maintenance
mode,
or
disconnected)
participate
in
an
election
to
choose
the
master
host
for
the
cluster;
all
other
hosts
become
slaves.
The
master
has
a
number
of
responsibilities,
including
monitoring
the
state
of
the
hosts
in
the
cluster,
protecting
the
powered-on
virtual
machines,
initiating
failover,
and
reporting
cluster
health
state
to
vCenter
Server.
The
master
is
elected
based
on
the
properties
of
the
hosts,
with
preference
being
given
to
the
one
connected
to
the
greatest
number
of
datastores.
Serving
in
the
role
of
master
will
have
little
or
no
effect
on
a
hosts
performance.
When
the
master
host
cant
communicate
with
a
slave
host
over
the
management
network,
the
master
uses
datastore
heartbeating
to
determine
the
state
of
that
slave
host.
By
default,
vSphere
HA
uses
two
datastores
for
heartbeating,
resulting
in
very
low
false
failover
rates.
In
order
to
reduce
the
chances
of
false
failover
even
furtherat
the
potential
cost
of
a
very
slight
performance
impactyou
can
use
the
advanced
option
das.heartbeatdsperhost
to
change
the
number
of
datastores
(up
to
a
maximum
of
five).
Enabling
HA
on
a
host
reserves
some
host
resources
for
HA
agents,
slightly
reducing
the
available
host
capacity
for
powering
on
virtual
machines.
When
HA
is
enabled,
the
vCenter
Server
reserves
sufficient
unused
resources
in
the
cluster
to
support
the
failover
capacity
specified
by
the
chosen
admission
control
policy.
This
can
reduce
the
number
of
virtual
machines
the
cluster
can
support.
VMware
Fault
Tolerance
For
each
virtual
machine
there
are
two
FT-related
actions
that
can
be
taken:
turning
on
or
off
FT
and
enabling
or
disabling
FT.
Turning
on
FT
prepares
the
virtual
machine
for
FT
by
prompting
for
the
removal
of
unsupported
devices,
disabling
unsupported
features,
and
setting
the
virtual
machines
memory
reservation
to
be
equal
to
its
memory
size
(thus
avoiding
ballooning
or
swapping).
Enabling
FT
performs
the
actual
creation
of
the
secondary
virtual
machine
by
live-migrating
the
primary.
Each
of
these
operations
has
performance
implications.
Dont
turn
on
FT
for
a
virtual
machine
unless
you
will
be
using
(i.e.,
Enabling)
FT
for
that
machine.
Turning
on
FT
automatically
disables
some
features
for
the
specific
virtual
machine
that
can
help
performance,
such
as
hardware
virtual
MMU
(if
the
processor
supports
it).
Enabling
FT
for
a
virtual
machine
uses
additional
resources
(for
example,
the
secondary
virtual
machine
uses
as
much
CPU
and
memory
as
the
primary
virtual
machine).
Therefore
make
sure
you
are
prepared
to
devote
the
resources
required
before
enabling
FT.
The
live
migration
that
takes
place
when
FT
is
enabled
can
briefly
saturate
the
vMotion
network
link
and
can
also
cause
spikes
in
CPU
utilization.
If
the
vMotion
network
link
is
also
being
used
for
other
operations,
such
as
FT
logging
(transmission
of
all
the
primary
virtual
machines
inputs
(incoming
network
traffic,
disk
reads,
etc.)
to
the
secondary
host),
the
performance
of
those
other
operations
can
be
impacted.
For
this
reason
it
is
best
to
have
separate
and
dedicated
NICs
(or
use
Network
I/O
Control,
described
in
Network
I/O
Control
(NetIOC)
on
page
34)
for
FT
logging
traffic
and
vMotion,
especially
when
multiple
FT
virtual
machines
reside
on
the
same
host.
Because
this
potentially
resource-intensive
live
migration
takes
place
each
time
FT
is
enabled,
we
recommend
that
FT
not
be
frequently
enabled
and
disabled.
FT-enabled
virtual
machines
must
use
eager-zeroed
thick-provisioned
virtual
disks.
Thus
when
FT
is
enabled
for
a
virtual
machine
with
thin
provisioned
virtual
disks
or
lazy-zeroed
thick-provisioned
virtual
disks
these
disks
need
to
be
converted.
This
one-time
conversion
process
uses
fewer
resources
when
the
virtual
machine
is
on
storage
hardware
that
supports
VAAI
(described
in
Hardware
Storage
Considerations
on
page
11).
Because
FT
logging
traffic
is
asymmetric
(the
majority
of
the
traffic
flows
from
primary
to
secondary),
congestion
on
the
logging
NIC
can
be
reduced
by
distributing
primaries
onto
multiple
hosts.
For
example
on
a
cluster
with
two
ESXi
hosts
and
two
virtual
machines
with
FT
enabled,
placing
one
of
the
primary
virtual
machines
on
each
of
the
hosts
allows
the
network
bandwidth
to
be
utilized
bidirectionally.
FT
virtual
machines
that
receive
large
amounts
of
network
traffic
or
perform
lots
of
disk
reads
can
create
significant
bandwidth
on
the
NIC
specified
for
the
logging
traffic.
This
is
true
of
machines
that
routinely
do
these
things
as
well
as
machines
doing
them
only
intermittently,
such
as
during
a
backup
operation.
To
avoid
saturating
the
network
link
used
for
logging
traffic
limit
the
number
of
FT
virtual
machines
on
each
host
or
limit
disk
read
bandwidth
and
network
receive
bandwidth
of
those
virtual
machines.
Make
sure
the
FT
logging
traffic
is
carried
by
at
least
a
Gigabit-rated
NIC
(which
should
in
turn
be
connected
to
at
least
Gigabit-rated
network
infrastructure).
Avoid
placing
more
than
four
FT-enabled
virtual
machines
on
a
single
host.
In
addition
to
reducing
the
possibility
of
saturating
the
network
link
used
for
logging
traffic,
this
also
limits
the
number
of
simultaneous
live-migrations
needed
to
create
new
secondary
virtual
machines
in
the
event
of
a
host
failure.
If
the
secondary
virtual
machine
lags
too
far
behind
the
primary
(which
usually
happens
when
the
primary
virtual
machine
is
CPU
bound
and
the
secondary
virtual
machine
is
not
getting
enough
CPU
cycles),
the
hypervisor
might
slow
the
primary
to
allow
the
secondary
to
catch
up.
The
following
recommendations
help
avoid
this
situation:
Make
sure
the
hosts
on
which
the
primary
and
secondary
virtual
machines
run
are
relatively
closely
matched,
with
similar
CPU
make,
model,
and
frequency.
Make
sure
that
power
management
scheme
settings
(both
in
the
BIOS
and
in
ESXi)
that
cause
CPU
frequency
scaling
are
consistent
between
the
hosts
on
which
the
primary
and
secondary
virtual
machines
run.
Enable
CPU
reservations
for
the
primary
virtual
machine
(which
will
be
duplicated
for
the
secondary
virtual
machine)
to
ensure
that
the
secondary
gets
CPU
cycles
when
it
requires
them.
Though
timer
interrupt
rates
do
not
significantly
affect
FT
performance,
high
timer
interrupt
rates
create
additional
network
traffic
on
the
FT
logging
NICs.
Therefore,
if
possible,
reduce
timer
interrupt
rates
as
described
in
Guest
Operating
System
CPU
Considerations
on
page
39.
VMware
vCenter
Update
Manager
VMware
vCenter
Update
Manager
provides
a
patch
management
framework
for
VMware
vSphere.
It
can
be
used
to
apply
patches,
updates,
and
upgrades
to
VMware
ESX
and
ESXi
hosts,
VMware
Tools
and
virtual
hardware,
and
so
on.
Update
Manager
Setup
and
Configuration
When
there
are
more
than
300
virtual
machines
or
more
than
30
hosts,
separate
the
Update
Manager
database
from
the
vCenter
Server
database.
When
there
are
more
than
1000
virtual
machines
or
more
than
100
hosts,
separate
the
Update
Manager
server
from
the
vCenter
Server
and
the
Update
Manager
database
from
the
vCenter
Server
database.
Allocate
separate
physical
disks
for
the
Update
Manager
patch
store
and
the
Update
Manager
database.
To
reduce
network
latency
and
packet
drops,
keep
to
a
minimum
the
number
of
network
hops
between
the
Update
Manager
server
system
and
the
ESXi
hosts.
In
order
to
cache
frequently
used
patch
files
in
memory,
make
sure
the
Update
Manager
server
host
has
at
least
2GB
of
RAM.
Update
Manager
General
Recommendations
For
compliance
view
for
all
attached
baselines,
latency
is
increased
linearly
with
the
number
of
attached
baselines.
We
therefore
recommend
the
removal
of
unused
baselines,
especially
when
the
inventory
size
is
large.
Upgrading
VMware
Tools
is
faster
if
the
virtual
machine
is
already
powered
on.
Otherwise,
Update
Manager
must
power
on
the
virtual
machine
before
the
VMware
Tools
upgrade,
which
could
increase
the
overall
latency.
Upgrading
virtual
machine
hardware
is
faster
if
the
virtual
machine
is
already
powered
off.
Otherwise,
Update
Manager
must
power
off
the
virtual
machine
before
upgrading
the
virtual
hardware,
which
could
increase
the
overall
latency.
NOTE
Because
VMware
Tools
must
be
up
to
date
before
virtual
hardware
is
upgraded,
Update
Manager
might
need
to
upgrade
VMware
Tools
before
upgrading
virtual
hardware.
In
such
cases
the
process
is
faster
if
the
virtual
machine
is
already
powered-on.
It
is
impossible
to
cover
all
the
different
virtual
network
infrastructure
design
deployments
based
on
the
various
combinations
of
type
of
servers,
network
adaptors
and
network
switch
capability
parameters.
In
this
paper,
the
following
four
commonly
used
deployments
that
are
based
on
standard
rack
server
and
blade
server
configurations
are
described:
It
is
assumed
that
the
network
switch
infrastructure
has
standard
layer
2
switch
features
(high
availability,
redundant
paths,
fast
convergence,
port
security)
available
to
provide
reliable,
secure
and
scalable
connectivity
to
the
server
infrastructure.
Virtual
Infrastructure
Traffic
vSphere
virtual
network
infrastructure
carries
different
traffic
types.
To
manage
the
virtual
infrastructure
traffic
effectively,
vSphere
and
network
administrators
must
understand
the
different
traffic
types
and
their
characteristics.
The
following
are
the
key
traffic
types
that
flow
in
the
vSphere
infrastructure,
along
with
their
traffic
characteristics:
Management
traffic:
This
traffic
flows
through
a
vmknic
and
carries
VMware
ESXi
host-to-VMware
vCenter
configuration
and
management
communication
as
well
as
ESXi
host-to-ESXi
host
high
availability
(HA)
related
communication.
This
traffic
has
low
network
utilization
but
has
very
high
availability
and
security
requirements.
VMware
vSphere
vMotion
traffic:
With
advancement
in
vMotion
technology,
a
single
vMotion
instance
can
consume
almost
a
full
10Gb
of
bandwidth.
A
maximum
of
eight
simultaneous
vMotion
instances
can
be
performed
on
a
10Gb
uplink;
four
simultaneous
vMotion
instances
are
allowed
on
a
1Gb
uplink.
vMotion
traffic
has
very
high
network
utilization
and
can
be
bursty
at
times.
Customers
must
make
sure
that
vMotion
traffic
doesnt
impact
other
traffic
types,
because
it
might
consume
all
available
I/O
resources.
Another
property
of
vMotion
traffic
is
that
it
is
not
sensitive
to
throttling
and
makes
a
very
good
candidate
on
which
to
perform
traffic
management.
Fault-tolerant
traffic:
When
VMware
Fault
Tolerance
(FT)
logging
is
enabled
for
a
virtual
machine,
all
the
logging
traffic
is
sent
to
the
secondary
fault-tolerant
virtual
machine
over
a
designated
vmknic
port.
This
process
can
require
a
considerable
amount
of
bandwidth
at
low
latency
because
it
replicates
the
I/O
traffic
and
memory-state
information
to
the
secondary
virtual
machine.
iSCSI/NFS
traffic:
IP
storage
traffic
is
carried
over
vmknic
ports.
This
traffic
varies
according
to
disk
I/O
requests.
With
end-to-end
jumbo
frame
configuration,
more
data
is
transferred
with
each
Ethernet
frame,
decreasing
the
number
of
frames
on
the
network.
This
larger
frame
reduces
the
overhead
on
server/targets
and
improves
the
IP
storage
performance.
On
the
other
hand,
congested
and
lower-speed
networks
can
cause
latency
issues
that
disrupt
access
to
IP
storage.
It
is
recommended
that
users
provide
a
high-
speed
path
for
IP
storage
and
avoid
any
congestion
in
the
network
infrastructure.
Virtual
machine
traffic:
Depending
on
the
workloads
that
are
running
on
the
guest
virtual
machine,
the
traffic
patterns
will
vary
from
low
to
high
network
utilization.
Some
of
the
applications
running
in
virtual
machines
might
be
latency
sensitive
as
is
the
case
with
VOIP
workloads.
Table
1
summarizes
the
characteristics
of
each
traffic
type.
To
understand
the
different
traffic
flows
in
the
physical
network
infrastructure,
network
administrators
use
network
traffic
management
tools.
These
tools
help
monitor
the
physical
infrastructure
traffic
but
do
not
provide
visibility
into
virtual
infrastructure
traffic.
With
the
release
of
vSphere5,
VDS
now
supports
the
NetFlow
feature,
which
enables
exporting
the
internal
(virtual
machine-
to-virtual
machine)
virtual
infrastructure
flow
information
to
standard
network
management
tools.
Administrators
now
have
the
required
visibility
into
virtual
infrastructure
traffic.
This
helps
administrators
monitor
the
virtual
network
infrastructure
traffic
through
a
familiar
set
of
network
management
tools.
Customers
should
make
use
of
the
network
data
collected
from
these
tools
during
the
capacity
planning
or
network
design
exercises.
Example
Deployment
Components
After
looking
at
the
different
design
considerations,
this
section
provides
a
list
of
components
that
are
used
in
an
example
deployment.
This
example
deployment
helps
illustrate
some
standard
VDS
design
approaches.
The
following
are
some
common
components
in
the
virtual
infrastructure.
The
list
doesnt
include
storage
components
that
are
required
to
build
the
virtual
infrastructure.
It
is
assumed
that
customers
will
deploy
IP
storage
in
this
example
deployment.
Hosts
Four
ESXi
hosts
provide
compute,
memory
and
network
resources
according
to
the
configuration
of
the
hardware.
Customers
can
have
different
numbers
of
hosts
in
their
environment,
based
on
their
needs.
One
VDS
can
span
across
350
hosts.
This
capability
to
support
large
numbers
of
hosts
provides
the
required
scalability
to
build
a
private
or
public
cloud
environment
using
VDS
(excellent
use
case).
Clusters
A
cluster
is
a
collection
of
ESXi
hosts
and
associated
virtual
machines
with
shared
resources.
Customers
can
have
as
many
clusters
in
their
deployment
as
are
required.
With
one
VDS
spanning
across
350
hosts,
customers
have
the
flexibility
of
deploying
multiple
clusters
with
a
different
number
of
hosts
in
each
cluster.
For
simple
illustration
purposes,
two
clusters
with
two
hosts
each
are
considered
in
this
example
deployment.
One
cluster
can
have
a
maximum
of
32
hosts.
VMware
vCenter
Server
VMware
vCenter
Server
centrally
manages
a
vSphere
environment.
Customers
can
manage
VDS
through
this
centralized
management
tool,
which
can
be
deployed
on
a
virtual
machine
or
a
physical
host.
The
vCenter
Server
system
is
not
shown
in
the
diagrams,
but
customers
should
assume
that
it
is
present
in
this
example
deployment.
It
is
used
only
to
provision
and
manage
VDS
configuration.
When
provisioned,
hosts
and
virtual
machine
networks
operate
independently
of
vCenter
Server.
All
components
required
for
network
switching
reside
on
ESXi
hosts.
Even
if
the
vCenter
server
system
fails,
the
hosts
and
virtual
machines
will
still
be
able
to
communicate.
Network
Infrastructure
Physical
network
switches
in
the
access
and
aggregation
layer
provide
connectivity
between
ESXi
hosts
and
to
the
external
world.
These
network
infrastructure
components
support
standard
layer
2
protocols
providing
secure
and
reliable
connectivity.
Along
with
the
preceding
four
components
of
the
physical
infrastructure
in
this
example
deployment,
some
of
the
virtual
infrastructure
traffic
types
are
also
considered
during
the
design.
The
following
section
describes
the
different
traffic
types
in
the
example
deployment.
Important
Virtual
and
Physical
Switch
Parameters
Before
going
into
the
different
design
options
in
the
example
deployment,
lets
take
a
look
at
the
virtual
and
physical
network
switch
parameters
that
should
be
considered
in
all
of
the
design
options.
There
are
some
key
parameters
that
vSphere
and
network
administrators
must
take
into
account
when
designing
VMware
virtual
networking.
Because
the
configuration
of
virtual
networking
goes
hand
in
hand
with
physical
network
configuration,
this
section
will
cover
both
the
virtual
and
physical
switch
parameters.
VDS
Parameters
VDS
simplifies
the
challenges
of
the
configuration
process
by
providing
one
single
pane
of
glass
to
perform
virtual
network
management
tasks.
As
opposed
to
configuring
a
vSphere
standard
(VSS)
on
each
individual
host,
administrators
can
configure
and
manage
one
single
VDS.
All
centrally
configured
network
policies
on
VDS
get
pushed
down
to
the
host
automatically
when
the
host
is
added
to
the
distributed
switch.
In
this
section,
an
overview
of
key
VDS
parameters
is
provided.
Host
Uplink
Connections
(vmnics)
and
dvuplink
Parameters
VDS
has
new
abstraction,
called
dvuplink,
for
the
physical
Ethernet
network
adaptors
(vmnics)
on
each
host.
It
is
defined
during
the
creation
of
the
VDS
and
can
be
considered
as
a
template
for
individual
vmnics
on
each
host.
All
the
properties
including
network
adaptor-teaming,
load
balancing
and
failover
policies
on
VDS
and
dvportgroups
are
configured
on
dvuplinks.
These
dvuplink
properties
are
automatically
applied
to
vmnics
on
individual
hosts
when
a
host
is
added
to
the
VDS
and
when
each
vmnic
on
the
host
is
mapped
to
a
dvuplink.
This
dvuplink
abstraction
therefore
provides
the
advantage
of
consistently
applying
teaming
and
failover
configurations
to
all
the
hosts
physical
Ethernet
network
adaptors
(vmnics).
Figure
2,
Shows
two
ESXi
hosts
with
four
Ethernet
network
adaptors
each.
When
these
hosts
are
added
to
the
VDS,
with
four
dvuplinks
configured
on
a
dvuplink
port
group,
administrators
must
assign
the
network
adaptors
(vmnics)
of
the
hosts
to
dvuplinks.
To
illustrate
the
mapping
of
the
dvuplinks
to
vmnics,
Figure
2
shows
one
type
of
mapping,
where
ESXi
hosts
vmnic0
is
mapped
to
dvuplink1,
vmnic1
to
dvuplink2
and
so
on.
Customers
can
choose
different
mapping
if
required,
where
vmnic0
can
be
mapped
to
a
different
dvuplink
instead
of
dvuplink1.
VMware
recommends
having
consistent
mapping
across
different
hosts
because
it
reduces
complexity
in
the
environment.
Figure
2.
dvuplink-to-vmnic
Mapping
As
a
best
practice,
customers
should
also
try
to
deploy
hosts
with
the
same
number
of
physical
Ethernet
network
adaptors
with
similar
port
speeds.
Also,
because
the
number
of
dvuplinks
on
VDS
depends
on
the
maximum
number
of
physical
Ethernet
network
adaptors
on
a
host,
administrators
should
take
that
into
account
during
dvuplink
port
group
configuration.
Customers
always
have
an
option
to
modify
this
dvuplink
configuration
based
on
the
new
hardware
capabilities.
Traffic
Types
and
dvportgroup
Parameters
Similar
to
port
groups
on
standard
switches,
dvportgroups
define
how
the
connection
is
made
through
the
VDS
to
the
network.
The
VLAN
ID,
traffic
shaping,
port
security,
teaming
and
load
balancing
parameters
are
configured
on
these
dvportgroups.
The
virtual
ports
(dvports)
connected
to
a
dvportgroup
share
the
same
properties
configured
on
a
dvgortgroup.
When
customers
want
a
group
of
virtual
machines
to
share
the
security
and
teaming
policies,
they
must
make
sure
that
the
virtual
machines
are
part
of
one
dvportgroup.
Customers
can
choose
to
define
different
dvportgroups
based
on
the
different
traffic
types
they
have
in
their
environment
or
based
on
the
different
tenants
or
applications
they
support
in
the
environment.
If
desired,
multiple
dvportgroups
can
share
the
same
VLAN
ID.
In
this
example
deployment,
the
dvportgroup
classification
is
based
on
the
traffic
types
running
in
the
virtual
infrastructure.
After
administrators
understand
the
different
traffic
types
in
the
virtual
infrastructure
and
identify
specific
security,
reliability
and
performed
requirements
for
individual
traffic
types,
the
next
step
is
to
create
unique
dvportgroups
associated
with
each
traffic
type.
As
was
previously
mentioned,
the
dvportgroup
configuration
defined
at
VDS
level
is
automatically
pushed
down
to
every
host
that
is
added
to
the
VDS.
For
example,
in
Figure2,
the
two
dvportgroups,
PG-A
(yellow)
and
PG-B
(green),
defined
at
the
distributed
switch
level
are
each
available
on
each
of
the
ESXi
hosts
that
are
part
of
that
VDS.
dvportgroup
Specific
Configuration
After
customers
decide
on
the
number
of
unique
dvportgroup
they
want
to
create
in
their
environment,
the
can
start
configuring
them.
The
configuration
options/parameters
are
similar
to
those
available
with
port
groups
on
vSphere
standard
switches.
There
are
some
additional
options
available
on
VDS
dvportgroups
that
are
related
to
teaming
setup
and
are
not
available
on
vSphere
standard
switches.
Customers
can
configure
the
following
key
parameters
for
each
dvportgroup.
Port security
As
part
of
the
teaming
algorithm
support,
VDS
provides
a
unique
approach
to
load
balancing
traffic
across
the
teamed
network
adaptors.
This
approach
is
called
load-based
teaming
(LBT),
which
distributes
the
traffic
across
the
network
adaptors
based
on
the
percentage
utilization
of
traffic
on
those
adaptors.
LBT
algorithm
works
on
both
ingress
and
egress
direction
of
the
network
adaptor
traffic,
as
opposed
to
the
hashing
algorithms
that
work
only
in
egress
direction
(traffic
flowing
out
of
the
network
adaptor).
Also
LBT
prevents
the
worst-case
scenario
that
might
happen
with
hashing
algorithms,
where
all
traffic
hashes
to
one
network
adaptor
of
the
team
while
other
network
adaptors
are
not
used
to
carry
any
traffic.
To
improve
the
utilization
of
all
the
links/network
adaptors,
VMware
recommends
the
use
of
this
advanced
feature,
LBT,
of
VDS.
The
LBT
approach
is
recommended
over
EtherChannel
on
physical
switches
and
route-based
IP
hash
configuration
on
the
virtual
switch.
Port
security
policies
at
port
group
level
enable
customer
protection
from
certain
activity
that
might
compromise
security.
For
example,
a
hacker
might
impersonate
a
virtual
machine
and
gain
unauthorized
access
by
spoofing
the
virtual
machines
MAC
address.
VMware
recommends
setting
the
MAC
address
Changes
and
Forged
Transmits
to
Reject
to
help
protect
against
attacks
launched
by
a
rogue
guest
operating
system.
Customers
should
set
the
Promiscuous
Mode
to
Reject
unless
they
want
to
monitor
the
traffic
for
network
troubleshooting
or
intrusion
detection
purposes.
NIOC
Network
I/O
control
(NIOC)
is
the
traffic
management
capability
available
on
VDS.
The
NIOC
concept
revolves
around
resource
pools
that
are
similar
in
many
ways
to
the
ones
existing
for
CPU
and
memory.
vSphere
and
network
administrators
now
can
allocate
I/O
shares
to
different
traffic
types
similarly
to
allocating
CPU
and
memory
resources
to
a
virtual
machine.
The
share
parameter
specifies
the
relative
importance
of
a
traffic
type
over
other
traffic
and
provides
a
guaranteed
minimum
when
the
other
traffic
competes
for
a
particular
network
adaptor.
The
shares
are
specified
in
abstract
units
numbered
1
to
100.
Customers
can
provision
shares
to
different
traffic
types
based
on
the
amount
of
resources
each
traffic
type
requires.
This
capability
of
provisioning
I/O
resources
is
very
useful
in
situations
where
there
are
multiple
traffic
types
competing
for
resources.
For
example,
in
a
deployment
where
vMotion
and
virtual
machine
traffic
types
are
flowing
through
one
network
adaptor,
it
is
possible
that
vMotion
activity
might
impact
the
virtual
machine
traffic
performance.
In
this
situation,
shares
configured
in
NIOC
provide
the
required
isolation
to
the
vMotion
and
virtual
machine
traffic
type
and
prevent
one
flow
(traffic
type)
from
dominating
the
other
flow.
NIOC
configuration
provides
one
more
parameter
that
customers
can
utilize
if
they
want
to
put
any
limits
on
a
particular
traffic
type.
This
parameter
is
called
the
limit.
The
limit
configuration
specifies
the
absolute
maximum
bandwidth
for
a
traffic
type
on
a
host.
The
configuration
of
the
limit
parameter
is
specified
in
Mbps.
NIOC
limits
and
shares
parameters
work
only
on
the
outbound
traffic,
i.e.,
traffic
that
is
flowing
out
of
the
ESXi
host.
VMware
recommends
that
customers
utilize
this
traffic
management
feature
whenever
they
have
multiple
traffic
types
flowing
through
one
network
adaptor,
a
situation
that
is
more
prominent
with
10
Gigabit
Ethernet
(GbE)
network
deployments
but
can
happen
in
1GbE
network
deployments
as
well.
The
common
use
case
for
using
NIOC
in
1GbE
network
adaptor
deployments
is
when
the
traffic
from
different
workloads
or
different
customer
virtual
machines
is
carried
over
the
same
network
adaptor.
As
multiple-
workload
traffic
flows
through
a
network
adaptor,
it
becomes
important
to
provide
I/O
resources
based
on
the
needs
of
the
workload.
With
the
release
of
vSphere
5,
customers
now
can
make
use
of
the
new
user-defined
network
resource
pools
capability
and
can
allocate
I/O
resources
to
the
different
workloads
or
different
customer
virtual
machines,
depending
on
their
needs.
This
user-defined
network
resource
pools
feature
provides
the
granular
control
in
allocating
I/O
resources
and
meeting
the
service-level
agreement
(SLA)
requirements
for
the
virtualized
tier
1
workloads.
Bidirectional
Traffic
Shaping
Besides
NIOC,
there
is
another
traffic-shaping
feature
that
is
available
in
the
vSphere
platform.
It
can
be
configured
on
a
dvportgroup
or
dvport
level.
Customers
can
shape
both
inbound
and
outbound
traffic
using
three
parameters:
average
bandwidth,
peak
bandwidth
and
burst
size.
Customers
who
want
more
granular
traffic-shaping
controls
to
manage
their
traffic
types
can
take
advantage
of
this
capability
of
VDS
along
with
the
NIOC
feature.
It
is
recommended
that
network
administrators
in
your
organization
be
involved
while
configuring
these
granular
traffic
parameters.
These
controls
make
sense
only
when
there
are
oversubscription
scenarios
caused
by
the
oversubscribed
physical
switch
infrastructure
or
virtual
infrastructurethat
are
causing
network
performance
issues.
So
it
is
very
important
to
understand
the
physical
and
virtual
network
environment
before
making
any
bidirectional
traffic-shaping
configurations.
switch
technology,
the
Ethernet
network
adaptor
connections
can
be
terminated
on
two
different
physical
switches
.
The
clustered
physical
switch
technology
is
referred
to
by
different
names
by
networking
vendors.
For
example,
Cisco
calls
their
switch
clustering
solution
Virtual
Switching
System
(Nexus6K,
7K
use
vPC);
Brocade
calls
theirs
Virtual
Cluster
Switching.
Refer
to
the
networking
vendor
guidelines
and
configuration
details
when
deploying
switch
clustering
technology.
Link-State
Tracking
Link-state
tracking
is
a
feature
available
on
Cisco
switches
to
manage
the
link
state
of
downstream
ports,
ports
connected
to
servers,
based
on
the
status
of
upstream
ports,
ports
connected
to
aggregation/core
switches.
When
there
is
any
failure
on
the
upstream
links
connected
to
aggregation
or
core
switches,
the
associated
downstream
link
status
goes
down.
The
server
connected
on
the
downstream
link
is
then
able
to
detect
the
failure
and
reroute
the
traffic
on
other
working
links.
This
feature
therefore
provides
the
protection
from
network
failures
due
to
the
failed
upstream
ports
in
non-mesh
topologies.
Unfortunately,
this
feature
is
not
available
on
all
vendors
switches,
and
even
if
it
is
available,
it
might
not
be
referred
to
as
link-state
tracking.
Customers
should
talk
to
the
switch
vendors
to
find
out
whether
a
similar
feature
is
supported
on
their
switches.
Figure
3
shows
the
resilient
mesh
topology
on
the
left
and
a
simple
loop-free
topology
on
the
right.
VMware
highly
recommends
deploying
the
mesh
topology
shown
on
the
left,
which
provides
highly
reliable
redundant
design
and
doesnt
need
a
link-state
tracking
feature.
Customers
who
dont
have
high-end
networking
expertise
and
are
also
limited
in
number
of
switch
ports
might
prefer
the
deployment
shown
on
the
right.
In
this
deployment,
customers
dont
have
to
run
STP
because
there
are
no
loops
in
the
network
design.
The
downside
of
this
simple
design
is
seen
when
there
is
a
failure
in
the
link
between
the
access
and
aggregation
switches.
In
that
failure
scenario,
the
server
will
continue
to
send
traffic
on
the
same
network
adaptor
even
when
the
access
layer
switch
is
dropping
the
traffic
at
the
upstream
interface.
To
avoid
this
black
holing
of
server
traffic,
customers
can
enable
link-state
tracking
on
the
virtual
and
physical
switches
and
indicate
any
failure
between
access
and
aggregation
switch
layers
to
the
server
through
link-state
information.
VDS
has
default
network
failover
detection
configuration
set
as
link
status
only.
Customers
should
keep
this
configuration
if
they
are
enabling
the
link-state
tracking
feature
on
physical
switches.
If
link-state
tracking
capability
is
not
available
on
physical
switches,
and
there
are
no
redundant
paths
available
in
the
design,
customers
can
make
use
of
the
beacon
probing
feature
available
on
VDS.
The
beacon
probing
function
is
a
software
solution
available
on
virtual
switches
for
detecting
link
failures
upstream
from
the
access
layer
physical
switch
to
the
aggregation/core
switches.
Beacon
probing
is
most
useful
with
three
or
more
uplinks
in
a
team.
dvuplink
Configuration
To
support
the
maximum
of
eight
1GbE
network
adaptors
per
host,
the
dvuplink
port
group
is
configured
with
eight
dvuplinks
(dvuplink1dvuplink8).
On
the
hosts,
dvuplink1
is
associated
with
vmnic0,
dvuplink2
is
associated
with
vmnic1,
and
so
on.
It
is
a
recommended
practice
to
change
the
names
of
the
dvuplinks
to
something
meaningful
and
easy
to
track.
For
example,
dvuplink1,
which
gets
associated
with
vmnic
on
a
motherboard,
can
be
renamed
as
LOM-uplink1;
dvuplink2,
which
gets
associated
with
vmnic
on
an
expansion
card,
can
be
renamed
as
Expansion-uplink1.
If
the
hosts
have
some
Ethernet
network
adaptors
as
LAN
on
motherboard
(LOM)
and
some
on
expansion
cards,
for
a
better
resiliency
story,
VMware
recommends
selecting
one
network
adaptor
from
LOM
and
one
from
an
expansion
card
when
configuring
network
adaptor
teaming.
To
configure
this
teaming
on
a
VDS,
administrators
must
pay
attention
to
the
dvuplink
and
vmnic
association
along
with
dvportgroup
configuration
where
network
adaptor
teaming
is
enabled.
In
the
network
adaptor-teaming
configuration
on
a
dvportgroup,
administrators
must
choose
the
various
dvuplinks
that
are
part
of
a
team.
If
the
dvuplinks
are
named
appropriately
according
to
the
host
vmnic
association,
administrators
can
select
LOM-uplink1
and
Expansion-uplink1
when
configuring
the
teaming
option
for
a
dvportgroup.
dvportgroup
Configuration
As
described
in
Table
2,
there
are
five
different
port
groups
that
are
configured
for
the
five
different
traffic
types.
Customers
can
create
up
to
5,000
unique
port
groups
per
VDS.
In
this
example
deployment,
the
decision
on
creating
different
port
groups
is
based
on
the
number
of
traffic
types.
According
to
Table
2,
dvportgroup
PG-A
is
created
for
the
management
traffic
type.
There
are
other
dvportgroups
defined
for
the
other
traffic
types.
The
following
are
the
key
configurations
of
dvportgroup
PG-A:
Teaming
option:
Explicit
failover
order
provides
a
deterministic
way
of
directing
traffic
to
a
particular
uplink.
By
selecting
dvuplink1
as
an
active
uplink
and
dvuplink2
as
a
standby
uplink,
management
traffic
will
be
carried
over
dvuplink1
unless
there
is
a
failure
on
dvuplink1.
All
other
dvuplinks
are
configured
as
unused.
Configuring
the
failback
option
to
No
is
also
recommended,
to
avoid
the
flapping
of
traffic
between
two
network
adaptors.
The
failback
option
determines
how
a
physical
adaptor
is
returned
to
active
duty
after
recovering
from
a
failure.
If
failback
is
set
to
No,
a
tailed
adaptor
is
left
inactive,
even
after
recovery,
until
another
currently
active
adaptor
fails
and
requires
a
replacement.
VMware
recommends
isolating
all
traffic
types
from
each
other
by
defining
a
separate
VLAN
for
each
dvportgroup.
There
are
several
other
parameters
that
are
part
of
the
dvportgroup
configuration.
Customers
can
choose
to
configure
these
parameters
based
on
their
environment
needs.
For
example,
customers
can
configure
PVLAN
to
provide
isolation
when
there
are
limited
VLANs
available
in
the
environment.
As
you
follow
the
dvportgroups
configuration
in
Table
2,
you
can
see
that
each
traffic
type
is
carried
over
a
specific
dvuplink,
with
the
exception
of
the
virtual
machine
traffic
type,
The
virtual
machine
traffic
type
uses
two
active
links,
dvuplink7
and
dvuplink8,
and
these
links
are
utilized
through
the
LBT
algorithm.
As
was
previously
mentioned,
the
LBT
algorithm
is
much
more
efficient
than
the
standard
hashing
algorithm
in
utilizing
link
bandwidth.
Table
3.
Static
Design
Configuration
with
iSCSI
Multipathing
and
Multi-Network
Adaptor
vMotion
As
shown
in
Table
3,
there
are
two
entries
each
for
the
vMotion
and
iSCSI
traffic
types.
Also
shown
is
a
list
of
the
additional
dvportgroup
configurations
required
to
support
the
multi-network
adaptor
vMotion
and
iSCSI
multipathing
processes.
For
multi-network
adaptor
vMotion,
dvportgroups
PG-B1
and
PG-B2
are
listed,
configured
with
dvuplink
3
and
dvuplink4
respectively
as
active
links.
And
for
iSCSI
multipathing,
dvportgroups
PG-D1
and
PG-D2
are
connected
to
dvuplink5
and
dvuplink6
respectively
as
active
links.
Load
balancing
across
the
multiple
dvuplinks
is
performed
by
the
multipathing
logic
in
the
iSCSI
process
and
by
the
ESX
platform
in
the
vMotion
process.
Configuring
the
teaming
policies
for
these
dvportgroups
is
not
required.
FT,
management
and
virtual
machine
traffic-type
dvportgroup
configuration
and
physical
switch
configuration
for
this
design
remain
the
same
as
those
described
in
Design
Option
1
of
the
previous
section.
This
static
design
approach
improves
on
the
first
design
by
using
advanced
capabilities
such
as
iSCSI
multipathing
and
multi
network
adaptor
vMotion.
But
at
the
same
time,
this
option
has
the
same
challenges
related
to
underutilized
resources
and
inflexibility
in
allocating
additional
resources
on
the
fly
to
different
traffic
types.
Design
Option
2
-
Dynamic
Configuration
with
NIOC
and
LBT
After
looking
at
the
traditional
design
approach
with
static
uplink
configurations,
lets
take
a
look
at
the
VMware
recommended
design
option
that
takes
advantage
of
the
advanced
VDS
features
such
as
NIOC
and
LBT.
In
this
design,
the
connectivity
to
the
physical
network
infrastructure
remains
the
same
as
that
described
in
the
static
design
option.
However,
instead
of
allocating
specific
dvuplinks
to
individual
traffic
types,
the
ESXi
platform
utilizes
those
dvuplinks
dynamically.
To
illustrate
this
dynamic
design,
each
virtual
infrastructure
traffic
types
bandwidth
utilization
is
estimated.
In
a
real
deployment,
customers
should
first
monitor
the
virtual
infrastructure
traffic
over
a
period
of
time,
to
gauge
the
bandwidth
utilization,
and
then
come
up
with
bandwidth
numbers
for
each
traffic
type.
The
following
are
some
bandwidth
numbers
estimated
by
traffic
type
for
the
scenario:
Management
traffic
(<1GB)
vMotion
(1GB)
FT
(1GB)
iSCSI
(1GB)
Virtual
machine
(2G
B)
Based
on
this
bandwidth
information,
administrators
can
provision
appropriate
I/O
resources
to
each
traffic
type
by
using
the
NIOC
feature
of
VDS.
Lets
take
a
look
at
the
VDS
parameter
configurations
for
this
design,
as
well
as
the
NIOC
setup.
The
dvuplink
port
group
configuration
remains
the
same,
with
eight
dvuplinks
created
for
the
eight
1GbE
network
adaptors.
The
dvportgroup
configuration
is
described
in
the
following
section.
dvportgroup
Configuration
In
this
design,
all
dvuplinks
are
active
and
there
are
no
standby
and
unused
uplinks,
as
shown
in
Table
4.
All
dvuplinks
are
therefore
available
for
use
by
the
teaming
algorithm.
The
following
are
the
key
parameter
configurations
of
dvportgroup
PG-A:
Teaming
option:
LBT
is
selected
as
the
teaming
algorithm.
With
LBT
configuration,
the
management
traffic
initially
will
be
scheduled
based
on
the
virtual
port
ID
hash.
Depending
on
the
hash
output,
management
traffic
is
sent
out
over
one
of
the
dvuplinks.
Other
traffic
types
in
the
virtual
infrastructure
can
also
be
scheduled
on
the
same
dvuplink
initially.
However,
when
the
utilization
of
the
dvuplink
goes
beyond
the
75
percent
threshold,
the
LBT
algorithm
will
be
invoked
and
some
of
the
traffic
will
be
moved
to
other
underutilized
dvuplinks.
It
is
possible
that
management
traffic
will
be
moved
to
other
dvuplinks
when
such
an
LBT
event
occurs.
The
failback
option
means
going
from
using
a
standby
link
to
using
an
active
uplink
after
the
active
uplink
comes
back
into
operation
after
a
failure.
This
failback
option
works
when
there
are
active
and
standby
dvuplink
configurations.
In
this
design,
there
are
no
standby
dvuplinks.
So
when
an
active
uplink
fails,
the
traffic
flowing
on
that
dvuplink
is
moved
to
another
working
dvuplink.
If
the
failed
dvuplink
comes
back,
the
LBT
algorithm
will
schedule
new
traffic
on
that
dvuplink.
This
option
is
left
as
the
default.
VMware
recommends
isolating
all
traffic
types
from
each
other
by
defining
a
separate
VLAN
for
each
dvportgroup.
There
are
several
other
parameters
that
are
part
of
the
dvportgroup
configuration.
Customers
can
choose
to
configure
these
parameters
based
on
their
environment
needs.
For
example,
they
can
configure
PVLAN
to
provide
isolation
when
there
are
limited
VLANs
available
in
the
environment.
As
you
follow
the
dvportgroups
configuration
in
Table
4,
you
can
see
that
each
traffic
type
has
all
dvuplinks
active
and
that
these
links
are
utilized
through
the
LBT
algorithm.
Lets
now
look
at
the
NIOC
configuration
described
in
the
last
two
columns
of
Table
4.
Table
4.
Dynamic
Design
Configuration
with
NIOC
and
LBT
The
NIOC
configuration
in
this
design
helps
provide
the
appropriate
I/O
resources
to
the
different
traffic
types
(through
shares).
Based
on
the
previously
estimated
bandwidth
numbers
per
traffic
type,
the
shares
parameter
is
configured
in
the
NIOC
shares
column
in
Table
4.
The
shares
values
specify
the
relative
importance
of
specific
traffic
types,
and
NIOC
ensures
that
during
contention
scenarios
on
the
dvuplinks,
each
traffic
type
gets
the
allocated
bandwidth.
For
example,
a
shares
configuration
of
10
for
vMotion,
iSCSI
and
FT
allocates
equal
bandwidth
to
these
traffic
types.
Virtual
machines
get
the
highest
bandwidth
with
20
shares
and
management
gets
lower
bandwidth
with
5
shares.
To
illustrate
how
share
values
translate
to
bandwidth
numbers,
lets
take
an
example
of
1Gb
capacity
dvuplink
carrying
all
five
traffic
types.
This
is
a
worst-case
scenario
where
all
traffic
types
are
mapped
to
one
dvuplink.
This
will
never
happen
when
customers
enable
the
LBT
feature,
because
LBT
will
balance
the
traffic
based
on
the
utilization
of
uplinks.
This
example
shows
how
much
bandwidth
each
traffic
type
will
be
allowed
on
one
dvuplink
during
a
contention
or
oversubscription
scenario
and
when
LBT
is
not
enabled.
Total
shares:
management
(5)
(
vMotion
(10)
+
FT
(10)
+
iSCSI
(10)
+
virtual
machine
(20)
=
55
)
1Gb
=
1000Mbps
o Management:
5
shares;
(5/55)
x
1000
=
90.91Mbps
o vMotion:
10
shares;
(10/55)
x
1000
=
181.18Mbps
o FT:
10
shares;
(10/55)
x
1000
=
181.18Mbps
o iSCSI:
lo
shares;
(10/55)
x
1000
=
181
.18Mbps
o Virtual
machine:
20
shares:
(20/55)
x
1000
=
363.64Mbps
Note:
Given
a
workload
requirement
for
a
portgroup
provided
in
Mbps
identify
the
required
share
value:
To
calculate
the
bandwidth
numbers
during
contention,
you
should
first
calculate
the
percentage
of
bandwidth
for
a
traffic
type
by
dividing
its
share
value
by
the
total
available
share
number
(55).
In
the
second
step,
the
total
bandwidth
of
the
dvuplink
(1Gb)
is
multiplied
with
the
percentage
of
bandwidth
number
calculated
in
the
first
step.
For
example,
5
shares
allocated
to
management
traffic
translate
to
90.91Mbps
of
bandwidth
to
management
process
on
a
fully
utilized
1Gb
network
adaptor.
In
this
example,
custom
share
configuration
is
discussed,
but
a
customer
can
make
use
of
predefined
high
(100),
normal
(50)
and
low
(25)
shares
when
assigning
them
to
different
traffic
types.
The
vSphere
platform
takes
these
configured
share
values
and
applies
them
per
uplink.
The
schedulers
running
at
each
uplink
are
responsible
for
making
sure
that
the
bandwidth
resources
are
allocated
according
to
the
shares.
In
the
case
of
an
eight
1GbE
network
adaptor
deployment,
there
are
eight
schedulers
running.
Depending
on
the
number
of
traffic
types
scheduled
on
a
particular
uplink,
the
scheduler
will
divide
the
bandwidth
among
the
traffic
types,
based
on
the
share
numbers.
For
example,
if
only
FT
(10
shares)
and
management
(5
shares)
traffic
are
flowing
through
dvuplink
5,
FT
traffic
will
get
double
the
bandwidth
of
management
traffic,
based
on
the
shares
value.
Also,
when
there
is
no
management
traffic
flowing,
all
bandwidth
can
be
utilized
by
the
FT
process.
This
flexibility
in
allocating
I/O
resources
is
the
key
benefit
of
the
NIOC
feature.
The
NIOC
limits
parameter
of
Table
4
is
not
configured
in
this
design.
The
limits
value
specifies
an
absolute
maximum
limit
on
egress
traffic
for
a
traffic
type.
Limits
are
specified
in
Mbps.
This
configuration
provides
a
hard
limit
on
any
traffic,
even
if
I/O
resources
are
available
to
use.
Using
limits
configuration
is
not
recommended
unless
you
really
want
to
control
the
traffic,
even
though
additional
resources
are
available.
There
is
no
change
in
physical
switch
configuration
in
this
design
approach,
even
with
the
choice
of
the
new
LBT
algorithm.
The
LBT
teaming
algorithm
doesnt
require
any
special
configuration
on
physical
switches.
Refer
to
the
physical
switch
settings
described
in
Design
Option
1.
Table
4.
Dynamic
Design
Configuration
with
NIOC
and
LBT
This
design
does
not
provide
higher
than
1Gb
bandwidth
to
the
vMotion
and
iSCSI
traffic
types
as
is
the
case
with
static
design
using
multi-network
adaptor
vMotion
and
iSCSI
multipathing.
The
LBT
algorithm
cannot
split
the
infrastructure
traffic
across
multiple
dvuplink
ports
and
utilize
all
the
links.
So
even
if
vMotion
dvportgroup
PG-B
has
all
eight
1GbE
network
adaptors
as
active
uplinks,
vMotion
traffic
will
be
carried
over
only
one
of
the
eight
uplinks.
The
main
advantage
of
this
design
is
evident
in
the
scenarios
where
the
vMotion
process
is
not
using
the
uplink
bandwidth,
and
other
traffic
types
are
in
need
of
the
additional
resources.
In
these
situations,
NIOC
makes
sure
that
the
unused
bandwidth
is
allocated
to
the
other
traffic
types
that
need
it.
This
dynamic
design
option
is
the
recommended
approach
because
it
takes
advantage
of
the
advanced
VDS
features
and
utilizes
I/O
resources
efficiently.
This
option
also
provides
active-active
resiliency
where
no
uplinks
are
in
standby
mode.
In
this
design
approach,
customers
allow
the
vSphere
platform
to
make
the
optimal
decisions
on
scheduling
traffic
across
multiple
uplinks.
Some
customers
who
have
restrictions
in
the
physical
infrastructure
in
terms
of
bandwidth
capacity
across
different
paths
and
limited
availability
of
the
layer
2
domain
might
not
be
able
to
take
advantage
of
this
dynamic
design
option.
When
deploying
this
design
option,
it
is
important
to
consider
all
the
different
traffic
paths
that
a
traffic
type
can
take
and
to
make
sure
that
the
physical
switch
infrastructure
can
support
the
specific
characteristics
required
for
each
traffic
type.
VMware
recommends
that
vSphere
and
network
administrators
work
together
to
understand
the
impact
of
the
vSphere
platforms
traffic
scheduling
feature
over
the
physical
network
infrastructure
before
deploying
this
design
option.
Every
customer
environment
is
different,
and
the
requirements
for
the
traffic
types
are
also
different.
Depending
on
the
need
of
the
environment,
a
customer
can
modify
these
design
options
to
fit
their
specific
requirements.
For
example,
customers
can
choose
to
use
a
combination
of
static
and
dynamic
design
options
when
they
need
higher
bandwidth
for
iSCSI
and
vMotion
activities.
In
this
hybrid
design,
four
uplinks
can
be
statically
allocated
to
iSCSI
and
vMotion
traffic
types
while
the
remaining
four
uplinks
are
used
dynamically
for
the
remaining
traffic
types
(it
may
also
be
that
the
IP
storage
infrastructure
uses
separate
physical
switches).
Table
5
shows
the
traffic
types
and
associated
port
group
configurations
for
the
hybrid
design.
As
shown
in
the
table,
management,
FT
and
virtual
machine
traffic
will
be
distributed
on
dvuplink1
to
dvuplink4
through
the
vSphere
platforms
traffic
scheduling
features,
LBT
and
NIOC.
The
remaining
four
dvuplinks
are
statically
assigned
to
vMotion
and
iSCSI
traffic
types.
Figure
5.
Rack
Server
with
Two
1OGbE
Network
Adaptors
Design
Option
1
-
Static
Configuration
The
static
configuration
approach
for
rack
server
deployment
with
1OGbE
network
adaptors
is
similar
to
the
one
described
in
Design
Option
1
of
rack
server
deployment
with
eight
1GbE
adaptors.
There
are
a
few
differences
in
the
configuration
where
the
numbers
of
dvuplinks
are
changed
from
eight
to
two,
and
dvportgroup
parameters
are
different.
Lets
take
a
look
at
the
configuration
details
on
the
VDS
front.
dvuplink
Configuration
To
support
the
maximum
two
Ethernet
network
adaptors
per
host,
the
dvuplink
port
group
is
configured
with
two
dvuplinks
(dvuplink,
dvuplink2).
On
the
hosts,
dvuplink1
is
associated
with
vmnic0
and
dvuplink2
is
associated
with
vmnic1.
dvportgroup
Configuration
As
described
in
Table
6,
there
are
five
different
dvportgroups
that
are
configured
for
the
five
different
traffic
types.
For
example,
dvportgroup
PG-A
is
created
for
the
management
traffic
type.
The
following
are
the
other
key
configurations
of
dvportgroup
PG-A:
Teaming
option:
An
explicit
failover
order
provides
a
deterministic
way
of
directing
traffic
to
a
particular
uplink.
By
selecting
dvuplink
as
an
active
uplink
and
dvuplink2
as
a
standby
uplink,
management
traffic
will
be
carried
over
dvuplink
unless
there
is
a
failure
with
it.
Configuring
the
failback
option
to
No
is
also
recommended,
to
avoid
the
flapping
of
traffic
between
two
network
adaptors.
The
failback
option
determines
how
a
physical
adaptor
is
returned
to
active
duty
after
recovering
from
a
failure.
It
failback
is
set
to
No,
a
failed
adaptor
is
left
inactive,
even
after
recovery,
until
another
currently
active
adaptor
tails,
requiring
its
replacement.
VMware
recommends
isolating
all
traffic
types
from
each
other
by
defining
a
separate
VLAN
for
each
dvportgroup.
There
are
various
other
parameters
that
are
part
of
the
dvportgroup
configuration.
Customers
can
choose
to
configure
these
parameters
based
on
their
environment
needs.
Table
6
provides
the
configuration
details
for
all
the
dvportgroups.
According
to
the
configuration,
dvuplink
carries
management,
iSCSI
and
virtual
machine
traffic;
dvuplink2
handles
vMotion,
FT
and
virtual
machine
traffic.
As
you
can
see,
the
virtual
machine
traffic
type
makes
use
of
two
uplinks,
and
these
uplinks
are
utilized
through
the
LBT
algorithm.
With
this
deterministic
teaming
policy,
customers
can
decide
to
map
different
traffic
types
to
the
available
uplink
ports,
depending
on
environment
needs.
For
example,
if
iSCSI
traffic
needs
higher
bandwidth
and
other
traffic
types
have
relatively
low
bandwidth
requirements,
customers
can
decide
to
keep
only
iSCSI
traffic
on
dvuplink1
and
move
all
other
traffic
to
dvuplink2.
When
deciding
on
these
traffic
paths,
customers
should
understand
the
physical
network
connectivity
and
the
paths
bandwidth
capacities.
Physical
Switch
Configuration
The
external
physical
switch,
which
the
rack
servers
network
adaptors
are
connected
to,
has
trunk
configuration
with
all
the
appropriate
VLANs
enabled.
As
described
in
the
physical
network
switch
parameters
sections,
the
following
switch
configurations
are
performed
based
on
the
VDS
setup
described
in
Table
6.
Enable
STP
on
the
trunk
ports
facing
ESXi
hosts,
along
with
the
PortFast
mode
and
BPDU
guard
feature.
The
teaming
configuration
on
VDS
is
static
and
therefore
no
link
aggregation
is
configured
on
the
physical
switches.
Because
of
the
mesh
topology
deployment
shown
in
Figure
5,
the
link
state-tracking
feature
is
not
required
on
the
physical
switches.
Table
6.
Static
Design
Configuration
This
static
design
option
provides
flexibility
in
the
traffic
path
configuration,
but
it
cannot
protect
against
one
traffic
types
dominating
others.
For
example,
there
is
a
possibility
that
a
network-intensive
vMotion
process
might
take
away
most
of
the
network
bandwidth
and
impact
virtual
machine
traffic.
Bidirectional
traffic-shaping
parameters
at
port
group
and
port
levels
can
provide
some
help
in
managing
different
traffic
rates.
However,
using
this
approach
for
traffic
management
requires
customers
to
limit
the
traffic
on
the
respective
dvportgroups.
Limiting
traffic
to
a
certain
level
through
this
method
puts
a
hard
limit
on
the
traffic
types,
even
when
the
bandwidth
is
available
to
utilize.
This
underutilization
of
I/O
resources
because
of
hard
limits
is
overcome
through
the
NIOC
feature,
which
provides
flexible
traffic
management
based
on
the
shares
parameters.
Design
Option
2,
described
in
the
following
section,
is
based
on
the
NIOC
feature.
Design
Option
2
-
Dynamic
Configuration
with
NIOC
and
LBT
This
dynamic
design
option
is
the
VMware-recommended
approach
that
takes
advantage
of
the
NIOC
and
LBT
features
of
the
VDS.
Connectivity
to
the
physical
network
infrastructure
remains
the
same
as
that
described
in
Design
Option
1.
However,
instead
of
allocating
specific
dvuplinks
to
individual
traffic
types,
the
ESXi
platform
utilizes
those
dvuplinks
dynamically.
To
illustrate
this
dynamic
design,
each
virtual
infrastructure
traffic
types
bandwidth
utilization
is
estimated.
In
a
real
deployment,
customers
should
first
monitor
the
virtual
infrastructure
traffic
over
a
period
of
time
to
gauge
the
bandwidth
utilization,
and
then
come
up
with
bandwidth
numbers.
The
following
are
some
bandwidth
numbers
estimated
by
traffic
type:
Management
traffic
(<1G
B)
vMotion
(2GB)
FT
(1GB)
iSCSI
(2G
B)
Virtual
machine
(2GB)
These
bandwidth
estimates
are
different
from
the
one
considered
with
rack
server
deployment
with
eight
1GbE
network
adaptors.
Lets
take
a
look
at
the
VDS
parameter
configurations
for
this
design.
The
dvuplink
port
group
configuration
remains
the
same,
with
two
dvuplinks
created
for
the
two
1OGbE
network
adaptors.
The
dvportgroup
configuration
is
as
follows.
dvportgroup
Configuration
In
this
design,
all
dvuplinks
are
active
and
there
are
no
standby
and
unused
uplinks,
as
shown
in
Table
7.
All
dvuplinks
are
therefore
available
for
use
by
the
teaming
algorithm.
The
following
are
the
key
configurations
of
dvportgroup
PG-A:
Teaming
option:
LBT
is
selected
as
the
teaming
algorithm.
With
LBT
configuration,
management
traffic
initially
will
be
scheduled
based
on
the
virtual
port
ID
hash.
Based
on
the
hash
output,
management
traffic
will
be
sent
out
over
one
of
the
dvuplinks.
Other
traffic
types
in
the
virtual
infrastructure
can
also
be
scheduled
on
the
same
dvuplink
with
LBT
configuration.
Subsequently,
if
the
utilization
of
the
uplink
goes
beyond
the
75
percent
threshold,
the
LBT
algorithm
will
be
invoked
and
some
of
the
traffic
will
be
moved
to
other
underutilized
dvuplinks.
It
is
possible
that
management
traffic
will
get
moved
to
other
dvuplinks
when
such
an
event
occurs.
There
are
no
standby
dvuplinks
in
this
configuration,
so
the
failback
setting
is
not
applicable
for
this
design
approach.
The
default
setting
for
this
failback
option
is
Yes.
VMware
recommends
isolating
all
traffic
types
from
each
other
by
defining
a
separate
VLAN
for
each
dvportgroup.
There
are
several
other
parameters
that
are
part
of
the
dvportgroup
configuration.
Customers
can
choose
to
configure
these
parameters
based
on
their
environment
needs.
As
you
follow
the
dvportgroups
configuration
in
Table
7,
you
can
see
that
each
traffic
type
has
all
the
dvuplinks
as
active
and
these
uplinks
are
utilized
through
the
LBT
algorithm.
Lets
take
a
look
at
the
NIOC
configuration.
The
NIOC
configuration
in
this
design
not
only
helps
provide
the
appropriate
I/O
resources
to
the
different
traffic
types
but
also
provides
SLA
guarantees
by
preventing
one
traffic
type
from
dominating
others.
Based
on
the
bandwidth
assumptions
made
for
different
traffic
types,
the
shares
parameters
are
configured
in
the
NIOC
shares
column
in
Table
7.
To
illustrate
how
share
values
translate
to
bandwidth
numbers
in
this
deployment,
lets
take
an
example
of
a
10Gb
capacity
dvuplink
carrying
all
five
traffic
types.
This
is
a
worst-case
scenario
in
which
all
traffic
types
are
mapped
to
one
dvuplink.
This
will
never
happen
when
customers
enable
the
LBT
feature,
because
LBT
will
move
the
traffic
type
based
on
the
uplink
utilization.
The
following
example
shows
how
much
bandwidth
each
traffic
type
will
be
allowed
on
one
dvuplink
during
a
contention
or
oversubscription
scenario
and
when
LBT
is
not
enabled:
Total
shares:
management
(5)
+
vMotion
(20)
+
FT
(10)
+
SCSI
(20)
+
virtual
machine
(20)
=
75
10Gb
=
10000Mbps
o Management:
5
shares;
(5/75)
x
10Gb
=
667Mbps
o vMotion:
20
shares;
(20/75)
x
10Gb
=
2.67Gbps
o FT:
10
shares;
(10/75)
x
10Gb
=
1.33Gbps
o iSCSI:
20
shares;
(20/75)
x
10Gb
=
2.67Gbps
o Virtual
machine:
20
shares;
(20/75)
x
10Gb
=
2.67Gbps
For
each
traffic
type,
first
the
percentage
of
bandwidth
is
calculated
by
dividing
the
share
value
by
the
total
available
share
number
(75),
and
then
the
total
bandwidth
of
the
dvuplink
(10Gb)
is
used
to
calculate
the
bandwidth
share
for
the
traffic
type.
For
example,
20
shares
allocated
to
vMotion
traffic
translate
to
2.67Gbps
of
bandwidth
to
the
vMotion
process
on
a
fully
utilized
1OGbE
network
adaptor.
In
this
1OGbE
deployment,
customers
can
provide
bigger
pipes
to
individual
traffic
types
without
the
use
of
trunking
or
multipathing
technologies.
This
was
not
the
case
with
an
eight-1GbE
deployment.
There
is
no
change
in
physical
switch
configuration
in
this
design
approach,
so
refer
to
the
physical
switch
settings
described
in
Design
Option
1
in
the
previous
section.
Table.
7.
Dynamic
Design
Configuration
This
design
option
utilizes
the
advanced
VDS
features
and
provides
customers
with
a
dynamic
and
flexible
design
approach.
In
this
design,
I/O
resources
are
utilized
effectively
and
SLAs
are
met
based
on
the
shares
allocation.
Blade
Server
in
Example
Deployment
Blade
servers
are
server
platforms
that
provide
higher
server
consolidation
per
rack
unit
as
well
as
lower
power
and
cooling
costs.
Blade
chassis
that
host
the
blade
servers
have
proprietary
architectures
and
each
vendor
has
its
own
way
of
managing
resources
in
the
blade
chassis.
It
is
difficult
in
this
document
to
look
at
all
of
the
various
blade
chassis
available
on
the
market
and
to
discuss
their
deployments.
In
this
section,
we
will
focus
on
some
generic
parameters
that
customers
should
consider
when
deploying
VDS
in
a
blade
chassis
environment.
From
a
networking
point
of
view,
all
blade
chassis
provide
the
following
two
options:
Integrated
switches:
With
this
option,
the
blade
chassis
enables
built-in
switches
to
control
traffic
flow
between
the
blade
servers
within
the
chassis
and
the
external
network.
Pass-through
technology:
This
is
an
alternative
method
of
network
connectivity
that
enables
the
individual
blade
servers
to
communicate
directly
with
the
external
network.
In
this
document,
the
integrated
switch
option
is
described
as
where
the
blade
chassis
has
a
built-in
Ethernet
switch.
This
Ethernet
switch
acts
as
an
access
layer
switch,
as
shown
in
Figure
6.
This
section
discusses
a
deployment
in
which
the
ESXi
host
is
running
on
a
blade
server.
The
following
two
types
of
blade
server
configuration
will
be
described
in
the
next
section:
Blade
server
with
two
10G
bE
network
adaptors
Blade
server
with
hardware-assisted
multiple
logical
network
adaptors
For
each
of
these
two
configurations,
various
VDS
design
approaches
will
be
discussed.
Blade
Server
with
Two
10GbE
Network
Adaptors
This
deployment
is
quite
similar
to
that
of
a
rack
server
with
two
1OGbE
network
adaptors
in
which
each
ESXi
host
is
provided
with
two
1OGbE
network
adaptors.
As
shown
in
Figure
6,
an
ESXi
host
running
on
a
blade
server
in
the
blade
chassis
is
also
provided
with
two
1OGbE
network
adaptors.
Figure
6.
Blade
Server
with
Two
1OGbE
Network
Adaptors
In
this
section,
two
design
options
are
described.
One
is
a
traditional
static
approach
and
the
other
one
is
a
VMware
recommended
dynamic
configuration
with
NIOC
and
LBT
features
enabled,
These
two
approaches
are
exactly
the
same
as
the
deployment
described
n
the
Rack
Server
with
Two
1OGbE
Network
Adaptors
section.
Only
blade
chassisspecific
design
decisions
will
be
discussed
as
part
of
this
section.
For
all
other
VDS
and
switch-related
configurations,
refer
to
the
Rack
Server
with
Two
1OGbE
Network
Adaptors
section
of
this
document.
Design
Option
1
-
Static
Configuration
The
configuration
of
this
design
approach
is
exactly
the
same
as
that
described
in
the
Design
Option
1
section
under
Rack
Server
with
Two
1OGbE
Network
Adaptors.
Refer
to
Table
6
for
dvportgroup
configuration
details.
Lets
take
a
look
at
the
blade
server
specific
parameters
that
require
attention
during
the
design.
Network
and
hardware
reliability
considerations
should
be
incorporated
during
the
blade
server
design
as
well.
In
these
blade
server
designs,
customers
must
focus
on
the
following
two
areas:
High
availability
of
blade
switches
in
the
blade
chassis
Connectivity
of
blade
server
network
adaptors
to
internal
blade
switches
High
availability
of
blade
switches
can
be
achieved
by
having
two
Ethernet
switching
modules
in
the
blade
chassis.
And
the
connectivity
of
two
network
adaptors
on
the
blade
server
should
be
such
that
one
network
adaptor
is
connected
to
the
first
Ethernet
switch
module,
and
the
other
network
adaptor
is
hooked
to
the
second
switch
module
in
the
blade
chassis.
Another
aspect
that
requires
attention
in
the
blade
server
deployment
is
the
network
bandwidth
availability
across
the
midplane
of
the
blade
chassis
and
between
the
blade
switches
and
aggregation
layer.
If
there
is
an
oversubscription
scenario
in
the
deployment,
customers
must
think
about
utilizing
traffic
shaping
and
prioritization
(802.lp
tagging)
features
available
in
the
vSphere
platform.
The
prioritization
feature
enables
customers
to
tag
the
important
traffic
coming
out
of
the
vSphere
platform.
These
high-priority
tagged
packets
are
then
treated
according
to
priority
by
the
external
switch
infrastructure.
During
congestion
scenarios,
the
switch
will
drop
lower-priority
packets
first
and
avoid
dropping
the
important,
high-priority
packets.
This
static
design
option
provides
customers
with
the
flexibility
to
choose
different
network
adaptors
for
different
traffic
types.
However,
when
doing
the
traffic
allocation
on
a
limited,
two
1OGbE
network
adaptors,
administrators
ultimately
will
schedule
multiple
traffic
types
on
a
single
adaptor.
As
multiple
traffic
types
flow
through
one
adaptor,
the
chances
of
one
traffic
types
dominating
others
increases.
To
avoid
the
performance
impact
of
the
noisy
neighbors
(dominating
traffic
type),
customers
must
utilize
the
traffic
management
tools
provided
in
the
vSphere
platform.
One
of
the
traffic
management
features
is
NIOC,
and
that
feature
is
utilized
in
Design
Option
2,
which
is
described
in
the
following
section.
Design
Option
2
-
Dynamic
Configuration
with
NIOC
and
LBT
This
dynamic
configuration
approach
is
exactly
the
same
as
that
described
in
the
Design
Option
2
section
under
Rack
Server
with
Two
1OGbE
Network
Adaptors.
Refer
to
Table
7
for
the
dvportgroup
configuration
details
and
NIOC
settings.
The
physical
switch
related
configuration
in
the
blade
chassis
deployment
is
the
same
as
that
described
in
the
rack
server
deployment.
For
the
blade
center-specific
recommendation
on
reliability
and
traffic
management,
refer
to
the
previous
section.
VMware
recommends
this
design
option,
which
utilizes
the
advanced
VDS
features
and
provides
customers
with
a
dynamic
and
flexible
design
approach.
With
this
design,
I/O
resources
are
utilized
effectively
and
SLAs
are
met
based
on
the
shares
allocation.
Blade
Server
with
Hardware-Assisted
Logical
Network
Adaptors
(HP
Flex-lO-
or
Cisco
UCS-like
Deployment)
Some
of
the
new
blade
chassis
support
traffic
management
capabilities
that
enable
customers
to
carve
I/O
resources.
This
is
achieved
by
providing
logical
network
adaptors
for
the
ESXi
hosts.
Instead
of
two
1OGbE
network
adaptors,
the
ESX1
host
now
sees
multiple
physical
network
adaptors
that
operate
at
different
configurable
speeds.
As
shown
in
Figure
7,
each
ESXi
host
is
provided
with
eight
Ethernet
network
adaptors
that
are
carved
out
of
two
1OGbE
network
adaptors.
Figure
7.
Multiple
Logical
Network
Adaptors
This
deployment
is
quite
similar
to
that
of
the
rack
server
with
eight
1GbE
network
adaptors.
However,
instead
of
1GbE
network
adaptors,
the
capacity
of
each
network
adaptor
is
configured
at
the
blade
chassis
level.
In
the
blade
chassis,
customers
can
carve
out
different
capacity
network
adaptors
based
on
the
need
of
each
traffic
type.
For
example,
if
iSCSI
traffic
needs
2.5Gb
of
bandwidth,
a
logical
network
adaptor
with
that
amount
of
I/O
resources
can
be
created
on
the
blade
chassis
and
provided
for
the
blade
server.
As
for
the
configuration
of
the
VDS
and
blade
chassis
switch
infrastructure,
the
configuration
described
in
Design
Option
1
under
Rack
Server
with
Eight
1GbE
Network
Adaptors
is
more
relevant
for
this
deployment.
The
static
configuration
option
described
in
that
design
can
be
applied
as
is
in
this
blade
server
environment.
Refer
to
Table
2
for
the
dvportgroup
configuration
details
and
switch
configurations
described
in
that
section
for
physical
switch
configuration
details.
The
question
now
is
whether
NIOC
capability
adds
any
value
in
this
specific
blade
server
deployment.
NIOC
is
a
traffic
management
feature
that
helps
in
scenarios
where
multiple
traffic
types
flow
through
one
uplink
or
network
adaptor.
If
in
this
particular
deployment
only
one
traffic
type
is
assigned
to
a
specific
Ethernet
network
adaptor,
the
NIOC
feature
will
not
add
any
value.
However,
if
multiple
traffic
types
are
scheduled
over
one
network
adaptor,
customers
can
make
use
of
NIOC
to
assign
appropriate
shares
to
different
traffic
types.
This
NIOC
configuration
will
ensure
that
bandwidth
resources
are
allocated
to
traffic
types
and
that
SLAs
are
met.
As
an
example,
lets
consider
a
scenario
in
which
vMotion
and
iSCSI
traffic
is
carried
over
one
3Gb
logical
uplink.
To
protect
the
iSCSI
traffic
from
network-intensive
vMotion
traffic,
administrators
can
configure
NIOC
and
allocate
shares
to
each
traffic
type.
If
the
two
traffic
types
are
equally
important,
administrators
can
configure
shares
with
equal
values
(10
each).
With
this
configuration,
when
there
is
a
contention
scenario,
NIOC
will
make
sure
that
the
iSCSI
process
will
get
half
of
the
1Gb
uplink
bandwidth
and
avoid
having
any
impact
on
the
vMotion
process.
VMware
recommends
that
the
network
and
server
administrators
work
closely
together
when
deploying
the
traffic
management
features
of
the
VDS
and
blade
chassis.
To
achieve
the
best
end-to-end
quality
of
service
(Q0S)
result,
a
considerable
amount
of
coordination
is
required
during
the
configuration
of
the
traffic
management
features.
Operational
Best
Practices
After
a
customer
successfully
designs
the
virtual
network
infrastructure,
the
next
challenges
are
how
to
deploy
the
design
and
how
to
keep
the
network
operational.
VMware
provides
various
tools,
APIs,
and
procedures
to
help
customers
effectively
deploy
and
manage
their
network
infrastructure.
The
following
are
some
key
tools
available
in
the
vSphere
platform:
VMware
vSphere
Command-Line
Interface
(vSphere
CLI)
VMware
vSphere
API
Virtual
network
monitoring
and
troubleshooting
NetFlow
Port
mirroring
In
the
following
section,
we
will
briefly
discuss
how
vSphere
and
network
administrators
can
utilize
these
tools
to
manage
their
virtual
network.
Refer
to
the
vSphere
documentation
for
more
details
on
the
tools.
VMware
vSphere
Command-Line
Interface
vSphere
administrators
have
several
ways
to
access
vSphere
components
through
vSphere
interface
options,
including
VMware
vSphere
CIient,
vSphere
Web
Client,
and
vSphere
Command-Line
Interface.
The
vSphere
CLI
command
set
enables
administrators
to
perform
configuration
tasks
by
using
a
vSphere
vCLI
package
installed
on
supported
platforms
or
by
using
VMware
vSphere
Management
Assistant
(vMA).
Refer
to
the
Getting
Started
with
vSphere
CLI
document
for
more
details
on
the
commands:
http://www.vmware.com/support/developer/vcli.
The
entire
networking
configuration
can
be
performed
through
vSphere
vCLI,
helping
administrators
automate
the
deployment
process.
VMware
vSphere
API
The
networking
setup
in
the
virtualized
datacenter
involves
configuration
of
virtual
and
physical
switches.
VMware
has
provided
APIs
that
enable
network
switch
vendors
to
get
information
about
the
virtual
infrastructure,
which
helps
them
to
automate
the
configuration
of
the
physical
switches
and
the
overall
process.
For
example,
vCenter
can
trigger
an
event
after
the
vMotion
process
of
a
virtual
machine
is
performed.
After
receiving
this
event
trigger
and
related
information,
the
network
vendors
can
reconfigure
the
physical
switch
port
policies
such
that
when
the
virtual
machine
moves
to
another
host,
the
VLAN/access
control
list
(ACL)
configurations
are
migrated
along
with
the
virtual
machine,
Multiple
networking
vendors
have
provided
this
automation
between
physical
and
virtual
infrastructure
configurations
through
integration
with
vSphere
APIs.
Customers
should
check
with
their
networking
vendors
to
learn
whether
such
an
automation
tool
exists
that
will
bridge
the
gap
between
physical
and
virtual
networking
and
simplify
the
operational
challenges.
Virtual
Network
Monitoring
and
Troubleshooting
Monitoring
and
troubleshooting
network
traffic
in
a
virtual
environment
require
similar
tools
to
those
available
in
the
physical
switch
environment.
With
the
release
of
vSphere
5,
VMware
gives
network
administrators
the
ability
to
monitor
and
troubleshoot
the
virtual
infrastructure
through
features
such
as
NetFlow
and
port
mirroring.
NetFlow
capability
on
a
distributed
switch
along
with
a
NetFlow
collector
tool
helps
monitor
application
flows
and
measures
flow
performance
over
time.
It
also
helps
in
capacity
planning
and
ensuring
that
I/O
resources
are
utilized
properly
by
different
applications,
based
on
their
needs.
Port
mirroring
capability
on
a
distributed
switch
is
a
valuable
tool
that
helps
network
administrators
debug
network
issues
in
a
virtual
infrastructure.
Granular
control
over
monitoring
ingress,
egress
or
all
traffic
of
a
port
helps
administrators
fine-tune
what
traffic
is
sent
for
analysis.
vCenter
Server
on
a
Virtual
Machine
As
mentioned
earlier,
vCenter
Server
is
only
used
to
provision
and
manage
VDS
configurations.
Customers
can
choose
to
deploy
it
on
a
virtual
machine
or
a
physical
host,
depending
on
their
management
resource
design
requirements.
In
case
of
vCenter
Server
failure
scenarios,
the
VDS
will
continue
to
provide
network
connectivity,
but
no
VDS
configuration
changes
can
be
performed.
By
deploying
vCenter
Server
on
a
virtual
machine,
customers
can
take
advantage
of
vSphere
platform
features
such
as
vSphere
High
Availability
(HA)
and
VMware
Fault
Tolerance
(Fault
Tolerance)
??
to
provide
higher
resiliency
to
the
management
plane.
In
such
deployments,
customers
must
pay
more
attention
to
the
network
configurations.
This
is
because
if
the
networking
for
a
virtual
machine
hosting
vCenter
Server
is
misconfigured,
the
network
connectivity
of
vCenter
Server
is
lost.
This
misconfiguration
must
be
fixed.
However,
customers
need
vCenter
Server
to
fix
the
network
configuration
because
only
vCenter
Server
can
configure
a
VDS.
As
a
work-around
to
this
situation,
customers
must
connect
to
the
host
directly
where
the
vCenter
Server
virtual
machine
is
running
through
vSphere
Client.
Then
they
must
reconnect
the
virtual
machine
hosting
vCenter
Server
to
a
VSS
that
is
also
connected
to
the
management
network
of
hosts.
After
the
virtual
machine
running
vCenter
Server
is
reconnected
to
the
network,
it
can
manage
and
configure
VDS.
Refer
to
the
community
article
Virtual
Machine
Hosting
a
vCenter
Server
Best
Practices
for
guidance
regarding
the
deployment
of
vCenter
on
a
virtual
machine:
http://communities.vmware.com/servlet/JiveServlet/previewBody/14089-102-1-16292/VM
hostVCBestPracitices.
html.
Conclusion
A
VMware
vSphere
distributed
switch
provides
customers
with
the
right
measure
of
features,
capabilities
and
operational
simplicity
for
deploying
a
virtual
network
infrastructure.
As
customers
move
on
to
build
private
or
public
clouds,
VDS
provides
the
scalability
numbers
for
such
deployments.
Advanced
capabilities
such
as
NIOC
and
LBT
are
key
for
achieving
better
utilization
of
I/O
resources
and
for
providing
better
SLAs
for
virtualized
business-critical
applications
and
multitenant
deployments.
Support
for
standard
networking
visibility
and
monitoring
features
such
as
port
mirroring
and
NetFlow
helps
administrators
manage
and
troubleshoot
a
virtual
infrastructure
through
familiar
tools.
VDS
also
is
an
extensible
platform
that
enables
integration
with
other
networking
vendor
products
through
open
vSphere
APIs.
12. VMware
Network
I/O
Control:
Architecture,
Performance
and
Best
Practices
The
Network
I/O
Control
(NetIOC)
feature
available
in
VMware
vSphereTM
4.1
(vSphere)
addresses
these
challenges
by
introducing
a
software
approach
to
partitioning
physical
network
bandwidth
among
the
different
types
of
network
traffic
flows.
It
does
so
by
providing
appropriate
quality
of
service
(QoS)
policies
enforcing
traffic
isolation,
predictability
and
prioritization,
therefore
helping
IT
organizations
overcome
the
contention
resulting
from
consolidation.
The
experiments
conducted
in
VMware
performance
labs
using
industry-standard
workloads
show
that
NetIOC:
Maintains
NFS
and/or
iSCSI
storage
performance
in
the
presence
of
other
network
traffic
such
as
vMotionTM
and
bursty
virtual
machines.
Provides
network
service
level
guarantees
for
critical
virtual
machines.
Ensures
adequate
bandwidth
for
VMware
Fault
Tolerance
(VMware
FT)
logging.
Ensures
predictable
vMotion
performance
and
duration.
Facilitates
any
situation
where
a
minimum
or
weighted
level
of
service
is
required
for
a
particular
traffic
type
independent
of
other
traffic
types.
Use
cases
and
application
of
NetIOC
with
10GbE
in
contrast
to
traditional
1GbE
deployments
The
NetIOC
technology
and
architecture
used
within
the
vNetwork
Distributed
Switch
(vDS)
How
to
configure
NetIOC
from
the
vSphere
Client
Examples
of
NetIOC
usage
to
illustrate
possible
deployment
scenarios
Results
from
actual
performance
tests
using
NetIOC
to
illustrate
how
NetIOC
can
protect
and
prioritize
traffic
in
the
face
of
network
contention
and
oversubscription
Best
practices
for
deployment
Moving
from
1GbE
to
10GbE
Virtualized
datacenters
are
characterized
by
newer
and
complex
types
of
network
traffic
flows
such
as
vMotion
and
VMware
FT
logging
traffic.
In
todays
virtualized
datacenters
where
10GbE
connectivity
is
still
not
commonplace,
networking
is
typically
based
on
large
numbers
of
1GbE
physical
connections
that
are
used
to
isolate
different
types
of
traffic
flows
and
to
provide
sufficient
bandwidth.
Table
1.
Typical
Deployment
and
Provisioning
of
1GbE
NICs
with
vSphere
4.0
Provisioning
a
large
number
of
GbE
network
adapters
to
accommodate
peak
bandwidth
requirements
of
these
different
types
of
traffic
flows
has
a
number
of
shortcomings:
Limited
bandwidth:
Flows
from
an
individual
source
(virtual
machine,
vMotion
interface,
and
so
on)
are
limited
and
bound
to
the
bandwidth
of
a
single
1GbE
interface
even
if
more
bandwidth
is
available
within
a
team
Excessive
complexity:
Use
of
large
numbers
of
1GbE
adapters
per
server
leads
to
excessive
complexity
in
cabling
and
management,
with
an
increased
likelihood
of
misconfiguration
Higher
capital
costs:
Large
numbers
of
1GbE
adapters
requires
more
physical
switch
ports,
which
in
turn
leads
to
higher
capital
costs
including
additional
switches
and
rack
space
Lower
utilization:
Static
bandwidth
allocation
to
accommodate
peak
bandwidth
for
different
traffic
flows
means
poor
average
network
bandwidth
utilization
10GbE
provides
ample
bandwidth
for
all
the
traffic
flows
to
coexist
and
share
the
same
physical
10GbE
link.
Flows
that
were
limited
to
the
bandwidth
of
a
single
1GbE
link
are
now
able
to
use
as
much
as
10GbE.
While
the
use
of
a
10GbE
solution
greatly
simplifies
the
networking
infrastructure
and
addresses
all
the
shortcomings
listed
above,
there
are
a
few
challenges
that
still
need
to
be
addressed
to
maximize
the
value
of
a
10GbE
solution.
One
means
of
optimizing
the
10GbE
network
bandwidth
is
to
prioritize
the
network
traffic
by
traffic
flows.
This
ensures
that
latency-sensitive
and
critical
traffic
flows
can
access
the
bandwidth
they
need.
NetIOC
enables
the
convergence
of
diverse
workloads
on
a
single
networking
pipe.
It
provides
sufficient
controls
to
the
vSphere
administrator
in
the
form
of
limits
and
shares
parameters
to
enable
and
ensure
predictable
network
performance
when
multiple
traffic
types
contend
for
the
same
physical
network
resources.
NetIOC
Architecture
Prerequisites
for
NetIOC
NetIOC
is
only
supported
with
the
vNetwork
Distributed
Switch
(vDS).
With
vSphere
4.1,
a
single
vDS
can
span
up
to
350
ESX/ESXi
hosts
(500
as
of
vSphere
5.5),
providing
a
simplified
and
more
powerful
management
environment
versus
the
per-host
switch
model
using
the
vNetwork
Standard
Switch
(vSS).
The
vDS
also
provides
a
superset
of
features
and
capabilities
over
that
of
the
vSS,
such
as
network
vMotion,
bi-directional
traffic
shaping
and
private
VLANs.
Configuring
and
managing
a
vDS
involves
use
of
distributed
port
groups
(DV
Port
Groups)
and
distributed
virtual
uplinks
(dvUplinks).
DV
Port
Groups
are
port
groups
associated
with
a
vDS
similar
to
port
groups
available
with
vSS.
dvUplinks
provide
a
level
of
abstraction
for
the
physical
NICs
(vmnics)
on
each
vSphere
host.
NetIOC
Feature
Set
NetIOC
provides
users
with
the
following
features:
Isolation:
ensure
traffic
isolation
so
that
a
given
flow
will
never
be
allowed
to
dominate
over
others,
thus
preventing
drops
and
undesired
jitter.
Shares:
allow
flexible
networking
capacity
partitioning
to
help
users
to
deal
with
over
commitment
when
flows
compete
aggressively
for
the
same
resources
Limits:
enforce
traffic
bandwidth
limit
on
the
overall
vDS
set
of
dvUplinks
Load-Based
Teaming:
efficiently
use
a
vDS
set
of
dvUplinks
for
networking
capacity
NetIOC
Traffic
Classes
The
NetIOC
concept
revolves
around
resource
pools
that
are
similar
in
many
ways
to
the
ones
already
existing
for
CPU
and
Memory.
NetIOC
classifies
traffic
into
six
predefined
resource
pools
as
follows:
vMotion
iSCSI
FT
logging
Management
NFS
Virtual
machine
traffic
Figure
1.
NetIOC
Architecture
Shares
A
user
can
specify
the
relative
importance
of
a
given
resource-pool
flow
using
shares
that
are
enforced
at
the
dvUplink
level.
The
underlying
dvUplink
bandwidth
is
then
divided
among
resource-pool
flows
based
on
their
relative
shares
in
a
work-conserving
way.
This
means
that
unused
capacity
will
be
redistributed
to
other
contending
flows
and
wont
go
to
waste.
As
shown
in
Figure
1,
the
network
flow
scheduler
is
the
entity
responsible
for
enforcing
shares
and
therefore
is
in
charge
of
the
overall
arbitration
under
overcommitment.
Each
resource-pool
flow
has
its
own
dedicated
software
queue
inside
the
scheduler
so
that
packets
from
a
given
resource
pool
wont
be
dropped
due
to
high
utilization
by
other
flows.
Limits
A
user
can
specify
an
absolute
shaping
limit
for
a
given
resource-pool
flow
using
a
bandwidth
capacity
limiter.
As
opposed
to
shares
that
are
enforced
at
the
dvUplink
level,
limits
are
enforced
on
the
overall
vDS
set
of
dvUplinks,
which
means
that
a
flow
of
a
given
resource
pool
will
never
exceed
a
given
limit
for
a
vDS
out
of
a
given
vSphere
host.
Load-Based
Teaming
(LBT)
As
of
vSphere
4.1,
which
introduced
a
load-based
teaming
(LBT)
policy
that
ensures
vDS
dvUplink
capacity
is
optimized.
LBT
avoids
the
situation
of
other
teaming
policies
in
which
some
of
the
dvUplinks
in
a
DV
Port
Groups
team
were
idle
while
others
were
completely
saturated
just
because
the
teaming
policy
used
is
statically
determined
(IP
Hashing).
LBT
reshuffles
port
binding
dynamically
based
on
load
and
dvUplinks
usage
to
make
an
efficient
use
of
the
bandwidth
available.
LBT
only
moves
ports
to
dvUplinks
configured
for
the
corresponding
DV
Port
Groups
team.
Note
that
LBT
does
not
use
shares
or
limits
to
make
its
judgment
while
rebinding
ports
from
one
dvUplink
to
another.
LBT
is
not
the
default
teaming
policy
in
a
DV
Port
Group
so
it
is
up
to
the
user
to
configure
it
as
the
active
policy.
LBT
will
only
move
a
flow
when
the
mean
send
or
receive
utilization
on
an
uplink
exceeds
75
percent
of
capacity
over
a
30-second
period.
LBT
will
not
move
flows
more
often
than
every
30
seconds.
Configuring
NetIOC
NetIOC
is
configured
through
the
vSphere
Client
in
the
Resource
Allocation
tab
of
the
vDS
from
within
the
Home->Inventory-
>Networking
panel.
NetIOC
is
enabled
by
clicking
on
Properties...
on
the
right
side
of
the
panel
and
then
checking
Enable
network
I/O
control
on
this
vDS
in
the
pop
up
box.
The
Limits
and
Shares
for
each
traffic
type
is
configured
by
right-clicking
on
the
traffic
type
(for
example,
Virtual
Machine
Traffic)
and
selecting
Edit
Settings...
This
will
bring
up
a
Network
Resource
Pool
Setting
dialog
box
in
which
you
can
select
the
Limits
and
Shares
values
for
that
traffic
type.
NetIOC
Usage
Unlike
the
limits
that
are
specified
in
absolute
units
of
Mbps,
shares
are
used
to
specify
the
relative
importance
of
the
flows.
Shares
are
specified
in
abstract
units
with
a
value
ranging
from
1
to
100.
In
this
section,
we
provide
an
example
that
describes
the
usage
of
shares.
Figure
6.
NetIOC
shares
usage
example
Figure
6
highlights
the
following
characteristics
of
the
shares:
In
absence
of
any
other
traffic,
a
particular
traffic
flow
gets
100
percent
of
the
bandwidth
available,
even
if
it
was
configured
with
25
shares
During
the
periods
of
contention,
the
bandwidth
is
divided
among
the
traffic
flows
based
on
their
relative
shares
NetIOC
Performance
In
this
section,
we
describe
in
detail
the
test-bed
configuration,
the
workloads
used
to
generate
the
network
traffic
flows,
and
the
test
results.
Test
Configuration
In
our
test
configuration,
we
used
an
ESX
cluster
that
comprised
two
Dell
PowerEdge
R610
servers
running
the
GA
release
of
ESX
4.1.
Each
of
the
servers
was
configured
with
dual-socket,
quad-core
2.27
GHz
Intel
Xeon
L5520
processors,
96
GB
of
RAM,
and
a
10
GbE
Intel
Oplin
NIC.
The
following
figure
depicts
the
hardware
configuration
used
in
our
tests.
The
complete
hardware
details
are
provided
in
Appendix
A.
Figure
7.
Physical
Hardware
Setup
Used
in
the
Tests
In
our
test
configuration,
we
used
a
single
vDS
that
spanned
both
vSphere
hosts.
We
configured
the
vDS
with
a
single
dvUplink
(dvUplink1).
The
10GbE
physical
NIC
port
on
each
of
two
vSphere
hosts
was
mapped
to
dvUplink1.
We
configured
the
vDS
with
four
DV
Port
Groups
as
follows:
Using
four
distinct
DV
Port
Groups
enabled
us
to
easily
track
the
network
bandwidth
usage
of
the
different
traffic
flows.
As
shown
in
Figure
8,
on
both
vSphere
hosts,
the
virtual
network
adapters
(vNICs)
of
all
the
virtual
machines
used
for
virtual
machine
traffic,
and
the
VMkernel
interfaces
(vmknics)
used
for
vMotion,
NFS,
and
VMware
FT
logging
were
configured
to
use
the
same
10GbE
physical
network
adapter
through
the
vDS
interface.
Figure
8.
vDS
Configuration
Used
in
the
Tests
Figure
9.
Setup
for
the
Test
Scenario
1
At
first,
we
measured
the
bandwidth
requirements
of
the
SPECweb2005
virtual
machine
traffic
and
vMotion
traffic
flows
in
isolation.
The
bandwidth
usage
of
the
virtual
machine
traffic
while
running
17,000
SPECweb2005
user
sessions
was
a
little
more
than
7Gbps
during
the
steady-state
interval
of
the
benchmark.
The
peak
network
bandwidth
usage
of
the
vMotion
traffic
flow
used
in
our
tests
was
measured
to
be
more
than
8Gbps.
Thus,
if
both
traffic
flows
used
the
same
physical
resources,
the
aggregate
bandwidth
requirements
would
certainly
exceed
the
10GbE
interface
capacity.
In
the
test
scenario,
during
the
steady-state
period
of
the
SPECweb2005
benchmark,
we
initiated
vMotion
traffic
flow,
which
resulted
in
both
the
vMotion
traffic
and
the
virtual
machine
traffic
flows
contending
on
the
same
physical
10GbE
link.
Figure
10
shows
the
performance
of
the
SPECweb2005
workload
in
a
virtualized
environment
without
NetIOC.
The
graph
plots
the
number
of
SPECweb2005
user
sessions
that
meet
the
QoS
requirements
(Time
Good)
at
a
given
time.
In
this
graph,
the
first
dip
corresponds
to
the
start
of
the
steady-state
interval
of
the
SPECweb2005
benchmark
when
the
statistics
are
cleared.
The
second
dip
corresponds
to
the
loss
of
QoS
due
to
vMotion
traffic
competing
for
the
same
physical
network
resources.
Figure
10.
SPECweb2005
Performance
without
NetIOC
We
note
that
when
we
repeated
the
same
test
scenario
several
times,
the
loss
of
performance
shown
in
the
graph
varied,
possibly
due
to
the
nondeterministic
nature
of
vMotion
traffic.
Nevertheless,
these
tests
clearly
demonstrate
that
lack
of
any
network
resource
management
controls
results
both
in
loss
of
performance
and
predictability
that
is
required
to
guarantee
SLAs
required
by
critical
traffic
flows.
Figure
11
shows
the
performance
of
a
SPECweb2005
workload
in
a
virtualized
environment
with
NetIOC
controls
in
place.
We
configured
the
virtual
machine
traffic
with
twice
the
number
of
shares
than
those
configured
for
vMotion
traffic.
In
other
words,
we
ensured
the
virtual
machine
traffic
had
twice
the
priority
over
vMotion
traffic
when
both
the
traffic
flows
competed
for
the
same
physical
network
resources.
Our
tests
revealed
that
although
the
duration
of
the
vMotion
was
doubled
due
to
the
controls
enforced
by
NetIOC,
as
shown
in
Figure
11,
the
SPECweb2005
performance
was
unperturbed
due
to
vMotion
traffic.
Figure
11.
SPECweb2005
Performance
with
NetIOC
Test
Scenario
2:
Using
Four
Traffic
Flows
NFS
Traffic,
Virtual
Machine
Traffic,
VMware
FT
Traffic
and
vMotion
Traffic
In
this
test
scenario,
we
chose
a
very
realistic
customer
deployment
scenario
that
featured
fault-tolerant
Web
servers.
A
recent
VMware
customer
survey
found
Web
servers
had
the
distinction
of
topping
the
high
ranks
among
the
popular
applications
used
in
conjunction
with
the
VMware
FT
feature.
This
is
no
coincidence
because
fault-tolerant
Web
servers
provide
some
compelling
features
that
are
not
available
with
typical
Web
server-farm
deployment
scenarios
using
load
balancers
that
redirect
user
requests
when
a
Web
server
goes
down.
Such
loadbalancer
based
solutions
may
not
be
the
most
customer-friendly
for
Web
sites
that
provide
very
large
downloads,
such
as
driver
updates
and
documentation.
As
an
example,
consider
a
failure
of
a
Web
server
while
a
user
is
downloading
a
large
user
manual.
In
a
load-balancer
based
Web-farm
deployment
scenario,
this
will
result
in
user
request
to
fail
(or
timeout)
and
the
user
would
need
to
resubmit
the
request.
On
the
other
hand,
in
a
VMware
FTenabled
Web
server
environment,
the
user
will
not
experience
such
failure
due
to
the
presence
of
a
secondary
hypervisor
that
has
full
information
on
pending
I/O
operations
from
the
failed
primary
virtual
machine,
and
commits
all
the
pending
I/O.
Refer
to
VMware
vSphere
4
Fault
Tolerance:
Architecture
and
Performance
for
more
information
on
VMware
FT.
As
shown
in
Figure
12,
our
test-bed
was
configured
such
that
all
the
traffic
flows
used
in
the
test
mix
contended
for
the
same
network
resources.
The
complete
experimental
setup
details
for
these
tests
are
provided
in
Appendix
B.
Figure
12.
Setup
for
the
Test
Scenario
2
Two
VMware
FTenabled
Web
server
virtual
machines
serving
SPECweb2005
benchmark
requests
(that
generated
virtual
machine
traffic
and
VMware
FT
logging
traffic)
One
virtual
machine
(VM3)
accessing
an
NFS
store
(that
generated
NFS
traffic)
One
virtual
machine
(VM4)
running
a
SPECjbb2005
workload
(used
to
generate
vMotion
traffic)
At
first
we
measured
the
network
bandwidth
usage
of
all
the
four
traffic
flows
in
isolation.
Table
2
describes
the
network
bandwidth
usage.
Table
2.
Network
Bandwidth
Usage
of
the
Four
Traffic
Flows
used
in
the
Test
Environment
The
goal
was
to
evaluate
the
latencies
of
critical
traffic
flows
including
VMware
FT
and
NFS
traffic
in
a
virtualized
environment
with
and
without
NetIOC
controls
when
four
traffic
flows
contended
for
the
same
physical
network
resources.
The
test
scenario
had
three
phases:
Phase
1:
The
SPECweb2005
workload
in
the
two
VMware
FTenabled
virtual
machines
was
in
the
steady
state.
Phase
2:
The
NFS
workload
in
VM3
became
active.
SPECweb2005
workload
in
the
other
two
virtual
machines
continued
to
be
active.
Phase
3:
The
VM4
running
the
SPECjbb2005
workload
was
subject
to
vMotion
while
the
NFS
and
SPECweb2005
workloads
remained
active
in
the
other
virtual
machines.
The
following
figures
depict
the
performance
of
different
traffic
flows
in
absence
of
NetIOC.
Let
us
first
consider
the
performance
of
the
VMware
FTenabled
Web
server
virtual
machines.
The
graph
plots
the
number
of
SPECweb2005
user
sessions
that
meet
the
QoS
requirements
(Time
Good)
at
a
given
time.
In
this
graph,
the
first
dip
corresponds
to
the
start
of
the
steady-state
interval
of
the
SPECweb2005
benchmark
when
the
statistics
are
cleared.
The
second
dip
corresponds
to
the
loss
of
QoS
due
to
multiple
traffic
flows
competing
for
the
same
physical
network
resources.
The
number
of
SPECweb2005
users
sessions
that
meet
the
QoS
requirements
dropped
by
about
67
percent
during
the
period
of
contention.
We
note
that
the
SPECweb2005
performance
degradation
in
the
VMware
FT
environment
was
much
more
severe
in
the
absence
of
NetIOC
than
what
we
observed
in
the
first
test
scenario.
This
is
because
in
a
VMware
FT
environment,
the
primary
and
secondary
virtual
machines
run
in
vLockstep,
and
so
the
network
link
between
the
primary
and
secondary
ESX
hosts
plays
a
critical
role
in
performance.
During
the
periods
of
heavy
contention
on
the
network
link,
the
primary
virtual
machine
will
make
little
or
no
forward
progress.
Figure
13.
SPECweb2005
Performance
in
a
VMware
FT
Environment
without
NetIOC
Figure
14.
NFS
Access
Latency
without
NetIOC
Similarly,
we
noticed
a
significant
jump
in
the
NFS
store
access
latencies.
As
shown
in
Figure
14,
the
maximum
I/O
latency
reported
by
the
IOmeter
increased
from
a
mere
162
ms
to
2166
ms
(a
factor
of
13).
Figure
15.
Network
Bandwidth
Usage
of
Traffic
Flows
in
Different
Phases
without
NetIOC
A
detailed
explanation
of
the
bandwidth
usage
in
each
phase
follows:
Phase
1:
In
this
phase,
the
VMware
FTenabled
VM1
and
VM2
were
active
and
the
SPECweb2005
benchmark
was
in
a
steady-state
interval.
The
aggregate
network
bandwidth
usage
of
the
virtual
machine
traffic
flow
and
the
VMware
FT
logging
traffic
flows
was
less
than
4Gbps.
Phase
2:
At
the
beginning
of
this
phase,
VM3
became
active
and
added
NFS
traffic
flow
to
the
test
mix.
This
resulted
in
three
traffic
flows
competing
for
the
network
resources.
Even
so
there
was
no
difference
in
the
QoS,
as
the
aggregate
bandwidth
usage
was
still
less
than
5Gbps.
Phase
3:
An
addition
of
vMotion
traffic
flow
to
the
test
mix
resulted
in
the
aggregate
bandwidth
requirements
of
the
four
traffic
flows
exceeding
the
capacity
of
the
physical
10GbE
link.
Lack
of
any
control
mechanism
to
manage
access
to
the
10GbE
bandwidth
resulted
in
vSphere
sharing
the
bandwidth
among
all
the
traffic
flows.
Critical
traffic
flows
including
VMware
FT
and
NFS
traffic
flows
got
the
same
treatment
as
the
vMotion
traffic
flow,
which
resulted
in
a
significant
drop
in
performance.
The
performance
requirements
of
the
different
traffic
flows
must
be
considered
to
put
network
I/O
resource
controls
in
place.
In
general,
the
bandwidth
requirement
of
the
VMware
FT
logging
traffic
is
expected
to
be
much
smaller
than
the
requirements
of
the
other
traffic
flows.
However,
given
its
impact
on
performance,
we
configured
VMware
FT
logging
traffic
with
the
highest
priority
over
other
traffic
flows.
We
also
ensured
NFS
traffic
and
virtual
machine
traffic
flows
had
higher
priority
over
vMotion
traffic.
Figure
16
shows
shares
assigned
to
the
different
traffic
flows.
Figure
17
shows
the
network
bandwidth
usage
of
the
different
traffic
flows
in
different
phases.
As
shown
in
the
figure,
thanks
to
the
network
I/O
resource
controls,
vSphere
was
able
to
enforce
priority
among
the
traffic
flows,
and
so
the
bandwidth
usage
of
the
critical
traffic
flows
remained
unperturbed
during
the
period
of
contention.
Figure
17.
Network
Bandwidth
Usage
of
Traffic
Flows
in
Different
Phases
with
NetIOC
The
following
figures
show
the
performance
of
SPECweb2005
and
NFS
workloads
in
a
VMware
FTenabled
virtualized
environment
with
NetIOC
in
place.
As
shown
in
the
figures,
vSphere
was
able
to
ensure
service
level
guarantees
to
both
the
workloads
in
all
the
phases.
Figure
18.
SPECweb2005
Performance
in
FT
Environment
with
NetIOC
The
maximum
I/O
latency
reported
by
the
IOmeter
remained
unchanged
at
162
ms
in
all
the
phases,
and
the
SPECweb2005
Performance
remained
unaffected
by
the
network
bandwidth
usage
spike
caused
by
the
vMotion
traffic
flow.
Test
Scenario
3:
Using
Multiple
vMotion
Traffic
Flows
In
this
final
test
scenario,
we
will
show
how
NetIOC
can
be
used
in
combination
with
Traffic
Shaper
to
provide
a
comprehensive
network
convergence
solution
in
a
virtualized
datacenter
environment.
While
NetIOC
enables
you
to
limit
vMotion
traffic
initiated
from
a
vSphere
host,
it
fails
to
prevent
performance
loss
when
multiple
vMotion
traffic
flows
initiated
on
different
vSphere
hosts
converge
onto
a
single
vSphere
host
and
possibly
overwhelm
the
latter.
We
will
show
how
a
solution
based
on
NetIOC
and
Traffic
Shaper
can
prevent
such
an
unlikely
event.
In
vSphere
4.0,
support
for
traffic
shaping
was
introduced,
providing
some
rudimentary
controls
on
network
bandwidth
usage.
For
instance,
it
only
provided
bandwidth
usage
controls
at
the
port
level,
and
did
not
enforce
prioritization
among
traffic
flows.
These
controls
were
provided
for
both
egress
and
ingress
traffic.
In
vSphere
deployment,
the
egress
and
ingress
traffic
are
with
respect
to
a
vDS
(or
vSS).
The
traffic
going
into
a
vDS
is
ingress/input,
and
traffic
leaving
a
vDS
is
egress/output.
So,
from
the
perspective
of
a
vNIC
port
(or
vmknic
port),
the
network
traffic
from
the
physical
network
(or
vmnic)
will
ingress
into
the
vDS
and
egress
from
vDS
to
vNIC.
Similarly,
the
traffic
flow
from
vNIC
will
ingress
into
the
vDS
and
egress
to
the
physical
network
(or
vmnic).
In
other
words,
the
ingress
and
egress
need
to
be
interpreted
as
follows:
Ingress
traffic:
traffic
from
a
vNIC
(or
vmknic)
to
vDS
Egress
traffic:
traffic
from
vDS
to
the
vNIC
(or
vmknic)
In
this
final
test
scenario,
we
added
a
third
vSphere
host
to
the
same
cluster
that
we
used
in
our
previous
tests.
As
shown
in
Figure
20,
the
cluster
used
for
this
test
comprised
three
vSphere
hosts.
We
initiated
vMotion
traffic
(peak
network
bandwidth
usage
of
9Gbps)
from
vSphere
Host
2,
and
vMotion
traffic
(peak
network
bandwidth
usage
close
to
1Gbps)
from
vSphere
Host
3.
Both
of
these
traffic
flows
converged
onto
the
same
destination
vSphere
host
(Host
1).
Below,
we
describe
the
results
of
the
three
test
configurations.
Without
NetIOC
As
a
point
of
reference,
we
first
disabled
NetIOC
in
our
test
configuration.
Our
tests
indicated
that,
without
any
controls,
the
receive
link
on
Host
1
was
fully
saturated
due
to
multiple
vMotion
traffic
flows
whose
aggregate
network
bandwidth
usage
exceeded
the
link
capacity.
With
NetIOC
As
shown
in
Figure
21,
we
used
NetIOC
to
enforce
limits
on
vMotion
traffic.
Figure
21.
NetIOC
Settings
to
Enforce
Limits
on
vMotion
Traffic
Flow
Figure
22
shows
the
Rx
network
bandwidth
usage
on
Host
1
(with
NetIOC
controls
in
place)
as
multiple
vMotion
traffic
flows
converge
on
it.
Figure
22.
Rx
Network
Bandwidth
Usage
on
Host
1
with
Multiple
vMotions
(with
NetIOC
On)
A
detailed
explanation
of
the
bandwidth
usage
in
each
phase
follows:
Phase
1:
In
this
phase,
vMotion
from
Host
3
to
Host
1
was
active.
Due
to
the
1GbE
link
capacity
on
Host
3,
the
bandwidth
usage
of
Figure
24
shows
the
Rx
network
bandwidth
usage
on
Host
1
(with
both
NetIOC
and
Traffic
Shaper
controls
in
place)
as
multiple
vMotion
traffic
flows
converge
on
it.
A
detailed
explanation
of
the
bandwidth
usage
in
each
phase
follows:
Phase
1:
In
this
phase,
vMotion
from
Host
3
to
Host
1
was
active.
Due
to
the
1GbE
link
capacity
on
Host
3,
the
bandwidth
usage
of
Lack
of
NetIOC
can
result
in
unpredictable
loss
in
performance
of
critical
traffic
flows
during
periods
of
contention.
NetIOC
can
effectively
provide
service
level
guarantees
to
the
critical
traffic
flows.
Our
test
results
showed
that
NetIOC
eliminated
a
performance
drop
of
as
much
as
67
percent
observed
in
an
unmanaged
scenario.
NetIOC
in
combination
with
Traffic
Shaper
provides
a
comprehensive
network
convergence
solution
enabling
features
that
are
not
available
with
the
any
of
the
hardware
solutions
in
the
market
today.
13. Storage
I/O
Control
Technical
Overview
and
Considerations
for
Deployment
Whats
new
vSphere
5.0
vSphere
Storage
I/O
Control
now
supports
NFS
Set
storage
quality
of
service
priorities
per
virtual
machine
for
better
access
to
storage
resources
for
high-priority
applications
Storage
I/O
Control
(SIOC)
provides
storage
I/O
performance
isolation
for
virtual
machines,
thus
enabling
VMware
vSphereTM
(vSphere)
administrators
to
comfortably
run
important
workloads
in
a
highly
consolidated
virtualized
storage
environment.
It
protects
all
virtual
machines
from
undue
negative
performance
impact
due
to
misbehaving
I/O-heavy
virtual
machines,
often
known
as
the
noisy
neighbor
problem.
Furthermore,
the
service
level
of
critical
virtual
machines
can
be
protected
by
SIOC
by
giving
them
preferential
I/O
resource
allocation
during
periods
of
congestion.
SIOC
achieves
these
benefits
by
extending
the
constructs
of
shares
and
limits,
used
extensively
for
CPU
and
memory,
to
manage
the
allocation
of
storage
I/O
resources
SIOC
improves
upon
the
previous
host-level
I/O
scheduler
by
detecting
and
responding
to
congestion
occurring
at
the
array,
and
enforcing
share-based
allocation
of
I/O
resources
across
all
virtual
machines
and
hosts
accessing
a
datastore.
With
SIOC,
vSphere
administrators
can
mitigate
the
performance
loss
of
critical
workloads
due
to
high
congestion
and
storage
latency
during
peak
load
periods.
The
use
of
SIOC
will
produce
better
and
more
predictable
performance
behavior
for
workloads
during
periods
of
congestion.
Benefits
of
leveraging
SIOC:
In
the
case
in
which
I/O
shares
for
the
virtual
disks
(VMDKs)
of
each
of
those
virtual
machines
are
set
to
different
values,
it
is
the
local
scheduler
that
prioritizes
the
I/O
traffic
only
in
case
the
local
HBA
becomes
congested.
This
described
host-level
capability
has
existed
for
several
years
in
ESX
Server
prior
to
vSphere
4.1.
It
is
this
local-host
level
disk
scheduler
that
also
enforces
the
limits
set
for
a
given
virtual-machine
disk.
If
a
limit
is
set
for
a
given
VMDK,
the
I/O
will
be
controlled
by
the
local
disk
scheduler
so
as
to
not
exceed
the
defined
amount
of
I/O
per
second.
vSphere
4.1
has
added
two
key
capabilities:
(1)
the
enforcement
of
I/O
prioritization
across
all
ESX
servers
that
share
a
common
datastore,
and
(2)
detection
of
array-side
bottlenecks.
These
are
accomplished
by
way
of
a
datastore-wide
distributed
disk
scheduler
that
uses
I/O
shares
per
virtual
machine
to
determine
whether
device
queues
need
to
be
throttled
back
on
a
given
ESX
server
t o
allow
a
higher-priority
workload
to
get
better
performance.
The
datastore-wide
disk
scheduler
totals
up
the
disk
shares
for
all
the
VMDKs
that
a
virtual
machine
has
on
the
given
datastore.
The
scheduler
then
calculates
what
percentage
of
the
shares
the
virtual
machine
has
compared
to
the
total
number
of
shares
of
all
the
virtual
machines
running
on
the
datastore.
This
percentage
of
shares
is
displayed
in
the
list
of
details
shown
in
the
view
of
virtual
machines
tab
for
each
datastore,
as
seen
in
Figure
2.
As
described
before,
SIOC
engages
only
after
a
certain
device-level
latency
is
detected
on
the
datastore.
Once
engaged,
it
begins
to
assign
fewer
I/O
queue
slots
to
virtual
machines
with
lower
shares
and
more
I/O
queue
slots
to
virtual
machines
with
higher
shares.
It
throttles
back
the
I/O
for
the
lower-priority
virtual
machines,
those
with
fewer
shares,
in
exchange
for
the
higher-priority
virtual
machines
getting
more
access
to
issue
I/O
traffic.
However,
it
is
important
to
understand
that
the
maximum
number
of
I/O
queue
slots
that
can
be
used
by
the
virtual
machines
on
a
given
host
cannot
exceed
the
maximum
device-queue
depth
for
the
device
queue
of
that
ESX
host.
The
ESX
maximum
queue
depth
varies
by
HBA
model.
The
queue-depth
maximum
value
is
typically
in
range
of
32
to
128.
The
lowest
that
SIOC
can
reduce
the
device
queue
depth
to
is
4.
Figure
3a
shows
that,
without
SIOC,
a
virtual
machine
with
a
lower
number
of
shares,
VM
C,
may
get
a
larger
percentage
of
the
available
storage-array
device-queue
slots
and
thus
greater
storage
array
performance,
while
a
virtual
machine
with
higher
I/O
shares,
VM
A,
gets
fewer
than
its
fair
share
and
reduced
storage
array
performance.
However,
with
SIOC
engaged
on
that
datastore,
as
in
Figure
3b,
the
result
will
be
that
the
lower-priority
virtual
machine
that
is
by
itself
on
a
separate
host
will
be
assigned
a
reduced
number
of
I/O
queue
slots.
That
will
result
in
fewer
storage
array
queue
slots
being
used
and
a
reduction
in
average
device
latency.
The
reduction
in
average
device
latency
provides
VM
A
and
VM
B
higher
storage
performance,
as
now
the
same
number
of
I/Os
that
they
previously
were
issuing
complete
faster
due
to
the
reduced
latency
for
each
of
those
I/Os.
For
instance,
assume
that
VM
A
was
using
18
I/O
slots
as
shown
in
figure
3a.
Without
SIOC,
the
storage
array
latency
could
be
unbounded
and
the
I/O
workloads
being
performed
by
the
lower
priority
VM
C
could
cause
a
high
storage
device
latency
of,
say,
40ms.
In
this
example,
VM
A
would
have
18
I/Os
@
40ms
worth
of
storage
performance.
Once
enabled,
SIOC
controls
the
latency
at
the
configured
congestion
threshold,
say
30ms.
SIOC
determines
the
number
of
storage
array
queue
slots
that
can
be
used
while
still
maintaining
an
average
device
latency
below
the
SIOC
congestion
threshold.
Although
SIOC
does
not
directly
manage
the
storage
array
queue,
it
is
able
to
indirectly
control
the
storage
array
device
queue
by
managing
the
ESX
device
queues
that
feed
into
it.
As
shown
in
Figure
3b,
SIOC
has
determined
that
30
host-side
storage
queue
slots
can
be
used
while
still
maintaining
the
desired
average
device
latency.
SIOC
then
distributes
those
storage
array
queue
slots
to
the
various
virtual
machine
workloads
according
to
their
priorities.
The
net
effect
in
this
example
is
that
VM
C
is
throttled
back
to
use
only
its
correct
relative
share
of
the
storage
array.
VM
A,
entitled
to
60
percent
of
the
queue
slots
(1500/2500
=
60
percent),
is
still
is
able
to
issue
the
same
18
I/Os
but
at
a
reduced
30ms
latency.
SIOC
provides
VM
A
greater
storage
performance
by
controlling
VM
C
and
ensuring
it
uses
only
its
appropriate
allocation
of
total
storage
resources
per
performance.
By
throttling
the
ESX
device-queue
depths
in
proportion
to
the
priorities
of
the
virtual
machines
that
are
using
them,
SIOC
is
able
to
control
storage
congestion
at
the
storage
array
and
distribute
storage
array
performance
appropriately.
Figure
3.
SIOC
Device-Queue
Management
with
Prioritized
Disk
Shares
SIOC
provides
isolation
and
prioritized
distribution
of
storage
resources
even
when
vSphere
administrators
have
not
manually
set
individual
disk-share
priorities
on
each
VMDK
per
virtual
machine.
SIOC
protects
virtual
machines
that
are
running
on
higher
consolidated
ESX
servers.
In
Figures
4a
and
4b,
all
virtual
machine
disks
have
default
(1000
shares),
or
equal
disk
shares.
Without
SIOC,
VM
A
and
VM
B
are
penalized
and
not
provided
equal
access
to
storage
resources
simply
because
they
are
running
together
on
the
same
ESX
server
and
sharing
the
same
ESX
device
queue.
Whereas
VM
C,
running
on
a
lower
consolidated
ESX
host,
is
given
unfair
preference
to
storage
resources.
Even
administrators
who
do
not
wish
to
individually
set
VMDK
disk
shares
can
benefit
from
this
feature.
SIOC
provides
these
vSphere
administrators
the
ability
to
enable
storage
isolation
for
all
virtual
machines
accessing
a
datastore
by
simply
checking
a
single
check
box
at
the
datastore
level.
This
new
storage
management
capability
offered
by
SIOC
allows
vSphere
administrators
the
ability
to
run
higher
consolidated
virtual
environments
by
preventing
imbalances
of
storage
resource
allocation
during
times
of
storage
contention.
Figure
4.
SIOC
Device-Queue
Management
with
Equal
Disk
Shares
In
these
examples,
SIOC
is
able
to
fully
manage
the
storage
array
queue
by
throttling
the
ESX
host
device
queues.
This
is
possible
because
all
the
workloads
impacting
the
storage
array
queue
are
coming
from
the
ESX
hosts
and
are
under
SIOCs
control.
However,
SIOC
is
able
to
provide
storage
workload
isolation/prioritization
even
in
scenarios
in
which
external
workloads,
not
under
SIOCs
control,
are
competing
with
those
that
it
controls.
In
this
scenario,
SIOC
will
first
automatically
detect
this
situation,
and
then
will
increase
the
number
of
device-queue
slots
it
makes
available
to
the
virtual
machine
workloads
so
that
they
can
compete
more
fairly
for
total
storage
resources
against
external
workloads.
Using
this
approach,
SIOC
is
able
to
maintain
a
balance
between
workload
isolation/prioritization
and
storage
I/O
throughput
even
when
it
cannot
directly
control
or
influence
the
external
workload.
This
behavior
continues
as
long
as
the
external
workload
persists
and
SIOC
resumes
normal
operation
once
it
stops
detecting
the
external
workload.
Enabling
Storage
I/O
Control
Since
SIOC
is
an
attribute
of
a
datastore,
it
is
set
under
the
properties
of
a
specific
datastore.
By
default
SIOC
is
not
enabled
on
the
datastore.
The
default
value
for
SIOC
to
kick
in
is
30ms,
but
this
value
can
be
modified
by
selecting
the
Advanced
option
where
one
enables
SIOC
in
the
vCenter
interface
as
shown
in
Figure
5.
Figure
5.
Datastore
Properties
SIOC
Enablement
and
Congestion
Threshold
Setting
SIOC
can
be
used
on
any
FC,
iSCSI,
or
locally
attached
block
storage
device
that
is
supported
with
vSphere
4.1.
Review
the
vSphere
4.1
Hardware
Compatibility
List
(http://www.vmware.com/go/hcl)
for
the
entire
list
of
supported
storage
devices.
SIOC
is
supported
with
FC
and
iSCSI
storage
devices
that
have
automated
tiered
storage
capabilities.
However,
when
using
SIOC
with
automated
tiered
storage,
the
SIOC
Congestion
Threshold
must
be
set
appropriately
to
make
sure
the
storage
devices
automated
tiered
storage
capabilities
are
not
impacted
by
SIOC.
At
this
time,
SIOC
is
not
supported
with
NFS
storage
devices
or
with
Raw
Device
Mapping
(RDM)
virtual
disks.
SIOC
is
also
not
supported
with
datastores
that
have
multiple
extents
or
are
being
managed
by
multiple
vCenter
Management
Servers.
For
complete
step-by-step
instructions
on
how
to
enable
SIOC,
or
change
the
default
latency
threshold
for
a
datastore
or
other
limitations,
consult
the
documentation
or
see
Managing
Storage
I/O
Resources
(Chapter
4)
in
the
vSphere
4.1
Resource
Management
Guide
(http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf)
Consideration
for
Deploying
Storage
I/O
Control
Configuring
Disk
Shares
Disk
shares
specify
the
relative
priority
a
virtual
machine
has
on
a
given
storage
resource.
When
you
assign
disk
shares
to
a
virtual
disk/virtual
machine,
you
specify
the
priority
for
that
virtual
machines
access
to
storage
resources
relative
to
other
powered-on
virtual
machines.
Disk
shares
in
vSphere
4.1
can
be
leveraged
at
both
a
local,
perESX
host
level,
and
now
at
a
datastore
level
when
SIOC
is
enabled
and
actively
prioritizing
storage
resources.
Disk
shares
are
set
by
selecting
Edit
Settings
for
a
virtual
machine
and
are
set
on
each
VMDK,
as
seen
in
Figure
6.
When
SIOC
is
not
enabled,
disk
shares
and
the
relative
priority
they
specify
are
enforced
only
at
a
localESX
host
level,
and
then
only
when
local
HBAs
are
saturated.
Virtual
machines
running
on
the
same
ESX
hosts
will
be
prioritized
relative
to
other
virtual
machines
on
the
same
host
but
not
relative
to
virtual
machines
running
on
other
ESX
hosts.
When
SIOC
is
enabled
and
actively
controlling
the
ESX
hosts
to
control
storage
latencies,
disk
shares
and
relative
priorities
are
enforced
across
all
the
ESX
servers
that
access
the
SIOC
controlled
datastore.
So
a
virtual
machine
running
on
one
ESX
host
will
have
access
to
storage
resources
based
on
the
number
of
disk
shares
the
virtual
machine
has
compared
to
the
total
number
of
disk
shares
in
use
on
the
datastore
by
all
virtual
machines
across
all
ESX
hosts
in
the
shared
storage
environment.
If
a
virtual
machine
does
not
f ully
use
its
allocation
of
I/O
access,
the
extra
I/O
slots
are
redistributed
proportionally
to
the
other
virtual
machines
that
are
actively
issuing
I/O
requests
on
the
datastore.
Figure
6.
Virtual
Machine
Properties
Disk
Shares
and
IOP
Limits
As
part
of
vSphere
4.1,
I/O
per
second
(IOPS)
limits
on
a
per-VMDK
level
can
be
set
to
further
manage
and
prioritize
virtual
machine
workloads.
Limits
(expressed
in
terms
of
IOPS)
are
implemented
at
the
local-disk
scheduler
level
and
are
always
enforced
regardless
of
whether
or
not
SIOC
is
enabled.
Configuring
the
Storage
I/O
Control
Congestion
Latency
Value
SIOC
is
designed
to
only
engage
and
enforce
storage
I/O
shares
when
the
storage
resource
becomes
contended.
This
is
very
similar
to
CPU
scheduling,
in
that
it
is
only
enforced
when
the
resource
is
contended.
To
determine
when
a
storage
device
is
contended,
SIOC
uses
a
congestion-threshold
latency
value
that
vSphere
administrators
can
specify.
The
default
congestion-threshold
latency,
30ms,
in
vSphere
4.1,
is
a
conservative
value
that
should
work
well
for
most
users.
The
SIOC
congestion-threshold
value
is
configurable,
so
vSphere
administrators
have
the
opportunity
to
maximize
the
benefits
of
SIOC
suited
to
their
own
virtual
environment
and
storage-
management
preferences.
This
section
discusses
the
considerations
and
recommendations
for
changing
this
key
parameter.
The
SIOC
threshold
represents
a
balance
between
(1)
isolation
and
prioritized
access
to
the
storage
resource
at
lower
latencies,
and
(2)
higher
throughput.
When
the
SIOC
congestion
threshold
is
set
low,
SIOC
can
begin
prioritizing
storage
access
earlier
and
throttle
storage
workloads
more
aggressively
in
order
to
maintain
a
datastore-wide
latency
below
the
congestion
latency
threshold.
The
more
aggressive
throttling
needed
to
maintain
a
lower
latency
might
reduce
the
overall
storage
throughput.
When
the
congestion
threshold
is
set
higher,
SIOC
will
not
engage
and
begin
prioritizing
resources
among
virtual
machines
until
the
higher
latency
is
reached.
When
using
a
higher
SIOC
congestion
latency,
SIOC
does
not
need
to
throttle
storage
workloads
as
much
in
order
to
maintain
the
storage
latency
below
the
higher
congestion
threshold.
This
may
allow
for
higher
overall
storage
throughput.
The
default
congestion
threshold
has
been
set
to
minimize
the
impact
of
throttling
on
storage
throughput
while
still
providing
reasonably
low
storage
latency
and
isolation
for
high-priority
virtual
machines.
In
most
cases
it
is
not
necessary
to
modify
the
storage
congestion
threshold
from
its
default
value.
However,
a
user
may
decide
to
modify
the
value
depending
on
the
type
and
speed
of
their
storage
device,
the
characteristics
of
the
workloads
in
their
virtual
environment,
and
their
storage-management
preference
between
workload
isolation/prioritization
and
workload
throughput.
Because
various
storage
devices
have
different
latency
characteristics,
users
may
need
to
modify
the
congestion
threshold
depending
on
their
storage
type.
See
Table
1
to
determine
the
recommended
range
of
values
for
your
storage-device
type.
Table
1.
SIOC
Congestion
Threshold
Recommendations
The
congestion
threshold
may
also
need
to
be
adjusted
when
using
automated
tiered
storage
devices.
These
are
systems
that
contain
two
or
more
types
of
storage
media
and
automatically
and
transparently
migrate
data
between
the
storage
types
in
order
to
optimize
I/O
performance.
These
systems
typically
try
to
keep
the
most
frequently
accessed
or
hot
data
on
faster
storage
such
as
SSD,
and
less
frequently
accessed
or
cold
data
on
slower
media
such
as
SAS
or
FC
disks.
This
means
that
the
type
of
storage
media
backing
a
particular
LUN
can
change
over
time.
For
full
LUN
auto-tiering
storage
devices,
in
which
the
entire
LUN
is
migrated
between
different
storage
tiers,
use
the
recommended
value
or
range
for
the
slowest
tier
of
storage
in
the
device.
For
example,
in
a
full
LUN
auto-tiering
storage
device
that
contains
SSD
and
Fibre
Channel
disks,
use
the
congestion
threshold
value
that
is
recommended
for
Fibre
Channel.
With
sub-LUN
or
block-level
auto-tiering
storage,
in
which
individual
storage
blocks
inside
a
LUN
are
migrated
between
storage
tiers,
combine
the
recommended
congestion
threshold
values/ranges
for
each
storage
type
in
the
auto-tiering
storage
devices.
For
example,
in
a
sub-LUN
/
block-level
auto-tiering
storage
device
that
contains
an
SSD
storage
tier
and
a
Fibre
Channel
storage
tier,
use
an
SIOC
congestion
threshold
value
in
the
range
of
1030ms.
The
exact
SIOC
congestion-threshold
value
to
use
is
based
on
your
individual
storage-device
characteristics
and
your
preference
of
isolation
(using
a
smaller
SIOC
congestion-threshold
value)
or
throughput
(using
a
larger
SIOC
congestion-threshold
value).
For
example,
in
the
SSD-FC
scenario,
the
more
SSD
storage
you
have
in
the
array,
the
more
your
storage
device
characteristics
will
match
that
of
the
SSD
storage
type
and
thus
the
closer
your
threshold
should
be
to
the
SSD
recommended
value
of
10ms,
the
low
end
of
the
combined
SSD-FC
range.
Customers
can
use
the
midpoint
of
the
range
as
a
conservative
congestion
threshold
value
that
provides
a
balance
between
the
preference
for
isolation
and
the
preference
for
throughput.
In
the
SSD-FC
example
in
which
there
was
a
range
of
1030ms,
the
conservative
congestion
threshold
value
would
be
20ms.
When
modifying
the
SIOC
congestion
threshold,
keep
in
mind
that
the
SIOC
latency
is
a
normalized
latency
metric
calculated
and
normalized
for
I/O
size
and
aggregate
number
of
IOPS
across
all
the
storage
workloads
accessing
the
datastore.
SIOC
uses
a
normalized
latency
to
take
into
consideration
that
not
all
storage
workloads
are
the
same.
Some
storage
workloads
may
issue
larger
I/O
operations
that
would
naturally
result
in
longer
device
latencies
to
service
these
larger
I/O
requests.
Normalizing
the
storage-
workload
latencies
allows
SIOC
to
compare
and
prioritize
workloads
more
accurately
by
bringing
them
all
into
a
common
measurement.
Because
the
SIOC
value
is
normalized,
the
actual
observed
latency
as
seen
from
the
guest
OS
inside
the
virtual
machine
or
from
an
individual
ESX
host
may
be
different
than
the
calculated
SIOC-normalized
latency
per
datastore.
Monitoring
Storage
I/O
Control
Effects
SIOC
includes
new
metrics
inside
vCenter
to
allow
users
to
observe
SIOCs
actions
and
latency
measurements.
There
are
two
new
SIOC
metrics
in
vCenter,
SIOC
normalized
latency
and
SIOC
Aggregated
IOPS.
The
SIOC
normalized
latency
is
the
value
that
SIOC
calculates
per
datastore
and
uses
when
comparing
with
the
SIOC
congestion
latency
threshold
to
determine
what
actions
to
take,
if
any.
SIOC
calculates
these
metrics
every
four
seconds
and
they
are
refreshed
in
the
vCenter
display
every
20
seconds.
These
metrics
can
be
viewed
on
the
datastore
performance
screen
inside
vCenter,
as
seen
in
Figure
7.
Additionally,
vCenter
reports
the
device-
queue
depths
for
each
ESX
host.
The
ESX
hosts
device-queue
depth
metrics
can
be
reviewed
to
determine
what
actions
SIOC
is
taking
on
individual
ESX
hosts
and
their
device
queues
in
order
to
maintain
a
datacenter-wide
SIOC
latency
on
the
datastore
under
the
set
congestion
threshold.
Figure
7.
vCenter
Datastore
Performance
and
SIOC
Metrics
SIOC
detects
the
moment
when
external
workloads,
not
under
SIOCs
control,
may
be
impacting
the
virtual
environments
storage
resources.
When
SIOC
detects
an
external
workload,
it
will
trigger
a
Non-VI
workload
detected
informational
alert
in
vCenter.
In
most
cases,
this
alert
is
purely
informational
and
requires
no
action
on
the
part
of
the
vSphere
administrator.
However,
the
alert
may
be
an
indicator
of
an
incorrectly
configured
SIOC
environment.
vSphere
administrators
should
verify
that
they
are
running
a
supported
SIOC
configuration
and
that
all
datastores
that
utilize
the
same
disk
spindles
have
SIOC
enabled
with
identical
SIOC
congestion-
threshold
values.
The
alert
might
also
be
triggered
by
some
backup
products
and
other
administrative
workloads
that
bypass
the
ESX
host
and
directly
access
the
datastore
in
order
to
accomplish
their
tasks.
SIOC
is
supported
in
these
configurations
and
the
alert
can
be
safely
ignored
for
these
products.
Refer
to
VMware
KB
article
1020651
for
more
details
on
the
Non-VI
workload
detected
alert.
Benefits
of
using
Storage
I/O
Control
SIOC
enables
improved
I/O
resource
management
for
a
multitude
of
conditions
and
provides
peace
of
mind
when
running
business-
critical
I/O
intensive
applications
in
a
shared
VMware
virtualization
environment.
Provides
performance
protection
A
common
concern
in
any
shared
resource
environment
is
that
one
consumer
may
get
far
more
than
its
fair
share
of
that
resource
and
adversely
impact
the
performance
of
the
other
users
that
share
the
resource.
SIOC
provides
the
ability,
at
the
datastore
level,
to
support
multiple-tenant
environments
that
share
a
datastore,
by
enabling
service-level
protections
during
periods
of
congestion.
SIOC
prevents
a
single
virtual
machine
from
monopolizing
the
I/O
throughput
of
a
datastore
even
when
the
virtual
machines
have
default
(equal
value)
I/O
shares
set.
Detects
and
manages
bottlenecks
at
the
array
only
when
congestion
exists
SIOC
detects
a
bottleneck
at
the
datastore
level,
and
manages
I/O
queue
slot
distribution
across
the
ESX
servers
that
share
a
datastore.
SIOC
expands
the
I/O
resource
control
beyond
the
bounds
of
a
single
ESX
server
to
work
across
all
ESX
servers
that
share
a
datastore.
When
SIOC
is
enabled
on
a
datastore
and
no
congestion
exists
at
the
device
level,
it
will
not
be
engaged
in
managing
I/O
resources
and
will
have
no
effect
on
I/O
latency
or
throughput.
In
an
optimized
and
well-configured
environment,
SIOC
may
only
engage
at
certain
peak
periods
during
the
day.
During
these
times
of
congestion
and
in
the
presence
of
external
or
nonSIOC
controlled
workloads,
SIOC
strikes
a
balance
between
aggregate
throughput
and
enforcement
of
virtual
machine
I/O
shares.
SIOC
helps
vSphere
administrators
understand
when
more
I/O
throughput
(device
capacity)
is
needed.
If
SIOC
is
engaged
for
significant
periods
of
time
during
the
day,
it
raises
the
question
if
there
is
a
need
for
a
change
in
the
storage
configuration.
In
this
case,
an
administrator
might
consider
either
adding
more
I/O
capacity
or
using
VMware
Storage
vMotion
to
migrate
I/O
intensive
virtual
machines
to
an
alternate
datastore.
Enables
higher
levels
of
consolidation
with
less
storage
expense
SIOC
enables
vSphere
administrators
to
maximize
their
storage
investments
by
running
more
virtual
machines
on
their
existing
storage
infrastructure
with
confidence
that
periodic
peak
periods
of
high
I/O
activity
will
be
controlled.
Without
SIOC,
administrators
will
often
overprovision
their
storage
to
avoid
latency
issues
that
pop
up
during
peak
periods
of
storage
activity.
With
SIOC,
the
administrators
can
now
comfortably
run
more
virtual
machines
on
a
single
datastore
with
confidence
that
the
storage
I/O
will
be
controlled
and
managed
at
the
device
level.
Leveraging
SIOC
can
reduce
storage
costs
because
the
cost
of
overprovisioning
a
storage
environment,
to
the
point
that
no
contention
occurs,
could
be
prohibitively
expensive.
Alternately,
the
cost
of
storage
may
drop
dramatically
by
leveraging
SIOC
to
manage
the
I/O
queue
slot
allocations
to
ensure
proportional
fairness
and
prioritization
of
virtual
machines
based
on
their
I/O
shares.
Conclusion
SIOC
offers
I/O
prioritization
to
virtual
machines
accessing
shared
storage
resources.
It
allows
vSphere
administrators
to
align
high-
priority
virtual
machine
traffic
with
better
performance
and
lower
latency
storage
performance
as
compared
to
the
lower-priority
virtual
machines.
It
monitors
datastore
latency
and
engages
when
a
preset
congestion
threshold
has
been
exceeded.
SIOC
gives
vSphere
administrators
a
new
means
to
manage
their
VMware
virtualized
environments
by
allowing
quality
of
service
to
be
expressed
for
storage
workloads.
As
such,
SIOC
is
a
big
step
forward
in
the
journey
toward
automated,
policy-based
management
of
shared
storage
resources.
SIOC
provides
the
means
to
better
control
a
consolidated
shared-storage
resource
by
providing
datastore-wide
I/O
prioritization,
helping
to
manage
traffic
on
a
shared
and
congested
datastore.
With
the
introduction
of
SIOC
in
vSphere
4.1,
vSphere
administrators
now
have
a
new
tool
available
to
help
them
increase
the
consolidation
density
while
ensuring
that
they
will
have
peace
of
mind,
knowing
that
during
periodic
periods
of
peak
I/O
activity
there
will
be
a
prioritization
and
proportional
fairness
enforced
across
all
the
virtual
machines
accessing
that
shared
resource.