Anda di halaman 1dari 14

Clearing the Confusion – vSphere Virtual

Cores, Virtual Sockets, and Virtual CPU


(vCPU)
This is a topic that I have been confused on more than once. I would read the help documentation
and the VMware KB article, thought I understood it and later say “wait, what?”

First, let’s look at a host…

Analyzing Maximum vSphere vCPUs on a Host

In vSphere 5.1 (I’ll update this for future versions), you can have up to 64 vCPUs configured on
a virtual machine, if you have vSphere Enterprise Plus (the number goes down as the edition of
vSphere is reduced). BUT, you are also limited to assigning a maximum number of vCPUS that
your physical server has available in logical CPUs.

If we take a look at one server in my lab, it’s a Dell T610 with a single physical CPU socket that
has 4 cores (quad-core) and hyperthreading is enabled, which doubles the number of cores
presented, for a total of 8 cores:

vCPU Confusion

What this means is that the maximum number of vCPUs that I could configure for a VM on this
host would be 8. Let’s verify.
If we edit the settings of a VM on that host, we see that we can either configure it with 8 virtual
sockets and 1 virtual core per socket, 4 sockets and 2 cores per socket, 2 sockets and 4 cores per
socket, or 8 sockets and 1 core per socket (all of which, if you multiple, totals 8):

vCPU Properties

On another host, a Dell M610, I have 2 physical sockets, 4 cores per socket, with hyperthreading
enabled, which gives me a total of 16 logical processors:

2 Physical Sockets, 4 Cores per Socket = 16 Logical Processors

If I look at a VM on that host (note that these VMs need to be hardware version 8 or above), I
can configured any combination of virtual cores that total no more than 16 (could be 16 x 1, 1 x
16, 2 x 8, 8 x 2, 4 x 4, etc):
16 Maximum Cores

Now that you know the the limitations of the physical hosts and hypervisor, let’s look at why this
differentiation of virtual sockets vs virtual cores is available and what you should choose.

The Guest OS Knows the Sockets and Cores

Warning! A very important part of understanding this is that when you configure a vCPU on a
VM, that vCPU is actually a Virtual Core, not a virtual socket. Also, a vCPU has been
traditionally presented to the guest OS in a VM as a single core, single socket processor.

What you might not have thought about is that the guest operating systems know not only the
number of “CPUs” but also the number of sockets and cores that the CPU has available. As
Kendrick Coleman shows in his post on vCPU for License Trickery, you can use the CPU-Z
utility to find out how many sockets and cores your virtual machine has.

Does it make any difference for the performance of the applications inside if the OS thinks it has
4 sockets and 2 cores per socket or 1 socket with 8 cores? As far as I can tell, NO (but I
welcome your comments). The guest OS is just scheduling the threads from each process onto a
CPU core and, using the hypervisor, those virtualized threads are scheduled, by the VMkernel
scheduler, on a logical CPU core of the operating system.
Task Manager Breakdown

If it doesn’t have any effect on performance, why would VMware even offered this option to
specify the number of sockets per core for each VM? The answer is that it’s all related to
software licensing for the OS and applications.

OS and Application Licensing Per Socket

Many (too many) operating systems and applications are licensed “per socket”. You might pay
$5000 per socket for a particular application. Let’s say that Windows Server 2003 is limited to
running on “up to 4 CPUs” (or sockets). Say that you had a physical server with 4 quad core
CPUs, for a total of 16 cores and then enabled hyperthreading for a total of 32 logical cores. If
you configured your VM to have up to 4 “CPUs”, as the license specified, those 4 vCPUs would
only run on 4 physical cores. However, if you had of installed that same Windows OS on the
same phsyical server, it would have run on up to 4 sockets but, with each socket having 4 cores,
it would have offered up to 16 logical cores for Windows (which still not breaking your end user
license agreement). In other words, you would get to use more cores and likely receive more
throughput.

In the end, what you are doing here is gaining granular control over how many virtual sockets
and how many virtual cores per socket are presented to each virtual machine. This way, you can
ensure you get the performance you need without having to buy extra licenses and without
violating your EULA.
A Tale of Two Metrics: Windows CPU or
vCenter VM CPU
A not uncommon question from our customers, or even from our own support people, is "Why
does monitoring a Windows system running on VMWare report different CPU data than
monitoring the virtual machine from the ESXi host? The ESX monitoring must be wrong!"

For example, here is LogicMonitor graphing the CPU load of a Windows system running as a
Virtual Machine on ESXi. In this case, the CPU is gathered from WMI, by querying the
Windows OS:

Here is the same machine at the same time, but this is how ESXi sees the load:

So which view is right? Why do they differ? Which should you pay attention to?

Both views are right. This Guest virtual machine was a Windows system, with four vCPUs. I
was running HyperPi, set to use 3 CPUs. So from the point of view of Windows (the top graph),
it had three CPUs it was running at 100% (for an average of about 75% of the total system,
which is what the top graph shows.)

However, from the point of view of ESX, this was not the only guest using those 4 CPUs. The
hypervisor has limited CPU resources, and it was sharing them with quite a few other systems.
Looking at a graph of just the top 10 VMs by CPU usage shows that when the system under test
(LMNOD1) increases its CPU demands, it takes CPU resources from other systems:

Which explains why even though the Windows system was provisioned with 4vCPUS, it was
only able to get the resources of slightly less than 50% of the capacity of the 4 virtual CPUs it
was provisioned with.

So why did Windows think it was getting more CPU than ESXi was giving it? Because it
doesn't know it's virtualized.

There are times when the Guest OS (windows perfmon, etc) will show lower CPU usage than
VMware reports. The guest doesn't know anything about the CPU used to virtualize the
hardware resources it is requesting. ESXi does, and accurately attributes that load. Comparing
the top two graphs, you can note that outside the period of load test, Windows reports a slightly
lower CPU resource usage than does ESXi.

There are also times when the Guest OS will report a higher usage of CPUs than the hypervisor
will, as in the period when the load test was running above. Windows doesn't know that part of
the time, the CPUs are being stolen away from it and given to other guest machines. As far as it
knows, it is using all the CPU it can on 3 of the 4 CPUs- so that must equal 75% load. It doesn't
know that part of the time, it has no access to the CPU's as they are being used elsewhere, so the
real time CPU usage is only about 50%.

There are other reasons that the internal measurements can differ, such as clock skew and
adjustments, but suffice it to say that if you are looking for an absolute view of how CPUs are
being used, you cannot trust the guest's view of itself.

Does that mean the Guest's view is meaningless? Far from it. If you are monitoring the guest
itself with LogicMonitor, or using Perfmon or TaskManager on it directly, and see high CPU
usage - this is still a valid indicator that the guest is trying to do a lot of work, and running out of
CPU resources. Whether it is running out of resources because it is actually using all the capacity
of the physical CPU (which would be the case in a standalone machine running at 100%), or
running out of CPU resources because it is sharing them unknowingly with other virtual systems
does not matter. The relevant fact remains that the system is using all the CPU it can. It may
warrant adjusting CPU resources, reservations or shares, or investigating the workload.

Is the ESXi host's view of CPU important, then? It is if you are the Vcenter administrator, and
want to know which systems are actually using real CPU resources. (Overview graphs like the
one shown above are very helpful for investigating these issues quickly.) But you cannot use the
ESXi host's view of the CPU load to tell if a system wants more than the resources allocated to it
- only how much of the resources allocated to it are being used. But if other systems are
competing for the same resources, any one guest may not get the resources it's allocated. If you
are looking at ESX, a better way to tell if the guests are wanting more resources is to look at
CPU Ready (which means that the guest had work ready to schedule on a CPU, but no CPU
resource was available.) The graph for the VM above, during the period of the load test, clearly
shows a big spike in CPU ready time:
So, whether you are looking at Vcenter, Perfmon, or the LogicMonitor view of the ESXi host or
the Guest - you need to understand what you are looking at, how it matters, and why the different
views don't agree.

(Note: with VMWare Tools installed, perfmon in the guest can show you the accurate CPU
usage under the "VM Processor" counter. This will be the same CPU usage you see in Vcenter,
or in LogicMonitor from monitoring ESXi or VCenter. But you still need to know what it means
and how to interpret it.)

More Resources:

Watch this short video on how to quickly identify which VM on an ESX host is hogging
resources:
vCPU configuration. Performance impact
between virtual sockets and virtual cores?
A question that I frequently receive is if there is a difference in virtual machine performance if
the virtual machine is created with multiple cores instead of selecting multiple sockets?

Single core CPU


VMware introduced multi core virtual CPU in vSphere 4.1 to avoid socket restrictions used by
operating systems. In vSphere a vCPU is presented to the operating system as a single core cpu
in a single socket, this limits the number of vCPUs that can be operating system. Typically the
OS-vendor only restricts the number of physical CPU and not the number of logical CPU (better
know as cores).

For example, Windows 2008 standard is limited to 4 physical CPUs, and it will not utilize any
additional vCPUs if you configure the virtual machine with more than 4 vCPUs. To solve the
limitation of physical, VMware introduced the vCPU configuration options “virtual sockets” and
“cores per socket”. With this change you can for example configure the virtual machine with 1
virtual sockets and 8 cores per socket allowing the operating system to use 8 vCPUs.

Just to show it works, I initially equipped the VM running Windows 2008 standard with 8 vCPU
each presented as a single core.

When reviewing the cpu configuration inside the Guest OS, the task manager shows 4 CPUs:

A final check by opening windows task manager verified it only uses 4 vCPUs.
I reconfigured the virtual machine to present 8 vCPU using a single socket and 8 number of
cores per socket.

I proceeded to power-on the virtual machine:

Performance impact
Ok so it worked, now the big question, will it make a difference to use multiple sockets or one
socket? How will the Vmkernel utilize the physical cores? Might it impact any NUMA
configuration. And it can be a very short answer. No! There is no performance impact between
using virtual cores or virtual sockets. (Other than the number of usuable vCPU of course).

Abstraction layer
And its because of the power of the abstraction layer. Virtual socket and virtual socket are
“constructs” presented upstream to the tightly isolated software container which we call a virtual
machine. When you run a operating system it detects the hardware (layout) within the virtual
machine. The VMkernel schedules a Virtual Machine Monitor (VMM) for every vCPU. The
virtual machine vCPU configuration is the sum of number of cores x number of sockets. Lets use
the example of 2 virtual socket 2 virtual core configuration.
The light blue box shows the configuration the virtual machine presents to the guest OS. When a
CPU instruction leaves the virtual machine it get picked up the Vmkernel. For each vCPU the
VMkernel schedules a VMM world. When a CPU instruction leaves the virtual machine it gets
picked up by a vCPU VMM world. Socket configurations are transparent for the VMkernel

NUMA
When a virtual machine powers on in a NUMA system, it is assigned a home node where
memory is preferentially allocated. The vCPUs of a virtual machine are grouped in a NUMA
client and this NUMA client is scheduled on a physical NUMA node. For more information
about NUMA please read the article: “Sizing VMs and NUMA nodes” Although it’s a not
covering the most current vSphere release, the basics remain the same.

To verify that the sockets have no impact on the NUMA scheduler I powered up a new virtual
machine and configured it with two sockets with each 2 cores. The host running the virtual
machine is a dual socket quad core machine with HT enabled. Providing 4 vCPUs to the virtual
machine ensures me that it will fit inside a single NUMA node.
When reviewing the memory configuration of the virtual machine in ESXTOP we can deduct
that its running on a single physical CPU using 4 cores on that die. Open the console, run
ESXTOP, press M for memory view. Use V (capital v) to display on VM worlds only. Press F
and select G for NUMA stats. You might want to disable other fields to reduce the amount of
information on your screen.

The column, NHN identifies the current Numa Home Node, which in Machine2 case is Numa
node 0. N%L indicates how much memory is accessed by the NUMA client and it shows 100%,
indicating that all vCPUs access local memory. The column GST_ND0 indicates how much
memory is provided by Node0 to the Guest. This number is equal to the NLMEM counter, which
indicated the current amount of local memory being accessed by VM on that home node.

vNUMA
What if you have a virtual machine with more than 8 CPU (for clarity, life of a Wide NUMA
starts at a vCPU count of 9). Then the VMkernel presents the NUMA client home nodes to the
Guest OS. Similar to the normal scheduling, the socket configuration are also transparent in this
case.

Why differentiate between sockets and cores?


Well there is a difference and it has to do with the Hot-Add CPU feature. When enabling the
option CPU Hot Plug you can only increase the virtual socket count.
In short using virtual sockets or virtual cores does not impact the performance of the virtual
machine. It only effects the initial configuration and the ability to assign more vCPU when your
Operating System restricts the maximum number of physical CPUs. Always check if your VM
configuration is in compliance with the vendor licensing rules before increasing the vCPU count!

Anda mungkin juga menyukai