Wednesday, 6 January 2016

Which is better? 1 vCPU or 2 vCPU standard VMs

Which is better? 1 vCPU or 2 vCPU standard VMs


I came across a comment that it's better to use two vCPU in your VM template rather than a single
vCPU. It is meant to perform better, schedule better and scale better than just a single one. Now I had my doubts but I've always liked testing these things on a real server to see what happens. My test rig has a Xeon 4 core cpu with lots of GHz so I set up 4 and then 8 VMs and tested different loads and configs and have the ESXTOP results below.

Test #1: 4 x 1vCPU VMs (5 minute test, using 1 core on each VM at 100%, 200MB memory load)

No %VMWAIT
Constant %RDY
Constant %OVRLP
So, copes well, nothing too crazy here. 

Test #2: 4 x 2vCPU VMs but only 1 core maxed (5 minute test, using 1 core only on each VM at 100%, 200MB memory load)


Periodic %VMWait
Constant %RDY
Constant %OVRLP
So, this actually performed better as with loadmaster it used one thread but scheduled them between each available vCPU and got overall better performance. Interesting! 

Test #3: 4 x 2 vCPU VMs but used two of them maxed out cores (5 minute test, using 1 core on 2 VMs only at 100%, 200MB memory load)

No CoStop issues seen
Constant %OVRLP on busy VMs
Constant %RDY on all 4 VMs
Periodic %VMWAIT on 2 idle VMs
So, this time things aren't too bad, but the idle VMs are probably starved a bit until they start ramping up also. 

Now we go into over commitment, exceeding the physical cores available by stacking up more than 4 vCPUs:

Test #4: 8 x 1vCPU VMs (5 minute test, using 1 core each at 75% load, 200MB memory load)

Constant %OVRLP on all VMs
Constant %RDY on all VMs
Pegged the physical cores but only %RDY really standing out. ESXi scheduler is doing it's job nicely!


Test #5: 8 x 2vCPU VMs (5 minute test, using 1 core each at 75% load, 200MB memory load)

Only difference between this and last test is increased number of vCPU per VM, still only running single threaded 75% load but it’s switches between Core 0 & 1 inside the VM.
Constant %VMWAIT on some VMs
Constant %CSTP on all VMs
Same workload, but now getting scheduling conflicts as cpu overprovisioning is 16 vCPU to 4 pCPU vs previous test of 8 vCPU to 4 pCPU. Which one do you think performs better?!! Now Co-Stop isn't too high but with the same number of VMs we heading into performance trouble territory. 

Test #6: 8 x 2vCPU VMs (5 minute test, using 2 threads @ 37% load, 200MB memory load)

This is similar to previous test but we’re now directly addressing the second vCPU in each VM.
Constant %CSTP on all VMs – very high level despite similar workload, just running two threads instead of one, performance on this would be awful. 


Notes:
%OVRLP – Time spent on behalf of a different resource pool/VM or world while the local was scheduled. Not included in %SYS.
%WAIT – Time spent in the blocked or busy wait state.
%RDY – Time CPU is ready to run, waiting for something else.

NWLD – Number of members in a running worlds resource pool or VM. 
(increases when # vCPU goes from 1 to 2)

So, I would say from what I saw that if you never over provision your physical cpu and keep a 1 to 1 mapping (i.e. never exceed the total number of physical cpu cores with the total number of virtual CPU cores) then you might actually get better performance with single threaded workloads. 

Once you get into overcommitment you're looking at issues. You're opening more CPU paths for VMs to take down the physical cores and while VMware does an amazing job, with like for like workloads, a lower number of vCPU performs better, or at least I would expect it to based on the ESXTOP results above. 

So if you have a static environment you have a choice. If you're a consultant and not hands on day after day as an admin on a particular customers environment then I would say you're taking a chance with 2 x vCPUs in the template. I would expect the customer to be calling you within a year and complaining about really bad performance during critical end of month periods and while I would expect a storage issue in this case the configuration caused by too many vCPU would require right sizing all the VMs and take downtime for each, not always easy or possible......makes your choice, takes your chances!