Sql-server – CPU column 5x higher on one machine vs another

performanceprofilersql servervirtualisation

We're running into a strange issue in a deployed version of our product where the performance is just not very good. When we restore a backup of their DB on one of our machines, it performs as we would expect.

While there are plenty of obvious things to look at, I'm focused on one area for the moment: the number of CPU cycles is 5-7 times higher in profile traces on their machine than it is on ours – running the same queries against the same database.

How does the CPU column in SQL perfmon get calculated on a vm? Do cycles get clocked while the virtual CPU is waiting for a real one? Could the switching of the real CPU under the virtual one lead to cache loss and higher CPU numbers? Am I wasting my time here?

Details:
– Both servers are on ESX hosts with SAN storage.
– I believe their infrastructure may be overloaded.
– VM has 4 CPUs, 16 (or more) GB ram minimal real load on either from within the VM
– SAN host is running loads far less than max
– ESX hosts are said to be not overloaded (although their infrastructure is expanding soon) – I will be digging into this more tomorrow.

I know there are a lot of other variables involved, but my curiosity is killing me about the huge difference in CPU numbers for the same queries on the same DB – even if it ends up not being the problem, I'd love to know more about what causes these differences!

Best Answer

Over-committing CPU and memory on a VM can certainly give you false positives. The true performance bottlenecks are "hidden" from the OS, so you may not see it as easily as you would on a normal windows box. I have seen this countless times before, so we always recommend reserves on assigned resources, and where possible, dedicated LUNs for the SQL Server.

Also, if your client is using "older" (for e.g. pre-Nehalem) processors, the actual virtualisation may be impacted by shortcomings of the hardware. I'd never recommend putting a SQL Server VM (especially not a high-IO one) on a host that has older hardware. Nehalem family introduced VT-d (if memory serves), which (similarly to how VT-x attaches a VM ID tag onto the TLBs, enabling the hardware to know which VM a processor call belongs to) means your hardware is optimised for directed I/O. All of these help in freeing up the hypervisor from having to keep track of these when emulating processor, memory and I/O calls.

Also, other things that may impact performance at a VM level is DRS, and VMotion, but the newer the hardware, the less of a concern they become, especially when you have VT-d, VT-c and VT-n enabled processors and chipset.

Pending feedback from you, I will be extending this answer, so please provide comments...