I have a SQL Server 2008 R2 failover cluster with a passive node and an active node. The servers are physically identical. I have two instances of SQL Server, one set to use all the CPUs in the first 3 NUMA nodes, the other set to use all the CPUs in the 4th NUMA node.
This seems to be working okay, even when failover occurs, but is this a bad idea?
What would happen in failover if the passive server didn't have as many processors?
Best Answer
I would not say that its a bad idea, but its always prefer not to touch the CPU affinity on sql server unless you know for sure that the problem is due to excessive context switching (and this too might be not the actual problem, but just a symptom).
Forcing affinity means that you take away sql server's ability to move processes between schedulers.
Consider below example : (using coreinfo from sysinternals) On a 2 socket, 12 CPU (24 logical cpu with hyper-threading enabled) with 2 NUMA Node machine, we get 12 CPU per NUMA node.
Remember that the schedulers are not bound to cores and SQL Server does its own thread scheduling. For some reason, if a non SQL process maxes out CPU 1 which is running a thread on scheduler 1, then SQL server has the ability to put that scheduler onto any available CPU e.g. CPU 4. SQL Server has its own load balancer that will move a thread from one CPU to another.
If you set processor affinity, then you remove the ability from sql server to switch scheduler. So scheduler 1 is bound to CPU 1 and it has to run there only.
Some Excerpts from my book library ...
From SQL Server Professional Internals and Troubleshooting Book :
From Microsoft SQL Server 2008 Internals book :
If you want to limit the CPU that a SQL instance uses on a server, use Windows System Resource Manager instead.
As always TEST, TEST and TEST your entire workload to avoid any serious performance issues if you decide to use "CPU Affinity".
Suggestion : Leave it to default - as if you get it wrong, things will be worse and troubleshooting would be more difficult !
performance problems can occur for the instance, due to NUMA node scheduler imbalances
References :
“Don’t Touch That Button!” Four Dangerous Settings in SQL Server(video archived - so no longer available)