I would not say that its a bad idea, but its always prefer not to touch the CPU affinity on sql server unless you know for sure that the problem is due to excessive context switching (and this too might be not the actual problem, but just a symptom).
This seems to be working okay, even when failover occurs, but is this a bad idea?
Forcing affinity means that you take away sql server's ability to move processes between schedulers.
Consider below example : (using coreinfo from sysinternals) On a 2 socket, 12 CPU (24 logical cpu with hyper-threading enabled) with 2 NUMA Node machine, we get 12 CPU per NUMA node.
Logical to Physical Processor Map:
**---------------------- Physical Processor 0 (Hyperthreaded)
--**-------------------- Physical Processor 1 (Hyperthreaded)
----**------------------ Physical Processor 2 (Hyperthreaded)
------**---------------- Physical Processor 3 (Hyperthreaded)
--------**-------------- Physical Processor 4 (Hyperthreaded)
----------**------------ Physical Processor 5 (Hyperthreaded)
------------**---------- Physical Processor 6 (Hyperthreaded)
--------------**-------- Physical Processor 7 (Hyperthreaded)
----------------**------ Physical Processor 8 (Hyperthreaded)
------------------**---- Physical Processor 9 (Hyperthreaded)
--------------------**-- Physical Processor 10 (Hyperthreaded)
----------------------** Physical Processor 11 (Hyperthreaded)
Logical Processor to Socket Map:
************------------ Socket 0
------------************ Socket 1
Logical Processor to NUMA Node Map:
************------------ NUMA Node 0
------------************ NUMA Node 1
Remember that the schedulers are not bound to cores and SQL Server does its own thread scheduling. For some reason, if a non SQL process maxes out CPU 1 which is running a thread on scheduler 1, then SQL server has the ability to put that scheduler onto any available CPU e.g. CPU 4. SQL Server has its own load balancer that will move a thread from one CPU to another.
If you set processor affinity, then you remove the ability from sql server to switch scheduler. So scheduler 1 is bound to CPU 1 and it has to run there only.
Some Excerpts from my book library ...
From SQL Server Professional Internals and Troubleshooting Book :
One common use for soft NUMA is when a SQL Server is hosting an application that has several different groups of users with very different query requirements. After configuring your theoretical 16-processor server for soft NUMA, assigning 2 × 4 CPU nodes and one 8-CPU node to a third NUMA node, you would next confi gure connection affinity for the three nodes to different ports, and then change the connection settings for each class of workload, so that workload A is “affinitized” to port x, which connects to the first NUMA node; workload B is affi nitized to port y, which connects to the second NUMA node, and all other workloads are affi nitized to port z, which is set to connect to the third NUMA node.
From Microsoft SQL Server 2008 Internals book :
In some situations, you might want to limit the number of CPUs available but not bind a particular scheduler to a single CPU—for example, if you are using a multiple-CPU machine for server consolidation. Suppose that you have a 64-processor machine on which you are running eight SQL Server instances and you want each instance to use eight of the processors. Each instance has a different affi nity mask that specifies a different subset of the 64 processors, so you might have affi nity mask values 255 (0xFF), 65280 (0xFF00), 16711680 (0xFF0000), and 4278190080 (0xFF000000). Because the affi nity mask is set, each instance has hard binding of scheduler to CPU. If you want to limit the number of CPUs but still not constrain a particular scheduler to running on a specific CPU, you can start SQL Server with trace flag 8002. This lets you have CPUs mapped to an instance, but within the instance, schedulers are not bound to CPUs.
If you want to limit the CPU that a SQL instance uses on a server, use Windows System Resource Manager instead.
As always TEST, TEST and TEST your entire workload to avoid any serious performance issues if you decide to use "CPU Affinity".
Suggestion : Leave it to default - as if you get it wrong, things will be worse and troubleshooting would be more difficult !
What would happen in failover if the passive server didn't have as many processors?
performance problems can occur for the instance, due to NUMA node scheduler imbalances
References :
Clustering is complex, and there are lots of moving parts (no pun intended). Let me try to break this down into more manageable chunks:
From a terminology perspective, there's your Windows Server Failover Cluster (WSFC), and your SQL Server Failover Cluster Instances (FCI). I try to avoid saying "Cluster" and use these acronyms to avoid ambiguity.
Quorum:
The quorum is the number of votes necessary to transact business on your WSFC. Depending on your WSFC configuration, voters can be nodes (servers), a drive, or a file share. You need more than 50% of your votes in order for the WSFC to be online. If you lose 50% or more of your voters, then the WSFC and all clustered services (including your FCI) will go offline and not come back until you have (or force) quorum.
In your configuration, you have two nodes, and one file share for a total of three votes. Any one of those voters can go offline. When you lost the file share, you still had two nodes online, so your WSFC and all clustered services stayed online.
Cluster Owner/Host Server:
When you say that "Node2 was now specified as the active node by Windows", I suspect you are referring to the "Current Host Server" for the cluster. So what is that?
Your WSFC has a network name and an IP address. That name & IP has to be tied to a machine that is part of your cluster. More specifically, it can be tied to any one machine in your cluster. This is part of your WSFC, but not your FCI.
In your scenario, you have three FCIs on a two-node WSFC. It would be a perfectly valid to have one FCI on Node1, and two FCIs on Node2. And the "Current Host Server" for the WSFC could be either node. SQL Server won't care.
So what happened: As you said, there were no adverse effects on the databases. I'd expect that, because SQL Server isn't tied to that WSFC host server. I don't think I wouldn't have expected the host server to move when the file share failed--but I'd let your Windows guys dig into that more. From a SQL perspective, everything worked as expected.
Best Answer
No, you cannot reliably use VMDK files for shared storage in VMware-virtualized Windows Server Failover Clustering. RDMs (Raw device mappings) are the only supported method of provisioning disks in this scenario. Use RDMs in physical compatibility mode (pass-through RDM). You cannot use virtual disks or RDMs in virtual compatibility mode (non-pass-through RDM) for shared storage. You can use VMDKs for the non-shared drives. See Setup for Failover Clustering and Microsoft Cluster Service from VMware for more details.
Since you have Veeam, you should use the SQL Server backup tools included with that package; see Veeam's help center for vSphere backup and restore for more details.