SQL Server Cluster – Setting Up Multiple Nodes for High Availability

clusteringfailoverhigh-availabilitysql server

Does it make sense to think about a failover cluster configuration where less than 50% of all included nodes are standby?

For example instead of setting up 4 two-node clusters where one is active and one is on standby on each cluster – doesn't it make more sense to create 1 cluster with 4 active nodes and two on standby/passive for failover? That makes a saving of 2 nodes in total.

Sure, there is a risk in case multiple nodes fail unplanned and have to share one of the previous standby nodes, but in case you decide to take that risk over provisioning the standby hardware resources?

Looking for "multiple node failover clustering" I couldn't find more on this.

Best Answer

Firstly, having 4 separate 2 node clusters with one FCI deployment in each seems to me to be a little extreme. The second model you describe does not really make sense to me. Instead of either deployment I would suggest running 2 Clusters of 4 nodes (you would need a witness for each) and consider one node in each as a dedicated failover node (i.e. that you install 3 SQL FCIs to each cluster).

Also you should try to deploy on Windows 2012 R2 because of its dynamic quorum and dynamic witness capabilities (2012 only had dynamic quorum) -which will help to maintain maximum uptime upon node failure.

As you will be aware, the biggest issue with not having enough dedicated failover nodes is that ultimately you are exposed to SQL co-existence considerations (such as setting the correct max-memory upon failover if shared with other SQL FCIs amongst other things). Furthermore in truly Highly Available environments you probably would not want to fail-back post failover (otherwise it is a further addition of down-time), so sharing dedicated failover nodes would probably necessitate failback of one instance (assuming two had failed over).

However, that said, all system deployments are a compromise and provisioning too many redundant failover nodes does not make sense and it is important to find the right balance between HA requirement and cost effectiveness.

As a general rule of thumb you should consider an approximate 1 dedicated (failover node) per (2 SQL FCI) 3 node cluster, 2 dedicated per (5 FCI) 7 node cluster and 3 dedicated per (8 FCI) 11 node cluster. These figures are generous approximations from nearly 20 years personal experience of SQL Failover Clustering but obviously there are many other factors that can affect your design strategy.

p.s. There are some technical Clustering reasons why it makes sense not to deploy a huge number of cluster nodes so I would probably not go beyond a 5 node cluster anyway (and in that case I would probably still be happy with 1 dedicated failover node).

Related Solutions

Sql-server – SQL Server Mirroring kicks in before cluster has time to fail over

Database mirroring questions from TechEd: failovers and partner timeouts

... when the cluster is failing over and the principal instance is unavailable for 60-90 seconds, the other instances in the mirroring session will time-out after ten seconds and initiate a mirroring failover... The solution to the problem is to change the partner time-out value - a facility that isn't well known. To change the partner time-out value for a mirroring session, use the following code:
ALTER DATABASE mydatabase SET PARTNER TIMEOUT 90;

Sql-server – Is it bad to assign specific processors to SQL Server 2008 R2 cluster instances

I would not say that its a bad idea, but its always prefer not to touch the CPU affinity on sql server unless you know for sure that the problem is due to excessive context switching (and this too might be not the actual problem, but just a symptom).

This seems to be working okay, even when failover occurs, but is this a bad idea?

Forcing affinity means that you take away sql server's ability to move processes between schedulers.

Consider below example : (using coreinfo from sysinternals) On a 2 socket, 12 CPU (24 logical cpu with hyper-threading enabled) with 2 NUMA Node machine, we get 12 CPU per NUMA node.

Logical to Physical Processor Map:
**----------------------  Physical Processor 0 (Hyperthreaded)
--**--------------------  Physical Processor 1 (Hyperthreaded)
----**------------------  Physical Processor 2 (Hyperthreaded)
------**----------------  Physical Processor 3 (Hyperthreaded)
--------**--------------  Physical Processor 4 (Hyperthreaded)
----------**------------  Physical Processor 5 (Hyperthreaded)
------------**----------  Physical Processor 6 (Hyperthreaded)
--------------**--------  Physical Processor 7 (Hyperthreaded)
----------------**------  Physical Processor 8 (Hyperthreaded)
------------------**----  Physical Processor 9 (Hyperthreaded)
--------------------**--  Physical Processor 10 (Hyperthreaded)
----------------------**  Physical Processor 11 (Hyperthreaded)

Logical Processor to Socket Map:
************------------  Socket 0
------------************  Socket 1

Logical Processor to NUMA Node Map:
************------------  NUMA Node 0
------------************  NUMA Node 1

Remember that the schedulers are not bound to cores and SQL Server does its own thread scheduling. For some reason, if a non SQL process maxes out CPU 1 which is running a thread on scheduler 1, then SQL server has the ability to put that scheduler onto any available CPU e.g. CPU 4. SQL Server has its own load balancer that will move a thread from one CPU to another.

If you set processor affinity, then you remove the ability from sql server to switch scheduler. So scheduler 1 is bound to CPU 1 and it has to run there only.

Some Excerpts from my book library ...

From SQL Server Professional Internals and Troubleshooting Book :

One common use for soft NUMA is when a SQL Server is hosting an application that has several different groups of users with very different query requirements. After conﬁguring your theoretical 16-processor server for soft NUMA, assigning 2 × 4 CPU nodes and one 8-CPU node to a third NUMA node, you would next conﬁ gure connection afﬁnity for the three nodes to different ports, and then change the connection settings for each class of workload, so that workload A is “afﬁnitized” to port x, which connects to the ﬁrst NUMA node; workload B is afﬁ nitized to port y, which connects to the second NUMA node, and all other workloads are afﬁ nitized to port z, which is set to connect to the third NUMA node.

From Microsoft SQL Server 2008 Internals book :

In some situations, you might want to limit the number of CPUs available but not bind a particular scheduler to a single CPU—for example, if you are using a multiple-CPU machine for server consolidation. Suppose that you have a 64-processor machine on which you are running eight SQL Server instances and you want each instance to use eight of the processors. Each instance has a different afﬁ nity mask that speciﬁes a different subset of the 64 processors, so you might have afﬁ nity mask values 255 (0xFF), 65280 (0xFF00), 16711680 (0xFF0000), and 4278190080 (0xFF000000). Because the afﬁ nity mask is set, each instance has hard binding of scheduler to CPU. If you want to limit the number of CPUs but still not constrain a particular scheduler to running on a speciﬁc CPU, you can start SQL Server with trace ﬂag 8002. This lets you have CPUs mapped to an instance, but within the instance, schedulers are not bound to CPUs.

If you want to limit the CPU that a SQL instance uses on a server, use Windows System Resource Manager instead.

As always TEST, TEST and TEST your entire workload to avoid any serious performance issues if you decide to use "CPU Affinity".

Suggestion : Leave it to default - as if you get it wrong, things will be worse and troubleshooting would be more difficult !

What would happen in failover if the passive server didn't have as many processors?

performance problems can occur for the instance, due to NUMA node scheduler imbalances

References :

affinity mask Server Configuration Option
~~“Don’t Touch That Button!” Four Dangerous Settings in SQL Server~~ (video archived - so no longer available)
Changes in affinity settings of SQL Server 2008 R2
SQL Server and Soft NUMA
Setting a Processor Affinity in SQL Server – the (unwanted) Side-Effects

Best Answer

Related Solutions

Sql-server – SQL Server Mirroring kicks in before cluster has time to fail over

Sql-server – Is it bad to assign specific processors to SQL Server 2008 R2 cluster instances

Related Question