Sql-server – SQL Server cluster questions please

clusteringsql serversql-server-2008-r2

We have SQL Server 2008 R2 cluster active/active/active (a, b, c) and three instances run on this environment.

We did a test on one of the instances, and failover test.

Manual failover from node c to b. After 3 minutes, the application cannot connect to SQL Server.
Back to node c we are OK and application is happy
Manual failover from node c to a. After 3 minutes, the application cannot connect to SQL Server.
Back to node c we are OK and application is happy

Please give me some helps.

Best Answer

There are a few things that I can think of off the top of my head. You're running a multi-instance failover cluster so in theory I'd expect to see each node to be sized such that at any given point in time it can handle the load of all three instances. Chances are that this is not the case, but maybe it is. Ideally, you'd also have a spare node that can handle failures but that doesn't sound like it's the case here.

There are some configurations that you can check to ensure that you've not set yourself up for failure and the first one I'd check would be to run

sp_configure 'max server memory (MB)'

If your run value is 2147483647 then you've got it set to allow SQL Server to take as much memory as it thinks it needs. This is set per instance so when you have multiple instances trying to consume all available RAM you will get memory pressure.

Having said that (read: actually, start here), you've not given us any other information about what you've done to discover why the application stops responding. Is it just the application that connects to the C node that chokes, or does the original application also not work? This could end up being something as simple as the application connection string is connecting to the IP/DNS name of the C node and not the VIP. If this is the case, then when C is no longer serving SQL Server then you're not going to be able to connect.

Step 1: Ensure the connection strings are actually connecting to the instance/VIP name and not the nodes.

Step 1.5: (Thanks to Thomas Stringer), make sure that you're giving the new instance enough time to actually recover the database. Connect to the instance via SSMS and see if your databases are in recovery.

Step 2: If Step 1 is correct, then get on the node that is running multiple instances and see what's going on. I'd recommend using PerfMon because "Task Manager is a dirty, filthy liar" and looking at metrics for the various subsystems starting with Memory, Network, CPU, and Disk IO. This answer contains much of what you'd need in order to check for resource pressure assuming you have connectivity to the instance and the databases are all fully recovered.

Related Solutions

Sql-server – SQL Server cluster failover

Failover clustering works to provide constant service in the event of a failure (defining a failure as an "abnormal termination of the previously active application, server, system, or network" - http://en.wikipedia.org/wiki/Failover). Manual manipulation (start, restart, pause, and stop) upon the service being clustered (SQL Server, File Services, etc.) does not qualify as an abnormal termination of the service.

If I recall, at least in Windows Server 2008, you can simulate a failover test case in the Failover Management Console under Administration Tools. Check this out for a list of testing methods: http://blogs.technet.com/b/vipulshah/archive/2009/06/17/failover-cluster-testing-methods.aspx

Sql-server – Is it bad to assign specific processors to SQL Server 2008 R2 cluster instances

I would not say that its a bad idea, but its always prefer not to touch the CPU affinity on sql server unless you know for sure that the problem is due to excessive context switching (and this too might be not the actual problem, but just a symptom).

This seems to be working okay, even when failover occurs, but is this a bad idea?

Forcing affinity means that you take away sql server's ability to move processes between schedulers.

Consider below example : (using coreinfo from sysinternals) On a 2 socket, 12 CPU (24 logical cpu with hyper-threading enabled) with 2 NUMA Node machine, we get 12 CPU per NUMA node.

Logical to Physical Processor Map:
**----------------------  Physical Processor 0 (Hyperthreaded)
--**--------------------  Physical Processor 1 (Hyperthreaded)
----**------------------  Physical Processor 2 (Hyperthreaded)
------**----------------  Physical Processor 3 (Hyperthreaded)
--------**--------------  Physical Processor 4 (Hyperthreaded)
----------**------------  Physical Processor 5 (Hyperthreaded)
------------**----------  Physical Processor 6 (Hyperthreaded)
--------------**--------  Physical Processor 7 (Hyperthreaded)
----------------**------  Physical Processor 8 (Hyperthreaded)
------------------**----  Physical Processor 9 (Hyperthreaded)
--------------------**--  Physical Processor 10 (Hyperthreaded)
----------------------**  Physical Processor 11 (Hyperthreaded)

Logical Processor to Socket Map:
************------------  Socket 0
------------************  Socket 1

Logical Processor to NUMA Node Map:
************------------  NUMA Node 0
------------************  NUMA Node 1

Remember that the schedulers are not bound to cores and SQL Server does its own thread scheduling. For some reason, if a non SQL process maxes out CPU 1 which is running a thread on scheduler 1, then SQL server has the ability to put that scheduler onto any available CPU e.g. CPU 4. SQL Server has its own load balancer that will move a thread from one CPU to another.

If you set processor affinity, then you remove the ability from sql server to switch scheduler. So scheduler 1 is bound to CPU 1 and it has to run there only.

Some Excerpts from my book library ...

From SQL Server Professional Internals and Troubleshooting Book :

One common use for soft NUMA is when a SQL Server is hosting an application that has several different groups of users with very different query requirements. After conﬁguring your theoretical 16-processor server for soft NUMA, assigning 2 × 4 CPU nodes and one 8-CPU node to a third NUMA node, you would next conﬁ gure connection afﬁnity for the three nodes to different ports, and then change the connection settings for each class of workload, so that workload A is “afﬁnitized” to port x, which connects to the ﬁrst NUMA node; workload B is afﬁ nitized to port y, which connects to the second NUMA node, and all other workloads are afﬁ nitized to port z, which is set to connect to the third NUMA node.

From Microsoft SQL Server 2008 Internals book :

In some situations, you might want to limit the number of CPUs available but not bind a particular scheduler to a single CPU—for example, if you are using a multiple-CPU machine for server consolidation. Suppose that you have a 64-processor machine on which you are running eight SQL Server instances and you want each instance to use eight of the processors. Each instance has a different afﬁ nity mask that speciﬁes a different subset of the 64 processors, so you might have afﬁ nity mask values 255 (0xFF), 65280 (0xFF00), 16711680 (0xFF0000), and 4278190080 (0xFF000000). Because the afﬁ nity mask is set, each instance has hard binding of scheduler to CPU. If you want to limit the number of CPUs but still not constrain a particular scheduler to running on a speciﬁc CPU, you can start SQL Server with trace ﬂag 8002. This lets you have CPUs mapped to an instance, but within the instance, schedulers are not bound to CPUs.

If you want to limit the CPU that a SQL instance uses on a server, use Windows System Resource Manager instead.

As always TEST, TEST and TEST your entire workload to avoid any serious performance issues if you decide to use "CPU Affinity".

Suggestion : Leave it to default - as if you get it wrong, things will be worse and troubleshooting would be more difficult !

What would happen in failover if the passive server didn't have as many processors?

performance problems can occur for the instance, due to NUMA node scheduler imbalances

References :

affinity mask Server Configuration Option
~~“Don’t Touch That Button!” Four Dangerous Settings in SQL Server~~ (video archived - so no longer available)
Changes in affinity settings of SQL Server 2008 R2
SQL Server and Soft NUMA
Setting a Processor Affinity in SQL Server – the (unwanted) Side-Effects

Best Answer

Related Solutions

Sql-server – SQL Server cluster failover

Sql-server – Is it bad to assign specific processors to SQL Server 2008 R2 cluster instances

Related Question