SQL Server 2012 – Automatic Failover in Availability Group

availability-groupsfailoversql serversql-server-2012

I have 2 nodes in my SQL server fail over cluster. I use SQL Server 2012 Availability Groups for HA purpose.

The problem is that when one my server goes down the automatic fail over does not happen!

How can I enable automatic fail over?

Best Answer

If you want to enable automatic failover, you will need to ensure that you have an "automatic failover set", to include two replicas (in your case, that'd be your two nodes) both set with synchronous commit, as well as automatic failover.

When you setup the Availability Group, this can be accomplished through the GUI, but if you had this scripted out, it would appear in the CREATE AVAILABILITY GROUP command. If you take a look at the syntax and possible parameters of CREATE AVAILABILITY GROUP, you would simply see the opportunity to set AVAILABILITY_MODE = SYNCHRONOUS_COMMIT as well as set FAILOVER_MODE = AUTOMATIC. This is essentially the steps to setup automatic failover.

Likewise, if this is already setup you can modify these replica settings with ALTER AVAILABILITY GROUP ... MODIFY REPLICA ON ... and specify the same above parameters.

Before continuing on, though, with the above modifications it is definitely going to be worth verifying what you already have in place. A quick query that will tell you what your commit mode and failover mode is for your current configuration is below:

select
    ar.replica_server_name,
    availability_group_name = ag.name,
    ar.availability_mode_desc,
    ar.failover_mode_desc
from sys.availability_replicas ar
inner join sys.availability_groups ag
on ar.group_id = ag.group_id
order by availability_group_name, replica_server_name;

Feel free to edit your question and put the output of the above query for review.

EDIT: In order to address your comment, it is worth pointing out that to have automatic failover you must be utilizing synchronous commit.

Related Solutions

Sql-server – SQL Server 2012, Network for: AlwaysOn Availability Group with AlwaysOn AlwaysOn Failover Cluster Instances

Yes AG support multiple subnets as described here. Also make sure that your data provider supports MultiSubnetFailover .. .NET Framework 4 supports it.

To answer your question ...

IF you use .NET framework 4 or 3.5 then the provider will support it as described here.

Also, a good reference to SQL Server Multi-Subnet Clustering is well documented.

With legacy client libraries or third party data providers, you cannot use the MultiSubnetFailover parameter in your connection string. To help ensure that your client application works optimally with multi-subnet FCI in SQL Server 2012, try to adjust the connection timeout in the client connection string by 21 seconds for each additional IP address. This ensures that the client’s reconnection attempt does not timeout before it is able to cycle through all IP addresses in your multi-subnet FCI.

SQL Server 2012 – Fix AlwaysOn Availability Group Automatic Failover

If I disconnect DEV-AWEB5

Define "disconnect", if you will. My guess is you kept the box up but took SQL Server down.

I cannot connect to the Group Listener (DevListener), but I can ping it and it will respond to my ping

That's because the listener is just a virtual network name (VNN) within the WSFC cluster resource group for the represented availability group. Your DEV_AWEB5 node still owns the cluster resource group, but it's just the AG cluster resource most likely that is in a failed state. The VNN must still be online (expected behavior). It's simply pointing to whatever node is owning that resource group (in this case, DEV-AWEB5). In fact, if you had PowerShell remoting enabled, and you ran the following:

Invoke-Command -ComputerName "YourListenerName" -ScriptBlock { $env:computername }

Likewise, if you can RDP into DEV-AWEB5 (provided you have the capability and accessibility, etc.) then you'd be able to RDP using the listener name (mstsc /v:YourListenerName). It's just a VNN.

The return of that would be the computer name of your owning node.

By all of your symptoms, I'd be willing to bet that you've reached your failover threshold. The failover threshold determines how many times the cluster will attempt to failover your resource group in a specified time period. The default of these values max failovers n - 1 (where n is the count of nodes) in a period of 6 hours. You can see that through the following WSFC PowerShell command:

Get-ClusterGroup -Name "YourAgName" |
    Select-Object Name, FailoverThreshold, FailoverPeriod

That just gives you the settings (which you can modify if you so choose, of course).

The best way to prove that this is the case for you, you would need to generate the cluster log (the system event logs only go into detail as far as " has failed", or something like that).

Get-ClusterLog -Node "YourClusterNode" -TimeSpan <amount_of_minutes_since_failure>

That'll by default get put into the "C:\Windows\Cluster\Reports" folder, and the file is called "Cluster.log".

If you were to open up that cluster log, you should be able to find the following string in there, indicating exactly what happened and why it happened:

Not failing over group [YourClusterGroupName], failoverCount [# of failovers], failover threshold [failover threshold value], nodeAvailCount [node available count].

The above message is simply WSFC telling you that it will not failover your group because it's happened too much (you hit the threshold).

Why does this happen? Simply to prevent the Ping-Pong effect of cluster resources going back and forth too frequently between nodes.

Whereas this would be common to hit these thresholds in failover testing, in production it would typically point to a problem that should be investigated.

Best Answer

Related Solutions

Sql-server – SQL Server 2012, Network for: AlwaysOn Availability Group with AlwaysOn AlwaysOn Failover Cluster Instances

SQL Server 2012 – Fix AlwaysOn Availability Group Automatic Failover

Related Question