Sql-server – Automatic Failover on SQL Server Always On Availability Groups is not triggered when primary replica becomes offline

availability-groupsclusteringsql server

I have set up a 2-node Always On Availability Group. Both replicas are in synchronous mode with automatic failover. If I perform a manual failover, everything works as expected: Primary and Secondary servers get switched, there is no data loss and Listener works as expected.

However, if I suddenly shut down the primary server (for testing purposes), the secondary does not switch to be primary. Primary replica gets stuck at "Resolving" status and the listener is not reachable until the primary server is back online.

In short, the availability group is working fine in manual failover, but not in automatic failover. It seems like if the cluster was not aware that one of the nodes is no longer available.

Am I supposed to perform any particular configuration to enable something like "unavailability detection"? I think this is a built-in behavior.

I ran a Cluster Validation, and the following warnings were reported:

  • The cluster is not configured with a quorum witness. As a best practice, configure a quorum witness to help achieve the highest availability of the cluster. (Should I really change quorum settings through Failover Manager?)
  • This resource does not have all the nodes of the cluster listed as Possible Owners. The clustered role that this resource is a member of will not be able to start on any node that is not listed as a Possible Owner. (I read that changing this through Failover Manager is not recommended, so I didn't do it)
  • Node NODE01 is reachable from Node NODE02 by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster. (Is this talking about adding redundancy cards?)

Best Answer

Add a quorum configuration in the failover cluster manager. In order for the cluster to continue to be online a majority of nodes need to be online, you have only two nodes. If one goes down, only one is online and that’s not majority. If you add a witness like shared drive which needs to be stored off the nodes, then if one node goes down, majority two are still online so cluster will remain online.