Sql-server – Testing WSFC failover puts secondary replica in RESOLVING mode

availability-groupsclusteringfailoversql server

I have 2 locations and each has its own WSFC.

  1. Cluster 1: WSFC,Datacenter1, Node1, Node2. Listener1

  2. Cluster 2: WSFC, Datacenter2, Node3, Node4. Listener2

Created an Availability group as EXTERNAL cluster type.

screenshot of AG options in SSMS with "EXTERNAL" selected as the cluster type, which is circled

When simulating Failover, by switching nodes, puts database in RESOLVING mode.

screenshot of SSMS object explorer showing the DB in question is not synchronizing, and one of the AG replicas is resolving

Looking into SQL Error Log on primary replica:

The state of the local availability replica in availability group 'AG_AW2017' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The state changed because the local instance of SQL Server is starting up. For more information, see the SQL Server error log or cluster log. If this is a Windows Server Failover Clustering (WSFC) availability group, you can also see the WSFC management console.

Looking into SQL Error Log on secondary replica:

A connection timeout has occurred on a previously established connection to availability replica 'Primary' with id [1B93C7DC-75E3-4180-9748-AC1B662781A9]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role

If I try to use WSFC as cluster type, then I will not be able to add Replica. I will get an error saying:

Cannot connect to Listener2. The specified instance of SQL Server is not part of the same Windows Server Failover Cluster (WSFC) as the primary node.

Spent quite bit of time on it, but still couldn't figure out what exactly the problem here.

Best Answer

From the Microsoft Docs for CREATE AVAILABILITY GROUP:

CLUSTER_TYPE

Introduced in SQL Server 2017. Used to identify if the availability group is on a Windows Server Failover Cluster (WSFC). Set to WSFC when availability group is on a failover cluster instance on a Windows Server failover cluster. Set to EXTERNAL when the cluster is managed by a cluster manager that is not a Windows Server failover cluster, like Linux Pacemaker. Set to NONE when availability group not using WSFC for cluster coordination. For example, when an availability group includes Linux servers with no cluster manager.

Since you mention you're using Windows Server Failover Clustering, the implication is that you are using Windows, and should not have the "Cluster Type" set to EXTERNAL. It should be set to WSFC.