Sql-server – Temporary Resolving State on Synchronous HA, is there a way around this

availability-groupssql server

So we're having an issue whereby any loss of communication between our 2 HA nodes causes the primary node to temporarily go into a 'resolving' state. This is for all of 1-3 seconds however we serve a DB to an application which doesn't retry connection and requires users to close and reopen meaning they lose work in progress.

This cluster is comprised of 2 nodes, one active and one non-readable secondary with a fileshare witness. All of our other applications don't mind this temporary blip however as mentioned this has a huge visible impact on other users. Is this a given when running HA in synchronous commit?

My concern is also when patching the secondary, I will be kicking users off when the secondary goes down. Would switching to async remove this constraint or is this an issue with how it has been configured?

Best Answer

It's difficult to answer the first part of your question as it seems to be related to the network and/or cluster configuration and not related to the availability mode. You should take a good luck at your cluster logs to see what is the root cause.

1. Checking cluster status a. Login to failover cluster manager console b. Click on the cluster name c. Check the summary details for errors d. Click on networks under the cluster name e. Check the summary and network connections tab at bottom of the window for errors f. Since this appears to be a 2 node cluster w/file share witness supporting AlwaysON, make sure each node and the FS has a vote in the cluster.

2. Checking for file share witness errors a. Confirm the share is online and available to the network

3. Obtaining cluster logs a. Login to a cluster node as an Administrator b. Examine both the System and Application event logs. c. Use powershell commandlet: get-clusterlog -uselocaltime ***this will save a cluster.txt file to C:\Windows\Cluster\Reports examine the log

As to your second question regarding patching, I assume that the nodes and the FS each have a vote. If the secondary node is getting patched (windows patching or SQL SP/CU), the AG on the primary node will be available provided quorum is maintained between the primary node and the file share witness. If quorum is not maintained, the AG will shutdown in order to protect the databases. The Availability mode is not a factor.