Sql-server – AlwaysOn failover DR testing

availability-groupsfailoversql-server-2016

I have always-on group running on 4 node cluster. Each DC has 2 node running on it

  • Data Center1:
    • Node1: Sync Replica – Automatic failover
    • Node2: Sync Replica – Automatic failover
  • Data Center2:
    • Node3: Sync Replica – Manual failover
    • Node4: Async Replica – Manual failover

While DR testing, we brought down nodes running on DC1 and I tried to bring Always-ON group on node3 (sync replica), however when I tried to failover, it gives a message that there will be data loss. Would there be any data loss if I failover manually to node3 even though it set as sync replica?

Best Answer

While DR testing, we brought down nodes running on DC1

Bringing down the primary server normally will cause an automatic failover to a sync AG. However, you also brought down the only other sync secondary configured for automatic failover. This puts the AG in a situation where there is no primary, which requires forced manual failover. From Perform a Forced Manual Failover of an Always On Availability Group (SQL Server):

If the primary replica becomes unavailable when the WSFC cluster has a healthy quorum, you can force failover (with possible data loss), to any replica whose role is in the SECONDARY or RESOLVING state. If possible, force failover to a synchronous-commit secondary replica that was synchronized when the primary replica was lost.

A forced manual failover will always warn of possible data loss. You need to take the steps outlined in that article to make sure you are avoiding data loss. See the section Potential Ways to Avoid Data Loss After Quorum is Forced.

You're in the "If a synchronized synchronous-commit secondary replica comes online" situation. You'll want to check the is_failover_ready column (in the sys.dm_hadr_database_replica_cluster_states DMV) for each database that's part of this AG. If they are all 1, then you can force a failover without data loss.


Another possibility is that, since node3 is in the DR data center (DC2), one or more of the databases in that availability group (AG) is not in the SYNCHRONIZED state.

Even though node3 is configured as a sync replica, it can fall behind the primary and then be considered not ready for failover. You should be able to observe this in the AG dashboard, specifically the "Failover Readiness" column in the list of databases. They should all say "No Data Loss" in before you attempt a failover.

Screenshot of AG dashboard showing the failover readiness column

You'll need to wait on node3 to "catch up" (synchronize) with the primary before initiating the failover.