Sql-server – Data Loss when forced failed over to secondary replica on Sql Server 2014

availability-groupsfailoversql server 2014

Setup:

  • 2 Availability Groups
  • 2 Replicas on Synchronous Commits.
  • 1 Replica on Asynchronous Commit.
  • Automatic Failover configured for the two replicas on Synchronous Commits.
  • Manual Failover configured for the single replica on Asynchronous Commit.

Problem:

  • ESX Server for the primary replica went down.
  • Automatic failover didn't happen.
  • Manual forced failover was performed using the command:

    ALTER AVAILABILITY GROUP AccountsAG FORCE_FAILOVER_ALLOW_DATA_LOSS;

  • This resulted in a lot of data loss.

Question:

  • What am I missing here ?
  • Why did the data loss happen? My understanding is that since two replicas was on synchronous commit, there would be no data loss when either one of them goes down.
  • Is it the data in the Log transactions that was lost. Would periodic log shrink prevented this data loss ?

Best Answer

What am I missing here

You should first read How It Works: Always On–When Is My Secondary Failover Ready?. This would tell you about scenarios when your secondary is not failover ready and what could lead to "no automatic" failover

Why did the data loss happen? My understanding is that since two replicas was on synchronous commit, there would be no data loss when either one of them goes down

The data loss happened because you forced failover over to replica which was not synchronized. Look at the word "FORCE_FAILOVER_ALLOW_DATA_LOSS" it says itself that if you use me you "might" face data loss. What this command does is this beings online database discarding any transaction log blocks which originated on primary but somehow did not made on secondary and since secondary has no information about it it cannot replay it and hence you might face data loss when database comes online.

Is it the data in the Log transactions that was lost. Would periodic log shrink prevented this data loss ?

Let us call it transaction log block not data. This log block was on primary but due to unexpected shutdown it was not delivered to secondary and secondary was not synchronized. If you are able to bring primary online after some time without doing any forced failover you might have been able to save data loss but since you forced failover it is not possible. May be later if you bring primary replica online and the information is there in transaction log you can get it in primary replica. Shrinking has NO business here at all.