Sql-server – AlwaysON AG synchronous commit mode disconnected transaction log effect and mitigation

availability-groupssql server

I have an AlwaysON AG with one primary and one secondary replica in synchronous commit mode. The databases in the AG all have autogrowth disabled (standard policy within organisation). There is planned hardware maintenance on the server that hosts the secondary replica that will result in the server being offline for 4-8 hours. My undestanding is that when this happens, the connected state of the secondary replica will change to DISCONNECTED, the send queue of the databases in the primary replica will accumulate unsent transaction log records. For as long as the secondary replica remains offline, all of the current log records remain active (autogrowth is disabled), transaction log backups will not truncate the log with the potential of the log filling. Can someone please confirm if this is correct? Is there any way we can mitigate this, ie. ensuring there's no index maintenance during this window or switching to aynschronous commit mode?

That was my understanding, but after reading this msdn article https://msdn.microsoft.com/en-us/library/ff877931%28v=sql.110%29.aspx?f=255&MSPPError=-2147217396 I am confused. There is a note that states: "If primary's session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica. When the secondary replica reconnects with the primary replica, they resume synchronous-commit mode."

Does this pertain to my situation as described above and if so does this mean that the potential for the transaction to fill is mitigated by design? Can this temporary shift in commit mode be seen when querying sys.availabilty_replicas or via any extended event or is it a purely a under the covers change?

Best Answer

Can someone please confirm if this is correct?

Yes, you are 100% correct!

Is there any way we can mitigate this, ie. ensuring there's no index maintenance during this window or switching to aynschronous commit mode?

The mitigation is not with the mode that the replicas run it, it's removing that replica from the AG.

Does this pertain to my situation as described above and if so does this mean that the potential for the transaction to fill is mitigated by design?

Yes, that's what will happen if you change nothing. No, the transaction log will still fill up. Regardless of synchronous or asynchronous modes the primary must keep all log blocks that are required for the replica that is the furthest behind.

In your scenario, there are only two replicas a primary and a secondary. The secondary will be offline for a while. You've already correctly stated what will happen, the send queue will get larger as the replica is offline and stays that way. This happens no matter the mode.

The mitigation is to remove the replica that will be down for the extended period of time. This allows the primary to properly truncate the log with proper transaction log backups.

Once the secondary is back up, since it was kicked out of the AG all of the databases participating in that AG will be in a restoring state. To get back to a good steady state and bring the replica back into the AG, take the log backups that were run during the period the secondary replica was offline and apply them to the databases, leaving them in a restoring state (with no recovery). Once you get close, pause all transaction log backups on the primary and join the replica back into the AG. Resume the log backups on the primary.