SQL Server – Unable to Truncate Transaction Log, log_reuse_wait_desc – AVAILABILITY_REPLICA

availability-groupssql serversql server 2014transaction-logtransactional-replication

This morning i was woken up by a transaction log full alert on one of our database. This server is an alwayson cluster and also a transactional replication subscriber.
I checked log_reuse_wait_desc and it showed logbackup. Someone had accidentally disabled the logbackup jobs 4 days earlier, I re-enabled the log backup job and the log got cleared. Since it was 4am I thought I will go to office later that morning and shirnk the log as it has grown to 400GB.

10AM- Im in office and I check the log usage before shrinking and it was around 16%. I was surprised and check the log_reuse_wait_desc, which showed replication. I was confused cause this was a replication subscriber. We then saw that the db was enabled for CDC and thought that might be the cause, so disabled CDC and now the log_reuse_wait_desc shows AVAILABILITY_REPLICA.

The log usage meanwhile still steadily growing and its at 17% now. I check the alwayson dashboard and check the sent and redo queue and both are virtually zero. I am not sure why the log reuse is showing as AVAILABILITY_REPLICA and unable to clear the log.

Any idea why this is happening?

Best Answer

If you do this:

SELECT * FROM sys.databases

And the log_reuse_wait_desc shows AVAILABILITY_REPLICA, that means SQL Server is waiting to send log data to one of your Always On Availability Group replicas. One of the replicas may be lagging behind due to a slow network, or it may be down altogether.

If you check the AG dashboard and it shows no queues, you may have been a victim of thread exhaustion. It's a known issue that the AG dashboard stops updating after worker thread exhaustion. You'll need to check the status on each replica directly rather than relying on the primary. Nick's note in that Connect item says that you can just alter a replica's properties to restart replication, but that doesn't always work (especially if you have hundreds of databases on a replica with a large amount of data that needs to be sent, and restarting replication can just cause the worker thread exhaustion again.)

If the last guy set up an AG replica and it's not supposed to exist anymore, then it's time to remove that AG and/or replica. Just be careful that apps aren't pointing to the listener name in order to connect to your SQL Server.