Sql-server – AOAG Database went into NOT SYNC\SUSPECT MODE

availability-groupssql serversql server 2014transaction-log

I have SQL Server AOAG 2014 with 3 replicas. 2 Syncronous Commit with Automatic Failover and 1 (BCP) Asynchronos with Manual Failover.

It is a DEV environment and I scheduled log backups every 15 mins.

BCP Replica AOAG Database went into "NOT SNCHRONIZING\SUSPECT" mode.

When I checked the logs it says LOG DRIVE is 100 GB and it is completely occupied with this specific Database Log.

When I checked SYS.DATABASES –> Log_reuse_wait_desc — I have seen "AVAILABILITY REPLICA".

I tried to backup log of this particular database but still I cannot shrink this particular database.

Finally , I have to re-initialize the database by removing it from AOAG and then by re-adding. then backup log for this particular DB and shrink it . This fixed the issue.

What else can I do for this particular situation.

Best Answer

This situation is caused by the transaction logs not being properly hardened on one of your secondary replicas. This could happen is one of them was offline for a period of time, or was unable to connect correctly.

To help identify the replica having the problem you can query sys.dm_hadr_database_replica_states, in particular pay attention to the log_send_queue_size column.

There are some odd things that happen with AGs sometimes, particulary after a machine has been restarted, whereby one of the databases in the AG will just stop syncing (and there will be no logged errors or issues). Rather than remove it from the AG entirely, and then add it back in you can attempt to restart the hadr_endpoint on the secondary that is being the problem, and then watch to see if traffic starts flowing again.