SQL Server 2016 – Distributed Availability Group Direct Seeding FAILED, SQL Error

availability-groupsdistributed-availability-groupssql-server-2016

We just started setting up Distributed Availability groups to replicate our production databases into a new reporting cluster. The first availability group that we setup for replication worked great without any issues, however when we then moved on to the second availability group with much larger databases (over 3TB total) it took much longer and two of the 5 databases failed. We setup the distributed availability group to use direct seeding and when querying the sys.dm_hadr_automatic_seeding table it indicates the current_state as FAILED, with failure_state 2 (SQL Error) or 21 (Seeding Check Message Timeout):

What can we do to troubleshoot this issue?

Best Answer

The AlwaysOn Professional blog has some general troubleshooting steps for direct seeding and also includes some details about trace flag 9567 to enable compression during seeding, but I didn't find any details about the SQL Error or Seeding Timeout.

We previously have had issues with large databases causing problems in availability groups, but this usually is resolved by applying the latest transaction logs from the primary against the replica.

In this case the databases were listed on the secondary availability group as recovering, so I tried applying the latest transaction log backups from the primary and then joining the database to the secondary availability group:

--Restore transaction logs from primary and stay in recovery mode. Multiple backup files may need to be restored from oldest to newest.
RESTORE LOG stackoverflow from disk = '\\Backups\SQL\_Trans\StackOverflow_AG\StackOverflow\StackOverflow_LOG_20170810_175400.trn' WITH NORECOVERY;

ALTER DATABASE stackoverflow SET HADR AVAILABILITY GROUP = [StackOverflow_RAG];
ALTER DATABASE stackoverflow SET HADR RESUME;

This worked for both of the failed databases and fixed the replication issues. Our reporting cluster now has all databases kept in sync from the primary availability group:

Related Solutions

SQL Server – Object Auditing with Availability Groups

No, auditing only reflects the one server that is running the audits. If you want to centralize audit results from multiple servers, you'll need to combine that data yourself.

Keep in mind that if someone really wanted to circumvent this, they'd simply add another replica on any server they want, do the querying they need, and then destroy the replica. An audit on the primary would only capture the fact that a replica was added (and even then, only if you're auditing for those events.)

SQL Server – Availability Group Database Stuck in Not Synchronizing Mode

Since the server had been offline for a while we thought it may have gone outside the recovery window of the primary. We decided to try applying the latest transaction logs on the database to see if that would kick-start the recovery process:

-- Remove database from Availability Group:    
Alter Database [StackExchange.Bicycles.Meta] SET HADR OFF;

-- Apply t-logs to catch up. This can be done manually in SSMS or via:
RESTORE LOG [StackExchange.Bicycles.Meta] FROM DISK = '\\ny-back01\backups\SQL\_Trans\SENetwork_AG\StackExchange.Bicycles.Meta\StackExchange.Bicycles.Meta_LOG_20160217_033201.trn' WITH NORECOVERY;

-- Re-join database to availability group
ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR AVAILABILITY GROUP = [SENetwork_AG];
ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR RESUME;

Afer running the above on the secondary server for both databases they were able to start synchronizing again.

UPDATE: We had a similar issue where after a Manual AG Failover one of the databases on the new primary replica was stuck in Not Synchronizing mode (switched to Not Synchronizing / Recovery Pending after restarting SQL Server), and the above steps worked to resolve that issue as well.

Best Answer

Related Solutions

SQL Server – Object Auditing with Availability Groups

SQL Server – Availability Group Database Stuck in Not Synchronizing Mode

Related Question