SQL Server – Databases in Synchronizing State on Failover Server

availability-groupssql server

Current Configuration: I have two SQL Servers, which are connected by the failover mechanism, meaning one is primary and one is secondary.

I restarted both services on both servers sequentially.

On the secondary server, all the databases are in the "Synchronizing" state.

AG replication mode is synchronous. It has been 12hr it's still showing "Synchronizing" state.

What I did so far to resolve

Drop the database from the secondary server and remove the database from AG group on the primary server and add it back.

Any insight on this, I am using SQL Server 2017.

Best Answer

Firstly, if you could drop the database from any one of replica, it means the database is not yet part of any of availability groups, before dropping the database it must be removed from availability group first.

Assuming that your current scenario has 2 nodes in 1 cluster, I would recommend first to add witness (preferred File-share from different server other than 2 nodes) in WSFC to ensure the quorum healthy, so that while you restarting 1 server there would be 2 votes available (1 Node, 1 File-share witness), and the WSFC could smoothly fail-over the roles (Availability groups) from one to another node.

For the synchronizing issue, please do the following

  1. Make sure the network connection is stable between 2 nodes especially with endpoint port (Default: 5022) which you can identity from any availability group properties right click on AG Name -> Properties -> look at availability Replicas section (bottom) example: TCP://<IP ADDRESS>:5022. Ensure the failover mode is Automatic enter image description here

  2. Try Suspend data movement and Resume, you can do this for any one of the database that is already part of availability group enter image description here

  3. If step 2 completed without any error, verify the hardening of LOG by REDO worker is normal which you can identity with following query, result can describe with the delay between LAST SENT (Primary) - LAST RECEIVED (Secondary)- LAST HARDENED (Secondary) which is not available in AG Dashboard

select  db.name, 
        db.database_id, 
        ag.name as GroupName, 
        state_desc, 
        recovery_model_desc, 
        log_reuse_wait_desc,
        AGDB.truncation_lsn,
        Rep.replica_server_name,
        rep.endpoint_url,
        DBRepStats.is_primary_replica,
        DBRepStats.synchronization_health_desc,
        DBRepStats.database_state_desc,
        (redo_queue_size / 1024.0) as redo_queue_size_MB,
        last_redone_time,
        last_redone_lsn,
        DBRepStats.end_of_log_lsn,
        DBRepStats.last_sent_lsn,
        DBRepStats.last_sent_time,
        DBRepStats.last_received_lsn,
        DBRepStats.last_received_time,
        DBRepStats.last_hardened_lsn,
        DBRepStats.last_hardened_time
from sys.databases as db
    left outer join sys.availability_databases_cluster as AGDB on db.group_database_id = AGDB.group_database_id
    left outer join sys.dm_hadr_database_replica_states as DBRepStats on db.group_database_id = DBRepStats.group_database_id
    left outer join sys.availability_replicas as Rep on DBRepStats.group_id = Rep.group_id and DBRepStats.replica_id = Rep.replica_id 
    left outer join sys.availability_groups as AG on DBRepStats.group_id = AG.group_id
where db.database_id > 4
go
  1. If no luck with above steps, have a look at following sections
    1. "AlwaysOn_health" from Management -> Extended Events -> Sessions (via SSMS)
    2. SQL Error Log
    3. WSFC Log