Sql-server – Very slow merge replication synchronization

data synchronizationmerge-replicationperformancesql server

I had an unforeseen issue with a sub going down (generic error, was probably a connection issue) and now that I've reconnected the sub it is insanely slow re-syncing. It seems to download about 1000 changes then sit there for about an hour before adding another 1000.

sub snapshot

Any ideas why this is going so slowly? This is just one table of dozens and I feel like this won't catch up before Monday (when I have users meant to be coming online).

Any help is appreciated!
Thanks!

UPDATE

It's been running for over 24 hours now and is slowly chewing through the records but at a still very slow pace. It now toggles between the shot above and 'Enumerating changes in filtered articles using Subscriber's assigned partition ID'. Yes, there is a filter in place but it's been there since I first set up replication and has never caused an issue before.

SECOND UPDATE

When I go into Replication Monitor there is an error:

The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

But what is strange is that there is zero system activity and I have another subscriber on this publication working as well a secondary replication from the same publisher to the same subscriber that is working perfectly. I've tried stopping and restarting, taking and applying a fresh snapshot but nothing seems to work. There are no blocked processes, no active queries that are deadlocking or anything else I can think to check.

To be clear:

  • Publication A => Sub 1: WORKING
  • Publication A => Sub 2: NOT WORKING
  • Publication B => Sub 1: WORKING
  • Publication B => Sub 2: WORKING

If anyone has any help, I'd much appreciate it.

Best Answer

There are many factors that can affect replication. Some common factors that I've come across are related to slow networking, blocking and storage issues. But the one that may be a factor is the performance problem with virtual log files. If you are initializing a database with a large amount of data and the default growth factors are in place, then sql server will grow the data 1mb at a time and the log 10 percent at a time. To check for the VLF issue, run dbcc loginfo. If that command returns over 100 records, I would be concerned. If it returns thousands of records, then it will have a real impact on performance. There are plenty of articles written on this subject. The basic fix is to adjust the autogrowth settings on all data and log files to reasonable sizes, then shrink the log file and initialize it back to the original size. I would check all of the databases, including the distribution database. There could also be other reasons why the distribution database is slow to distribute the transactions. I've experienced this on several transactional replication systems in the past and was suprised at the impact of correcting this every time.

I would also suspect locking at the publisher. Identify the spids involved in the replication and use the dynamic management view sys.dm_os_waiting_tasks to determine which waits are involved in these sessions. This will help identify what these subscriptions are waiting on.

Related Question