Sql-server – Merge replication fails on executing query

My merge replication was working fine this morning. Since an hour or 2 ago, my synchronisation status windows keeps showing this message:

Thread id 3108 will wait for 15 second(s) before retrying the query on Publisher 'MSI-SQL-MAS01'.

Can anyone tell me what's going on? The database is 500Gb in size and it's also pretty heavily used. However, the replication must work, we need to keep the database at the subscriber in sync.

Also this is what I see after a minute or so:

A query executing on Publisher 'MSI-SQL-MAS01' failed because the connection was chosen as the victim in a deadlock. Please rerun the merge process if you still see this error after internal retries by the merge process.

Like I just said, the replication was working fine this morning. I also restarted the merge agent like the error suggested but with no results.

I also tried

update sysmergepublications set [generation_leveling_threshold] = 0

at both subscriber and publisher database but no luck it seems.. only getting the following error sometimes too:

The merge process is retrying a failed operation made to article 'xx' – Reason: 'The merge process was unable to synchronize the row.'.

Thanks in advance.

I don't see my former error anymore. The only thing I get now is:

"The merge process is retrying a failed operation made to article 'xx' – Reason: 'The merge process was unable to synchronize the row.'."

The profiler makes no sense to me. I don't see any relevant information.

If I reinitialize the subscription, what does it do? I hope I do not need a new snapshot for that because that would be impossible at this time. The database is over 500Gb in size and the snapshot takes over 9 hours to complete. The database is in production all the time and very heavily used, so a lock on the database to create a snapshot is really a no-go.

Alright this is what I see in the conflict viewer:

The same row was updated at both 'master_server' and 'slave_server'. The resolver chose the update from 'master_server' as the winner.

How is it possible that the same row gets updated at both master and slave server while our application is connected to the master server?
It looks like something is updating the same row at both servers with different content and made merge agent crash?

Best Answer

"It looks like something is updating the same row at both servers with different content and made merge agent crash?" This is handled by merge conflict tables, and would not cause the issues you are describing. These conflict tables are located on the publisher database, and are named like: MSMerge_conflict__.

To answer your question about what reinitialization does, by default, reinitialization will take a snapshot of your published articles, drop the articles on the subscriber side, recreate the articles on the subscriber side, and then bulk load data from the snapshot into the subscriber articles. Since this is a production environment, and those articles need to be available on the subscriber side, this should only be used as a last resort.

What you can do is query the MSrepl_errors table on the Distribution database. This will provide you with a command_id and an xact_seqno. You can use these values as inputs into the sys.sp_browsereplcmds stored procedure. This will provide you with command text that is actually failing. Using this information, you can better understand the nature of the failure. If a particular row cannot be inserted or deleted at the subscriber, you may have to either delete the existing row (to allow the insert) or insert a dummy row (to allow the delete), respectively.

I hope this information helps,

Matt

Best Answer

Related Solutions

Sql-server – How to check if subscriber replication initialization has timed out

Sql-server – Copy snapshot files from Publisher, Paste and Apply on subscriber in SQL Merge Replication

Related Question