Sql-server – High tempdb disk I/O during merge replication of BLOBs

merge-replicationperformancesql server

Having a merge publication for replicating BLOBs (type image), got very high tempdb disk I/O for my size of data. Publication is download-only and has no filters.

High disk I/O is caused by synchronization (when no subscribers are synchronizing, everything is ok), strongly correlated with number of subscribers. It happens even when no data is changed at Publisher between synchronizations, and that bothers me.

  • Size of replicated table: 7MB (total count of rows is about 100)
  • tempdb I/O : ~30 MB/sec for write (log and data files)
  • Number of subscribers: slightly over 100, each synchronizing every 30
    minutes (more or less evenly).
  • Retention period set to 14 days

Using SQL Server 2008 at Publisher, SQL Server 2005-2008R2 at Subscribers.
All subscribers use Web Synchronization.

Additionally, synchronization at subscriber takes a lot of time, with multiple occurrences in replmerg.log like these:

DatabaseReconciler, 2015/04/21 12:13:40.348, 3604, 25088,  S2,  
INFO: [WEBSYNC_PROTOCOL]  
Sending client ReconcilerPhase WebSyncReconcilerPhase_RegularDownload     

DatabaseReconciler, 2015/04/21 12:13:47.063, 3604, 25194,  S2,  
INFO: [WEBSYNC_PROTOCOL]  
Received server ReconcilerPhase WebSyncReconcilerPhase_LastRegularDownload

Tried setting @stream_blob_columns on and off with no effect.

The question is: Is it a good idea to use merge replication to send these blobs to subscribers? We have other publications (though they have no BLOB columns) with a lot of data without tempdb problem. Is it an SQL Server flaw, or bad setup?

Publisher and Distributor are on the same instance, SQL Server 2008 SP4, cannot be sure about Subscribers, some of them maybe not up-to-date. Anyway, I can prepare a test environment with few subscribers having controlled versions, if it seems to help.

Confirmed, that excessive tempdb usage caused by execution of sys.sp_MSenumgenerations90. Checked MSMerge_genhistory table, found over 1.2 millions of records where pubid is null.

Found this conversation with replication guru:

Executed sp_mergemetadataretentioncleanup with no effect.

Found a remark on a case like this (too much rows in MSmerge_genhistory) where deleting rows where pubid is null and genstatus=1 helped to fix replication.

Deleted rows where pubid is null (implying that all Subscribers are synchronized, and which are not – will be reinitialized in "manual mode") and disk IO is back to normal again!

I have a feeling, that this situation could be caused by the fact, that all of my Subscribers are anonymous via WebSync and most of them have the same hostname. I'll try to check, if -hostname key helps not to multiply records in MSmerge_genhistory.

Best Answer

There is a TechNet blog that discusses some merge replication issues with blobs on a SQL Server 2008 or higher server.

http://blogs.technet.com/b/claudia_silva/archive/2011/10/31/replication-watch-out-for-stream-blob-columns-when-setting-up-replication-on-your-sql-2008-server.aspx

Note that the author cautions about which settings to use when there are SQL Server 2005 clients, such as you have.