Sql-server – Distributor latency – slow sp_MSget_repl_commands performance

replicationsql serversql-server-2008-r2sql-server-2012transactional-replication

One of our publishers is subscribed to by three subscribers, each subscriber is hosted on its own physical server. Two of those subscribers are 2008R2 and the third 2012. The 2012 subscriber is experiencing latency daily during peak times.

After running some extended event tracing for a couple of days I can see that the latency co-insides with slow sp_MSget_repl_commands execution times at the distributor, and more specifically the final select query found at line 724 "– Mini join along. Only agent_id and article_id columns in dbo.MSsubscriptions". The execution time for this statement can run into >= 20 minutes.

Does the distributor reader thread wait for the entire execution of sp_MSget_repl_commands to complete before starting to write replicated commands to the command buffers? Or does it start writing replicated commands to the command buffers as soon as they are returned by the SP? I see that the problematic statement is hinted with OPTION (FAST 1) which might suggest that this is the case. If the reader thread does wait for sp_MSget_repl_commands to complete before starting to write replicated commands to the command buffer then that might explain the latency, otherwise I am unsure. Is there anything else obvious I can monitor?

The MSdistribution_history.comments suggest that the holdup is at the subscriber. However no replicated subscriber stored procedure are completing in >= 100 ms and there is no obvious physical server resource contention. I’ve ruled out the network because there are also other subscriber databases on the same physical server being populated across the same network which are not experiencing latency.

Best Answer

To answer my own question to an extent. Commands appear to be written to the command buffer as soon as sp_MSget_repl_commands makes them available. If the writer thread is unable to write the command to the subscriber then sp_MSget_repl_commands spins until the command has been written. Once written the writer notifies the reader and frees one of the two command buffers for the next batch of commands. This explains the long execution of sp_MSget_repl_commands.