Sql-server – Transactional Replication not sending all commands

replicationsql serversql-server-2008-r2transactional-replication

I have a transactional replication topology as follows:

NY Publication Database same server holds distributor

NY – Subscriber
LON – Subscriber
HK – Subscriber

We are using bog standard, one-way transactional replication. All machines SQL 2008 R2

The snapshot gets applied fine to all subscribers, and then starts distributing commands, as you would expect, but to the one subscriber only (London), a whole bunch 2000-4 million commands just get "stuck". You can see the commands in msrepl_commands, you can see the transactions in msrepl_transaction. Replication monitor shows that there are xxxxx undelivered commands. All commands are being delivered without issue to the other 2 subscribers.

The problem is that they just stay there. Replication monitor says that there is nothing to be replicated, and nothing happens, they just stay where they are. The only way I have found around it, it to tear down the subscription and re-initialize!?!

I am aware that there are network "blips" between NY and Lon, and I can see in msrepl_errors, that there are some communication failures. However, I would expect replication to pick them back up? The distribution agent is running fine.

How do I get around this issue, and force SQL Server to pick these commands up, and deliver them.

Best Answer

Examine the output of sp_helpsubscriptionerrors. One issue I have seen is with DDL commands; they are treated differently in replication so if manual schema changes are being made to the Publisher, they can sometimes fail. There may also be specific DML commands failing on the subscriber due to transactional inconsistencies. If this is the case, you can use the distribution agent profile called "Continue On Data Consistency Errors" which will ignore duplicate key and other errors.

Latency could be causing the problem (according to MS, latency and throughput are the most important factors in transactional replication); you can track that with sp_posttracertoken. Both the Log Reader and Distribution agents have parameters which can be tweaked to compensate for that. Try lowering these values, particularly on the Log Reader.

Distribution Agent (from MSDN):

MaxDeliveredTransactions Is the maximum number of push or pull transactions applied to Subscribers in one synchronization. A value of 0 indicates that the maximum is an infinite number of transactions. Other values can be used by Subscribers to shorten the duration of a synchronization being pulled from a Publisher.

CommitBatchThreshold Is the number of replication commands to be issued to the Subscriber before a COMMIT statement is issued. The default is 1000.

Log Reader Agent (from MSDN):

MaxCmdsInTran Specifies the maximum number of statements grouped into a transaction as the Log Reader writes commands to the distribution database. Using this parameter allows the Log Reader Agent and Distribution Agent to divide large transactions (consisting of many commands) at the Publisher into several smaller transactions when applied at the Subscriber. Specifying this parameter can reduce contention at the Distributor and reduce latency between the Publisher and Subscriber. Because the original transaction is applied in smaller units, the Subscriber can access rows of a large logical Publisher transaction prior to the end of the original transaction, breaking strict transactional atomicity. The default is 0, which preserves the transaction boundaries of the Publisher.