Sql-server – Replication monitoring refresher for distribution job affecting performance

replicationsql-server-2005sql-server-agenttransactional-replication

I have a performance problem due to a sql agent job executing dbo.sp_replmonitorrefreshjob.

The environment is a 2 node Cluster with SQL Server 2005 SP2 and with
transaction replication. The OS is Windows Server 2003 R2.

This job runs a stored procedure on distribution database and Application Team experienced total Integration Inbound & Outbound transaction failure due to multiple queries being blocked on the Database. Upon looking deeper into the queries running on the database the following query was identified to have been running for over 9 Hours and when this job(exec dbo.sp_replmonitorrefreshjob) is killed, the issue goes away.
This job was setup to set with 2147 retry attempts and retry interval 1 so it is pretty much running 24/7.

Some of the code for this Stored Procedure is

ALTER procedure [sys].[sp_replmonitorrefreshjob] 
    (
    @iterations tinyint = 0 -- 0 - run continuously, non 0 - run for specified iterations
    ,@profile bit = 0 -- for internal use - DO NOT DOCUMENT (remove this before release)
     )

Is there a way to improve this job and how often does it need to run. If we can get away with this job and if there is another option let me know.
Upgrade is not an option to different version or even service pack is not a question.

Best Answer

Generally, replication monitor doesn't do a very good job. Have a look at this script by Robert Davis that measures latency:

http://www.sqlsoldier.com/wp/sqlserver/measuringtransactionalreplicationlatencywithouttracertokens