Sql-server – SQL Merge Replication Issue – problem where backup and maintenance job which is running till morning and most of the server jobs are stuck

merge-replicationsql-server-2008

We have a merge replication environment which includes

Publisher: Server1 where only one database of 125 GB in replication and also Distributor is also configured on same server hosted in a HyperV environment.

Subscribers: We have approx 280 servers with Distributor(push subscription) configuration.

NOTE: we have a environment with lots of DML changes in day so we perform many checks on a daily basis on every server through maintenance and backup jobs which includes step1 : check integrity for complete database step2: Rebuild/Reorganise Index step3: Backup/Verify step4: Old backup jobs clean up step5: Old job history clean up

Also this job completes in 4-5 hrs on a daily basis but it varies sometimes on server to server basis.

Snapshot Agent runs at daily at 00:05 AM

PROBLEM:

Since sunday night we are having this problem where backup and
maintenance job which is running till morning and most of the server
jobs are stuck on step1 dbcc checkdb ('my database') (and Clients are
also complaining that application is crashing , can't login , very
slow)

Upon checking, my backup spid is waiting on OLEDB wait type and we do not have configured linked server connections on Publisher and other subscribers except few which are our head office servers from where we do our major imports / small updates / if we need to push out any database changes.

While checking the below query i can see the percentage is moving for my dbcc spid id but doesn't really tell me why it is stuck on this step?

select session_id
, percent_complete 
from sys.dm_exec_requests 
where percent_complete > 0

And it doesn't really matter which time I run this job or single step i.e. dbcc checkdb, it did not complete.

Also Check with our Windows team there was no update rolled out on weekend and no changes have been made on application side.

Any suggestions of what could be the problem ?

Best Answer

I came to know the Anti virus was updated last week and past weekend all servers were rebooted. To test on one server we disabled some of the install anti virus features, and my job ran successfully in a reasonable time (4-5 hrs). Hope this work globally too. Fingers crossed!