Sql-server – Why are the Log Ship Jobs Blocking Each Other

log-shippingsql-server-2017

I have 5 databases at a DR site all being log shipped to once a minute using the built in SQL Server log ship jobs. The server is running SQL Server 2017.

For the last week the log shipping keeps going down due to blocking, when I look at the blocked queries there are several queries all deleting from msdb.dbo.log_shipping_monitor_error_detail and they're all blocking each other…

enter image description here

If I add get_locks=1 to sp_whoisactive they all look very similar…

Session 104…

<Database name="msdb">
  <Locks>
    <Lock request_mode="S" request_status="GRANT" request_count="1" />
  </Locks>
  <Objects>
    <Object name="log_shipping_monitor_error_detail" schema_name="dbo">
      <Locks>
        <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="21" />
        <Lock resource_type="PAGE" page_type="*" index_name="nc3lsmonitor_error_detail" request_mode="U" request_status="GRANT" request_count="4" />
        <Lock resource_type="PAGE" page_type="*" index_name="nc3lsmonitor_error_detail" request_mode="U" request_status="WAIT" request_count="2" />
      </Locks>
    </Object>
  </Objects>
</Database>

Session 109…

<Database name="msdb">
  <Locks>
    <Lock request_mode="S" request_status="GRANT" request_count="1" />
  </Locks>
  <Objects>
    <Object name="log_shipping_monitor_error_detail" schema_name="dbo">
      <Locks>
        <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="21" />
        <Lock resource_type="PAGE" page_type="*" index_name="nc3lsmonitor_error_detail" request_mode="U" request_status="GRANT" request_count="3" />
        <Lock resource_type="PAGE" page_type="*" index_name="nc3lsmonitor_error_detail" request_mode="U" request_status="WAIT" request_count="2" />
      </Locks>
    </Object>
  </Objects>
</Database>

The queries never unblock and never get registered as deadlocks, at the minute when I see it happen I have to kill the connections then it goes back to working again for anything from a few minutes to a few days when it will eventually get stuck again. The table in question that its trying to delete from only has about a thousand rows in it.

At this point I'm running out of ideas.

Edit : Complete version details from @@Version are ….

Microsoft SQL Server 2017 (RTM-CU15-GDR) (KB4505225) – 14.0.3192.2
(X64) Jun 15 2019 00:45:05 Copyright (C) 2017 Microsoft
Corporation Standard Edition (64-bit) on Windows Server 2016 Standard
10.0 (Build 14393: ) (Hypervisor)

Best Answer

What I think you can try here is to change the run times of each of your log shipping jobs. If you have 5 LS backup jobs running at the same time, change to time to 1 minute/2 minutes apart, the same to the LS restores and copy jobs.