Sql-server – How to prevent SQL Server Job History deadlocks

deadlockjobssql serversql-server-2012

Out of the ~50 instances I currently manage, I have two that each have one deadlock almost every night involving "INSERT INTO msdb.dbo.sysjobhistory" (always the victim) and "DELETE FROM msdb.dbo.sysjobhistory" called by "EXEC msdb.dbo.sp_purge_jobhistory", both sessions run by the SQL Server Agent service account. I have never had such a deadlock on any of the other instances, nor at any previous jobs. One each of these servers, the deadlocks occur at different times during the night (1:30 Saturdays run by a weekly schedule for sp_purge_jobhistory or 2:01 run by a step in the nightly schedule for syspolicy_purge_history).

There doesn't seem to be anything significantly different between these two "problem" instances and any of the others. One is a 2012 SP3 CU3 test server and the other is 2012 SP2 production instance.

Any ideas as to how I can prevent this?

Best Answer

I wrote this (a much simplified version of the code in @Taiob's blog post he linked to), and then call it from a job instead of the Microsoft-provided sp_purge_jobhistory:

--  Create Procedure PurgeJobHistory.sql

USE master;
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

IF NOT EXISTS (SELECT 1 FROM sys.procedures p JOIN sys.schemas s ON p.schema_id = s.schema_id WHERE s.name = N'dbo' AND p.name = N'PurgeJobHistory') BEGIN
   EXEC('CREATE PROCEDURE [dbo].[PurgeJobHistory] AS BEGIN SET NOCOUNT ON; END');
END;
GO

ALTER PROCEDURE [dbo].[PurgeJobHistory]
    @numberOfDaysToKeepSucceeded    SMALLINT    =   7
    ,@numberOfDaysToKeepOther       SMALLINT    =   30
AS
/*
Purpose:        Purges old rows from msdb.dbo.sysjobhistory, providing for different retention for success vs. other statuses
Author:         Mark Freeman (@m60freeman)
Last Edited:    2017-03-29
Compatibility:  SQL Server 2008 and newer
Adapted from:   http://sqlworldwide.com/manage-sql-agent-job-history/
License:        This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
                    http://creativecommons.org/licenses/by-nc-sa/3.0/
*/
SET NOCOUNT ON;  
SET DEADLOCK_PRIORITY LOW;

DECLARE @LastDateToRetainSucceeded  INT
        ,@LastDateToRetainOther     INT
        ,@RowsDeleted               INT
        ,@RowCount                  INT;

--Get the actual date of the most recent rows that we wish to maintain (converted match sysjobhistory.run_date)
SELECT  @LastDateToRetainSucceeded  = CONVERT(INT, CONVERT(VARCHAR(200), (GETDATE() - @numberOfDaysToKeepSucceeded), 112))
        ,@LastDateToRetainOther     = CONVERT(INT, CONVERT(VARCHAR(200), (GETDATE() - @numberOfDaysToKeepOther), 112))
        ,@RowCount                  = COUNT(1)
  FROM  msdb.dbo.sysjobhistory sjh WITH (NOLOCK);

RAISERROR('Starting rows in history: %i', 0, 1, @RowCount) WITH NOWAIT;

--DELETE sysjobhistory rows 1,000 at a time in a loop until we have nothing left to delete
SET @RowsDeleted = -1;

WHILE @RowsDeleted != 0 BEGIN
    --Handle rows with a Succeeded status
    BEGIN TRAN;
        DELETE  TOP (1000)
          FROM  msdb.dbo.sysjobhistory WITH (READPAST)      --Ignore locked rows
         WHERE  run_status  =   1                           --Succeeded
           AND  run_date    <   @LastDateToRetainSucceeded;

        SELECT  @RowsDeleted = @@ROWCOUNT;
        RAISERROR('Succeeded rows deleted: %i', 0, 1, @RowsDeleted) WITH NOWAIT;
    COMMIT;

    WAITFOR DELAY '00:00:01';
END;

SET @RowsDeleted = 1;

WHILE @RowsDeleted != 0 BEGIN
    --Handle rows with all other status values
    BEGIN TRAN;
        DELETE  TOP (1000)
          FROM  msdb.dbo.sysjobhistory WITH (READPAST)      --Ignore locked rows
         WHERE  run_status  IN  (0, 2, 3)                   --Failed, Retry, and Canceled are all treated as failures.
           AND  run_date    <   @LastDateToRetainOther;

        SELECT  @RowsDeleted = @@ROWCOUNT;
        RAISERROR('Other rows deleted:     %i', 0, 1, @RowsDeleted) WITH NOWAIT;
    COMMIT;

    WAITFOR DELAY '00:00:01';
END;

SELECT  @RowCount   = COUNT(1)
  FROM  msdb.dbo.sysjobhistory sjh WITH (NOLOCK);

RAISERROR('Ending rows in history: %i', 0, 1, @RowCount) WITH NOWAIT;
GO

It cut us down from having over 1 million rows in msdb.dbo.sysjobhistory to only about 1/4 as many, and the deletes are no longer involved in deadlocks.

Thanks to @Taiob and @ScottHodgin for their input!