SQL Server 14.0 – Deleting Millions of Records

deletesql serversql-server-2017

I have two tables containing 200 Millions of records each.
I have to delete from them, approximately 70 Millions of records, based on an integer value in a column.

I'm deleting them in chunks of 4000, using the following script:

DECLARE @BATCHSIZE INT, @ITERATION INT, @TOTALROWS INT, @MSG VARCHAR(500)
DECLARE @STARTTIME DATETIME, @ENDTIME DATETIME

SET NOCOUNT ON;
SET DEADLOCK_PRIORITY LOW;
SET @BATCHSIZE = 4000
SET @ITERATION = 0 
SET @TOTALROWS = 0 

WHILE @BATCHSIZE>0
BEGIN   
    SET @STARTTIME = GETDATE();

    BEGIN TRANSACTION
    DELETE TOP(@BATCHSIZE)
    FROM [mydb].[dbo].tableA 
    WHERE [mydb].[dbo].tableA.Code not IN (
            SELECT Code
            FROM [mydb].[dbo].TableB)

    SET @BATCHSIZE=@@ROWCOUNT
    SET @ITERATION=@ITERATION+1
    SET @TOTALROWS=@TOTALROWS+@BATCHSIZE
    COMMIT TRANSACTION;

    SET @ENDTIME = GETDATE();

    SET @MSG = 'Iteration: ' + CAST(@ITERATION AS VARCHAR) + ' Total deletes:' + CAST(@TOTALROWS AS VARCHAR) + '  >> ' + CAST(DATEDIFF(millisecond, @STARTTIME,@ENDTIME) AS VARCHAR)
    RAISERROR (@MSG, 0, 1) WITH NOWAIT
END

TableA contains 6 columns, 5 of integers and one NVARCHAR(64).
There is an index on the column Code, and clusterIndex on the PK.
TableB contains only one column, Code,it's a PK.

After running the script for a couple of hrs, it becames very very slow.

At the beginning each iteration was executed in 250ms, then it increase to 2 minutes after running for some hours.

The database is in simple recovery mode. It's not used by anyone, and it's running on a dedicated machine with 256GB of RAM.

I've tried to rebuild the indexes every hour, Shrink the database (not the file because my user can't) but it's always slow.

If I start to delete the records on another table it has the exact same behaviour, starting very very fast and then increase to slow down after every iteration.

How can I restore the initial conditions? What can I do to improve the delete?
I've tried to

Best Answer

For large deletes in batches, consider specifying a clustered index key range instead of using TOP so that a clustered index seek can be used in the plan. Below is an example.

DECLARE
      @BATCHSIZE INT = 4000
    , @ITERATION INT = 0
    , @TOTALROWS INT = 0
    , @MSG VARCHAR(500)
    , @STARTTIME DATETIME
    , @ENDTIME DATETIME
    , @StartValue int = 0
    , @EndValue int = 0
    , @MaxValue int = (SELECT MAX(PK) FROM [mydb].[dbo].tableA);

SET NOCOUNT ON;
SET DEADLOCK_PRIORITY LOW;

WHILE @StartValue <= @MaxValue
BEGIN

    SET @EndValue = @StartValue + @BATCHSIZE;

    SET @STARTTIME = GETDATE();

    DELETE FROM [mydb].[dbo].tableA 
    WHERE [mydb].[dbo].tableA.Code NOT IN (
            SELECT Code
            FROM [mydb].[dbo].TableB
        )
        AND [mydb].[dbo].tableA.PK >= @StartValue
        AND [mydb].[dbo].tableA.PK < @EndValue;

    SET @TOTALROWS=@TOTALROWS+@@ROWCOUNT;
    SET @ITERATION=@ITERATION+1;


    SET @ENDTIME = GETDATE();

    SET @MSG = 'Iteration: ' + CAST(@ITERATION AS VARCHAR) + ' Total deletes:' + CAST(@TOTALROWS AS VARCHAR) + '  >> ' + CAST(DATEDIFF(millisecond, @STARTTIME,@ENDTIME) AS VARCHAR);

    RAISERROR (@MSG, 0, 1) WITH NOWAIT;

    SET @StartValue = @EndValue;

END;
GO