Sql-server – Optimize delete query in SQL Server 2008 R2 SP1

performancequery-performancesql serversql-server-2008-r2

I have a delete query which takes around 6 mins to complete. I am looking for ways to optimise it and if its duration can be shortened. Below is its anonymized (due to security constraints) estimated execution plan.

Execution plan:

enter image description here

Query:

DELETE TOP(10000) Object1
   FROM Object1
   LEFT JOIN Object2
   ON 
    Object1.TID = Object2.TID
   WHERE 
    Object2.TID IS NULL

Record count details:

  1. sp_spaceused Object1 — 124164707
  2. sp_spaceused Object2 — 27799877

Index details:

1) Object1

  • CL index on TID(-) DESC, TIndex(-) DESC
  • No other indexes apart from above index

2) Object2

  • CL index on TID(-) DESC
  • NC index [NodeID] ASC
  • NC index [DateTime] ASC
  • NC index [TimeStamp] ASC
  • NC index [TType] ASC

SQL Server Environment details:

  • SQL Server 2008 R2 SP1
  • Edition: Enterprise
  • Version: 10.50.2500.0

DDL:

--Object1
CREATE TABLE [dbo].[Object1](
    [TID] [bigint] NOT NULL,
    [TIndex] [tinyint] NOT NULL,
    [OID] [varchar](1000) NOT NULL,
    [OIDName] [varchar](100) NOT NULL,
    [OIDValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_OIDValue_ANSI]  DEFAULT (''),
    [RawValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_RawValue_ANSI]  DEFAULT (''),
    [OIDValue_Unicode] [nvarchar](1000) NULL,
    [RawValue_Unicode] [nvarchar](1000) NULL,
    [OIDValue]  AS (isnull([OIDValue_Unicode],[OIDValue_ANSI])),
    [RawValue]  AS (isnull([RawValue_Unicode],[RawValue_ANSI])),
 CONSTRAINT [PK_Object1] PRIMARY KEY CLUSTERED 
(
    [TID] DESC,
    [TIndex] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

--Object2
    CREATE TABLE [dbo].[Object2](
        [TID] [bigint] IDENTITY(97196339,1) NOT NULL,
        [EID] [int] NOT NULL,
        [DateTime] [datetime] NOT NULL CONSTRAINT [DF_Object2_RecivedDateTime]  DEFAULT (getdate()),
        [IP] [varchar](50) NOT NULL CONSTRAINT [DF_Object2_IP]  DEFAULT ('0.0.0.0'),
        [C_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_C_ANSI]  DEFAULT (''),
        [C_Unicode] [nvarchar](255) NULL,
        [Community]  AS (isnull([Community_Unicode],[Community_ANSI])),
        [Tag_ANSI] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_Tag_ANSI]  DEFAULT (''),
        [Tag_Unicode] [nvarchar](100) NULL,
        [Tag]  AS (isnull([Tag_Unicode],[Tag_ANSI])),
        [Acknowledged] [tinyint] NOT NULL CONSTRAINT [DF_Object2_Acknowledged]  DEFAULT ((0)),
        [Hname_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_Hname_ANSI]  DEFAULT (''),
        [Hname_Unicode] [nvarchar](255) NULL,
        [Hostname]  AS (isnull([Hostname_Unicode],[Hostname_ANSI])),
        [NodeID] [bigint] NOT NULL CONSTRAINT [DF_Object2_NodeID]  DEFAULT ((0)),
        [TType] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_TType]  DEFAULT (''),
        [ColorCode] [int] NULL,
        [TimeStamp] [timestamp] NOT NULL,
     CONSTRAINT [PK_Object2] PRIMARY KEY CLUSTERED 
    (
        [TID] DESC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

I tried replacing LEFT JOIN with NOT IN and NOT EXISTS but it didn't help much. Rather NOT IN performed worst and NOT EXISTS was slightly better (5.8 secs) but still not that better. For all these testings, I used below select query since I can't use actual delete:

SELECT Object1.TID
       FROM Object1
       LEFT JOIN Object2
       ON 
        Object1.TID = Object2.TID
       WHERE 
        Object2.TID IS NULL

I would appreciate if someone could provide thoughts/suggestions on it. If you need more details, I'll try to provide it as much as I can considering security limitations I have.

Additional info:

The delete runs daily as part of SQL job. I am doing Naive Batching as described in Take Care When Scripting Batches (by Michael J. Swart) to delete rows until all of the rows are deleted.

Number of rows to delete depends on the data at that point of time the SQL job runs. Sorry but don't have the exact number as it varies daily and I never recorded it.

The following query takes 6 min 30s:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
    ON Object2.TID = Object1.TID
WHERE
    Object2.TID IS NULL;

The result is 0 rows (since we delete it daily we won't be getting much records).

The following query returns 123,529,024 records:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
    ON Object1.TID = Object2.TID

Best Answer

Using Michael J. Swart's Take Care When Scripting Batches code as a base, I'd like to offer another possible solution which uses a temp table to track the next set of rows to be deleted. The code keeps track of the max TID value from the previous delete and utilizes that information to keep from re-scanning the entire table over and over. I'd be curious to know if it would work for your situation.

--------------------------
--initial demo data set up
--------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#Object1') IS NOT NULL
    DROP TABLE #Object1
IF OBJECT_ID('tempdb..#Object2') IS NOT NULL
    DROP TABLE #Object2
CREATE TABLE #Object1 (TID BIGINT)
CREATE TABLE #Object2 (TID BIGINT)

DECLARE @UpperLimit INT;

SET @UpperLimit = 1000;

WITH n
AS (
    SELECT x = ROW_NUMBER() OVER (
            ORDER BY s1.[object_id]
            )
    FROM sys.all_objects AS s1
    CROSS JOIN sys.all_objects AS s2
    CROSS JOIN sys.all_objects AS s3
    )
--Populate Object1 with some test data
INSERT INTO #Object1 (tid)
SELECT x
FROM n
WHERE x BETWEEN 1
        AND @UpperLimit;

SELECT *
FROM #Object1

--Populate Object2 with some test data
insert into #Object2(TID) values(10),(20),(30)


------------------------------
--the real delete code
------------------------------
DECLARE @LargestKeyProcessed BIGINT = - 1
    ,@NextBatchMax INT
    ,@RC INT = 1;

IF OBJECT_ID('tempdb..#Object1RowsToDelete') IS NOT NULL
    DROP TABLE #Object1RowsToDelete

--Create a temp table to hold the TID values for
--Object1 rows where there does not exist a corresponding row
--on Object2 for TID
CREATE TABLE #Object1RowsToDelete (TID BIGINT NOT NULL)

--Careful batch loop until all intended rows are deleted
WHILE (@RC > 0)
BEGIN
    --Truncate the temp table
    TRUNCATE TABLE #Object1RowsToDelete

    --Populate the temp table with the next set of TID's 
    --that need to be deleted
    INSERT INTO #Object1RowsToDelete
    SELECT TOP (10) o1.TID      --Alter TOP as needed
    FROM #Object1 o1
    LEFT JOIN #Object2 o2 ON O1.TID = O2.TID
    WHERE o2.TID IS NULL
        AND O1.TID > @LargestKeyProcessed
    ORDER BY O1.TID ASC;

    --Delete from Object1 by joining against the temp table
    DELETE O1
    FROM #Object1 o1
    JOIN #Object1RowsToDelete d ON d.TID = o1.TID

    SET @RC = @@ROWCOUNT;

    --The max TID in the temp table is where we need to start on 
    --the next interation of the while loop
    SET @LargestKeyProcessed = (
            SELECT max(TID)
            FROM #Object1RowsToDelete
            )
print @LargestKeyProcessed
END

--Select the remaining rows from Object1
--In this example, only 10, 20 and 30 remain because
--those TID's were on Object2
SELECT *
FROM #Object1