Sql-server – Optimize delete query in SQL Server 2008 R2 SP1

performancequery-performancesql serversql-server-2008-r2

I have a delete query which takes around 6 mins to complete. I am looking for ways to optimise it and if its duration can be shortened. Below is its anonymized (due to security constraints) estimated execution plan.

Execution plan:

Query:

DELETE TOP(10000) Object1
   FROM Object1
   LEFT JOIN Object2
   ON 
    Object1.TID = Object2.TID
   WHERE 
    Object2.TID IS NULL

Record count details:

sp_spaceused Object1 — 124164707
sp_spaceused Object2 — 27799877

Index details:

1) Object1

CL index on TID(-) DESC, TIndex(-) DESC
No other indexes apart from above index

2) Object2

CL index on TID(-) DESC
NC index [NodeID] ASC
NC index [DateTime] ASC
NC index [TimeStamp] ASC
NC index [TType] ASC

SQL Server Environment details:

SQL Server 2008 R2 SP1
Edition: Enterprise
Version: 10.50.2500.0

DDL:

--Object1
CREATE TABLE [dbo].[Object1](
    [TID] [bigint] NOT NULL,
    [TIndex] [tinyint] NOT NULL,
    [OID] [varchar](1000) NOT NULL,
    [OIDName] [varchar](100) NOT NULL,
    [OIDValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_OIDValue_ANSI]  DEFAULT (''),
    [RawValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_RawValue_ANSI]  DEFAULT (''),
    [OIDValue_Unicode] [nvarchar](1000) NULL,
    [RawValue_Unicode] [nvarchar](1000) NULL,
    [OIDValue]  AS (isnull([OIDValue_Unicode],[OIDValue_ANSI])),
    [RawValue]  AS (isnull([RawValue_Unicode],[RawValue_ANSI])),
 CONSTRAINT [PK_Object1] PRIMARY KEY CLUSTERED 
(
    [TID] DESC,
    [TIndex] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

--Object2
    CREATE TABLE [dbo].[Object2](
        [TID] [bigint] IDENTITY(97196339,1) NOT NULL,
        [EID] [int] NOT NULL,
        [DateTime] [datetime] NOT NULL CONSTRAINT [DF_Object2_RecivedDateTime]  DEFAULT (getdate()),
        [IP] [varchar](50) NOT NULL CONSTRAINT [DF_Object2_IP]  DEFAULT ('0.0.0.0'),
        [C_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_C_ANSI]  DEFAULT (''),
        [C_Unicode] [nvarchar](255) NULL,
        [Community]  AS (isnull([Community_Unicode],[Community_ANSI])),
        [Tag_ANSI] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_Tag_ANSI]  DEFAULT (''),
        [Tag_Unicode] [nvarchar](100) NULL,
        [Tag]  AS (isnull([Tag_Unicode],[Tag_ANSI])),
        [Acknowledged] [tinyint] NOT NULL CONSTRAINT [DF_Object2_Acknowledged]  DEFAULT ((0)),
        [Hname_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_Hname_ANSI]  DEFAULT (''),
        [Hname_Unicode] [nvarchar](255) NULL,
        [Hostname]  AS (isnull([Hostname_Unicode],[Hostname_ANSI])),
        [NodeID] [bigint] NOT NULL CONSTRAINT [DF_Object2_NodeID]  DEFAULT ((0)),
        [TType] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_TType]  DEFAULT (''),
        [ColorCode] [int] NULL,
        [TimeStamp] [timestamp] NOT NULL,
     CONSTRAINT [PK_Object2] PRIMARY KEY CLUSTERED 
    (
        [TID] DESC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

I tried replacing LEFT JOIN with NOT IN and NOT EXISTS but it didn't help much. Rather NOT IN performed worst and NOT EXISTS was slightly better (5.8 secs) but still not that better. For all these testings, I used below select query since I can't use actual delete:

SELECT Object1.TID
       FROM Object1
       LEFT JOIN Object2
       ON 
        Object1.TID = Object2.TID
       WHERE 
        Object2.TID IS NULL

I would appreciate if someone could provide thoughts/suggestions on it. If you need more details, I'll try to provide it as much as I can considering security limitations I have.

Additional info:

The delete runs daily as part of SQL job. I am doing Naive Batching as described in Take Care When Scripting Batches (by Michael J. Swart) to delete rows until all of the rows are deleted.

Number of rows to delete depends on the data at that point of time the SQL job runs. Sorry but don't have the exact number as it varies daily and I never recorded it.

The following query takes 6 min 30s:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
    ON Object2.TID = Object1.TID
WHERE
    Object2.TID IS NULL;

The result is 0 rows (since we delete it daily we won't be getting much records).

The following query returns 123,529,024 records:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
    ON Object1.TID = Object2.TID

Best Answer

Using Michael J. Swart's Take Care When Scripting Batches code as a base, I'd like to offer another possible solution which uses a temp table to track the next set of rows to be deleted. The code keeps track of the max TID value from the previous delete and utilizes that information to keep from re-scanning the entire table over and over. I'd be curious to know if it would work for your situation.

--------------------------
--initial demo data set up
--------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#Object1') IS NOT NULL
    DROP TABLE #Object1
IF OBJECT_ID('tempdb..#Object2') IS NOT NULL
    DROP TABLE #Object2
CREATE TABLE #Object1 (TID BIGINT)
CREATE TABLE #Object2 (TID BIGINT)

DECLARE @UpperLimit INT;

SET @UpperLimit = 1000;

WITH n
AS (
    SELECT x = ROW_NUMBER() OVER (
            ORDER BY s1.[object_id]
            )
    FROM sys.all_objects AS s1
    CROSS JOIN sys.all_objects AS s2
    CROSS JOIN sys.all_objects AS s3
    )
--Populate Object1 with some test data
INSERT INTO #Object1 (tid)
SELECT x
FROM n
WHERE x BETWEEN 1
        AND @UpperLimit;

SELECT *
FROM #Object1

--Populate Object2 with some test data
insert into #Object2(TID) values(10),(20),(30)


------------------------------
--the real delete code
------------------------------
DECLARE @LargestKeyProcessed BIGINT = - 1
    ,@NextBatchMax INT
    ,@RC INT = 1;

IF OBJECT_ID('tempdb..#Object1RowsToDelete') IS NOT NULL
    DROP TABLE #Object1RowsToDelete

--Create a temp table to hold the TID values for
--Object1 rows where there does not exist a corresponding row
--on Object2 for TID
CREATE TABLE #Object1RowsToDelete (TID BIGINT NOT NULL)

--Careful batch loop until all intended rows are deleted
WHILE (@RC > 0)
BEGIN
    --Truncate the temp table
    TRUNCATE TABLE #Object1RowsToDelete

    --Populate the temp table with the next set of TID's 
    --that need to be deleted
    INSERT INTO #Object1RowsToDelete
    SELECT TOP (10) o1.TID      --Alter TOP as needed
    FROM #Object1 o1
    LEFT JOIN #Object2 o2 ON O1.TID = O2.TID
    WHERE o2.TID IS NULL
        AND O1.TID > @LargestKeyProcessed
    ORDER BY O1.TID ASC;

    --Delete from Object1 by joining against the temp table
    DELETE O1
    FROM #Object1 o1
    JOIN #Object1RowsToDelete d ON d.TID = o1.TID

    SET @RC = @@ROWCOUNT;

    --The max TID in the temp table is where we need to start on 
    --the next interation of the while loop
    SET @LargestKeyProcessed = (
            SELECT max(TID)
            FROM #Object1RowsToDelete
            )
print @LargestKeyProcessed
END

--Select the remaining rows from Object1
--In this example, only 10, 20 and 30 remain because
--those TID's were on Object2
SELECT *
FROM #Object1

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

Sql-server – Parent-Child Tree Hierarchical ORDER

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

Sql-server – Parent-Child Tree Hierarchical ORDER

Related Question