I have a delete query which takes around 6 mins to complete. I am looking for ways to optimise it and if its duration can be shortened. Below is its anonymized (due to security constraints) estimated execution plan.
Execution plan:
Query:
DELETE TOP(10000) Object1
FROM Object1
LEFT JOIN Object2
ON
Object1.TID = Object2.TID
WHERE
Object2.TID IS NULL
Record count details:
sp_spaceused Object1
— 124164707sp_spaceused Object2
— 27799877
Index details:
1) Object1
- CL index on TID(-) DESC, TIndex(-) DESC
- No other indexes apart from above index
2) Object2
- CL index on TID(-) DESC
- NC index [NodeID] ASC
- NC index [DateTime] ASC
- NC index [TimeStamp] ASC
- NC index [TType] ASC
SQL Server Environment details:
- SQL Server 2008 R2 SP1
- Edition: Enterprise
- Version: 10.50.2500.0
DDL:
--Object1
CREATE TABLE [dbo].[Object1](
[TID] [bigint] NOT NULL,
[TIndex] [tinyint] NOT NULL,
[OID] [varchar](1000) NOT NULL,
[OIDName] [varchar](100) NOT NULL,
[OIDValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_OIDValue_ANSI] DEFAULT (''),
[RawValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_RawValue_ANSI] DEFAULT (''),
[OIDValue_Unicode] [nvarchar](1000) NULL,
[RawValue_Unicode] [nvarchar](1000) NULL,
[OIDValue] AS (isnull([OIDValue_Unicode],[OIDValue_ANSI])),
[RawValue] AS (isnull([RawValue_Unicode],[RawValue_ANSI])),
CONSTRAINT [PK_Object1] PRIMARY KEY CLUSTERED
(
[TID] DESC,
[TIndex] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
--Object2
CREATE TABLE [dbo].[Object2](
[TID] [bigint] IDENTITY(97196339,1) NOT NULL,
[EID] [int] NOT NULL,
[DateTime] [datetime] NOT NULL CONSTRAINT [DF_Object2_RecivedDateTime] DEFAULT (getdate()),
[IP] [varchar](50) NOT NULL CONSTRAINT [DF_Object2_IP] DEFAULT ('0.0.0.0'),
[C_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_C_ANSI] DEFAULT (''),
[C_Unicode] [nvarchar](255) NULL,
[Community] AS (isnull([Community_Unicode],[Community_ANSI])),
[Tag_ANSI] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_Tag_ANSI] DEFAULT (''),
[Tag_Unicode] [nvarchar](100) NULL,
[Tag] AS (isnull([Tag_Unicode],[Tag_ANSI])),
[Acknowledged] [tinyint] NOT NULL CONSTRAINT [DF_Object2_Acknowledged] DEFAULT ((0)),
[Hname_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_Hname_ANSI] DEFAULT (''),
[Hname_Unicode] [nvarchar](255) NULL,
[Hostname] AS (isnull([Hostname_Unicode],[Hostname_ANSI])),
[NodeID] [bigint] NOT NULL CONSTRAINT [DF_Object2_NodeID] DEFAULT ((0)),
[TType] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_TType] DEFAULT (''),
[ColorCode] [int] NULL,
[TimeStamp] [timestamp] NOT NULL,
CONSTRAINT [PK_Object2] PRIMARY KEY CLUSTERED
(
[TID] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I tried replacing LEFT JOIN
with NOT IN
and NOT EXISTS
but it didn't help much. Rather NOT IN
performed worst and NOT EXISTS
was slightly better (5.8 secs) but still not that better. For all these testings, I used below select query since I can't use actual delete:
SELECT Object1.TID
FROM Object1
LEFT JOIN Object2
ON
Object1.TID = Object2.TID
WHERE
Object2.TID IS NULL
I would appreciate if someone could provide thoughts/suggestions on it. If you need more details, I'll try to provide it as much as I can considering security limitations I have.
Additional info:
The delete runs daily as part of SQL job. I am doing Naive Batching as described in Take Care When Scripting Batches (by Michael J. Swart) to delete rows until all of the rows are deleted.
Number of rows to delete depends on the data at that point of time the SQL job runs. Sorry but don't have the exact number as it varies daily and I never recorded it.
The following query takes 6 min 30s:
SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
ON Object2.TID = Object1.TID
WHERE
Object2.TID IS NULL;
The result is 0 rows (since we delete it daily we won't be getting much records).
The following query returns 123,529,024 records:
SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
ON Object1.TID = Object2.TID
Best Answer
Using Michael J. Swart's Take Care When Scripting Batches code as a base, I'd like to offer another possible solution which uses a temp table to track the next set of rows to be deleted. The code keeps track of the max TID value from the previous delete and utilizes that information to keep from re-scanning the entire table over and over. I'd be curious to know if it would work for your situation.