Our application uses a popular 3rd-party extension to Entity Framework that allows for valuable things that native Entity Framework is not capable of or designed to do, like bulk deletes and updates.
It seems that the query pattern produced by the bulk delete API method from this library is not optimized well by the SQL Server query optimizer.
I am using SQL Server 2014. Consider the following table and index (names have been changed to protect the innocent):
CREATE TABLE [dbo].[a_staging_table]
(
[staging_ky] [BIGINT] IDENTITY(1,1) NOT NULL,
[col_a] [INT] NOT NULL,
[col_b] [VARCHAR](15) NOT NULL,
[col_c] [VARCHAR](256) NOT NULL,
[col_d] [INT] NULL,
[col_e] [INT] NOT NULL,
[col_f] [VARCHAR](3) NOT NULL,
[col_g] [VARCHAR](3) NULL,
[col_h] [VARCHAR](20) NULL,
[col_i] [CHAR](4) NOT NULL,
[col_j] [INT] NOT NULL,
[col_k] [VARCHAR](10) NULL,
[col_l] [DATETIME] NOT NULL,
[col_m] [VARCHAR](25) NOT NULL,
[col_n] [DATETIME] NOT NULL,
[col_o] [VARCHAR](25) NOT NULL,
[col_p] [BIGINT] NOT NULL,
[col_q] [BIT] NOT NULL,
[col_r] [DATETIME] NULL,
[col_s] [VARCHAR](25) NULL,
PRIMARY KEY CLUSTERED ([staging_ky] ASC)
);
GO
CREATE NONCLUSTERED INDEX [IDX_dbo_a_staging_table_col_p]
ON [dbo].[a_staging_table]([col_p] ASC)
WITH (FILLFACTOR = 100);
GO
The application is trying to delete all records from the table where col_p is equal to a certain value, and the generated SQL query from Entity Framework looks like this:
DELETE [dbo].[a_staging_table]
FROM [dbo].[a_staging_table] AS j0 INNER JOIN (
SELECT
1 AS [C1],
[Extent1].[staging_ky] AS [staging_ky]
FROM [dbo].[a_staging_table] AS [Extent1]
WHERE [Extent1].[col_p] = @p__linq__0
) AS j1 ON (j0.[staging_ky] = j1.[staging_ky])
Looking at the query plan, SQL Server does not optimize away the join between the table and itself, and instead, the plan does an index scan with a nested loops join and seek to the same index, followed by a table spool, prior to the delete.
A more traditional and optimal delete statement is something like this statement, which just does an index seek, prior to the clustered index delete:
DELETE [dbo].[a_staging_table]
FROM [dbo].[a_staging_table] AS j0
WHERE [j0].[col_p] = @p__linq__0
The actual query plans for both statements are posted at:https://www.brentozar.com/pastetheplan/?id=SygZoD6um
Is this query pattern generated by this library a known query anti-pattern for a delete statement in SQL Server?
Best Answer
This strikes me as a slightly curious question to ask. Why does the answer to that question matter? What would you do with that information? If the library that you're using doesn't meet your business needs, then four options come to mind:
With all of that said, I think that most SQL Server people would view that query with suspicion. I also think that it's fair to say that joining to tables that you don't need is an anti-pattern.
All of the relational databases that I've worked with have limitations around the inferences that they can make for eliminating unnecessary joins. One of the reasons that you get that ugly plan in SQL Server is that there isn't optimizer support for eliminating an inner self-join on a table's primary key. This can be observed with the following simple demo:
The following query:
Results in join elimination:
But a similar query with an
INNER JOIN
does not:The optimizer doesn't support every possible query transformation. In this example, as programmers we can deduce that a join isn't necessary but the optimizer cannot. This might be unsupported in the product because Microsoft viewed this scenario as not common enough to address, the check was not worth the compile time overhead, or because it would be a difficult check to implement.
For completeness, the spool is present in the query plan due to halloween protection.