Sql-server – Strange query results for DELETE with ROW_NUMBER

sql serversql server 2014

In the following DELETE statement I try to delete all rows except the first one ordered by some criteria. (The actual query makes more sense, this is just a repro. All the sys.objects stuff is just for generating test data.)

Note the filter r <> 1. Yet, the OUTPUT clause outputs deleted rows with r = 1. How can this be?

USE tempdb
SET XACT_ABORT ON

BEGIN TRAN

    SELECT *
    INTO #o
    FROM sys.objects

    SELECT TOP 2 name FROM #o ORDER BY object_id --debug output

    DELETE k
    OUTPUT Deleted.name, Deleted.r
    FROM (
        SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
        FROM #o k
    ) k
    WHERE r <> 1 --OUTPUT returns rows with (r = 1)

    SELECT TOP 2 name FROM #o ORDER BY object_id --debug output

ROLLBACK

Query results:

enter image description here

(The 3rd result set is complete – only one row.)

Note, that all rows except the first one were deleted. The numbering order of column r does not seem to match what was requested. And there is a row with r = 1.

This is SQL Server 2014 CU3 with trace flag 4199 enabled.

Best Answer

This is expected behaviour at the moment

the function gets evaluated on the DELETE stream.

So it actually behaves like this (pseudo code)

DELETE k
OUTPUT Deleted.name, 
       ROW_NUMBER() OVER (ORDER BY Deleted.object_id) as r
FROM (
    SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
    FROM #o k
) k
WHERE r <> 1 --OUTPUT returns rows with (r = 1)

Although this is the currently defined expected behaviour it isn't really reasonable and they say

Long term, we need to actually fix the behavior of OUTPUT clause to match that of the ANSI SQL standard which will result in change of results. So we will look at the correct semantics for a future version of SQL Server since there might be apps that rely on the current behavior.

I haven't tested on SQL Server 2014 but on 2012 the plan looks as the below.

enter image description here

After the delete operator (to the left) the column values from the deleted rows are sorted back into object_id order and the row_number re-applied.

It looks like the same is happening in your case from the results. (temp tables have a negative id and are sorted first before sysrscols which has a low positive object_id of 3).

As well as the dubious semantics of the result the second sort by object_id doesn't seem strictly necessary in this plan as it looks likely that they will already be sorted in that order in any event.

Regarding workarounds for this specific case changing the output clause to OUTPUT Deleted.name, 1 + Deleted.r AS r would work.

For more complicated WHERE clauses I think you'd need a pass to calculate the row_number and then a join. e.g.

ALTER TABLE #o
  ADD CONSTRAINT PK PRIMARY KEY (object_id);

WITH k AS
(
 SELECT *, 
        ROW_NUMBER() OVER (ORDER BY object_id) r
FROM #o
)
MERGE k AS k1
using k AS k2
ON k1.object_id = k2.object_id
WHEN matched AND k2.r <> 1 THEN
  DELETE
OUTPUT Deleted.name,
       k2.r;

enter image description here

Results

enter image description here