How to Use SQL Delete with a Subquery

sql serversql server 2014subquery

The following code was added by one of our developers to delete duplicate records from the table:

DELETE  SubQuery

FROM
(
    SELECT  ID
            ,FK1
            ,FK2
            ,CreatedDateTime
            ,ROW_NUMBER() OVER(PARTITION BY FK1, FK2 ORDER BY CreatedDateTime) AS RowNumber

    FROM    Table
)
AS SubQuery

WHERE   RowNumber > 1

When reviewing the code, I assumed that it wouldn't work, however testing it in our test environment (SQL 2014) shows that it does!

How does SQL know to resolve the sub query and delete the the records from table?

Best Answer

The subquery you have in your code is called a derived table. It's not a base table but a table that "lives" during the time that the query runs. Like views (which are also called viewed tables) - and in recent versions CTEs which is another, 4th way to "define" a table inside a query - they are similar to a table in many ways. You can select from them, you can use them in from or to join them to other tables (base or not!).

In some DBMS, (not all DBMS have implemented this the same way) these tables/views are updatable. And "updatable" means that we can also update, insert into or delete from them.

There are restrictions though and this is expected. Imagine if the subquery was a join of 2 (or 17 tables). What would delete mean then? (from which tables should rows be deleted?) Updatable views is a very complicated matter. There's a recent (2012) book, entirely on this subject, written by Chris Date, well known expert in relational theory: View Updating and Relational Theory.

When the derived table (or view) is a very simple query, like it has only one base table (possibly restricted by a WHERE) and no GROUP BY, then every row of the derived table corresponds to one row in the underlying base table, so it is easy^* to update, insert or delete from this.

When the code inside the subquery is more complex, it depends on whether the rows of the derived table/view can be traced/resolved to rows from one of the underlying base tables.

For SQL Server, you can read more in the Updatable Views paragraph in MSDN: CREATE VIEW.

Updatable Views

You can modify the data of an underlying base table through a view, as long as the following conditions are true:

Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table.

The columns being modified in the view must directly reference the underlying data in the table columns. The columns cannot be derived in any other way, such as through the following:

An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP, VAR, and VARP.

A computation. The column cannot be computed from an expression that uses other columns. Columns that are formed by using the set operators UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a computation and are also not updatable.

The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.

TOP is not used anywhere in the select_statement of the view together with the WITH CHECK OPTION clause.

The previous restrictions apply to any subqueries in the FROM clause of the view, just as they apply to the view itself. Generally, the Database Engine must be able to unambiguously trace modifications from the view definition to one base table.

^{Actually delete is easier, less complex than update. SQL Server only needs the primary keys or some other way to identify which rows of the base table are to be deleted. For update, there is an additional (rather obvious) restriction that we can't update a computed column. You can try to modify your query to do an update. Updating the CreatedDateTime will probably work just fine but trying to update the computed RowNumber column will raise an error. And insert is even more complex, as we'd have to provide values for all the columns of the base table that don't have a DEFAULT constraint.}

Related Solutions

SQL Server – DELETE from Subquery or Derived Table

Convert your select statement into a CTE, and DELETE FROM the CTE, as in:

;WITH del AS
(
SELECT *
FROM (
    SELECT RN = ROW_NUMBER() OVER (
            PARTITION BY SC.ID ORDER BY SC.id
            )
        ,SC.*
    FROM [ETL].[Stage_Claims] SC
    WHERE ID IN (
            SELECT ID
            FROM (
                SELECT RN = ROW_NUMBER() OVER (
                        PARTITION BY ID ORDER BY id
                        )
                    ,ID
                FROM [ETL].[DUPS_Claims]
                ) AS t1
            WHERE RN > 1
            )
    ) AS t2
WHERE RN > 1
)
DELETE FROM del;

Standard warning: You should test this in a non-production environment.

You can simplify your query quite a bit, and likely get better performance by using the below query, which does not make use of the intermediate table, DUPS_Claims, since it is absolutely unnecessary:

;WITH cte AS
(
    SELECT sc.ID
        , rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
    FROM ETL.Stage_Claims sc
)
DELETE
FROM cte 
WHERE rn > 1;

I created a non-clustered, non-unique index on both tables, then looked at the execution plans for both variations.

The first variant:

The second variant:

The first variant scans the index twice, whereas the second variant clearly only needs to scan the index a single time, and doesn't require a relatively expensive merge join in my somewhat contrived example. My sample ETL.Stage_Claims table contains 89 unique ID values, each duplicated 89 times, for a total of 7921 rows.

In case CTEs aren't your thing, you can use this approach to delete from a derived table, instead:

DELETE c
FROM (
    SELECT sc.ID
        , rn = ROW_NUMBER() OVER (PARTITION BY sc.ID ORDER BY sc.ID)
    FROM ETL.Stage_Claims sc
) c
WHERE rn > 1;

The query plan for the above DELETE from the derived table:

Sql-server – SQL Server Updated Record on Read-Only Filegroup

I have a confession to make.

Once, when I was young, I built an ETL process started with changing read-only filegroups to read-write, doing its ETL work, and then setting them back to read-only.

So just in case you have a coworker who was diabolical like I was (I was young, I needed the money), you can test by:

Change the name of the read-only filegroup - that way, if someone has hard-coded scripts that alter the filegroup by name, their scripts will fail, and you'll catch the culprit. Or, a little harder:
Use Profiler or Extended Events to track anyone who does an ALTER DATABASE.

Best Answer

Related Solutions

SQL Server – DELETE from Subquery or Derived Table

Sql-server – SQL Server Updated Record on Read-Only Filegroup

Related Question