Sql-server – DELETE statement conflicted with the REFERENCE constraint

deletereferential-integritysql server

My situation looks like this:

Table STOCK_ARTICLES:

ID *[PK]*
OTHER_DB_ID
ITEM_NAME

Table LOCATION:

ID *[PK]*
LOCATION_NAME

Table WORK_PLACE:

ID *[PK]*
WORKPLACE_NAME

Table INVENTORY_ITEMS:

ID *[PK]*
ITEM_NAME
STOCK_ARTICLE *[FK]*
LOCATION *[FK]*
WORK_PLACE *[FK]*

The 3 FKs in INVENTORY_ITEMS reference the "ID" columns in the respective other tables, obviously.

The relevant tables here are STOCK_ARTICLE and INVENTORY_ITEMS.

Now there is a SQL job consisting of several steps (SQL scripts) that "synchronizes" the database mentioned above with another database (OTHER_DB). One of the steps inside this job is for "cleanup". It deletes all records from STOCK_ITEMS where there's no corresponding record in the other database with the same ID.
It looks like this:

DELETE FROM STOCK_ARTICLES
 WHERE
    NOT EXISTS
     (SELECT OTHER_DB_ID FROM
     [OTHER_DB].[dbo].[OtherTable] AS other
               WHERE other.ObjectID = STOCK_ARTICLES.OTHER_DB_ID)

But this step always fails with:

The DELETE statement conflicted with the REFERENCE constraint
"FK_INVENTORY_ITEMS_STOCK_ARTICLES". The conflict occurred in database
"FIRST_DB", table "dbo.INVENTORY_ITEMS", column 'STOCK_ARTICLES'.
[SQLSTATE 23000] (Error 547) The statement has been terminated.
[SQLSTATE 01000] (Error 3621). The step failed.

So the problem is that it can't delete records from STOCK_ARTICLES when they are referenced by INVENTORY_ITEMS.
But this cleanup needs to work.
Which means that I probably have to extend the cleanup script so that it first identifies the records that should be deleted from STOCK_ITEMS, but can't because the corresponding ID is referenced from inside INVENTORY_ITEMS.
Then it should first delete those records inside INVENTORY_ITEMS, and after that delete the records inside STOCK_ARTICLES.
Am I right?
How would the SQL code look like then?

Thank you.

Best Answer

That is the whole point of foreign key constraints: they stop you deleting data that is referred to elsewhere in order to maintain referential integrity.

There are two options:

  1. Delete the rows from INVENTORY_ITEMS first, then the rows from STOCK_ARTICLES.
  2. Use ON DELETE CASCADE for the in the key definition.

1: Deleting In Correct Order

The most efficient way to do this varies depending on the complexity of the query that decides which rows to delete. A general pattern might be:

BEGIN TRANSACTION
SET XACT_ABORT ON
DELETE INVENTORY_ITEMS WHERE STOCK_ARTICLE IN (<select statement that returns stock_article.id for the rows you are about to delete>)
DELETE STOCK_ARTICLES WHERE <the rest of your current delete statement>
COMMIT TRANSACTION

This is fine for simple queries or for deleting a single stock item, but given your delete statement contains a WHERE NOT EXISTS clause nesting that within WHERE IN might produce a very inefficient plan so test with a realistic data set size and rearrange the query if needed.

Also note the transaction statements: you want to make sure both the deletes complete or neither of them do. If the operation is already happening within a transaction you will obviously need to alter this to match your current transaction and error handling process.

2: Use ON DELETE CASCADE

If you add the cascade option to your foreign key then SQL Server will automatically do this for you, removing rows from INVENTORY_ITEMS to satisfy the constraint that nothing should refer to the rows you are deleting. Just add ON DELETE CASCADE to the FK definition like so:

ALTER TABLE <child_table> WITH CHECK 
ADD CONSTRAINT <fk_name> FOREIGN KEY(<column(s)>)
REFERENCES <parent_table> (<column(s)>)
ON DELETE CASCADE

An advantage here is that the delete is one atomic statement reducing (though, as usual, not 100% removing) the need to worry about transaction and lock settings. The cascade can even operate over multiple parent/child/grand-child/... levels if there is only one path between parent and all the descendants (search for "multiple cascade paths" for examples of where this might not work).

NOTE: I, and many others, consider cascaded deletes to be dangerous so if you use this option be very careful to properly document it in your database design so you and other developers don't trip over the danger later. I avoid cascading deletes wherever possible for this reason.

A common problem caused with cascaded deletes is when someone updates data by dropping and recreating rows instead of using UPDATE or MERGE. This is often seen where "update the rows that already exist, insert those that don't" (sometimes called an UPSERT operation) is needed and people unaware of the MERGE statement find it easier to do:

DELETE <all rows that match IDs in the new data>
INSERT <all rows from the new data>

than

-- updates
UPDATE target 
SET    <col1> = source.<col1>
  ,    <col2> = source.<col2>
       ...
  ,    <colN> = source.<colN>
FROM   <target_table> AS target JOIN <source_table_or_view_or_statement> AS source ON source.ID = target.ID
-- inserts
INSERT  <target_table>
SELECT  *
FROM    <source_table_or_other> AS source
LEFT OUTER JOIN
        <target_table> AS target
        ON target.ID = source.ID
WHERE   target.ID IS NULL

The problem here is that the delete statement will cascade to child rows, and the insert statement won't recreate them, so while updating the parent table you accidentally lose data from the child table(s).

Summary

Yes, you have to delete the child rows first.

There is another option: ON DELETE CASCADE.

But ON DELETE CASCADE can be dangerous, so use with care.

Side note: use MERGE (or UPDATE-and-INSERT where MERGE is not available) when you need an UPSERT operation, not DELETE-then-replace-with-INSERT to avoid falling into traps laid by other people using ON DELETE CASCADE.