Sql-server – Simulate DELETE CASCADE

cascadeforeign keysql serversql-server-2016

I need to delete many rows from one table persons, that is referenced by many other tables as FK.

If I had DELETE CASCADE, I could just delete the records from persons, and automatically delete from the other tables, but I don't like the idea.

I would like a script that takes as input a DELETE statement, and outputs all the DELETE statements that are needed. Ideally, it would tell me how many records would be deleted.

Example:

Input:

DELETE FROM persons WHERE person_id < 1000000

Output:

-- This would delete 124,345 records
DELETE FROM persons_addresses WHERE person_id < 1000000
-- This would delete 82,954 records
DELETE FROM persons_phone numbers WHERE person_id < 1000000
...
-- This would delete 999,999 records
DELETE FROM persons WHERE person_id < 1000000

Best Answer

The information required to generate the queries and the counts is all available in the catalog views, like sys.columns and sys.foreign_key_columns. We need to find all of the child tables, and then count how many rows in each child table meet the same criteria as the parent ID.

CREATE PROCEDURE dbo.GeneratedPathedDeletes
  @ParentTable     nvarchar(512),
  @ParentColumn    nvarchar(128),
  @DeleteCriteria  nvarchar(255)
AS
BEGIN
  DECLARE @sql nvarchar(max) = N'',
          @src nvarchar(max) = N'SELECT ''DELETE $t$ WHERE $c$ $clause$;'' UNION ALL SELECT 
          ''-- This would delete '' + (SELECT RTRIM(COUNT(*)) FROM $t$ WHERE $c$ $clause$) 
          + '' rows.'';';

  SELECT @sql += REPLACE(REPLACE(REPLACE(@src,N'$t$',t),N'$c$',c),N'$clause$',@DeleteCriteria)
  FROM
  (
    SELECT t = QUOTENAME(OBJECT_SCHEMA_NAME(pt.parent_object_id))
             + '.' + QUOTENAME(OBJECT_NAME(pt.parent_object_id)),
           c = QUOTENAME(pc.name)
    FROM sys.foreign_key_columns AS pt
    INNER JOIN sys.columns AS pc
       ON pt.parent_object_id = pc.[object_id]
      AND pt.parent_column_id = pc.column_id
    INNER JOIN sys.columns AS rc
       ON pt.referenced_column_id = rc.column_id
      AND pt.referenced_object_id = rc.[object_id]
    WHERE pt.referenced_object_id = OBJECT_ID(@ParentTable)
      AND rc.name = @ParentColumn
  ) AS x;

  -- final delete of parent table:
  SELECT @sql += REPLACE(REPLACE(REPLACE(@src,N'$t$',@ParentTable),
    N'$c$',@ParentColumn),N'$clause$',@DeleteCriteria);

  EXEC sys.sp_executesql @sql;
END
GO

Sample usage:

EXEC dbo.GeneratedPathedDeletes
     @ParentTable    = N'dbo.persons',
     @ParentColumn   = N'person_id',
     @DeleteCriteria = N' < 100000'; 

My sample tables were much smaller, but my output looked like this:

-- This would delete 12 rows.
DELETE dbo.persons_addresses WHERE person_id < 1000000;

-- This would delete 4 rows.
DELETE dbo.persons_phone_numbers WHERE person_id < 1000000;

-- This would delete 2 rows.
DELETE dbo.persons WHERE person_id < 1000000;

Limitations:

  • This doesn't perform the deletes - you still have to run those manually based on the output.
  • This doesn't handle cyclic paths or grandchildren; only traditional parent-child.
  • This doesn't handle multi-column keys or, probably, self-referencing keys (didn't even try).
  • More complex clauses, like an IN clause, or multiple ranges, will require more work.
  • This isn't secure from SQL Injection - if you're accepting weapons from users and passing them into this procedure, please see my tips part 1 and part 2.