Sql-server – Purging of unreferenced data via unconditional DELETE, by leaning on foreign key constraints

constraintdatabase-designforeign keysql serversql server 2014

I'm designing a database from which old data has to be purged regularly for legal reasons, and I'm trying to figure out the best way to organise the purging.

Dealing with the main tables is trivial, since they all have a month column and cascading deletes are set up for all owned rows in detail tables. So I can simply iterate over a list of these "master" tables and delete rows older than a given month.

However, there are some tables for which things aren't that simple. They can be referenced from quite a few other tables but their rows have to disappear when they are no longer referenced (because of data protection laws).

I could write some code find out which rows in a certain table aren't referenced anywhere else, based on the foreign key meta data in the database schema.

However, I'd rather lean on the foreign key constraints instead and simply use the moral equivalent of DELETE FROM @TableName. The constraints keep referenced rows from getting deleted and all unreferenced rows disappear as intended. Hence I would make simply make a second list of table names to which the unconditional DELETE should be applied during a purge, and that's it.

That solution would certainly be ideal: you can't make it any simpler, and it is directly based on the declared database schema, i.e. the foreign key constraints.

Would that be considered acceptable practice? Are there any drawbacks*?

*) apart from the fact that the consequences of someone dropping a required foreign key constraint are not only unusually dire but also unusually delayed (until the next purge at the end of the year)

Additional considerations

I found the fly in the ointment: the constraint not only blocks the deletion, it also results in an error state and abortion of the current statement. Hence the scheme doesn't work as is. Back to the drawing board then… Any pointers welcome.

Best Answer

One option would be to create views for these "inconvenient" tables that only return rows where all foreign key relationships have been broken. Then you could do unconditional deletes from those views.

You would have the overhead of coding each of these views*, but that's a one-time startup cost. This is essentially the same as what you mentioned already:

I could write some code find out which rows in a certain table aren't referenced anywhere else

But in the end, you will have the "unconditional deletes" in your cleanup job.

Note: the approach below would not scale well on really large tables without indexes, and potentially some form of batching (e.g., putting a TOP 100 and ORDER BY in the view definition

Say you have two "main tables" that reference this "inconvenient table" (the one you need to remove rows from when they are no longer referenced:

CREATE TABLE dbo.InconvenientTable
(
    Id int IDENTITY(1,1) NOT NULL,
    [Description] varchar(10) NOT NULL,

    CONSTRAINT PK_InconvenientTable 
        PRIMARY KEY (Id)
);
GO

CREATE TABLE dbo.MainTable1
(
    Id int IDENTITY(1,1) NOT NULL,
    [Description] varchar(10) NOT NULL,
    InconvenientTableId int NOT NULL,

    CONSTRAINT FK_MainTable1_InconvenientTable 
        FOREIGN KEY (InconvenientTableId)
        REFERENCES dbo.InconvenientTable (Id)
);

CREATE TABLE dbo.MainTable2
(
    Id int IDENTITY(1,1) NOT NULL,
    [Description] varchar(10) NOT NULL,
    InconvenientTableId int NOT NULL,

    CONSTRAINT FK_MainTable2_InconvenientTable 
        FOREIGN KEY (InconvenientTableId)
        REFERENCES dbo.InconvenientTable (Id)
);

The inconvenient table has 3 rows, all 3 of which are referenced in MainTable1, while only 1 is referenced in MainTable2.

INSERT INTO dbo.InconvenientTable
    ([Description])
VALUES
    ('One'),
    ('Two'),
    ('Three');
GO

INSERT INTO dbo.MainTable1
    ([Description], InconvenientTableId)
VALUES
    ('One', 1),
    ('Two', 2),
    ('Three', 3);

INSERT INTO dbo.MainTable2
    ([Description], InconvenientTableId)
VALUES
    ('Two', 2);

Now we need a view that shows all rows in dbo.InconvenientTable that are not referenced by the two main tables:

CREATE VIEW dbo.InconvenientTable_RowsToDelete
AS
SELECT it.*
FROM dbo.InconvenientTable it
WHERE 
    NOT EXISTS (SELECT NULL FROM dbo.MainTable1 mt1 WHERE mt1.InconvenientTableId = it.Id)
    AND NOT EXISTS (SELECT NULL FROM dbo.MainTable2 mt2 WHERE mt2.InconvenientTableId = it.Id);
GO

Currently all rows are referenced, so this query returns 0 rows:

Now let's delete the 'Two' row from both main tables:

DELETE dbo.MainTable2 WHERE InconvenientTableId = 2;
DELETE dbo.MainTable1 WHERE InconvenientTableId = 2;

And now the view returns that unreferenced row:

Now we can delete everything from the view, which successfully removes the 'Two' row from our inconvenient table for compliance reasons:

*You could also attempt to automate the creation of the views with dynamic SQL and metadata queries, but that seems risky

Related Solutions

Sql-server – View foreign key constraints so I can delete a table

And here's how to generate the script @Shark showed for all the tables you want to drop. Let's say you have the following tables:

USE tempdb;
GO

CREATE TABLE dbo.z(z INT PRIMARY KEY); -- we won't delete this one

CREATE TABLE dbo.a
(
 a INT PRIMARY KEY FOREIGN KEY REFERENCES dbo.z(z)
);

CREATE TABLE dbo.b
(
 b INT PRIMARY KEY, 
 a INT FOREIGN KEY REFERENCES dbo.a(a)
);

CREATE TABLE dbo.c
(
 c INT PRIMARY KEY, 
 b INT FOREIGN KEY REFERENCES dbo.b(b), 
 a INT FOREIGN KEY REFERENCES dbo.a(a)
);

-- we won't drop this table either, but we'll need to drop
-- the constraint:

CREATE TABLE dbo.d
(
 d INT, 
 c INT FOREIGN KEY REFERENCES dbo.c(c)
);

But we only want to delete a, b, and c.

-- load the tables you want to delete into a table variable:

DECLARE @tables_to_delete TABLE (t NVARCHAR(512));

INSERT @tables_to_delete VALUES('dbo.a'),('dbo.b'),('dbo.c');


DECLARE @sql NVARCHAR(MAX) = N'';

-- build a list of the foreign keys you'll have to drop first:

SELECT @sql += CHAR(13) + CHAR(10) + N'ALTER TABLE ' 
    + QUOTENAME(OBJECT_SCHEMA_NAME(f.parent_object_id))
    + '.' + QUOTENAME(OBJECT_NAME(f.parent_object_id))
    + ' DROP CONSTRAINT ' + QUOTENAME(f.name) + ';'
FROM sys.foreign_keys AS f
INNER JOIN @tables_to_delete AS t
ON f.referenced_object_id = OBJECT_ID(t.t);

-- then the DROP TABLE commands:

SELECT @sql += CHAR(13) + CHAR(10) + N'DROP TABLE '
    + t + ';'
FROM @tables_to_delete;

PRINT @sql;
-- EXEC sp_executesql @sql;

Result (the constraint names will look different if you run this):

ALTER TABLE [dbo].[b] DROP CONSTRAINT [FK__b__a__2D27B809];
ALTER TABLE [dbo].[c] DROP CONSTRAINT [FK__c__a__30F848ED];
ALTER TABLE [dbo].[c] DROP CONSTRAINT [FK__c__b__300424B4];
ALTER TABLE [dbo].[d] DROP CONSTRAINT [FK__d__c__32E0915F];
DROP TABLE dbo.a;
DROP TABLE dbo.b;
DROP TABLE dbo.c;

When you're happy about the result, uncomment the EXEC line.

(Note, you won't be able to validate the script in its entirety when using PRINT if the script is very large. The script is truncated by Management Studio because it still has an archaic limit to how many characters it will show. The string won't be truncated like this when it gets passed to sp_executesql.)

Mysql – Foreign Key Constraint fails

You need to have already declared the table that the foreign key references, before you can define a foreign key that references it.

Once you declare the second table, you can then declare the first table. Tested here on MySQL 5.5.27.

If you need to bypass the validation, you can do this:

SET FOREIGN_KEY_CHECKS = 0;
-- declare tables
SET FOREIGN_KEY_CHECKS = 1;

http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_foreign_key_checks

Additional considerations

Best Answer

Related Solutions

Sql-server – View foreign key constraints so I can delete a table

Mysql – Foreign Key Constraint fails

Related Question