Sql-server – Patterns for Deleting large amounts of data

sql-server-2008-r2

So I'm looking at how to delete a large amount of data from a handful of tables that do not have the key I need to easily isolate the rows that need to be deleted. The situation is as so:

I have a ID let's say a RequestID. I've decided that all of an arbitrary RequestID's data entries are invalid and so I want to purge them out of my tables to clean up.

Table A is a dimensional table that has my RequestID and it also has all the associated URL IDs for any particular RequestID.

Table B also contains dimensional data but does not have RequestID, so I have to use Table A to look up which records in Table B are valid delete candidates.

These tables are anywhere from 1 million to a billion rows, so the deletes have to be batched to work properly.

My thought was to do something like this but it doesn't seem very performant:

WHILE EXISTS (SELECT TOP 1 1 FROM TableB JOIN TableA ON TableA.URLID = TableB.URLID)
BEGIN
DELETE TOP 50000 a
FROM TableB a
JOIN TableA ON TableB.URLID = TableA.URLID
WHERE TableA.RequestID = <some_value>
END

I'm not sure how better I could delete the data

Edit: sorry I forgot to include requestID in the delete code example

Best Answer

but you are missing RequestID hopefully you have index on URLID and RequestID

select 1 
while (@@rowcount > 0)
BEGIN
  DELETE TOP (50000) b 
  FROM TableB b 
  JOIN TableA a
    ON b.URLID = a.URLID 
   AND a.RequestID = @RequestID
END

Disable FK can help but be sure you are not breaking any FK