So I'm looking at how to delete a large amount of data from a handful of tables that do not have the key I need to easily isolate the rows that need to be deleted. The situation is as so:
I have a ID let's say a RequestID. I've decided that all of an arbitrary RequestID's data entries are invalid and so I want to purge them out of my tables to clean up.
Table A is a dimensional table that has my RequestID and it also has all the associated URL IDs for any particular RequestID.
Table B also contains dimensional data but does not have RequestID, so I have to use Table A to look up which records in Table B are valid delete candidates.
These tables are anywhere from 1 million to a billion rows, so the deletes have to be batched to work properly.
My thought was to do something like this but it doesn't seem very performant:
WHILE EXISTS (SELECT TOP 1 1 FROM TableB JOIN TableA ON TableA.URLID = TableB.URLID)
BEGIN
DELETE TOP 50000 a
FROM TableB a
JOIN TableA ON TableB.URLID = TableA.URLID
WHERE TableA.RequestID = <some_value>
END
I'm not sure how better I could delete the data
Edit: sorry I forgot to include requestID in the delete code example
Best Answer
but you are missing RequestID hopefully you have index on URLID and RequestID
Disable FK can help but be sure you are not breaking any FK