Sql-server – Breaking up long running query into batches runs fine for a while and then produces fatal error

deletesql serversql-server-2012t-sql

I'm using a variation of Take Care When Scripting Batches by Michael J. Swart, which details using "careful batching" to delete records in batches from a table that contains 38MM records. The script below runs for a while and shows results until it hits about the 30MM record mark (30 minutes in) when I get an error:

Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command.
The results, if any, should be discarded.

Part of the issue is that each week I get a new feed of records and these records are added to the repository of existing records from previous feeds. A new record can be received that dictates that an older record be deleted. The older record can be anywhere in the table all the way back to the oldest feed.

So when this script loops through the table, it might only be looking at 100,000 records per batch on one end of the equation (the "latest" alias), but it still has to search the entire table for the existence of the older record (the "previous" alias) which would need to be deleted, so I'm thinking the performance gains that I might see in a normal delete situation using this approach are eliminated in this scenario. Should I abandon this approach?

Before I tried this approach I was just using an update statement without the batch, and I was getting a transaction log error. It would run for ~4 hours and then fail.

DECLARE
@latestFeed INT = 90,
@LargestKeyProcessed INT = -1,
@NextBatchMax INT = 1,
@RC INT = (SELECT MAX(record_id) FROM table_B);

WHILE (@NextBatchMax < @RC)
BEGIN
SELECT TOP (100000) @NextBatchMax = record_id
FROM table_B
WHERE record_id > @LargestKeyProcessed
ORDER BY record_id ASC;

delete previous
output Deleted.*
into table_A
from table_B as previous
inner join table_B as latest on previous.field_1 = latest.field_1
    and previous.field_2 = latest.field_2
    and previous.field_3 = latest.field_3
where previous.feed_id < latest.feed_id
    and previous.condition_1 in ('1', '2')
    and latest.condition_1 = '3'
    and latest.feed_id = 90
    AND latest.record_id > @LargestKeyProcessed
    AND latest.record_id <= @NextBatchMax;

delete previous
output Deleted.*
into table_A
from table_B as previous
inner join table_B as latest on previous.field_1 = latest.field_1
    and previous.field_2 = latest.field_2
    and previous.field_3 = latest.field_3
where previous.feed_id < latest.feed_id
    and previous.condition_1 in ('1', '2')
    and latest.condition_1 = '2'
    and latest.feed_id = 90
    AND latest.record_id > @LargestKeyProcessed
    AND latest.record_id <= @NextBatchMax;
SET @LargestKeyProcessed = @NextBatchMax;
END

Since these aren't my production tables, just mirrors of them, I truncated them and re-inserted all the records again. I then removed the second DELETE statement shown in my code, so that it was at least more simple. This time it ran completely without the severe error.

NEXT ATTEMPT

I'm adding a transaction to the while loop so that with each iteration it commits the transaction. I'm also reducing the batch size to 50,000 to see if that will help at all with the speed of the processing.

It also occurred to me that there's no reason to include the entire table in this. By that I mean, I know that the only feed that will contain records triggering the delete of a previous record are in the latest feed. So I'm setting the starting value of @LargestKeyProcessed like so:

@LargestKeyProcessed INT = (SELECT MIN(record_id) - 1 
FROM Table_B WHERE [feed_id] = 90),

While the script will still have to look through the rest of the table for the actual records to delete, there's no reason it needs to look through the entire table to find the records triggering a delete.

I ran DBCC CHECKDB. It reported:

CHECKDB found 0 allocation errors and 0 consistency errors in database

Best Answer

Jon,

One possible option with the statement you posted would be to put an explicit begin transaction before the first delete statement and a commit statement right before the set of the @LargestKeyProcessed variable. This would commit the delete before you move to next loop iteration and should help with the log file as it's breaking the batch up into small transactions instead one one large one that is generated implicitly by the loop.

Additionally if your database is in FULL recovery mode you will need to take frequent backups of the transaction log to allow the transactions to be removed,once the have committed or rolled back, to prevent the log file from needing to grow. If your database is in SIMPLE recovery mode the transactions will get overwritten once the transaction commits or rolls back.