Sql-server – Breaking up long running query into batches runs fine for a while and then produces fatal error

deletesql serversql-server-2012t-sql

I'm using a variation of Take Care When Scripting Batches by Michael J. Swart, which details using "careful batching" to delete records in batches from a table that contains 38MM records. The script below runs for a while and shows results until it hits about the 30MM record mark (30 minutes in) when I get an error:

Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command.
The results, if any, should be discarded.

Part of the issue is that each week I get a new feed of records and these records are added to the repository of existing records from previous feeds. A new record can be received that dictates that an older record be deleted. The older record can be anywhere in the table all the way back to the oldest feed.

So when this script loops through the table, it might only be looking at 100,000 records per batch on one end of the equation (the "latest" alias), but it still has to search the entire table for the existence of the older record (the "previous" alias) which would need to be deleted, so I'm thinking the performance gains that I might see in a normal delete situation using this approach are eliminated in this scenario. Should I abandon this approach?

Before I tried this approach I was just using an update statement without the batch, and I was getting a transaction log error. It would run for ~4 hours and then fail.

DECLARE
@latestFeed INT = 90,
@LargestKeyProcessed INT = -1,
@NextBatchMax INT = 1,
@RC INT = (SELECT MAX(record_id) FROM table_B);

WHILE (@NextBatchMax < @RC)
BEGIN
SELECT TOP (100000) @NextBatchMax = record_id
FROM table_B
WHERE record_id > @LargestKeyProcessed
ORDER BY record_id ASC;

delete previous
output Deleted.*
into table_A
from table_B as previous
inner join table_B as latest on previous.field_1 = latest.field_1
    and previous.field_2 = latest.field_2
    and previous.field_3 = latest.field_3
where previous.feed_id < latest.feed_id
    and previous.condition_1 in ('1', '2')
    and latest.condition_1 = '3'
    and latest.feed_id = 90
    AND latest.record_id > @LargestKeyProcessed
    AND latest.record_id <= @NextBatchMax;

delete previous
output Deleted.*
into table_A
from table_B as previous
inner join table_B as latest on previous.field_1 = latest.field_1
    and previous.field_2 = latest.field_2
    and previous.field_3 = latest.field_3
where previous.feed_id < latest.feed_id
    and previous.condition_1 in ('1', '2')
    and latest.condition_1 = '2'
    and latest.feed_id = 90
    AND latest.record_id > @LargestKeyProcessed
    AND latest.record_id <= @NextBatchMax;
SET @LargestKeyProcessed = @NextBatchMax;
END

Since these aren't my production tables, just mirrors of them, I truncated them and re-inserted all the records again. I then removed the second DELETE statement shown in my code, so that it was at least more simple. This time it ran completely without the severe error.

NEXT ATTEMPT

I'm adding a transaction to the while loop so that with each iteration it commits the transaction. I'm also reducing the batch size to 50,000 to see if that will help at all with the speed of the processing.

It also occurred to me that there's no reason to include the entire table in this. By that I mean, I know that the only feed that will contain records triggering the delete of a previous record are in the latest feed. So I'm setting the starting value of @LargestKeyProcessed like so:

@LargestKeyProcessed INT = (SELECT MIN(record_id) - 1 
FROM Table_B WHERE [feed_id] = 90),

While the script will still have to look through the rest of the table for the actual records to delete, there's no reason it needs to look through the entire table to find the records triggering a delete.

I ran DBCC CHECKDB. It reported:

CHECKDB found 0 allocation errors and 0 consistency errors in database

Best Answer

Jon,

One possible option with the statement you posted would be to put an explicit begin transaction before the first delete statement and a commit statement right before the set of the @LargestKeyProcessed variable. This would commit the delete before you move to next loop iteration and should help with the log file as it's breaking the batch up into small transactions instead one one large one that is generated implicitly by the loop.

Additionally if your database is in FULL recovery mode you will need to take frequent backups of the transaction log to allow the transactions to be removed,once the have committed or rolled back, to prevent the log file from needing to grow. If your database is in SIMPLE recovery mode the transactions will get overwritten once the transaction commits or rolls back.

Related Solutions

SQL Server 2012 – How Many Records to Delete in Batch Delete?

Note that regardless of whether you trigger a table lock, a 1000+ record INSERT transaction will lock many pages in your indexes, and has a high risk of causing deadlocking or other problems. The good news is that your post does not seem to provide a business case for taking on a such a large transactional operation.

The data source table is not going to be read by other users...

Is it crucial for the business to ensure that a large block records has been transferred in a large batch, such that anyone accessing the source table will not see them at the exact point in time anyone access the target time will be able to see them? Or, is it necessary for users of the target table to see large blocks of records appear in the target table all exactly at the same time?

If no to both of those, I second Spörri's below statement...

If you have a clustered index on an identity column, which is not fragmented and you are deleting records from a range thats old you can safely ask the engine to lock the extends that the old records are using and delete ~64Kb at a time without fearing of blocking readers.

Now 64KB is not a very large value. If your records are each 1000 bytes, that means you are talking at most 64 records. As such, I recommend you do these transactions one record at a time. If the migration terminates on an error, one of these records will get rolled back, and you can certainly pick up where you left off because the source table and target table will be in alignment. This will ensure maximum availability for your users.

Sql-server – How to query based on related table without returning records from related table

You don't need a join, since you have no columns from table_B in the output. You could use a semi-join, i.e an EXISTS subquery:

SELECT a.Name, 'Microsoft' AS Company
FROM Table_A AS a
WHERE EXISTS 
      ( SELECT *
        FROM Table_B AS b
        WHERE a.ID = b.Foreign_ID
          AND b.Company = 'Microsoft'
      ) ;

If you did want some other column from table_B (say the Specialty) and still only 1 row, you could use either a derived table with a window function, or CROSS APPLY with TOP 1. CROSS APPLY is a special join where we can reference previous tables. Note that it's similar to INNER JOIN (you had LEFT JOIN while you need INNER JOIN):

SELECT a.Name, 'Microsoft' AS Company, b.Specialty
FROM Table_A AS a
  CROSS APPLY 
    ( SELECT TOP (1) bi.Specialty 
      FROM Table_B AS bi
      WHERE a.ID = bi.Foreign_ID
        AND b.Company = 'Microsoft'
      ORDER BY Specialty 
    ) AS b ;

Best Answer

Related Solutions

SQL Server 2012 – How Many Records to Delete in Batch Delete?

Sql-server – How to query based on related table without returning records from related table

Related Question