Sql-server – Keep transaction log size under control during batch processing

sql serversql-server-2008transaction-log

tl;dr

I think my verbosity has obscured the real question I'm asking here, so I apologise for that. My main issue is that the checkpoint command appeared to be working for at least 800 iterations of the loop. The log size remained static at around 1GB during that period. Then, sometime overnight the checkpoint evidently failed to free the space within the log file and it consumed the drive. No database backups where scheduled. I'm at a loss as to what caused this behaviour.

Is there anything I can do to ensure that the log space is freed on completion of each loop iteration under both FULL and SIMPLE recovery modes?


I am trying to alter the schema (and make some data changes) on an extremely large table without causing any disruption to applications and services that require this table and also to keep the transaction log at a manageable size.

So I have decided to process the table data within a loop, either taking a log backup or issuing a checkpoint statement depending on the recovery model at the end of each iteration. Each loop migrates some data into a new table (both old and new tables are UNION ALLd behind a view to ensure availability while the data is only partially migrated).

However the transaction log is blowing out on the development server under SIMPLE recovery mode, but in a very odd way. I monitored the log for the first ~800 iterations of the loop (out of a total of ~3000) before going home and the log remained at a size consistent with the number of records it was processing per iteration and so the method appeared to be working. However, on arrival the next morning, I noticed that the log had exploded and consumed the entire disk with only 400 iterations to go.

I do not understand why the process appeared to work initially, and then failed at some seemingly arbitrary point. I would have expected the log to grow steadily if my logic was wrong, but this did not happen.

I truncated the log so it started at ~500KB. It grew to around 1.2GB after the first loop and stayed at approximately that size for the first few hundred iterations. Based on that it seemed as though the method was sound.

My script is summarised with the following pseudoish code:

select distinct [fk] into #tmp from [huge_table];

while exists (select 1 from #tmp) begin
    select top (1) @id = [fk] from #tmp;

    -- Actual query is more complicated...
    delete [huge_table]
    output deleted.* into [new_table]
      from [huge_table] a
     where a.[fk] = @id;

    -- Keep log down
    if @recovery = N'SIMPLE'
        checkpoint;
    else begin
        set @logpath = '\\path\to\backups\log_' + cast(@c as varchar) + '.trn';
        backup log 'database_name' to disk = @logpath;
        set @c = @c + 1;
    end

    delete top (1) #tmp;
end

My choices at this point appear to be either a) fix the problem somehow, or b) assume that the same degenerate behaviour will not occur under FULL recovery mode in Production. I'm naturally repulsed by option B, but not sure how to go about option A.

How can I ensure that the size of the transaction log behaves itself. Is there something I am missing?

My only idea currently is to append to the end of the loop something like:

if @logsize > 10GB (or some other arbitrary number)
    dbcc shrinkfile (N'database_log_file', 0, truncateonly);

But, aside from being awful in and of itself, I can't think of any way to translate this approach to a database under FULL recovery mode, so it's really not an option for me at this point.

EDIT: Also, I checked the log_reuse_wait_desc column after the failure but the result for the database in question was NOTHING, so that provided no enlightenment. I was able to do a log truncation via dbcc shrinkfile after the fact, so the log was not in any state that prevented truncation.

Best Answer

Even with the SIMPLE recovery model, the trans log can still blow up if there's a long running transaction going on. I'm wondering if something on your Dev server was running while your script was doing its thing and cause the trans log to not be able to truncate since the MinLSN was from some long running transaction.

To answer the question though, unless you restrict access to the DB (like running your script using ALTER DATABASE SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE or something similar like SINGLE_USER), there's no way you can force the log file to free space as other transactions may be running and the trans log has to stay consistent for those transactions.

Check TechNet (or BOL: Checkpoints and the Active Portion of the Log) for more information.