Sql-server – How are these two SQL Server rollbacks different

database-internalsrollbacksql serversql-server-2008-r2transaction-log

In SQL Server 2008 R2, how are these two rollbacks different:

  1. Run an ALTER statement, for a few minutes, and then hit 'Cancel Executing'. It takes a few minutes to rollback completely.

  2. Run the same ALTER statement, but this make sure that the LDF file is not big enough for it to complete successfully. Once the LDF limit is met and no 'autogrowth' is allowed, query execution stops immediately (or a rollback happens) with this error message:

The statement has been terminated.
Msg 9002, Level 17, State 4, Line 1
The transaction log for database 'SampleDB' is full. 
To find out why space in the log cannot be reused, see the 
log_reuse_wait_desc column in sys.databases

How are these two different on the following points?

  1. Why is the second 'rollback' instantaneous? I'm not entirely sure if it could be called a rollback. My guess is, the transaction log is written as the execution progresses and once it realizes that there is not enough space to fully complete the task, it just stops with some 'end' message, without commit.

  2. What happens when the first rollback takes so much time (is a rollback single threaded)?
    2.1. Does SQL Server go back and undo the entries made in the LDF file?
    2.2. The LDF file size gets smaller at the end of the rollback (from DBCC SQLPERF(LOGSPACE))

  3. One additional question: During the second scenario, SQL Server starts consuming LDF file pretty quickly. In my case, it increased from 18% usage to 90% usage in the first few minutes (< 4 mins). But once it reached 99%, it stayed there for another 8 mins, while fluctuating usage between 99.1% to 99.8%. It goes up (99.8%) and down (99.2%) and up again (99.7%) and down (99.5%) a few times before the error is thrown. What is happening behind the scenes?

Any MSDN links that could help explain this more are appreciated.

At Ali Razeghi suggestion, I'm adding perfmon : Disk Bytes/sec

Scenario 1:

Scenario 1

Scenario 2:

Scenario 2

Best Answer

As indicated above, after running more tests, I arrived at an calculated conclusions. I summarized all of them into a blog post here, but I'll copy some content over to this post for posterity.

Conjecture (based on some tests)

As of now, I do not have a clear explanation as to why this is. But following are my estimations based on the artifacts gathered during the tests.

Rollback happens in both scenarios. One is explicit rollback (user hitting Cancel button), the other is implicit (Sql Server making that decision internally).

In both scenarios, the traffic going to the log file is consistent. See the images below:

Scenario 1:

Scenario 1:

Scenario 2:

Scenario 2

  • One artifact that reinforced this line of thinking is capturing Sql Trace during both the scenarios.

    • Scenario 1 is self evident a.k.a. when we hit ‘Cancel’, it rolls back.
    • In Scenario 2, the error message is displayed after performing ‘rollback’ implicitly. In Sql Trace, we see the error message “The transaction log for database ‘SampleDB’ is full” a long time before the message is displayed on the screen. So, my guess is that rollbacks happens in both scenarios, but the error message is Scenario 2 is displayed after successfully and completely performing the rollback.
  • Scenario 2 seems to take longer as as it progresses much further along, so the rollback takes longer.

Unexplained behavior:

  • Why does the log file usage vary so much?
    • It increases to 90%, then down to 85%, then up to 99% and hovers there for a long time. I see it going up and down like this several times: 99.2 %, 99.8 %, 99.1 %, 99.7 %. Why does this happen?
    • One possible explanation is that, there might be a background process (something like Log Flush) that cleans up log file every few minutes. And everytime it kicks in, some entries are cleared up, resulting in more free space available.

Any ideas to help explain this behavior in a better way are welcome.