I had a large transaction fail due to the transaction log filling up yesterday, it couldn't be rolled back and SQL Server has restarted the specific DB to perform recovery.
This is not totally correct when a transaction starts in SQL Server it reserves space in transaction log in case the transaction has to rollback. From Transaction Log architecture BOL doc
Rollback operations are also logged. Each transaction reserves space
on the transaction log to make sure that enough log space exists to
support a rollback that is caused by either an explicit rollback
statement or if an error is encountered. The amount of space reserved
depends on the operations performed in the transaction, but generally
it is equal to the amount of space used to log each operation. This
reserved space is freed when the transaction is completed
So I guess you can understand now.
What I can't understand it why it's taking so long?
Query rollback operation is mostly single threaded
while when same query executes it can use parallelism/multiple threads to perform same operation making it very fast as compared to the rollback. I suggest you read What Bob Dorr has to say about rollback taking long time. To get more insight about what occurs when rollback is in progress please read This article. Restarting SQL Server service is not going to help much, after restart the rollbak process had to start again. You have to wait and let the rollback process complete.
You can also use sys.dm_exec_requests to track your rollback.
select
session_id,
command,
status,
percent_complete
from sys.dm_exec_requests
where command IN ('killed/rollback','rollback','db_startup')
The other reason which Remus mentioned is too many VLF's. If you have to many VLF's recovery can take lot of time. Also other factor that can delay recovery is if SQL Server service account does not have Perform Volume maintenacce task right or instant file initialization
. You can use this link to check if IFI is present or not.
EDIT:
From The output you posted
session_id command status percent_complete
35 DB STARTUP background 86.06061
The startup process is 86 % complete.
Just keep monitoring it, eventually it will complete. Also as stated rollback process can get blocked and can incur waits so keep monitoring that as well.
2012 x64 Standard SP2
You have standard edition which cannot take advantage of fast recovery which is present in enterprise edition. With fast recovery database can come online after second phase of recovery(the redo phase) however it is alwasy not the case so please read the article. As an additional reading you can read When is fast recovery used. Standard edition limitation could be one more reason why it is taking database time to come online.
One other thing, the rollback session (35) is itself blocking another system process from performing CHECKPOINT
Yes it can and you cannot do anything as you have no control over system process in terms you cannot kill it.
The confusion seems to be with the verbiage - online/offline is sort of confusing in this context.
It refers to whether the operations being performed (a reindex or a table alter) are performed online or not, and not whether the state of the object is online or offline. Online means that the object does not have an exclusive lock on it, and is possibly accessible to other queries/operations as a result. Offline means it is locked by the operation and can't be touched by other queries no matter what.
The benefit for Enterprise is that I can do my schema changes or index maintenance and not (potentially) adversely impact other queries running. That doesn't mean that other, less restrictive, locks won't be taken (row level, page level), but it does mean that I'm not totally boxed out of the object. It does not guarantee things will keep running the same, though, as many other factors play into this. Check out the individual commands you're interested in running to see the myriad options that may effect locking behaviors.
For example, I've seen an online index reorg end up effectively locking the table from all other SELECT
statements until I tuned the MAXDOP
it was using, even though it wasn't actually taking a lock on the whole table.
An object can't be taken online or offline, although a database can (not relevant to this topic though).
Resources:
Best Answer
Here's the excerpt from the Recovery and the transaction log documentation that describes the Fast recovery feature: