Sql-server – Understanding The Transaction Log

sql serversql-server-2016transaction-log

I used to work with MySQL/PhpMyAdmin for web application and portable SQLCE for light local program. I just started with SQL Server Express 2016 and SSMS and just cannot understand Transaction Log. I have read various posts including MSDN: The Transaction Log and A beginner’s guide to SQL Server transaction logs, but I still cannot conceptually understanding even one single need for Transaction Log (the *.ldf file) and my lack of understanding leads me to think the log file is just a complete waste of disk space.

In MSDN it states the transaction log supports the following operations:

Individual transaction recovery.

So it logs all my query and data? Then what is the point of having the
*.mdf data file in the first place? Alternatively, if I back up my data and database schema, I can always recreate my database and insert back my data, and I have no need for my *.ldf

Recovery of all incomplete transactions when SQL Server is started.

“Incomplete Transaction" – does that mean if there is a power outage before a process is completed, the log file will resume/repair the transaction automatically when the power is back on and server is running?

Rolling a restored database, file, filegroup, or page forward to the
point of failure.

"Rolling"… does that mean there are subversions of database within the log file like a backup machine?

Supporting transactional replication.

I found no explanation. Does it mean I can repeatedly insert duplicate rows of the same data? It make no sense.

Supporting high availability and disaster recovery solutions: Always
On Availability Groups, database mirroring, and log shipping.

So..if my *.mdf database file got corrupted, the log file will mirror and recreate the whole thing uncorrupted?

At last, the log file seems contain everything there is in a *.mdf data file plus a good number of functions wrapping it. I thought either the *.mdf file is not needed, or alternatively if my schema and data is regularly backup, then there is no need for log file. But I am sure my understanding is incorrect.

Best Answer

I'm going to attempt to show you why you have a vested interest in the Sql Server Transaction log.

Bear with me as I lay out a few assumptions...

  • For simplicity sake, I'm assuming FULL RECOVERY model.
  • Because you're following BEST PRACTICES, you've separated the MDF/NDF files onto different drives than where your LDF file is stored. So, you have your MDF on Drive-E and your LDF on Drive-F. You'll see why this is important later.
  • Your company wants to minimize data loss in the event of a disaster, so you're actually backing up your transaction logs every minute.
  • For disaster recovery purposes, you have some sort of HA/DR configured. Let's just go with simple transactional replication.

Here's a simple scenario using two different transactions:

--Start transaction for Salary
BEGIN TRANSACTION

UPDATE Payroll SET YourSalary = YourSalary + 5000.00

COMMIT
--Congrats, you just got a $5,000 raise!


--Start a transaction for BankCheckingAccount
BEGIN TRANSACTION

--Subtract $500.00 from your bank checking account
UPDATE YourBankCheckingAccount SET AccountBalance = AccountBalance - 500.00

--No COMMIT yet.

When those statements executed, Sql Server read data into memory (if it wasn't already there), updated the data pages in memory and wrote that information into the transaction log indicating the update. It tracks the BEFORE and AFTER of the data modification in the transaction log. The data pages only have the 'final' result. However, at this point, the modified data pages are not 'necessarily' written to disk for performance reasons. Imagine how slow things would be if you had to wait on disk for each data 'page' modification to be written. Data modifications are always written to the log before a 'completion' response is returned to the user. That's the 'D' part of ACID. At this point, the data page in memory is flagged as IsDirty meaning that one or more modifications have been made to it since it was read into memory and those changes have not been written to disk. These 'dirty' pages are written to disk during a CHECKPOINT.

Now, back up your database (without committing that transaction). The back up process initiates the CHECKPOINT process and the 'dirty' pages are written to disk. Keep in mind that you have NOT committed this transaction and are free to undo (rollback) it at any time. Your updated information is in that backup. That backup also has enough information (from the log) to identify committed and uncommitted transactions at the time the backup was taken. If you restored that backup to another database, I'm sure you'd like your $500.00 back, right? During a restore to the other database, Sql Server would put the data page (with your -$500.00 account balance) back to disk. However, after the data pages have been restored, Sql Server goes through 'crash-recovery' and uses the transaction log to identify transactions that need to be undone (due to uncommitted transactions like yours) and also transactions that need to be 'redone' (due to data being modified and committed after the backup started, but are not reflected in the actual data pages because the updates occurred AFTER the CHECKPOINT.)

Now, on to another scenario still using our transactions. Your Salary was updated (and committed) 5 minutes ago. Your E-drive, which hosts the MDF file suffers a disk failure and Sql Server goes down due to not being able to access it. Your F-Drive (which hosts the LDF file) is intact. Again, because you're following BEST PRACTICES, you've been taking FULL (and possibly DIFFERENTIAL) backups. Your last FULL backup was 1 hour ago, but that was BEFORE your Salary increase transaction took place. The information about your Salary increase is in the transaction log. At this point, you should attempt to do a 'tail-of-the-log' backup of the transaction log to try and salvage any transactions that occurred between the last log backup and the failure. When you restore your backup from 1 hour ago and subsequent log restores, Sql Server looks through the transaction log for transactions that need to be undone (not committed) or redone (committed). Your salary increase needs to be 'redone' because it has been committed but it's not in your backup. Your transaction log (and backups) saves the day.

The Sql Server transaction log is a 'circular' file. Let's say your transaction log was created with a maximum size of 1gb. You have complete control over how you define your transaction log space. As transactions occur, information about those transactions is written to the transaction log. Eventually, you'll run out of that 1GB space allocated for your log. When you backup your transaction log (remember, you're doing that every minute), Sql Server is able to 'reuse (circular)' transaction log space allocated as long as transactions logged in that space have committed. Transaction log space allocated to transactions that are are still 'in-flight' (not committed) OR that have not been completely replicated (you're using transactional replication) cannot be release during a transaction log backup because the log needs to be used to 'rollback' the update should the user decide to 'rollback' the transaction or the system goes down. In transactional replication (or other HA/DR scenarios like mirroring, etc.), those transactions need to be completely replicated before the log space can be reused. Actually, in our scenario, our uncommitted bank account transaction will prevent the log from being reused and you'll eventually run out of log space even though you're taking log backups every minute. So make sure transactions are being committed or rolled back in a timely fashion.

Speaking of 'Rolling a restored database, file, filegroup, or page forward to the point of failure.'

Your company wants to minimize data loss and you're taking transaction log backups each minute. Remember, only the log knows the BEFORE and AFTER information for a transaction. The only thing in the MDF/NDF files is the data as it is NOW and that's only if a CHECKPOINT has occurred since the update. Now assume your entire server dies. You have FULL backups AND you have transaction log backups to the minute just before the server died. Knowing that this possibility could happen, you knew to store your backups on a backup server that was isolated from THIS server. Your $5000 raise is in those backups AND you want your $500.00 back into your checking account. Those changes are NOT in your FULL backup because it was taken over an hour ago. When you get around to being able to restore your backups, you restore your FULL (and maybe a DIFFERENTIAL) and all of the transaction logs you have since the FULL/DIFF (just up to a minute before disaster). Sql Server then goes through crash-recovery. It makes sure your $500.00 back account transaction is 'undone' because it was uncommitted and then makes sure your $5000 raise is 'redone' because it was committed.

As far as replication, Sql Server uses the transaction log to replicate these changes to remote servers, not the data pages. Also, the transaction log space cannot be 'reused' until the transactions participating in replication or mirroring have been 'replayed' on the target server.

To summarize - the transaction log is your BEST friend.

  • It supports 'Individual transaction recovery' and 'Recovery of all incomplete transactions when SQL Server is started' during crash-recovery to undo uncommitted transactions and redo committed transactions because there is no guarantee these data page modifications were actually written to disk before disaster struck.
  • Similarly, it supports 'Rolling a restored database, file, filegroup, or page forward to the point of failure' again using the restoration of transaction logs and crash-recovery.

  • It supports 'transactional replication' and 'high availability and disaster recovery solutions: Always On Availability Groups, database mirroring, and log shipping', by sending the transaction logs to the remote server for 're-play'

My attempt to explain the usefulness of the transaction log is minimal at best. Refer to Paul Randal's excellent post about how all of this stuff works.