Sql-server – SQL Server differential backup LSN values

backuprestoresql servertransaction-log

I'm looking at restoring databases on SQL Server and I'm not understanding something related to the LSNs. Here is a screenshot from the Restore Database dialog in Management Studio for a sample database I created.

enter image description here

You'll notice that there are two differential backups. If each differential backup contains the data for the transactions of all the previous differential backups, why would the FirstLSN of the second differential backup (22000000041800004) be higher than the FirstLSN of the first differential backup (22000000039800037)? If it covers the same transactions why wouldn't the LSNs overlap?

If you look at Microsoft's documentation on this it states

For data and differential backups, the first LSN identifies the earliest log record that is needed to perform recovery with this backup.

So if I am going to recover the database using the second differential backup, wouldn't it stand to reason that the second differential backup's FirstLSN would start where the full backup left off (22000000038800001)? Restoring the second differential would apply all the transactions that occurred after the full backup was created wouldn't it?

I'm not understanding how the system accounts for the LSN gap between the full backup and the second differential.

Best Answer

why would the FirstLSN of the second differential backup (22000000041800004) be higher than the FirstLSN of the first differential backup (22000000039800037)?

When dealing with Data (Full, file, etc) or differential version of said backups the FirstLSN and LastLSN actually mean something different than is being inferred.

FirstLSN when dealing with the differentials means that's the LSN which is needed in order start recovery with that backup (meaning that's where it's going to start the process, at that lsn).

LastLSN when dealing with the differentials means that's the LSN at which any subsequent restore operations (in this case, your log backups) need to start. If you wanted to apply any extra logs, for example, the log backup that can first be applied needs to hold the LastLSN value in it in order to be applied.

However, this definition changes when dealing with the log backups which would be your original thought on the matter.

Edit: Just found this, which may help.

Edit: Updated question so updating answer.

So if I am going to recover the database using the second differential backup, wouldn't it stand to reason that the second differential backup's FirstLSN would start where the full backup left off (22000000038800001)?

It's a differential, so it only holds what little log it needs to be transactionally consistent. It wouldn't have to be that, as that's not the firstLSN where it is starting recovery. This differential could be from a full backup base taken 1.5 years ago (please, please don't let it go that long between fulls) and would not have any LSNs in most likely the last 1.5 years in it. It'll have all of the changed extents, but no log transactions (again, it'll have a tiny amount to be consistent). It's just telling you that it's going to start it's recovery process at that LSN.

Restoring the second differential would apply all the transactions that occurred after the full backup was created wouldn't it?

It would restore ALL of the extents that were changed since the last full backup. These may or may not reflect actual changes. For example, a large insert statement that put 1000 rows into an empty table that was subsequently rolled back. The extents allocated to that table, while nothing would be in there since it was rolled back, will still be apart of the differential as they were changed. All the differential does (well, not all but mostly) is look a the DCM and backup those extents, figure out how much log it needs and presto you have a differential backup. So it won't apply the transactions (as in replaying them) but it will hold whatever their consequences are.

I'm not understanding how the system accounts for the LSN gap between the full backup and the second differential.

It's not really a gap, it holds everything that has changed since that time (not as transactions but as what extents were changed) and then a little log. The "gap" is the changed extents part and what "makes up for it" is the little bit of log in the differential backup to make it consistent.

But why do full and differential backups need to store the LSN values?

They need to know information about themselves as well. For example, assume that a full backup was lost but you have a differential based on the newer full backup and you have an older full backup. You could restore the original old backup, logs, then the differential when you finally got to where it needs to be and continue on... but how would you know where that needed to be if you didn't store the values somewhere? How would we as admins know what order items came in if we didn't have these values?