SQL Server Backup – Why Differential Backup Can’t Specify Its Base

backupsql server

This is my first DBA.SE post, so please inform me of any mistakes, thanks!

I am a new DBA (not an IT pro, just no one else in the company to do it), so the more basic the explanation the better. I have been reading about database backup strategies (or, as I have learned to call them, "restore strategies"). I understand what Full, Differential, and Transaction Log backups do, but I want to know why a differential backup can only be based on the most recent full backup.

If a differential backup is everything that has changed since the last full backup, then why can't the differential be based off of any backup of my choosing? To be more clear, I'm asking about specifying the base when the backup is taken, not when restoring. I am assuming that when restoring you would choose the correct base and corresponding differential to perform the restore (not using a differential made from base B to restore from base A).

What is the reason that prevents this functionality from being possible? I figure that there must be a reason, I just don't know what it is.

Note: I understand that the base cannot be specified, but my question is why not? (I'm also not interested in discussion about "why would you?")

Analogy

Here's an analogy for how I understand a differential backup:

I have an Excel file with some data in cells.

On day 1, I make a copy of this file and store it somewhere else (the "full backup").

On day 2, I look at the file and compare it to the backup copy that I made on day 1, and I note all the cells that have changed and what their new values are (a "differential backup"). I am not noting every change made to a cell, only what its final value is. If cell A1 started as "Alfred", changed to "Betty", "Charlie", then "Dave", I would only note that "A1 is now Dave".

On day 3, I compare the current file with the backup file again and note the changes (another "differential backup" with the same base as day 2). Again, only noting final values per cell at the time observed, not all values that the cell has been throughout the day.

On day 4, I compare again and note changes again. Continuing with cell A1, now it says "Sarah", even if it was 10 other names throughout the day, and all I note is "Now A1 is Sarah".

On day 5, my file gets messed up; so, I look at the backup copy that I made on day 1, then the final states noted on day 4, and I apply the changes noted to the backup copy and now I have the file "restored" to how it was on day 4. So, I look at the backup made on day 1, see that on day 4 cell A1 ended as "Sarah", and change the backup cell A1 to be "Sarah".

Why would it matter if I had made another backup copy ("full") of the file on day 2? Why wouldn't it still be possible to compare (read, "take a differential backup of") the file on day 3 or 4 with the copy made on day 1? As I understand it, SQL Server would require me to compare (when taking another differential backup) to a full backup made on day 2 (if one had been made)- no other option.

Best Answer

A differential backup uses what is called the differential change map to build a list of pages that have been modified since the last full backup. This list is a "differential" list, hence the name of the backup type, and the reason the backup can only ever be restored over top of the associated full backup.

Performing a full backup resets the differential change map. From that point forward, any page that is modified is recorded in the map. If you then take a differential, that backup only contains pages that have been modified since the last full backup, and recorded in the map.

In your analogy, the two full backups, which serve as a base for the entire restore process would likely have different contents, and therefore different differential maps. If you restore a diff based on the first backup over the 2nd backup, the database would likely be corrupted. In fact, SQL Server prevents the restore of a diff backup over anything except the original full backup it is based upon.

When you ask SQL Server to take a differential backup, the only "base" for the differential is the single differential change map present in the database at the time the differential backup starts. This is why you cannot specify the base for the differential backup.


In response to a comment from @MartinSmith - you might be able to use COPY_ONLY backups to restore a differential backup over a number of full backups. Consider the following scenario:

  1. BACKUP DATABASE xyz TO DISK = 'path_to_backup.bak';
  2. BACKUP DATABASE xyz TO DISK = 'path_to_backup_2.bak' WITH COPY_ONLY;
  3. BACKUP DATABASE xyz TO DISK = 'path_to_backup_3.bak' WITH COPY_ONLY;
  4. BACKUP DATABASE xyz TO DISK = 'path_to_backup_4.bak' WITH COPY_ONLY;
  5. BACKUP DATABASE xyz TO DISK = 'path_to_backup_diff.bak' WITH DIFFERENTIAL;

The differential backup in step 5 should be capable of being restored over any of the backups taken in steps 1 to 4, since the differential change map is only cleared when the full backup in step 1 occurs. The COPY_ONLY backups in steps 2, 3, and 4, do not reset the change map. Since the differential change map accumulates changes made since the full backup, each of the successive COPY_ONLY backups contains enough information for the differential backup to work against any of the previous 4 backups.

Although it seems like it should work, in practice, restoring a differential over top of a copy_only backup results in the following error:

Msg 3136, Level 16, State 1, Line 1
This differential backup cannot be restored because the database has not been restored to the correct earlier state.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

I've created a SQL Server 2012 platform repro for testing differential and copy_only restores, and saved the file on gist.github.com - WARNING the script will drop any database named RestoreTest as its first step.