SQL Server Backup – How to Guard Against Backup Corruption Due to Spurious Bit Errors

backupcorruptionsql server

There's always a very small chance that a backup that was just written to disk is corrupted. Disks are not 100% reliable. Bits can be flipped with very low probability. I have seen that happen on multiple machines on otherwise well-functioning disks.

It is possible to detect such errors using backup checksums. But what do you do when an error is found? If a log backup is corrupted the log backup chain is broken. If it is a full backup depending differential backups are now unusable. A new full backup is needed in order to restart the usual backup schedule.

What can be done to avoid losing backups randomly due to spurious data corruption?

Best Answer

It is possible to detect such errors using backup checksums.

If you use the CHECKSUM option, then yes it should be found but only when using a RESTORE or RESTORE ... WITH VERIFYONLY. If you don't check it, you'll never know.

But what do you do when an error is found?

That depends on what is wrong and where it happened. The action to be taken first is to make sure your entire restore sequence is still valid and can be actioned. The second is to figure out what caused the corruption and potentially move to different storage, drivers, etc. In this case, chances aren't that great that it will happen very often if at all.

If a log backup is corrupted the log backup chain is broken.

This may or may not be true depending on your restore strategy. If you have a differential to bridge the gap, that's perfectly valid and you can still use the other log backups that occur after it. If you're extremely worried about single bit flip in a log backup, mirror the log backups to another storage solution.

If it is a full backup depending differential backups are now unusable. A new full backup is needed in order to restart the usual backup schedule.

You're correct on the first part and I don't know what you are considering the "usual" backup schedule. If you have the log backups, you can continue to roll forward through the bad full and covering differentials. You're definitely going to want more than a single recovery sequence.

What can be done to avoid losing backups randomly due to spurious data corruption?

Mirror your backups, for starters. Have more than a single recovery sequence. Check the backups and take action when something does go wrong so that you're not exposed to risk. Disks die, things go wrong. While it shouldn't be happening often, it will most likely happen to you at some point or another in your career. Having a good restore and backup strategy to take care of this is what will help.