Sql-server – Skipping VM backups for high workload SQL Server

backupsql servervirtualisation

For those who have virtualized SQL Servers with very high volume transaction workload (very high rate of data change), I am curious how you handle VM backups, or if you even do them.

Of course, I manage a separate chain of transactionally-consistent native database backups. But the standard policy at the virtualization level has always been to backup all VMs. This is currently using a typical 3rd party VM backup solution (Initiate VM snapshot, copy snapshot data to backup appliance, release snapshot). However, these servers have an extremely high rate of data change, resulting in the change files growing large very quickly. And the system is reaching a point where it is choking on the amount of data that needs merged during the snapshot release.

I am toying with the idea of skipping the VM backups altogether. I don't permit any other apps to be installed on SQL Servers, and the base OS and SQL configuration is pretty vanilla. So it wouldn't take much in a failure scenario to just deploy a new VM from template, install SQL bits on it, and start restoring database backups.

Thoughts? Alternatives? Pitfalls I'm missing? Thanks for your input.

Best Answer

One option is to set the disks that contain the data and log files of the user databases to independent disks, so the VM snapshot backups will not include those disks. As long as your database backup strategy is sound, you don't need to include the data and log files in the VM backup. This may not be possible, of course, if you have applications and database files installed on the same disk.

One helpful thing here is to have your master, model, and msdb on one of the disks that is included in the VM backup, so that when the VM backup is restored you won't have to restore the system databases, which is a bit of a pain. I scripted a process to move system databases--can be found at Script to move SQL Server System Database Files.

If this approach is used, restoring the server will take some work as you'll have to restore the VM backup, then create the disks and directories for the database files, then restore databases, etc.

A side benefit of this approach is that if you have really large volumes (multi-terrabyte), a VM snapshot of all of the disks will typically pause the server for several seconds, so this method will effectively eliminate that "stunning" effect. If you are using clusters/availability groups, the pause can cause availability groups to go offline for a bit.

One reason why you don't want to stop backing up the VM is that if you build a new server or restore the OS from an old backup, it will require a lot of time to get all of the Windows Updates installed, and this could amount to a significant amount of downtime. If you have a VM template (or whatever deployment method that is in use) that is kept updated this of course wouldn't be an issue.