Sql-server – Automation of backups of Large Number of Database

restoresql server

In a multi-tenant web application where a database is set up for each tenant, the database can be backed up and restored manually through a fairly conventional approach.

I am considering this type of architecture for an application that could have 5,000-10,000 tenants, and running this type of process for this many tenant databases will be cumbersome.

If I was to implement such a process for a large number of tenant databases:

  • Is there likely to be an excessive performance overhead for such a large number of distinct backups?

  • How would such a large number of backup processes be automated?

  • How could one keep track of such a large number of backups and ensure that restore operations got the correct backup and database to restore to?

  • How could one implement controls to monitor the backup status and integrity of such a large number of databases?

Best Answer

If the backups on those 5-10K databases are run serially, there shouldn't be a material performance difference than if you were running a backup on one giant database. You might get by with running a few backups simultaneously, if your databases aren't large and you have good I/O capacity, but I wouldn't count on that.

You will want to stay away from the "maintenance plan" type jobs for the because you need more control over what happens and those jobs aren't going to create the kind of logs you will want. Plus, plans have wierd ways of failing that don't always get noticed, particularly on the older SQL Server versions. Writing a procedure to backup all of the databases on a server is pretty easy. That procedure should keep a log table that describes what database was backed up, when it was backed up and what file it went to. File names should probably have some sort of timestamp in them, to make it easy to find the correct files. I'd ensure that there is a way to "shard" all of those backup files into different folders and filesystems; don't just dump them into one folder. You will need an automated way to either archive or delete 'old' backups. It would be good if that also went into that log table and flagged any particular backup file as 'available', 'archived', 'deleted', etc. And, of course, you need enough storage to hold as many backups of each database as you are willing to keep.

Automated restore is tricker. Partly because you can quickly wipe out the wrong database, partly because you need a way to kick database users out in order to start the restore.

Backups can be read via RESTORE HEADERONLY and RESTORE FILELISTONLY to validate that are about to restore what you think you are. I would try to build that information into the name of the file because it is much easier to look at a filename than fiddle with RESTORE commands. You can write a couple of quickie CLR commands to do directory listings and such, I'm no C# genius so I found a couple of examples on the web when I had to do that. Just pick a good format for a file name and then stick with it. Something like SERVERNAME-INSTANCENAME-DATABASENAME-FULL-2012.04.18-09.24.00.bak. That way, it is easy to see where the backup came from and when it was taken. Make sure that your restoration scheme can handle restoring a database to a different server and/or under a different database name. A common problem when restoring to the same server under a different database name is a collision with the file names; you need to use new files.

All of that assumes that the databases are running in SIMPLE recovery mode. If the databases are not running in SIMPLE mode, your problems multiply. You will need more space to hold backups since you will need the tlog backups as well as the full backups. Running transaction log backups could be real problems because a single job to back them all might not run in a acceptable window. (If you gurantee a point-in-time recovery to 15 minutes, that won't help if the job needs 30 minutes to run.) If the tlog sequence gets broken, or you loose the full backup somehow, you need to be able to take backups of one or more of the databases, depending on what went wrong. It would be useful to pick databases and restore them onto a different instance, to make sure that everything works. DBAs aren't as good as their last backups, they are as good as their last restore.

Restoration code will be more complicated, particularly if you have the idea that tenants can do their own restores.

Also, in addition to backing up those databases, you will want to create similar processes to run DBCC CHECKDB and to do re-indexing. I would look at some of the existing code that is available on DBA blogs. You might even be able to find something that you can rework into a backup procedure.

Lastly, test everything like your business depends on it. 'Cause, it might. Make sure that you test situations where the backup fails (out of space for the backup file, maybe?) or that there are databases that are offline or set to limited access or are damaged somehow. When you run your tests, monitor the performance of the system, preferably when it has some load already on it.