If the backups on those 5-10K databases are run serially, there shouldn't be a material performance difference than if you were running a backup on one giant database. You might get by with running a few backups simultaneously, if your databases aren't large and you have good I/O capacity, but I wouldn't count on that.
You will want to stay away from the "maintenance plan" type jobs for the because you need more control over what happens and those jobs aren't going to create the kind of logs you will want. Plus, plans have wierd ways of failing that don't always get noticed, particularly on the older SQL Server versions. Writing a procedure to backup all of the databases on a server is pretty easy. That procedure should keep a log table that describes what database was backed up, when it was backed up and what file it went to. File names should probably have some sort of timestamp in them, to make it easy to find the correct files. I'd ensure that there is a way to "shard" all of those backup files into different folders and filesystems; don't just dump them into one folder. You will need an automated way to either archive or delete 'old' backups. It would be good if that also went into that log table and flagged any particular backup file as 'available', 'archived', 'deleted', etc. And, of course, you need enough storage to hold as many backups of each database as you are willing to keep.
Automated restore is tricker. Partly because you can quickly wipe out the wrong database, partly because you need a way to kick database users out in order to start the restore.
Backups can be read via RESTORE HEADERONLY and RESTORE FILELISTONLY to validate that are about to restore what you think you are. I would try to build that information into the name of the file because it is much easier to look at a filename than fiddle with RESTORE commands. You can write a couple of quickie CLR commands to do directory listings and such, I'm no C# genius so I found a couple of examples on the web when I had to do that. Just pick a good format for a file name and then stick with it.
Something like SERVERNAME-INSTANCENAME-DATABASENAME-FULL-2012.04.18-09.24.00.bak. That way, it is easy to see where the backup came from and when it was taken. Make sure that your restoration scheme can handle restoring a database to a different server and/or under a different database name. A common problem when restoring to the same server under a different database name is a collision with the file names; you need to use new files.
All of that assumes that the databases are running in SIMPLE recovery mode. If the databases are not running in SIMPLE mode, your problems multiply. You will need more space to hold backups since you will need the tlog backups as well as the full backups. Running transaction log backups could be real problems because a single job to back them all might not run in a acceptable window. (If you gurantee a point-in-time recovery to 15 minutes, that won't help if the job needs 30 minutes to run.) If the tlog sequence gets broken, or you loose the full backup somehow, you need to be able to take backups of one or more of the databases, depending on what went wrong. It would be useful to pick databases and restore them onto a different instance, to make sure that everything works. DBAs aren't as good as their last backups, they are as good as their last restore.
Restoration code will be more complicated, particularly if you have the idea that tenants can do their own restores.
Also, in addition to backing up those databases, you will want to create similar processes to run DBCC CHECKDB and to do re-indexing. I would look at some of the existing code that is available on DBA blogs. You might even be able to find something that you can rework into a backup procedure.
Lastly, test everything like your business depends on it. 'Cause, it might. Make sure that you test situations where the backup fails (out of space for the backup file, maybe?) or that there are databases that are offline or set to limited access or are damaged somehow. When you run your tests, monitor the performance of the system, preferably when it has some load already on it.
Best Answer
It doesn't really make sense that you're able to do the backup with the
--tab
option but not without it, and I think that should be investigated.Concerns about integrity are generally concerns with the backup, and you should be concerned about the backup, because
--single-transaction
issues aSTART TRANSACTION WITH CONSISTENT SNAPSHOT
before the backup, ensuring that what gets written to the file represents the state of all the tables at one consistent point in time.Without this, your "user" table, being alphabetically later than your "history" table could (for example) in the backup file contain a new user but not have caught history of that account's creation, since the backup of "history" finished before the backup of "user" was started.
You say you want to restore within a transaction, but that's not possible (by default) and probably not really what you intend at any rate.
It's not possible, by default, because of the way
mysqldump
works. Examine a backup fileBoth
CREATE TABLE
andDROP TABLE
cause an implicit commit. This isn't the only concern, because non-conflicting writes to the tables could still be done by other threads.There are two other ways to prevent changes on a MySQL server:
This attempts to close all tables and acquire a global lock that prevents any thread from writing. The problem here again in with the standard behavior of
mysqldump
, and what it writes to the dump file for execution during the restore:These individual table locks are not compatible with the global lock:
You can disable this behavior with
--skip-add-locks
but then you wouldn't be able to write data into the tables if you had the global read lock because it prevents you from writing as well as other threads... so that one seems to be a non-starter.Alternately, you could force the entire server to
read_only
mode.If you're concerned with other threads modifying the tables during the restore, but you don't want the entire server to be read-only for the duration of the operation, I only see one really solid option, and that would be to restore all of the tables with an alternate name, and then, in a single SQL statement, rename them all to their correct names.
When multiple tables are renamed in a single statement, the entire statement is atomic.
This acquires a global mutex that prevents t1 from being accessed until all of the renames are complete... but it would require manipulating your dump file to give the tables alternate names, thought it might also work if you restored to an alternate database name and renamed the tables in a fully-qualified form.
Another approach would be to drop and add all of the tables and lock all of them for write, before inserting the data, after removing any
LOCK
andUNLOCK
from the dump file(s). The one additional step you'd need to do, here, would be after acquiring the locks, you'd need to run through every table and delete any data that got inserted by other threads between the time you created the table and acquired the locks, or the restore would fail.SET FOREIGN_KEY_CHECKS = 0
is a session-level setting that would allow your session to freely remove any stray data without regard to foreign key constraints.The bottom line is that there is not a simple, straightforward way to keep all hands off of the data you are attempting to restore, and if faced with that requirement, I would be inclined to shut down the application server or otherwise prevent access to the server with a mechanism outside of
mysql
itself, such as IP-level filtering.