At the lower end (500 tenants / 10000 users) this is how I did it. First, you have a "control" database that is global, central and contains all of the information about tenants and users (I really don't think you want to manage these as SQL auth logins). So imagine a database called "Control" with the following tables:
CREATE TABLE dbo.Instances
(
InstanceID INT PRIMARY KEY,
Connection VARCHAR(255)
--, ...
);
INSERT dbo.Instances SELECT 1, 'PROD1\Instance1';
INSERT dbo.Instances SELECT 1, 'PROD2\Instance1';
-- ...
CREATE TABLE dbo.Tenants
(
TenantID INT PRIMARY KEY,
Name NVARCHAR(255) NOT NULL UNIQUE,
InstanceID INT -- Foreign key tells which instance this tenant's DB is on
--, ...
);
INSERT dbo.Tenants SELECT 1, 'MyTenant', 1;
-- ...
CREATE TABLE dbo.Users
(
UserID INT PRIMARY KEY,
Username VARCHAR(320) NOT NULL UNIQUE,
PasswordHash VARBINARY(64), -- because you never store plain text, right?
TenantID INT -- foreign key
--, ...
);
INSERT dbo.Users SELECT 1, 'foo@bar.com', 0x43..., 1;
In our case when we added a new tenant we would build the database dynamically, but not when the admin user clicked OK in the UI... we had a background job that pulled new databases off a queue every 5 minutes, set model to single_user, and then created each new database serially. We did this to (a) prevent the admin user from waiting for database creation and (b) to avoid two admin users trying to create a database at the same time or otherwise getting denied the ability to lock model (required when creating a new database).
Databases were created with the name scheme Tenant000000xx
where xx
represented Tenants.TenantID
. This made maintenance jobs quite easy, instead of having all kinds of databases named BurgerKing
, McDonalds
, KFC
etc. Not that we were in fast food, just using that as an example.
The reason we didn't pre-allocate thousands of databases as the comment suggested is that our admin users usually had some idea of how big the tenant would become, whether they were high priority, etc. So they had basic choices in the UI that would dictate their initial size and autogrowth settings, which disk subsystem their data/log files would go to, their recovery settings, backup schedule to hinge off of, and even smarts about which instance to deploy the database to in order to best balance usage (though our admins could override this). Once the database is created, the tenant table was updated with the chosen instance, an admin user was created for the tenant, and our admins were e-mailed the credentials to pass along to the new tenant.
If you're using a single point of entry, it is not feasible to allow multiple tenants to have users with the same username. We opted to use e-mail address, which - if all users work for the company and use their corporate e-mail address - should be fine. Though our solution eventually became more complex for two reasons:
- We had consultants that worked for more than one of our clients, and needed access to multiple
- We had tenants who themselves were actually comprised of multiple tenants
So, we ended up with a TenantUsers
table that allowed one user to be associated with multiple tenants.
Initially when a user logs in, the app will know the connection string for the control database only. When a login is successful, it can then build a connection string based on the information it found. E.g.
SELECT i.Connection
FROM dbo.Instances AS i
INNER JOIN dbo.Tenants AS t
ON i.InstanceID = t.InstanceID
INNER JOIN dbo.TenantUsers AS u
ON i.TenantID = u.TenantID
WHERE u.UserID = @UserID;
Now the app could connect to the user's database (each user had a default tenant) or the user could select from any of the tenants they could access. The app would then simply retrieve the new connection string, and redirect to the home page for that tenant.
If you get into this 10MM user area you propose, you'll definitely need this to be balanced better. You may want to federate the application so that they have different points of entry connecting to different control databases. If you give each tenant a subdomain (e.g. TenantName.YourApplicationDomain.com) then you can do this behind the scenes with DNS/routing without interrupting them when you need to scale out further.
There is a lot more to this - like @Darin I am only scratching the surface here. Let me know if you need a non-free consult. :-)
Best Answer
If the backups on those 5-10K databases are run serially, there shouldn't be a material performance difference than if you were running a backup on one giant database. You might get by with running a few backups simultaneously, if your databases aren't large and you have good I/O capacity, but I wouldn't count on that.
You will want to stay away from the "maintenance plan" type jobs for the because you need more control over what happens and those jobs aren't going to create the kind of logs you will want. Plus, plans have wierd ways of failing that don't always get noticed, particularly on the older SQL Server versions. Writing a procedure to backup all of the databases on a server is pretty easy. That procedure should keep a log table that describes what database was backed up, when it was backed up and what file it went to. File names should probably have some sort of timestamp in them, to make it easy to find the correct files. I'd ensure that there is a way to "shard" all of those backup files into different folders and filesystems; don't just dump them into one folder. You will need an automated way to either archive or delete 'old' backups. It would be good if that also went into that log table and flagged any particular backup file as 'available', 'archived', 'deleted', etc. And, of course, you need enough storage to hold as many backups of each database as you are willing to keep.
Automated restore is tricker. Partly because you can quickly wipe out the wrong database, partly because you need a way to kick database users out in order to start the restore.
Backups can be read via RESTORE HEADERONLY and RESTORE FILELISTONLY to validate that are about to restore what you think you are. I would try to build that information into the name of the file because it is much easier to look at a filename than fiddle with RESTORE commands. You can write a couple of quickie CLR commands to do directory listings and such, I'm no C# genius so I found a couple of examples on the web when I had to do that. Just pick a good format for a file name and then stick with it. Something like SERVERNAME-INSTANCENAME-DATABASENAME-FULL-2012.04.18-09.24.00.bak. That way, it is easy to see where the backup came from and when it was taken. Make sure that your restoration scheme can handle restoring a database to a different server and/or under a different database name. A common problem when restoring to the same server under a different database name is a collision with the file names; you need to use new files.
All of that assumes that the databases are running in SIMPLE recovery mode. If the databases are not running in SIMPLE mode, your problems multiply. You will need more space to hold backups since you will need the tlog backups as well as the full backups. Running transaction log backups could be real problems because a single job to back them all might not run in a acceptable window. (If you gurantee a point-in-time recovery to 15 minutes, that won't help if the job needs 30 minutes to run.) If the tlog sequence gets broken, or you loose the full backup somehow, you need to be able to take backups of one or more of the databases, depending on what went wrong. It would be useful to pick databases and restore them onto a different instance, to make sure that everything works. DBAs aren't as good as their last backups, they are as good as their last restore.
Restoration code will be more complicated, particularly if you have the idea that tenants can do their own restores.
Also, in addition to backing up those databases, you will want to create similar processes to run DBCC CHECKDB and to do re-indexing. I would look at some of the existing code that is available on DBA blogs. You might even be able to find something that you can rework into a backup procedure.
Lastly, test everything like your business depends on it. 'Cause, it might. Make sure that you test situations where the backup fails (out of space for the backup file, maybe?) or that there are databases that are offline or set to limited access or are damaged somehow. When you run your tests, monitor the performance of the system, preferably when it has some load already on it.