SQL Server – Limit to the Number of Databases on One Server

azure-vmscalabilitysql server

I'm setting up a SaaS system, where we're planning to give each customer their own database. The system is already set up so that we can easily scale out to additional servers if the load becomes too great; we're hoping to have thousands, or even tens of thousands of customers.

Questions

Is there any practical limitation on the number of micro-databases you can/should have on one SQL Server?
Can it affect performance of the server?
Is it better to have 10,000 databases of 100 MB each, or one database of 1 TB?

Additional information

When I say "micro-databases", I don't really mean "micro"; I just mean that we're aiming for thousands of customers, so each individual database would only be a thousandth or less of the total data storage. In reality, each database would be around the 100MB mark, depending on how much usage it gets.

The main reason to use 10,000 databases is for scalability. Fact is, V1 of the system has one database, and we have had some uncomfortable moments when the DB was straining under the load.

It was straining CPU, memory, I/O – all of the above. Even though we fixed those problems, they made us realize that at some point, even with the best indexing in the world, if we're as successful as we hope to be, we simply can't put all our data in one big honkin' database. So for V2 we're sharding, so we can split the load between multiple DB servers.

I've spent the last year developing this sharded solution. It's one license per server, but anyway that's taken care of since we're using VMs on Azure. Reason the question comes up now is because previously we were offering only to large institutions and setting up each one ourselves. Our next order of business is a self-service model where anyone with a browser can sign up and create their own database. Their databases will be much smaller and much more numerous than the large institutions.

We tried Azure SQL Database Elastic Pools. Performance was very disappointing, so we switched back to regular VMs.

Best Answer

I've worked on SQL Servers with 8 to 10 thousand databases on a single instance. It's not pretty.

Restarting the server can take as long as an hour or more. Think about the recovery process for 10,000 databases.

You cannot use SQL Server Management Studio to reliably locate a database in the Object Explorer.

Backups are a nightmare, since for backups to be worthwhile you need to have a workable disaster recovery solution in place. Hopefully your team is great at scripting everything.

You start doing things like naming databases with numbers, like M01022, and T9945. Trying to make sure you're working in the correct database, e.g. M001022 instead of M01022, can be maddening.

Allocating memory for that many databases can be excruciating; SQL Server ends up doing a lot of I/O, which can be a real drag on performance. Consider a system that records carbon use details across 4 tables for 10,000 companies. If you do that in one database, you only need 4 tables; if you do that in 10,000 databases, all of sudden you need 40,000 tables in memory. The overhead of dealing with that number of tables in memory is substantial. Any query you design that will be ran against those tables will require at least 10,000 plans in the plan cache if there are 10,000 databases in use.

The list above is just a small sampling of problems you'll need to plan for when operating at that kind of scale.

You'll probably run into things like the SQL Server Service taking a very long time to start up, which can cause Service Controller errors. You can increase the service startup time yourself, create the following registry entry:

Subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
Name:   ServicesPipeTimeout
Type:   REG_DWORD
Data:   The number of milliseconds before timeout occurs during service startup

For example, to wait 600 seconds (10 minutes) before the service times out, type 600000.

Since writing my answer I've realized the question is talking about Azure. Perhaps doing this on SQL Database is not so problematic; perhaps it is more problematic. Personally, I'd probably design a system using a single database, perhaps sharded vertically across multiple servers, but certainly not one-database-per-customer.

Related Solutions

Sql-server – standard language/interface for programmatic ETL in SQL Server

There is a tool that enables this - http://www.varigence.com/products/biml.html

There's a commerical version, but we also include some of the BIML functionality in BIDS Helper, a free tool. http://bidshelper.codeplex.com/

I'm happy to answer any questions that you might have about it.

This is a tool that my company provides.

Sql-server – Handling growing number of Tenants in Multi-tenant Database Architecture

At the lower end (500 tenants / 10000 users) this is how I did it. First, you have a "control" database that is global, central and contains all of the information about tenants and users (I really don't think you want to manage these as SQL auth logins). So imagine a database called "Control" with the following tables:

CREATE TABLE dbo.Instances
(
  InstanceID INT PRIMARY KEY,
  Connection VARCHAR(255)
  --, ...
);

INSERT dbo.Instances SELECT 1, 'PROD1\Instance1';
INSERT dbo.Instances SELECT 1, 'PROD2\Instance1';
-- ...

CREATE TABLE dbo.Tenants
(
  TenantID INT PRIMARY KEY,
  Name NVARCHAR(255) NOT NULL UNIQUE,
  InstanceID INT -- Foreign key tells which instance this tenant's DB is on
  --, ...
);

INSERT dbo.Tenants SELECT 1, 'MyTenant', 1;
-- ...

CREATE TABLE dbo.Users
(
  UserID INT PRIMARY KEY,
  Username VARCHAR(320) NOT NULL UNIQUE,
  PasswordHash VARBINARY(64), -- because you never store plain text, right?
  TenantID INT -- foreign key
  --, ...
);

INSERT dbo.Users SELECT 1, 'foo@bar.com', 0x43..., 1;

In our case when we added a new tenant we would build the database dynamically, but not when the admin user clicked OK in the UI... we had a background job that pulled new databases off a queue every 5 minutes, set model to single_user, and then created each new database serially. We did this to (a) prevent the admin user from waiting for database creation and (b) to avoid two admin users trying to create a database at the same time or otherwise getting denied the ability to lock model (required when creating a new database).

Databases were created with the name scheme Tenant000000xx where xx represented Tenants.TenantID. This made maintenance jobs quite easy, instead of having all kinds of databases named BurgerKing, McDonalds, KFC etc. Not that we were in fast food, just using that as an example.

The reason we didn't pre-allocate thousands of databases as the comment suggested is that our admin users usually had some idea of how big the tenant would become, whether they were high priority, etc. So they had basic choices in the UI that would dictate their initial size and autogrowth settings, which disk subsystem their data/log files would go to, their recovery settings, backup schedule to hinge off of, and even smarts about which instance to deploy the database to in order to best balance usage (though our admins could override this). Once the database is created, the tenant table was updated with the chosen instance, an admin user was created for the tenant, and our admins were e-mailed the credentials to pass along to the new tenant.

If you're using a single point of entry, it is not feasible to allow multiple tenants to have users with the same username. We opted to use e-mail address, which - if all users work for the company and use their corporate e-mail address - should be fine. Though our solution eventually became more complex for two reasons:

We had consultants that worked for more than one of our clients, and needed access to multiple
We had tenants who themselves were actually comprised of multiple tenants

So, we ended up with a TenantUsers table that allowed one user to be associated with multiple tenants.

Initially when a user logs in, the app will know the connection string for the control database only. When a login is successful, it can then build a connection string based on the information it found. E.g.

SELECT i.Connection
  FROM dbo.Instances AS i
  INNER JOIN dbo.Tenants AS t
  ON i.InstanceID = t.InstanceID
  INNER JOIN dbo.TenantUsers AS u
  ON i.TenantID = u.TenantID
  WHERE u.UserID = @UserID;

Now the app could connect to the user's database (each user had a default tenant) or the user could select from any of the tenants they could access. The app would then simply retrieve the new connection string, and redirect to the home page for that tenant.

If you get into this 10MM user area you propose, you'll definitely need this to be balanced better. You may want to federate the application so that they have different points of entry connecting to different control databases. If you give each tenant a subdomain (e.g. TenantName.YourApplicationDomain.com) then you can do this behind the scenes with DNS/routing without interrupting them when you need to scale out further.

There is a lot more to this - like @Darin I am only scratching the surface here. Let me know if you need a non-free consult. :-)