Database Design – Multi-Tenant Data Architecture Performance for 10000 Tenants

database-design

We are trying to decide on a database design for a new web application. We expect to have close to 10,000 tenants and would like to keep their data in separate databases if it makes sense to do so. Each tenant will have a DB that is around 20MB they are tracking mostly the personal data of 50 to 100 youth and 100 to 300 adults plus events, attendance, awards etc. I do not know how many total users would log in at once but each tenant could have several.

My options as I understand them is to have:

A shared-nothing approach giving each tenant a separate database.
A schema approach giving each tenant separate tables.
shared-everything approach where each tenant has their own tenantID

I would ideally like the shared-nothing approach but I am unclear of how many databases you can have in SQL Server before you run into performance issues. Managing multiple databases is not a major concern for us but if SQL Server becomes unresponsive due to the number of DBs this is the kind of information I am seeking. Keep in mind the databases are small 20MB on average. I also need to easily restore backups for individual tenants. I have also read that you can have roughly 32,000 databases in SQL Server but other post suggest you shouldn't try to get anywhere close to this number. Even 1000 to 2000 databases can cause performance issues but it was unclear of the database sizes that caused the issues. If anyone has used thousands of small databases in a SQL Server environment please let me know.

Performance wise which approach would be the best route for us to take? Please also take into consideration of having multiple SQL Server Licenses would not be ideal cost wise. But if it is necessary please let me know.

I have also read several post here and also Multi-Tenant Data Architecture but I haven't seen a real clear winner with a large number of tenants but really small databases.

If anyone has experience with something similar please let me know how you handled this and if it is working out well. Any real world experience/advise would be greatly appreciated.

Thank You.

Best Answer

Design a shared-everything system. This can be deployed in a shared-nothing way i.e. put each tenant in their own database, should that prove desirable. The extra development effort to type the additional predicate is very small and easy to do up front. The reverse - where you re-factor a shared-nothing system to become multi-tenant - will be a significant project and difficult to get right retrospectively. You will need some scripts or ETL packages to load / unload one tenant from a database. This will help with size & load balancing, privacy concerns and such like.

The schema-based approach has all the disadvantages of single-database and none of the advantages of multi-tenant. You will end up with a lot of dynamic code, stored procedures will be difficult to get right and statement recompiles will be ubiquitous, increasing response times.

Invariably some of your tenants will be much larger than others. My experience is that you will end up with a handful of tenants representing 30%-70% of the total data volume. These will each have their own DB and, perhaps, infrastructure. They will consume 60% of your support and maintenance time. The others will all fit into one or two databases.

I've run development instances with roughly 1,000 databases online sharing about 8GB of memory. This worked well enough for dev. I'd never try this in production as the risk of problems snowballing would be too high.

If you're feeling brave you may like to experiment with auto-close on one or more database. This will save some resources on the instance. This is definitely not recommended. I mention it merely as an observation.

Related Solutions

Sql-server – Handling growing number of Tenants in Multi-tenant Database Architecture

At the lower end (500 tenants / 10000 users) this is how I did it. First, you have a "control" database that is global, central and contains all of the information about tenants and users (I really don't think you want to manage these as SQL auth logins). So imagine a database called "Control" with the following tables:

CREATE TABLE dbo.Instances
(
  InstanceID INT PRIMARY KEY,
  Connection VARCHAR(255)
  --, ...
);

INSERT dbo.Instances SELECT 1, 'PROD1\Instance1';
INSERT dbo.Instances SELECT 1, 'PROD2\Instance1';
-- ...

CREATE TABLE dbo.Tenants
(
  TenantID INT PRIMARY KEY,
  Name NVARCHAR(255) NOT NULL UNIQUE,
  InstanceID INT -- Foreign key tells which instance this tenant's DB is on
  --, ...
);

INSERT dbo.Tenants SELECT 1, 'MyTenant', 1;
-- ...

CREATE TABLE dbo.Users
(
  UserID INT PRIMARY KEY,
  Username VARCHAR(320) NOT NULL UNIQUE,
  PasswordHash VARBINARY(64), -- because you never store plain text, right?
  TenantID INT -- foreign key
  --, ...
);

INSERT dbo.Users SELECT 1, 'foo@bar.com', 0x43..., 1;

In our case when we added a new tenant we would build the database dynamically, but not when the admin user clicked OK in the UI... we had a background job that pulled new databases off a queue every 5 minutes, set model to single_user, and then created each new database serially. We did this to (a) prevent the admin user from waiting for database creation and (b) to avoid two admin users trying to create a database at the same time or otherwise getting denied the ability to lock model (required when creating a new database).

Databases were created with the name scheme Tenant000000xx where xx represented Tenants.TenantID. This made maintenance jobs quite easy, instead of having all kinds of databases named BurgerKing, McDonalds, KFC etc. Not that we were in fast food, just using that as an example.

The reason we didn't pre-allocate thousands of databases as the comment suggested is that our admin users usually had some idea of how big the tenant would become, whether they were high priority, etc. So they had basic choices in the UI that would dictate their initial size and autogrowth settings, which disk subsystem their data/log files would go to, their recovery settings, backup schedule to hinge off of, and even smarts about which instance to deploy the database to in order to best balance usage (though our admins could override this). Once the database is created, the tenant table was updated with the chosen instance, an admin user was created for the tenant, and our admins were e-mailed the credentials to pass along to the new tenant.

If you're using a single point of entry, it is not feasible to allow multiple tenants to have users with the same username. We opted to use e-mail address, which - if all users work for the company and use their corporate e-mail address - should be fine. Though our solution eventually became more complex for two reasons:

We had consultants that worked for more than one of our clients, and needed access to multiple
We had tenants who themselves were actually comprised of multiple tenants

So, we ended up with a TenantUsers table that allowed one user to be associated with multiple tenants.

Initially when a user logs in, the app will know the connection string for the control database only. When a login is successful, it can then build a connection string based on the information it found. E.g.

SELECT i.Connection
  FROM dbo.Instances AS i
  INNER JOIN dbo.Tenants AS t
  ON i.InstanceID = t.InstanceID
  INNER JOIN dbo.TenantUsers AS u
  ON i.TenantID = u.TenantID
  WHERE u.UserID = @UserID;

Now the app could connect to the user's database (each user had a default tenant) or the user could select from any of the tenants they could access. The app would then simply retrieve the new connection string, and redirect to the home page for that tenant.

If you get into this 10MM user area you propose, you'll definitely need this to be balanced better. You may want to federate the application so that they have different points of entry connecting to different control databases. If you give each tenant a subdomain (e.g. TenantName.YourApplicationDomain.com) then you can do this behind the scenes with DNS/routing without interrupting them when you need to scale out further.

There is a lot more to this - like @Darin I am only scratching the surface here. Let me know if you need a non-free consult. :-)

Sql-server – SQL Server Database Diagram in SQL Management Studio creating multiple schemas

Database Diagrams is one of three Visual database tools.

After reading trough Visual Database Tools F1 Help, where there is no mention of choosing schema before creating a new table, which confirms that it automatically creates the table in the dbo schema.

To overcome this and still use Database Diagrams open your Table Properties window.

Create new schema:

enter image description here

Name your schema, define permissions etc.

enter image description here

Change schema for your table while in design view using Properties window.

enter image description here

For me this approach is still faster that writing scripts, i didn't change to different designer tool and it works.

Hope it helps somebody.

Best Answer

Related Solutions

Sql-server – Handling growing number of Tenants in Multi-tenant Database Architecture

Sql-server – SQL Server Database Diagram in SQL Management Studio creating multiple schemas

Related Question