SQL Server – Multiple Databases for One Web Solution Design Pattern

design-patternmulti-tenantsql server

I am working on a new project and I am in the process of designing the Microsoft SQL Server database behind the system.

The system will contain an infinite amount of customers and each customer has multiple users. Each customer will have their own subset of data and the amount of data is going to be vast. Worst case scenario is approximately 5+ million rows per month across multiple tables per customer.

To optimize performance per customer I was considering a system containing one system database and multiple customer databases. My thought is mainly that I see no reason why one customer should query through other customer's data. The issue as I see it is that this solution would cause relationships between the customer databases and the system database (cross-database relationships).

Am I overthinking the performance issue?

Best Answer

Given that you haven't started coding the app or DB yet, but do have a sense of the scope of the data (even if that will change somewhat as time goes on): no, you are not over-thinking this. I wouldn't spend months pondering this one aspect of the system design, but to not think of it at all is a bit foolish since refactoring app code is far easier than refactoring databases (since databases include "state").

My preference for multi-tenant systems, especially if dealing with large amounts of data and/or great variation of data sizes between clients, is to go the Database-per-Customer route. There are obviously advantages and disadvantages to any approach, but here is what I see from having gone through this a few times and trying it both ways:

ADVANTAGES

  • One customer does not adversely affect another performance wise. This would mainly be an issue when having customers with greatly varied amounts of data since parameter sniffing will cache the plan for the first CustomerID passed into a Stored Procedure, and that CustomerID might have 10 million rows whereas the next call to the Stored Procedure is for a customer with 1,000 rows.
  • You don't need to have a CustomerID field in every table (except lookup tables). This complicates indexing and performance tuning.
  • You don't need to worry as much about cross-customer data privacy issues when forgetting to include the CustomerID field in all JOIN conditions and WHERE conditions (either Cartesian products or simply Customer A seeing Customer B's data).
  • When a customer leaves, you just shut off their DB and archive it. No need to worry about purging obsolete data.
  • If a customer needs their data, you can simply send them their DB without worrying about cross-customer data issues.
  • You can host customer DBs geographically closer to each particular customer, especially when using a cloud-based system like Azure SQL Database, AWS, etc.

DISADVANTAGES

  • App code connectivity needs to take into account the Database, not just Server
  • More potential for Connection Pool Fragmentation if you include the Database in the ConnectionString rather than changing the DB context after making the Connection
  • DB code releases get a little more complicated since they need to be applied across all DBs (though there are ways to minimize this complication)
  • Increased potential to have differing code/schema between customers if you don't have a solid release process (item directly above), and this would be a maintenance nightmare
  • Support gets minorly complicated since you now need to know which DB to go to instead of needing just a CustomerID
  • If the customer model changes such that in the future a "parent" company of Customer A and Customer B can be the true customer, then those are in two separate DBs and harder to manage and report on as a single entity.
  • Backups get a little more complicated now that you have 200+ databases (though there are ways to minimize this complication)

Regarding the thought of having a single SystemData DB and the related concern of not being able to FK across DBs: how much system lookup data do you expect to have? Assuming it is not 1 GB of data, you can create a Schema in each customer DB called "System" or "Common" or "Lookups" or whatever, and load the data into that schema into all customer DBs. Then you can FK to those tables. And by having one DB per customer you will already need a release process that updates the code and schema of each customer DB in the exact same manner anyway (such that they are always in sync with a common structure), so updating lookup tables fits right into that process. The main downside here is that you are duplicating this data per each customer DB, so 200 times perhaps? But that is over the course of years. You aren't starting out with 200 customers, right?


Also, please keep in mind that Table Partitioning is not "sharding" (like you might be thinking of in terms of MongoDB or PostgreSQL/GreenPlum, etc). It is not meant to magically solve the problem of multi-tenancy or distributed data. It is meant to handle the problem of quickly and efficiently handling bulk data loads and unloads. And it does quite well at its intended purpose. And occasionally it even helps queries on very large tables (1 billion rows or more). But indexes and statistics are still per-table, not per-partition (though you can create filtered indexes/statistics per each partition key to help make up for this). And while you can (in more recent versions) reindex per-partition, starting with SQL Server 2012, indexes no longer get a "free" UPDATE STATISTICS WITH FULL SCAN when doing a REBUILD, if they are partitioned ! Non-partitioned indexes still do :-).

For what it's worth, I worked on a multi-tenant system that did attempt to use table partitioning in this manner (was not my idea). It caused a lot of headaches, frustration, extra work trying to fix weird execution plans, and after a few years, was mostly agreed upon that it probably shouldn't have been done.