At the lower end (500 tenants / 10000 users) this is how I did it. First, you have a "control" database that is global, central and contains all of the information about tenants and users (I really don't think you want to manage these as SQL auth logins). So imagine a database called "Control" with the following tables:
CREATE TABLE dbo.Instances
(
InstanceID INT PRIMARY KEY,
Connection VARCHAR(255)
--, ...
);
INSERT dbo.Instances SELECT 1, 'PROD1\Instance1';
INSERT dbo.Instances SELECT 1, 'PROD2\Instance1';
-- ...
CREATE TABLE dbo.Tenants
(
TenantID INT PRIMARY KEY,
Name NVARCHAR(255) NOT NULL UNIQUE,
InstanceID INT -- Foreign key tells which instance this tenant's DB is on
--, ...
);
INSERT dbo.Tenants SELECT 1, 'MyTenant', 1;
-- ...
CREATE TABLE dbo.Users
(
UserID INT PRIMARY KEY,
Username VARCHAR(320) NOT NULL UNIQUE,
PasswordHash VARBINARY(64), -- because you never store plain text, right?
TenantID INT -- foreign key
--, ...
);
INSERT dbo.Users SELECT 1, 'foo@bar.com', 0x43..., 1;
In our case when we added a new tenant we would build the database dynamically, but not when the admin user clicked OK in the UI... we had a background job that pulled new databases off a queue every 5 minutes, set model to single_user, and then created each new database serially. We did this to (a) prevent the admin user from waiting for database creation and (b) to avoid two admin users trying to create a database at the same time or otherwise getting denied the ability to lock model (required when creating a new database).
Databases were created with the name scheme Tenant000000xx
where xx
represented Tenants.TenantID
. This made maintenance jobs quite easy, instead of having all kinds of databases named BurgerKing
, McDonalds
, KFC
etc. Not that we were in fast food, just using that as an example.
The reason we didn't pre-allocate thousands of databases as the comment suggested is that our admin users usually had some idea of how big the tenant would become, whether they were high priority, etc. So they had basic choices in the UI that would dictate their initial size and autogrowth settings, which disk subsystem their data/log files would go to, their recovery settings, backup schedule to hinge off of, and even smarts about which instance to deploy the database to in order to best balance usage (though our admins could override this). Once the database is created, the tenant table was updated with the chosen instance, an admin user was created for the tenant, and our admins were e-mailed the credentials to pass along to the new tenant.
If you're using a single point of entry, it is not feasible to allow multiple tenants to have users with the same username. We opted to use e-mail address, which - if all users work for the company and use their corporate e-mail address - should be fine. Though our solution eventually became more complex for two reasons:
- We had consultants that worked for more than one of our clients, and needed access to multiple
- We had tenants who themselves were actually comprised of multiple tenants
So, we ended up with a TenantUsers
table that allowed one user to be associated with multiple tenants.
Initially when a user logs in, the app will know the connection string for the control database only. When a login is successful, it can then build a connection string based on the information it found. E.g.
SELECT i.Connection
FROM dbo.Instances AS i
INNER JOIN dbo.Tenants AS t
ON i.InstanceID = t.InstanceID
INNER JOIN dbo.TenantUsers AS u
ON i.TenantID = u.TenantID
WHERE u.UserID = @UserID;
Now the app could connect to the user's database (each user had a default tenant) or the user could select from any of the tenants they could access. The app would then simply retrieve the new connection string, and redirect to the home page for that tenant.
If you get into this 10MM user area you propose, you'll definitely need this to be balanced better. You may want to federate the application so that they have different points of entry connecting to different control databases. If you give each tenant a subdomain (e.g. TenantName.YourApplicationDomain.com) then you can do this behind the scenes with DNS/routing without interrupting them when you need to scale out further.
There is a lot more to this - like @Darin I am only scratching the surface here. Let me know if you need a non-free consult. :-)
You seem to use the word "database" when you mean "table." One database for all data is best. Most modern databases (PostgreSQL, MySQL, Oracle, DB2, SQL Server, etc.) can handle a large amount of data in one database. It sounds like you might want to access all customer and/or all invoice information together, so keep them together in one database.
Data should be split into separate tables to normalize it as needed. Fully normalized data (each piece of information stored only once) is recommended by every database textbook I have read. Read about (or at least Google) "database normalization" to understand that concept well.
One normalized approach would be to have one customer table indexed by a customer id with all fields common to every customer. The customer table would also have a field identifying the customer type: customer1, 2, ... 10. A separate table would hold customer-1-type fields, indexed by customer id. Another table would have customer-2-type fields, etc. The application using the database would be designed to look for data in the customer-1 table if the customer-type is customer1, look for data in customer-2 table if customer-type is customer2, etc.
Presumably, the relationship between customer and invoice is one to many. A invoice table could have a foreign key linking to the customer table. If invoice data varied by customer-type, then that data could be placed in separate tables: invoice-customer-type-1, invoice-customer-2, etc. When gathering invoice data, find the customer id for the invoice from the invoice table, look up the customer-type from the customer table, then get data from the appropriate invoice-customer-# table.
Also, the tables in the database should be structured how the data will be used. For data entry and updating, a normalized structure is good. For printing bills or other reporting, you might want to create temporary de-normalized tables to speed the process. Creation of a temporary table can be faster because you avoid making multiple table joins every time a bill is printed.
Get a textbook and study database design, but in the meantime, I hope this helps.
Best Answer
You should store all information that is common for all companies in one database and copy the data that are needed to the database that needs the information. Adding a new customer would be done in the common db. After that the information will be copied to company1. If it's needed in company2, then the data will be copied to company2. Make sure you also add an update process to keep the copied information in sync. Alternatively, you could keep the customer information in the main db and link to it. However, this would only be recommended if all db's are on the same SQL Server instance.