What should be architecture of multi client web application

Architecturedatabase-design

My question is not related to a specific technology rather it's about design methodology that should be chosen. My company is about to create a web application whose users will be corporate companies. The application will store data of all the signed up/registered companies including financial data. We are unsure what should be the architecture of the application specifically from database point of view.

I can think of two approaches:

Single database for all the companies with same codebase. The company is identified using company table.
Separate Database for each company with same codebase or may be different codebase for each company/client.

Both of the approaches have their pros and cos. If single database is used then data size will increase very fast and performance issue can be encountered due to load by multiple company users hitting the same database. The benefit of this approach is that schema is very easy to manage schema. The separate database on the other hand has benefit of reduced performance overhead, privacy of company data. But the hard part is that a single change in schema needs to be replicated in all the company databases.

These are the things we are aware of. As part of our policy, we would not sell the code to the company/client. Only license will be bought by them.

We are only interested in relational databases and would not be going for NoSQL. Plus the number of client/companies will not be limited and can grow to any number.

What is the better way to design database architecture of this scenario. I know there are thousands of applications that might have faced this scenario before development, but this is my first time 🙂

So would really like your input on what is the better approach for that.

Thank you very much!

Best Answer

Let me start with: you miss 3: Mutliple databases each containing data for a number of clients - allows both scale out (important) as well as not having possibly hundreds of thousands of databases.

But even then, you really have some bad logic - basically it is a lot less a problem than you think:

If single database is used then data size will increase very fast and performance issue can be encountered due to load by multiple company users hitting the same database.

Ah, yes, and if you have a dozen databases and your server is too slow the exact same happens. Also data size increase is identical unless you have a lot of shared data that you only need to load once.

The benefit of this approach is that schema is very easy to manage schema.

No. You deal with a product, so schema management should be fully automated. After all, you will likely have to manage schemata of a dozen or two database copies ANYWAY - development, a separate schema possibly per developer, some for testing, then quality control. Unless you plan everything to happen manually db changes will use change scripts and be automated.

And once automated - managing 1000 copies of a database is not really much more a problem, or?

The separate database on the other hand has benefit of reduced performance overhead, privacy of company data.

No. I mean, yes, it has less performance overhead. But this is cent wise and irrelevant - I doubt the overhead is more than a couple of percent. Privacy CAN be important, it makes a lot of scenarios a lot easier to separate customers. But only if you have separate web servers for every customer (or a good maintenance window) - otherwise you have to update all databases at exactly the same moment as code. Have fun with that.

Privacy can be important. It also allows easy import and export of data. As well as easily transporting data to a separate instance in case of a bug so people can look at it.

It also will likely use a lot more resources on the web server and database server as connection management will use a lot more connections because they are not reusable. Unless you program partially around it.

But the hard part is that a single change in schema needs to be replicated in all the company databases

Only if your development process is really backward and 1990 or so. Agile and DevOps already demand this to be solved, so this is a long solved problem and you have it anyway. Not sure how your team works, but I am in a smallish team (around 6 developers) in a project at the moment. One production database. Total db copy count is around 20 - multiple test environments, every developer has a separate database for his work, quality control, hallway testing, all have their own copy of the database. As such, schema management is fully automated. I could not care less about having 1000 other copies to run the scripts on. It is a little more scripting (and the time and parallelism requirements are an issue) but the core problem is solved anyway. If you do not have this, I fear your product will have manually maintained databases and the result of that will be a horrifically botched development process.

Related Solutions

Separate Schemas by Industry or by Application functionality

I understand that you want to go with single database (as it is good from management & maintenance point of view), but maybe it's too much integration.

I am assuming that:

you will have separate application for each industry,
they have not very much in common (they really cover different business aspects)

I think that one of acceptable solutions in this case would be:

a separate database for each industry
in every database, single shared public schema for shared data, including common dictionaries, all non-customer-dependent data.
one separate database user account + private schema for each customer. You might use postgres feature of table inheritance to conform to some public table.

Issues that Simon puts in his comment are really important. So you will have to enforce a very strict policy regarding GRANTs in your database.

In PostgreSQL, it is possible to achieve (as well in many other RDBMS). The key to achieving multi-tenant solution would be intelligent usage of schemas, roles, search_path setting. See http://www.postgresql.org/docs/current/static/ddl-schemas.html. Actually what I propose will in a way emulate what Oracle does. If you need help on details please ask.

Sql-server – Handling growing number of Tenants in Multi-tenant Database Architecture

At the lower end (500 tenants / 10000 users) this is how I did it. First, you have a "control" database that is global, central and contains all of the information about tenants and users (I really don't think you want to manage these as SQL auth logins). So imagine a database called "Control" with the following tables:

CREATE TABLE dbo.Instances
(
  InstanceID INT PRIMARY KEY,
  Connection VARCHAR(255)
  --, ...
);

INSERT dbo.Instances SELECT 1, 'PROD1\Instance1';
INSERT dbo.Instances SELECT 1, 'PROD2\Instance1';
-- ...

CREATE TABLE dbo.Tenants
(
  TenantID INT PRIMARY KEY,
  Name NVARCHAR(255) NOT NULL UNIQUE,
  InstanceID INT -- Foreign key tells which instance this tenant's DB is on
  --, ...
);

INSERT dbo.Tenants SELECT 1, 'MyTenant', 1;
-- ...

CREATE TABLE dbo.Users
(
  UserID INT PRIMARY KEY,
  Username VARCHAR(320) NOT NULL UNIQUE,
  PasswordHash VARBINARY(64), -- because you never store plain text, right?
  TenantID INT -- foreign key
  --, ...
);

INSERT dbo.Users SELECT 1, 'foo@bar.com', 0x43..., 1;

In our case when we added a new tenant we would build the database dynamically, but not when the admin user clicked OK in the UI... we had a background job that pulled new databases off a queue every 5 minutes, set model to single_user, and then created each new database serially. We did this to (a) prevent the admin user from waiting for database creation and (b) to avoid two admin users trying to create a database at the same time or otherwise getting denied the ability to lock model (required when creating a new database).

Databases were created with the name scheme Tenant000000xx where xx represented Tenants.TenantID. This made maintenance jobs quite easy, instead of having all kinds of databases named BurgerKing, McDonalds, KFC etc. Not that we were in fast food, just using that as an example.

The reason we didn't pre-allocate thousands of databases as the comment suggested is that our admin users usually had some idea of how big the tenant would become, whether they were high priority, etc. So they had basic choices in the UI that would dictate their initial size and autogrowth settings, which disk subsystem their data/log files would go to, their recovery settings, backup schedule to hinge off of, and even smarts about which instance to deploy the database to in order to best balance usage (though our admins could override this). Once the database is created, the tenant table was updated with the chosen instance, an admin user was created for the tenant, and our admins were e-mailed the credentials to pass along to the new tenant.

If you're using a single point of entry, it is not feasible to allow multiple tenants to have users with the same username. We opted to use e-mail address, which - if all users work for the company and use their corporate e-mail address - should be fine. Though our solution eventually became more complex for two reasons:

We had consultants that worked for more than one of our clients, and needed access to multiple
We had tenants who themselves were actually comprised of multiple tenants

So, we ended up with a TenantUsers table that allowed one user to be associated with multiple tenants.

Initially when a user logs in, the app will know the connection string for the control database only. When a login is successful, it can then build a connection string based on the information it found. E.g.

SELECT i.Connection
  FROM dbo.Instances AS i
  INNER JOIN dbo.Tenants AS t
  ON i.InstanceID = t.InstanceID
  INNER JOIN dbo.TenantUsers AS u
  ON i.TenantID = u.TenantID
  WHERE u.UserID = @UserID;

Now the app could connect to the user's database (each user had a default tenant) or the user could select from any of the tenants they could access. The app would then simply retrieve the new connection string, and redirect to the home page for that tenant.

If you get into this 10MM user area you propose, you'll definitely need this to be balanced better. You may want to federate the application so that they have different points of entry connecting to different control databases. If you give each tenant a subdomain (e.g. TenantName.YourApplicationDomain.com) then you can do this behind the scenes with DNS/routing without interrupting them when you need to scale out further.

There is a lot more to this - like @Darin I am only scratching the surface here. Let me know if you need a non-free consult. :-)

Best Answer

Related Solutions

Separate Schemas by Industry or by Application functionality

Sql-server – Handling growing number of Tenants in Multi-tenant Database Architecture

Related Question