Mysql – Splitting large database into smaller ones

configurationdatabase-designMySQLstored-procedures

Our system currently stores all customer (merchant) accounts in one "flat" MySQL (5.6) DB namespace. We would like to scale out better – we are considering breaking our data up based on the merchant account ID. So instead of having:

use database single;
table sales (
  `account_id`
  ...
)

Break up merchants into separate namespaces:

use database <account_id>;
table sales (
  ... // stores data  for a single account
)

Above has several benefits:

  • Accounts can be placed on different servers and scale out much better. We can decide how many accounts to co-locate, and migrate as needed.
  • Individual account backup/restore times would be greatly reduced.
  • Ability to upgrade one account's database at a time (more granularity).
  • One bad user won't affect all accounts.
    etc.

Concern:
Today, all of our DB access is via stored procedures – both reads and mutations. We would like to keep it that way as it isolates the DB schema from the app code. If we were to break up the accounts into separate db-namespaces the dot-notation used inside of the stored procedures cannot be parameterized, as far as we can tell. We would have to duplicate the stored procedures (~150 count) for each of the account-namespaces. Consequently if we need to make a change to one SP, we need to apply it to all of the namespaces' stored procedures. In fact, any DB schema change needs to be applied (via automation) to all the DB namespaces. The administrative overhead is slightly scary, even after automating the deployment of changes.

Any alternative/better solutions for scaling out?

Best Answer

If we were to break up the accounts into separate db-namespaces the dot-notation used inside of the stored procedures cannot be parameterized

We would have to duplicate the stored procedures (~150 count) for each of the account-namespaces.

That should not be an issue: when calling a procedure in database1 it will have that context so you won't have to change the procedures. You don't even need to change the procedures and tables at all: keep the account_id column in and have it always a fixed value for each account/database. That way you can switch back to multiple tenants per DB.

If you always develop and test against databases with multiple tenants this should be safe (if you dev & test against single-tenant DBs you make it easy for cross-tenant data revealing bugs to be missed).

If you decide later that you will never need/want multi-tenant databases, you can at that stage decide whether it is worth the effort and regression testing required to remove account_id from everything.

In fact, any DB schema change needs to be applied (via automation) to all the DB namespaces. The administrative overhead is slightly scary, even after automating the deployment of changes.

There are many tools to help out here, which is best (or if rolling your own is better) depends very much on the application, scale, and deployment frequency, amongst other factors.

Do you agree that it's not possible to have one common namespace (database) hosting all the stored procedures, then be able to invoke it as shared.sp_get_info() in the context of another (per-merchant-account) database?

If mySQL allows cross-DB references like that (I've not used it much recently, I'm an MS SQL Server guy mainly, which would allow such things (though not in Azure SQL)) then that woul be possible and would work, but I wouldn't recommend it.

Your common DB becomes a single point of failure for all the supporting DBs. It tightly couples them, making it more work to scale by simply shifting some databases onto another server/service as you can't just simply move the desired client DBs and update connections settings in the apps (you have to create a new common DB and make sure each has the right families for procedures). You still have to maintain all the different copies of the same procedures. You still have the problem of keeping things appropriately up-to-date but it opens up a new class of possible errors: if due to a build process problem or human error the procedures for a given client DB get updated before the schema in that DB or don't get updated when the DB does, you have problems some of which might be hard to diagnose. Also if you have to restore a client DB from an old backup, you have extra work to make sure that the procedures are at the right version.

It might feel like a simplification, but you are really adding complication. Keeping the same code in procedures in each DB is cleaner. You'll have potential versioning issues but the common DB won't really remove them, in fact it will add more such concerns.