Mysql – Managing 100s of similar databases

MySQLsharding

We are migrating to a model where every "customer" lives in its own dedicated schema (database per tenant). This will allow us to later move some accounts to different DB clusters or upgrade specific accounts to newer schemas, for example. However from the get-go we need to automate how databases are managed in bulk or the whole effort will end up in smoke.


Additional info: the split is dictated by regulatory reasons and to scale out.

The number of schemas would be about 1000 (today), hosted in ~10 MySQL instances, so roughly 100 schemas per DB instance. New accounts would grow the system at a rate of 100 schemas-per-DB-instance.

My goal is to apply DevOps practices to database change management. So the schema lives in a git repo. Next, we need to be able to selectively or in bulk upgrade specific accounts to bring them to certain "schema version" that's in git.

Problem is figuring out what changes to apply to what databases, sort of like a diff and make CLI tools but for the database schemas.

I was thinking of creating a file per one of two database objects: a table and a stored procedure. Fortunately these are the only objects I need to track. Each object in the database has a corresponding file in the git. So if I change a definition in the file, I can run a "db-diff" (a tool I'd create) to tell which objects in what databases are outdated.

To bring the DB schema in sync with the git version of the same object would work something like this:

  • for stored procedures that's easy: DROP/CREATE; I'd utilize the SP's routine_comment field in the information_schema.routines to put there its current git tag. If the version there is outdated, replace it with the SP in the file in git.

  • for table objects, it is not clear to me how to compare what's in git vs in DB. One major annoyance of MySQL is that the show create.. result doesn't show exactly the definition that you asked for in ALTER/CREATE. For example, if you have a bigint column, you end up with the silly bigint(20). To a diff tool that's a different schema than what we have in the git.

Is this something I'd need to write myself from scratch or anything already on the market that you recommend?

Best Answer

I have a client with that has 11000 databases with each database containing 90-100 databases each.

This causes several problems:

  1. It bloats the INFORMATION_SCHEMA badly. Memory takes a hit.

  2. If you ever have to backup a database, it takes 3+ hours for Xtrabackup to read the entire instance's table and column info before it actually starts copying data.

  3. Poor performance in one tenant database can hurt performance for the entire instance

  4. Maintenance for one tenant database can hurt availability for the entire instance

Please see my old post : One Big Database vs. Several Smaller Ones. It compares a multitenant database to a multitenant building. Ask any landlord of a multitenant building and you will get the same sob story.

BACK TO REALITY

Managing a building with three tenants is better that managing a complex with 100+ units (NYCHA).

In terms of databases, you should shard the tenants into several independent clusters so that each DB instance holds a decent amount of data without gorging on RAM. Just keep in mind that the more tenant database you have, the more memory will be consumed. (BUYER BEWARE)