Should i use multiple databases

database-design

I have a situation, in which i need to design a database for some bakeries. I will need to design the same thing for many bakeries. For each bakery i will have for example, an Employee table, and a Product table (this will in reality much more complicated). The employees and the products do have relationships between them, but entities from one bakery never have any relationship with entities from another bakery. I was thinking, instead of making a database that has a Bakery entity as well, can't i just make several databases that only have Employees and Products in it, and then just pull data from the database i need based on which bakery is requesting it. Since every bakery will have the exact same tables, and the exact same attributes, and bakeries will never interact between each other, i was thinking this might be a good idea.

This is obviously a test example that i am working on for a web app class project, so I'm pretty new to this, and I'm not sure if something like this exist.

  • Do people do this (splitting up your data in multiple databases in
    situations like this)?
  • Is is too slow?
  • Will just making a Bakery entity and then indexing it be better, even
    though my database diagram will get MUCH more complicated that way?

Best Answer

In the real world, a lot will depend on how many customers you have, how paranoid those customers are/ how valuable the data is, and how skewed your customers are (i.e. Mom & Pop with 10 items and 100 sales a day vs Hostess Inc with millions of sales a day).

Separate databases create a variety of administrative problems. Every time you want to do an upgrade, for example, you have to ensure that the same script gets run in every database. If you support 10,000 bakeries, that's challenging. Particularly when the script works perfectly on 9,900 of them and generates slightly different errors in 100 databases because they happen to have a data condition that you didn't anticipate or because different background jobs were running at the time and now you have to manually look at 100 error logs to see how to fix each database (meanwhile your middle tier code got pushed successfully and assumes that the new objects are present which isn't correct for 100 customers). You can, of course, mitigate these problems by having fully automated deploys and fully tested rollback scripts and rolling back the 9900 databases where the upgrade succeeded because the 100 databases failed. But that requires a pretty sophisticated deployment pipeline that has, for example, all 10,000 databases available in lower environments to test deployment scripts against and that everyone scrupulously verifies their rollback scripts (which would be rare).

If you have relatively small numbers of clients that are paying relatively large amounts to use your service to handle their very sensitive information, it may be perfectly reasonable to do this. If you have 100 databases because you have 100 hospital chains as customers putting patient information in your system, this level effort is probably very reasonable. It's probably not reasonable if you're trying to provide a low-cost way for lots of small bakeries to manage a relatively small number of employees and products.

If you have a mix of very large and very small customers such that different customers will likely want the same query to use different plans or such that certain large customers want to be able to partition out their workload (i.e. Hostess wants to know that there are 4 app servers and a database server that is theirs alone and that when you add 4 cores to the database server that they're going to get all of that and you want to charge Hostess for the Enterprise Edition of your application to give them the privilege of using dedicated hardware), it may make sense to put your handful of giant customers in their own database and group all the small fry into their own shared database.

Practically, that means that you'd probably design it with a bakery entity in the knowledge that some day in the future you might actually choose to deploy to multiple databases some of which would only have 1 row in the table. Perhaps initially you put everything in one database, then move to a couple geographically separate databases (i.e. all the Aussie customers go in the database that you run out of the Aussie data center), then move the really large customers to their own database where the bakery entity only has a single row. Then when Hostess comes to you in a few years and wants to separate Hostess Europe from Hostess America with separate sets of employees and products, you can set those up as separate bakeries in the same database.