Mysql – Linking foreign keys across multiple databases: direct, or using an intermediary table

database-designforeign keymariadbmultiple-databaseMySQL

I want to make a part of my application reusable, and that warrants moving the corresponding tables into a separate database. So for the sake of an example, please consider the two imaginary databases in the list that follows. (More databases sharing the same logic may be added as the project grows.)

  • users containing tables related to user sign ups, login and e-mail history, password reset tokens etc., as well as the accounts themselves; and
  • blogs having tables for posts, media files, comments, etc.

Each table in the blogs database must obviously have an account_id column referring as a foreign key to users.accounts.id. (I do realise that to make it work both databases must use InnoDB and reside on the same server.)

My question is what would be a better practice:

  • direct reference to another database:
    • simply refer blogs.posts.account_id to users.accounts.id (repeat with all other blogs.* tables),
    • make each reference CASCADE ON DELETE; or
  • using an intermediary table:
    • create an intermediary table blogs.accounts having only one column called id; then
    • on one hand, refer every table inside the blogs database to that intermediary table (so blogs.posts.account_id to blogs.accounts.id, CASCADE ON DELETE); and
    • on the other hand, finish by referring this blogs.accounts.id to the 'upstream' users.accounts.id, make sure to CASCADE ON DELETE as well.

The latter seems like an unnecessary complication. But the only advantage I can think of is this can make the setup future proof in case we end up having to still migrate one (or some) of the databases to another server:

  • If we link the tables directly, after the migration the blogs database will have lots of disparate account_id columns that won't CASCADE ON DELETE
  • But if these intermediary tables get disconnected from the upstream users.accounts.id, their neighbouring tables in each respective database are still linked to them. This way we can continue benefitting from at least somewhat integrity and CASCADEs. In other words, if a user gets deleted, all we have to do is have a script go through each of these *.accounts connector tables and delete the id counterpart once, and CASCADE will take care of the rest of the tables inside of that database automatically.

Am I on the right track with this logic, or am I missing some other ways to handle this more effectively, and therefore reinventing the wheel?

Best Answer

This is bordering on an opinion-based question, because of course anyone can have their own justification for either choice.

I would not choose to add the intermediary table, for a few reasons.

One is that as you know, the intermediary table cannot have a foreign key reference to a table on another MySQL instance. Foreign keys can span schemas, but not instances.

If you need to run special scripts to clean up the intermediary table anyway, that special script should be able to encode clean up tasks for all the blog tables. I assume at least the number of blog tables remains relatively stable. You aren't going to be adding so many blog tables in custom ways that this becomes a burden to maintain the cleanup script.

The intermediary table therefore seems like an extra step that just complicates things, and takes extra storage.

I've worked at several companies that eschewed the use of foreign key constraints in any case, because they make some operation tasks harder, for example schema changes. Also because they incur locking behavior that you probably don't expect. In the absence of the constraints, you can't get the ON DELETE CASCADE feature.

Like I said above, this is subject to opinion. You could easily ask, "but what if the priorities are a bit different...?" Of course, you must make this decision based on your requirements and priorities.