I want to make a part of my application reusable, and that warrants moving the corresponding tables into a separate database. So for the sake of an example, please consider the two imaginary databases in the list that follows. (More databases sharing the same logic may be added as the project grows.)
users
containing tables related to user sign ups, login and e-mail history, password reset tokens etc., as well as the accounts themselves; andblogs
having tables for posts, media files, comments, etc.
Each table in the blogs
database must obviously have an account_id
column referring as a foreign key to users.accounts.id
. (I do realise that to make it work both databases must use InnoDB and reside on the same server.)
My question is what would be a better practice:
- direct reference to another database:
- simply refer
blogs.posts.account_id
tousers.accounts.id
(repeat with all otherblogs.*
tables), - make each reference CASCADE ON DELETE; or
- simply refer
- using an intermediary table:
- create an intermediary table
blogs.accounts
having only one column calledid
; then - on one hand, refer every table inside the
blogs
database to that intermediary table (soblogs.posts.account_id
toblogs.accounts.id
, CASCADE ON DELETE); and - on the other hand, finish by referring this
blogs.accounts.id
to the 'upstream'users.accounts.id
, make sure to CASCADE ON DELETE as well.
- create an intermediary table
The latter seems like an unnecessary complication. But the only advantage I can think of is this can make the setup future proof in case we end up having to still migrate one (or some) of the databases to another server:
- If we link the tables directly, after the migration the
blogs
database will have lots of disparateaccount_id
columns that won't CASCADE ON DELETE - But if these intermediary tables get disconnected from the upstream
users.accounts.id
, their neighbouring tables in each respective database are still linked to them. This way we can continue benefitting from at least somewhat integrity and CASCADEs. In other words, if a user gets deleted, all we have to do is have a script go through each of these*.accounts
connector tables and delete the id counterpart once, and CASCADE will take care of the rest of the tables inside of that database automatically.
Am I on the right track with this logic, or am I missing some other ways to handle this more effectively, and therefore reinventing the wheel?
Best Answer
This is bordering on an opinion-based question, because of course anyone can have their own justification for either choice.
I would not choose to add the intermediary table, for a few reasons.
One is that as you know, the intermediary table cannot have a foreign key reference to a table on another MySQL instance. Foreign keys can span schemas, but not instances.
If you need to run special scripts to clean up the intermediary table anyway, that special script should be able to encode clean up tasks for all the blog tables. I assume at least the number of blog tables remains relatively stable. You aren't going to be adding so many blog tables in custom ways that this becomes a burden to maintain the cleanup script.
The intermediary table therefore seems like an extra step that just complicates things, and takes extra storage.
I've worked at several companies that eschewed the use of foreign key constraints in any case, because they make some operation tasks harder, for example schema changes. Also because they incur locking behavior that you probably don't expect. In the absence of the constraints, you can't get the ON DELETE CASCADE feature.
Like I said above, this is subject to opinion. You could easily ask, "but what if the priorities are a bit different...?" Of course, you must make this decision based on your requirements and priorities.