Mysql – How to prove that the act of removing foreign keys doesn’t corrupt existing data

database-designforeign keyinnodbMySQLreferential-integrity

TL;DR: Prove that in practice, the execution of the alter table table_name drop foreign key constraint_name statement does not corrupt existing data. The important consideration is the execution of the statement itself; consider data changes after the fact irrelevant. (By analogy: opening the stable door is the harmful action, not the horse bolting.)

My boss and I are having a difference of opinion about whether or not to use foreign keys in a MySQL/InnoDB database. I'm for using them for enforcing RI and taking advantage of ON DELETE CASCADE and ON UPDATE RESTRICT/SET NULL, while he's against (confident that he can enforce RI at the application level).

His argument is chiefly:

  1. Drupal doesn't use them and gets along fine without them, so why should we?
  2. They're too inflexible when you need to change data and/or structure.
  3. He's removed them from existing tables to change things and it's caused data corruption that was only noticeable weeks or months later, on high-traffic/ high activity sites, so he'd rather not use them.

My arguments are chiefly:

  1. Not all of the databases/ DB engines supported by Drupal 5-7 (MyISAM, SQLite 3) enforce FKs by default, so Drupal leaves them as documented only, whereas this might be different in D8.
  2. Yes, they can be a pain to deal with, but perhaps the fault lies with poor planning/designing on the part of the developer, not the FKs.
  3. Surely not enforcing RI with FKs at the DBMS level is more likely to cause data corruption than otherwise. (Case in point is D6's user reference module not restricting changing user status when existing content references a user that must have a certain status).
  4. It's a waste of time and resources to make code do/attempt what the DBMS already does. How can he guarantee that his code will work as well as (or better than) the DBMS?

Essentially, he'll come over to my way of thinking if I can prove that the act of removing a foreign key itself (regardless of subsequent data inserts/updates) won't corrupt existing data. I can't see a way clear to prove their usefulness/advantage because I'm not sure how I would satisfactorily observe/document the effect a drop foreign key would have immediately after execution other than comparing data from before execution to data after execution.

So, how do I persuade my boss that we should use foreign keys by proving that the act of later removal/alteration of them won't corrupt existing data, whereas using them will not cause any issues? How would I best set up practical usage tests?

Note: I don't so much want to prove that I am right (from an egotistical point of view) to use FKs, but that it benefits the data and the application code more to use them at the DBMS level than at the application code level.

Best Answer

To counter the points directly:

Drupal doesn't use them and gets along fine without them, so why should we?

Drupal supports many database layers, perhaps at least one of those does not support FKs and they chose to stick with the lowest common feature set? A great many people do use them, the one data point where people aren't using them is relatively meaningless. I know people who don't wear a seatbelt in cars, but I do as do most others...

They're too inflexible when you need to change data and/or structure.

If you are changing your structure to the point where you need to change FK relationships then anything in your existing design can look inflexible (particularly if the existing design didn't properly match the system you were modelling to start with, which is probably why you are looking to alter the existing structure).

They are deliberately inflexible when changing data. They are designed to not let you change data in such a way that integrity could be compromised, even temporarily. Ask for examples where a foreign key constraint would stop a data modification and for each one we'll be able to explain why that is a good thing and how to work with the constraint to achieve your goal without breaking it (or conversely, we might instead tell you why that particular FK relationship is wrong - but that doesn't make other FKs wrong it just means there was a problem with that part of your model).

He's removed them from existing tables to change things and it's caused data corruption that was only noticeable weeks or months later, on high-traffic/ high activity sites, so he'd rather not use them.

Removing them did not cause corruption. Removing them allowed a bug in some other code to cause corruption over time and it wasn't noticed. If the keys had been in place that process would have raised an error and presumably the problem with it would have therefore been noticed and fixed instead of silently being allowed to break more data until noticed. All removing the constraint does is stop the database enforcing the constraint for future inserts/updates/deletes - it does not affect existing data.

I'm not sure how you'd go about conclusively proving to someone who is sure otherwise that this is the case.

The only proof I can think of is first principals: reasserting what foreign key constraints actually do thereby showing that they would not allow inconsistency to appear, and if there are specific examples of the corruption the person has in mind you could work through trying to create that problem with the keys in place (showing that the key would have caused an error instead of allowing the inconsistency to be created).

More specifically about the immediate effect of removing the constraints: show that dropping them does not alter existing data at all. Create a copy of the database, drop the keys in the copy, and run a full comparison of the data to show that nothing changed as a result of the constraints being lifted.

he's against (confident that he can enforce RI at the application level).

At best that is bad use of development/testing time, at worst it will create a nightmare for later. You are reinventing the wheel, probably inefficiently and potentially with bugs, you have to implement your new wheel everywhere code touches that data in every application now and going forward instead of simply letting the database handle it each time, and if one of those bits of code has a bug the data potentially loses integrity for all applications for all time.

There are of course complex business rules that a DBMS can't enforce for you, so you have to implement them in your BL layer, but for the fundamentals like this let the DB handle it.

An extra one not mentioned but I've heard several times: They are inefficient when we make changes to the parent objects

This is due to most DBMSs not automatically creating an index for each FK, many people assume that they do and are surprised that they don't see the performance metrics that they expect. DBs don't do this as it could be very wasteful when not needed (which is more often then you might expect). If you need an index on the columns of the FK, to make cascading updates/deletes (or just simple join queries in that direction) efficient, create one.