Postgresql – Is it a good or bad idea to use “ON UPDATE CASCADE ON DELETE CASCADE” for foreign keys? Why does this mechanism exist at all

postgresql

I understand what foreign keys are, and have made a point of including them wherever they make sense for all my database tables that I design.

However, something which has always confused me is whether or not I should be explicitly setting the "ON UPDATE" and "ON DELETE" features (in lack of a better term). Example:

CREATE TABLE "test1"
(
    id              serial,
    referenceid     integer,
    FOREIGN KEY     (referenceid) REFERENCES "othertable" (id) ON UPDATE CASCADE ON DELETE CASCADE
)

This code goes out of its way to explicitly add the technically "unnecessary" part: "ON UPDATE CASCADE ON DELETE CASCADE".

Since this is not done by default, there must be a reason for this! After all, the default behaviour is always (or at least should always be) the most commonly needed behaviour:

CREATE TABLE "test2"
(
    id              serial,
    referenceid     integer,
    FOREIGN KEY     (referenceid) REFERENCES "othertable" (id)
)

In the test1 table, as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that the referenced records in the test1 table will either be updated or deleted. This seems, on the surface, like what should be the default behaviour.

In the test2 table, again as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that PostgreSQL will refuse to perform the query if there are records in test2 which reference the ones being modified.

I'm basically confused about the entire concept of "ON UPDATE" and "ON DELETE". Why would one ever want a query to be refused like that? And "CASCADE" isn't even the only option (besides none); there are multiple other values you can use which cause various behaviour (which I don't understand).

Since there is a stated relationship between the tables (through the foreign keys), isn't the whole point that you want them to remain consistent? So why you not want it to "CASCADE" if there are changes made to the "master" table?

This might be similar to how I could never understand why object-oriented programming had "security measures" in the code, disabling you from directly changing or retrieving an object's properties and being forced to go through "getters" and "setters". I mean, if something can execute queries in your database, isn't "all lost" anyway? They can just do:

DELETE FROM table1/table2 CASCADE

… or something like that.

The ON UPDATE/ON DELETE mechanism seems almost like the database engineers could not decide on the best behaviour and instead put this on the user of the product instead. For me, it adds a lot of confusion and anxiety.

It should be noted that I have used the "test2" style code many times in the past, only to realize that I cannot update or delete records where it made sense. That's why I started using "ON UPDATE CASCADE ON DELETE CASCADE" in the first place, after asking and learning about it.

So why isn't this the default and even the only behaviour for a database? Why would you ever want a query to update/delete your "master records" to fail?

Best Answer

I'm not sure about ON UPDATE CASCADE. If you find yourself needing this sort of cascaded update then that is perhaps a "code smell" in your database design. In theory your primary key should be static so changes that need cascading shouldn't need to happen. Perhaps it was added as a logical step along from ON DELETE CASCADE. It is at least safer than cascading deletes.

The existence of ON DELETE CASCADE makes more sense: while PKs shouldn't really change things do often get deleted. The cascading is simply a convenience, it saves you from having to write code to drop child entities manually when getting rid of a parent. Further more it might be considered safer than implementing this in other logic because the database is taking care of transactional consistency, deadlocks, and so forth, so the operation should (bugs permitting) be guaranteed atomic. If you implement your own "find children, delete, then delete parent" which may have to be nested, you have to do some legwork[!] to ensure that if there is an error part way through there is no way that you delete the great-great-great-grand-children of a row but leave the rest standing (leaving a partly deleted entity which could cause difficult to diagnose issues later).

[!] Taking appropriate locks, preferably not by locking whole tables, ensuring transaction isolation settings are right, ... - it isn't as simple as it might first look.

Why Not Cascade?

As I said above, I consider a need to cascade updates routinely to be a bit of a design smell. You shouldn't need to change a primary key value during normal operations.

I'm very wary of cascaded deletes, despite the danger of bugs in more manually deleting complex structured entities. Too often you see inexperienced people perform UPSERT operations[*] using a DELETE-then-re-INSERT method, even when the DB supports single-statement upsert operations[^], which damages your data if cascaded deletes are enabled: the delete removes the children too, and they don't get put back by the subsequent insert.

Also, in a lot of cases with real data you don't actually want a cascaded delete. For example: if a manager leaves a company you don't want to delete their sub-ordinates because assigning a new manager first was forgotten, or prevented by a bug.

[*] in postgres via INSERT ... ON CONFLICT ... but this is not standard and quite different syntax[†] is used elsewhere
[^] either because they aren't aware of the available syntax, or are avoiding it in order to be cross-DB compatible
[†] MERGE can be used in Microsoft's TSQL for the same effect, mySQL[‡] supports INSERT ... ON DUPLICATE KEY ...
[‡] mySQL also supports REPLACE INTO, but IIRC that is just syntactic sugar for delete+insert so has the same dangers