Database Design – Is Marking a Row as Removed or Deleted a Good Practice?

database-design

This is a theoretical* design question about marking rows as removed or deleted. I assume rows should be marked as removed whenever there is a need to remove the pertinent data, but the row should stay in the relevant database table to maintain referential integrity.

* Theoretical, within the scope of this question, means that it should not be related to concrete example and also not related to concrete relational database.

For the sake of the question I have chosen a classical example: A Shop's database. And a situation is: shop's user wants to quit using the shop and wants his personal data purged.

We have an Orders table which references a Users table. If we try to DELETE a user row, we will lose all referenced orders (assuming we have strong referencing). Therefore instead of a DELETion of a user row we may mark such row as 'removed' (i.e., have a BIT (or boolean) column SET to '1', e.g., IsRemoved = 1) and clear his login, password, first name, last name columns to purge personal data. This is a first approach, which I believe is bad. While we get rid of personal data, the row still occupies space in database – just for the sake of referential integrity: no other useful meaning it has.

But there is an alternative, instead of having an Orders->Users relationship, we can have a different structure: Orders->Customers->Users, where the Customers table has no personal data. When a user wants to quit, we simply DELETE the pertinent User row, but Customer along with its related Orders stay stored in the database. No need to mark row as removed, no need for additional business rules to handle the removed rows behavior. I believe this approach is better and may be used for any situation where the IsRemoved row approach is used.

Question

Do you think second approach lacks real-life application in some situations? Or can it be used for any DB design?

P.S. Of course, it is possible to referential integrity with NULLs in FOREIGN KEYs, however this approach is out of scope. While it may be useful in the Shop's example, the question is hypothetical, and shop is used only as example, not as a real problem (which may be solved with NULLs).

Best Answer

The 2nd approach is interesting, but it seems like you could be optimizing the model for a scenario that is not that frequent (introducing a third table to be joined whenever you need to access the user data).

Also, your description of the first approach seem to imply that the user table has no other info, except the one that you need to purge, and it's not necessarily the case (e.g. you may want to track the insertion and purge date to have statistics for the leaving users).

Finally, as everyone else has already said, there is not a single "good" answer. For example, you might have some audit regulations that forbids you to delete the personal data for a period of time, so the "isActive" flag would be a better match.