Database design: how to handle the “archive” problem

database-design

I'm pretty sure a lot of applications, critical applications, banks and so on do this on a daily basis.

The idea behind all that is:

  • all the rows must have a history
  • all links must stay coherent
  • it should be easy to make requests to get "current" columns
  • clients who have bought obsolete things should still see what they've bought even though this product is not part of the catalogue anymore

and so on.

Here's what I want to do, and I'll explain the problems I'm facing.

All my tables will have those columns:

  • id
  • id_origin
  • date of creation
  • start date of validity
  • start end of validity

And here are the ideas for CRUD operations:

  • create = insert new row with id_origin = id, date of creation=now, start date of validity=now, end date of validity=null (= means it's the current active record)
  • update =
    • read = read all the records with end date of validity==null
    • update the "current" record end date of validity=null with end date of validity=now
    • create a new one with the new values, and end date of validity=null (= means it's the current active record)
  • delete = update the "current" record end date of validity=null with end date of validity=now

So here's my problem: with many-to-many associations. Let's take an example with values:

  • Table A (id = 1, id_origin = 1, start=now, end=null)
  • Table A_B (start=now, end=null, id_A = 1, id_B = 48)
  • Table B (id = 48, id_origin = 48, start=now, end=null)

Now I want to update table A, record id=1

  • I mark record id=1 with end=now
  • I insert a new value into table A and… damn I've lost my relation A_B unless I duplicate the relation, too… this would end to a table:

  • Table A (id = 1, id_origin = 1, start=now, end=now+8mn)

  • Table A (id = 2, id_origin = 1, start=now+8mn, end=null)
  • Table A_B (start=now, end=null, id_A = 1, id_B = 48)
  • Table A_B (start=now, end=null, id_A = 2, id_B = 48)
  • Table B (id = 48, id_origin = 48, start=now, end=null)

And… well I have another problem: the relation A_B: shall I mark (id_A = 1, id_B = 48) as obsolete or not (A – id=1 is obsolete, but not B – 48)?

How to deal with this?

I have to design this on a big scale: products, partners, and so on.

What is your experience on this? How would you do (how have you done)?


Edit

I've found this very interesting article, but it doesn't deal properly with "cascasding obsolescence" (= what I'm asking actually)

Best Answer

It's not clear to me if these requirements are for auditing purposes or just simple historical reference such as with CRM and shopping carts.

Either way, consider have an main and main_archive table for each major area where this is required. "Main" will only have current / active entries whereas "main_archive" will have a copy of everything that ever goes into main. Insert / update into main_archive can be a trigger from insert / update into main. Deletes against main_archive can then run across a longer period of time, if ever.

For the referential issues such as Cust X bought Product Y, the easiest way to solve your referential concern of cust_archive -> product_archive is to never delete entries from product_archive. Generally, churn should be much lower in that table so size shouldn't be too bad of a concern.

HTH.