Mysql – Storing revisions and relations between revised items in 1:N and N:N scenarios for audits

ArchitectureauditMySQLschema

I have several of the following situations in a MySQL schema that I design for a custom CRM/ERP solution.

  1. 1:N related tables SO and SO_ITEMS (SO is just one example I have several) that both need to have revisions. The data sets stored tend to be get pretty big for some of these tables and, according to the business rules I gathered, there will be many changes that need auditing so copying all the child items for each revision of the parent will not scale very well I predict.

  2. N:N related tables SO and PO tables (where I use a pivot table of course) that needs to be audited also so that changes from at least one way imply (or may imply) storing an updated revision of the data entry form the other table. (We need no worry about the child _items data entries for each of these tables because although they may be related, I don't require auditing for now).

My strategy for now is to use a separate table suffixed with _revision for each of the above tables and store the current revision number in the normal table and a revision number for each revision, at least for the parents.

So my question are:

  • is my schema flawed in some major way that I don't predict for this kind of scenario? (I suppose there are many other applications that have solved this problem out there).
  • should I make a direct relationship between x_revison and y_revision or use the revision number and query through the normal (current revision?) table? I want the variant that reduces complexity the most and I'm pretty sure querying through is simpler on most operations (I suspect there will not be a big disparity between SELECT/INSERT/UPDATE in normal operation, so I don't need to optimise for one type of DB operation). Am I missing something?
  • is it common practice to copy-on-write the revisions or should I just lazily insert entries in the revisions table when a change appears only? Any possible pros/cons to these two options.

Best Answer

You have tradeoffs to consider when deciding on the design. Which do you want to optimize for:

  • Insert new item
  • Revise item
  • Fetch "current" item
  • Fetch some old revision
  • Fetch all revisions

If one of those actions dominates, then design for it, and kludge for the rest.

Also, do revisions need extra columns, such as revisor, revision_date, revision_number, ...? This may 'force' you to have a separate table for revisions.

If not using a separate table, consider this design:

Have a column rev for revision number: 0 for "current"; 1...N for old versions (1 is oldest). Then it takes a couple of statements to move 0 to N+1 and insert a new 0. The Fetches are pretty obvious. Include an composite index (the PRIMARY KEY?) in this order: (rev, item_id).

Related Question