MySQL – How to Keep Updates (Diffs) of an Entity in the Database

MySQLpostgresqlrelational-theory

What is the best way to keep updates (diffs) of the some entity in the database? At StackOverflow we can edit questions and answers. And then we can look at any revision of the question or answer we want. For example: revisions of some random question. Maybe someone knows how it realized in StackOverflow?

To be clear in my case I have some entity (article) with some fields (name, description, content). Many users can edit the same article. I want to keep history of the article updates (something like version control) and I want to keep only diffs, not the whole content of the updated article. By the way I use PostgreSQL, but can migrate to any other database.

Here is some requirements. You don't need to fully satisfy them. But if you do it will be much better. Nevertheless any answer is much appreciated. So I want to have ability:

  1. to keep only diffs to not waste my space for no purpose.
  2. to fetch any revision (version) of some article. But fetching last revision of the article must be really quick. Fetching speed of other revisions is not so important.
  3. to fetch any diff (and list of diffs) of some article. Article can have changes in fields: header, description or content (like StackOverflow have changes in header and content), so it must be taken into account.

Best Answer

Check out Drupal and RDF. Check out what StackExhange (etc) uses. And study how GIT works.

I recommend keeping the current value in the primary table, and put the history in other table(s). That is, plan for efficient access to the current value, at the expense of history.

If you are talking about "articles" of, say, 1K characters, then compress them. Compress old copies; optionally compress the 'current' copy. But do the compression in the client, so as to offload the database server. This will save space. If you are talking about ancillary data (dates, names, etc), don't bother compressing.

Do not save the entire set each time there is a change. That is, if you change the date, don't store another, identical, copy of the article body. Too bulky.

You mentioned diffs -- Do you intend to do diffs on paragraphs? If so, do you have a good algorithms for such? (My previous comments do not assume "diff".)

Oh, well, I guess I am not adding much to your stated requirements.