MySQL – How to Store Text Differences Efficiently

MySQL

I want to store text differences in MySQL. Let's say I have a text, 500 characters or so, and then someone edits the text. I want to be able to reconstruct both the old and new text.

From a database viewpoint, I thought it would be stupid to store both texts if I could just store the difference. That would save a lot of space, especially if the edits are minute and the texts huge. But how do I do that?

I had some ideas which were really complex, such as storing only the old text, and when an edit occurs, replace the changed parts with a certain variable which then refers to another table. That other table could simply store the new and old value of that particular section of the text.

Is there a simple way to do this?

Best Answer

If you store old text and differences, you will have to always apply everything to get current text. So I suggest storing current text and diff (sort of reversed changes) to get previous version. Keep table for those diffs like (text_id, version_number, diff).

For differences you can sure use standard diff tool. And if you want to save space, use some compression (either on mysql side or application side).

Do you need to search in those texts? If yes then in current version only or in all versions? The latter might be a bit bigger challenge to do effectively.

Related Question