How to model medium.com paragraph comment

database-designdatabase-recommendation

If you have used medium.com, I am interested to know more about how to model readers' paragraph comments in database.

Specifically, how connections between these comments and paragraphs are modeled?

My initial thought is (1) using the order of paragraphs in article, e.g. paragraph 1, 2… It won't work if the author deletes/adds/moves paragraphs, which messes up the order.

Then I think about (2) assigning id to paragraphs, e.g. SHA hash the paragraph as its unique id. But it won't work if the author edits the paragraph.

Could anyone help me?

Btw, could the solution extends to sentences/phrases comments as well?

Best Answer

You can insert a comment marker in the text itself. This marker is not displayed. As text is being changed, the marker stay embedded in the text itself and moves around as text is being edited. If the paragraph is removed then the comment can be orphaned and garbage collected later. Of course, this has nothing to do with DBs, is just pure text processing.

If you want a DB centric approach then the paragraphs are assigned primary keys during creation and the comments reference this primary key. As PK, of course, they never change. Order of paragraphs within text can be persisted as a relation previous-current-next (implemented as a separate table, or as next_id and prev_id fields. You can also throw in versioning (history). I'm not a editable text storage expert so I don't know what is the state of the art.

Your problem, as you state it, is not a problem of text processing but a problem arrise from the use of a natural primary key (paragraph position or paragraph hash). Of course, we all know that natural keys are bad because they change. In your example they change very frequently. Using surrogate keys solve the problem of PK volatility. the rest is just a modeling exercise on storing the relations (order of paragraphs in text, paragraph changes as text evolves etc). I'm letting as an exercise to you the problem of paragraph split during edit (which shard inherits which comments) and the problem of paragraph merge.