Postgresql – Using JSONB Column or Another Table to Save Relationsips

database-designjsonmany-to-manypostgresql

I tried to search here thoroughly but did not find any answers.

I have a PostgreSQL database, which has two main tables:

  • documents
  • users

These two tables have different relations. A user can:

  • like
  • bookmark to read later
  • save

… a document.

The question is how should I save these relations?

In my experience with MySQL, the obvious way was to create tables for these many to many relations, containing user_id and document_id.

But as we are using PostgreSQL and it has the amazing JSON support, we are thinking maybe the better approach is to have a user_document table, which contains user_id, document_id and a JSON column containing all the relations.

The JSON would be something like this:

{
   'follow' : {'date' : 1523517140, 'doesFollow' : 't'}, 
   'bookmark' : {'date' : null, 'doesBookmark' : 'f'},
   ....
}

I have little to zero experience with PostgreSQL and I don't know about the performance of querying on JSONB columns. And I don't know if this approach makes sense at all in PostgreSQL. But it seems OK and if there is nothing wrong with it, maybe it is preferable to the first, normal, approach.

Best Answer

Document types like json or jsonb (or xml or hstore) are convenient to store, well, documents. Ideal for data with varying keys and rarely updated and not-too-complex filter criteria in queries. Manipulating big documents a lot inside the database would be an anti-pattern.

Structured data that is often written in small increments, and probably searched a lot (like in your case most likely) are much more efficient with a normalized design - regarding storage, performance, concurrent write access and data integrity. Hence the advice in the manual:

JSON data is subject to the same concurrency-control considerations as any other data type when stored in a table. Although storing large documents is practicable, keep in mind that any update acquires a row-level lock on the whole row. Consider limiting JSON documents to a manageable size in order to decrease lock contention among updating transactions. Ideally, JSON documents should each represent an atomic datum that business rules dictate cannot reasonably be further subdivided into smaller datums that could be modified independently.

Implement your n:m relationship with one ore more junction tables. You can enforce data integrity with constraints (PK, FK, UNIQUE, CHECK, ..), none of which is easily possible with document types. I would lean towards a single table unless you have substantially differing requirements for "like", "bookmark" etc.

Example:

If you are unsure about the typical layout, here are the basics:

Assuming that each of your relation types can only be used once per user/doc combination. For just a hand full of values (3 in your case) I would use a 1-byte "char" column as PK of a lookup table to optimize storage and performance in tables and indexes.

CREATE TABLE reltype (
   reltype "char" PRIMARY KEY
 , relation_type text UNIQUE NOT NULL
);

INSERT INTO reltype(reltype, relation_type) VALUES
   ('l', 'like')
 , ('b', 'bookmark')
 , ('s', 'save');

CREATE TABLE user_doc (
   user_doc_id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY
 , user_id     int REFERENCES users     ON UPDATE CASCADE ON DELETE CASCADE
 , document_id int REFERENCES documents ON UPDATE CASCADE ON DELETE CASCADE
 , reltype     "char" REFERENCES reltype NOT NULL DEFAULT 'l'
 , CONSTRAINT user_document_pkey UNIQUE(user_id, document_id, reltype)
);

You can always export aggregated data as JSON documents. Even have a VIEW or MATERIALIZED VIEW to read like a table. But don't manage relationships inside a single big JSON document.