Postgresql – Postgres JSON field for many to many relationships

I'm designing a schema that has a many-to-many relationship as such:

A lead can have several stages and each stage can have several leads. I currently have three tables, with one of them as being a junction table.

stages
----------
id | name | workflow_id | created_at | updated_at

leads
----------
id | name | created_at | updated_at

stage_leads
--------------
id | stage_id | lead_id | created_at | updated_at

(stages belong to a workflow)

I've got indices set up.

When a lead moves from one stage to another, a new entry is created in the stage_leads table. My use cases seem simple:

a) find/count all leads on a particular stage, and

b) find/count all stages a particular lead has gone through

My concern is that since every move is creating a new entry into the junction table — and there are potentially hundreds of stages a lead can go through — it may lead to hundreds of thousands of rows for very few leads and that could probably impact the lookup time when I'm trying to fetch a specific set.

My question is this: would it be better to use Postgres' JSON functionality to store my data something like this:

stages
----------
id | name | workflow_id | created_at | updated_at

leads
----------
id | name | current_stage_id | stages | created_at | updated_at

… where the stages column on "leads" is a JSON object that logs the stage_ids that the lead has gone through, the timestamps, and a couple of other fields?

This seems to satisfy my use case of finding all leads on a particular stage (I can run a query against the current_stage_id of all leads) and it probably makes retrieving the stages for a particular lead easier. I'm not very confident about it though. I realize I'd be giving up on things like constraints, etc.

Are there any downsides to an approach like this?

Best Answer

would it be better to use Postgres' JSON functionality to store my data something like this:

No.

GIN / GiST indexes on jsonb are bigger and slower than b-tree indexes. Additionally, whenever you update a field it must be rewritten, you can't just insert an extra element inside the json document in the row.

My concern is that since every move is creating a new entry into the junction table -- and there are potentially hundreds of stages a lead can go through -- it may lead to hundreds of thousands of rows for very few leads and that could probably impact the lookup time when I'm trying to fetch a specific set.

This is what RDBMSes are good at.

Use the conventional relational modelling for this unless you have a really good reason not to.

Best Answer

Related Solutions

Postgresql – Improve performance on concurrent UPDATEs for a timestamp column in Postgres

Design for periodic data cuts

Related Question