PostgreSQL – How to Link Rows in Many-to-Many Relationship with Hierarchical Types

database-designpostgresqlrecursive

In PostgreSQL 10.4, I have a model/table category with a Foreign Key to itself. The hierarchy can extend to any number of levels: parent, grandparent etc.

I have a model/table product. Each product can have multiple categories and each category can have multiple products.

category table:

id | name  | parent_id
---+-------+-----------
1    name1   null
2    name2   1
3    name3   2 
4    name4   1
5    name5   4
6    name6   null

cat_prod relation table:

id | name | product_id | category_id
---+------+------------+-------------
1    name1  1            1
2    name1  1            2
3    name2  2            1
4    name3  3            1
5    name4  4            2
6    name5  5            2
7    name6  6            3

product table:

id | name 
---+-------
1    name1   
2    name2   
3    name3   
4    name4   
5    name5   

If I want to get all products that 'are' in category with id 1 (including subcategories) I need to get products from 1 and 2, 3, 4, 5 because they are children, grand_children of 1.

If I link/add a product to a category (ex:5), what is the best approach?

  1. Every time I add a product to a subcategory add it also to the
    parent, grandparent etc

    Ex- add it also in 4 and 1 – this adds a lot of rows in database for the product category relation, but needs one query to retrieve products.

  2. Add link just to 5 and use recursive

    Just one row in database, but many calls to a db at retrieval.

  3. Another approach

    I need also to take in consideration that a Product can be in multiple categories.

Best Answer

Option 2.

Only save a row for the assigned category, no extra rows for category parents. Don't store redundant information in your tables. That's bad for general performance and for maintenance. (Also don't save the name of the product in table prod_cat redundantly)

A recursive query on category is cheap and simple. You might encapsulate it in a function for convenience:

CREATE OR REPLACE FUNCTION f_cats_of_id(_id int)
  RETURNS TABLE (category_id int) AS
$func$
WITH RECURSIVE cte AS (
   SELECT $1 AS parent_id  -- not checking existence, cheapest

   UNION ALL
   SELECT c.id
   FROM   cte
   JOIN   category c USING (parent_id)
   )
TABLE cte
$func$  LANGUAGE SQL;

And this query to ...

... get all products that 'are' in category with id 1 (including subcategories)

SELECT DISTINCT p.*  -- assuming you want all columns of table product.
FROM   f_cats_of_id(1)
JOIN   cat_prod cp USING (category_id)
JOIN   product p ON p.id = cp.product_id;

DISTINCT because ...

... a Product can be in multiple categories

The same product might be listed more than once, via category and sub-category. DISTINCT folds duplicates.

db<>fiddle here