Postgresql – Recursivly get a Tree Via self joined table

many-to-manypostgresqlrecursiveself-join

Using other questions here & Postgresql documentation I've managed to build a many-to-many self joined table.

However adding a WHERE clause is giving me trouble.

Problem:

A Category can have many child categories, and many parent categories. Given a category.Id, I want to retrieve the category, the category children, childrens children and so on.

Example: given this structure:

child_1
    child_11
        child_111
        child_112
            child_1121
    child_21
child_2

Given: a clause of id = child_11

Expected results:
child_11, child_111, child_112, child_1121,

Actual results: child_11, child_111, child_112

Here is my attempt: http://sqlfiddle.com/#!17/3640f/2

In case Sqlfiddle is down: https://www.db-fiddle.com/#&togetherjs=LhDjxfPHo6

Note: I don't care about duplicating the where clause, my application can handle that

Table structure:

CREATE TABLE Category(id SERIAL PRIMARY KEY, name VARCHAR(255));
CREATE TABLE Categories(parent_id INTEGER, child_id INTEGER, PRIMARY KEY(parent_id, child_id));

ALTER TABLE Categories ADD FOREIGN KEY (parent_id) REFERENCES category (id);
ALTER TABLE Categories ADD FOREIGN KEY (child_id) REFERENCES category (id);

Table data:

INSERT INTO Category(id, name) VALUES (1, 'parent_1');
INSERT INTO Category(id, name) VALUES (2, 'child_1');
INSERT INTO Category(id, name) VALUES (3, 'child_2');
INSERT INTO Category(id, name) VALUES (4, 'child_3');
INSERT INTO Category(id, name) VALUES (5, 'child_1_1');
INSERT INTO Category(id, name) VALUES (6, 'child_1_2');
INSERT INTO Category(id, name) VALUES (7, 'child_1_1_1');
INSERT INTO Category(id, name) VALUES (10, 'child_of_many');
INSERT INTO Category(id, name) VALUES (11, 'parent_1');
INSERT INTO Category(id, name) VALUES (12, 'parent_2');

INSERT INTO Categories(parent_id, child_id) VALUES (1, 2);
INSERT INTO Categories(parent_id, child_id) VALUES (1, 3);
INSERT INTO Categories(parent_id, child_id) VALUES (1, 4);
INSERT INTO Categories(parent_id, child_id) VALUES (2, 5);
INSERT INTO Categories(parent_id, child_id) VALUES (2, 6);
INSERT INTO Categories(parent_id, child_id) VALUES (5, 7);

My query which is giving me the children, but not the childrens children etc.
If I remove the WHERE clause's I can get all rows:

WITH RECURSIVE categories_category AS (
  SELECT id, 'Category' AS COLUMN_TYPE, c1.name 
  FROM Category c1
  WHERE c1.id=2
UNION
  SELECT c2.id, 'Category' AS COLUMN_TYPE, c2.name
   FROM Category c1
   INNER JOIN categories cs1 ON c1.id = cs1.parent_id
   INNER JOIN Category c2 ON c2.id = cs1.child_id
   WHERE cs1.parent_id = 2
) SELECT * FROM categories_category

Edit: More Detailed example:

Given the following category row's, I'd like to be able to run a query given a WHERE clause the matches the id of warmStoreAlcohol and get a result of:

+---+------------------+
|id |name              |
+---+------------------+
|2  |warmStoredAlcohol |
|3  |vodka             |
|4  |beer              |
|5  |frozenBeer        |
+---+------------------+

coldStoredAlcohol would give a result of:

+---+------------------+
|id |name              |
+---+------------------+
|6  |coldStoredAlcohol |
|5  |frozenBeer        |
|7  |cooler            |
+---+------------------+

The database structure will not change often. In this example 'frozenBeer' has two parents, and should return for querying both warmStoredAlcohol and coldStoredAlcohol.

I am open to changing the table structure, adding new tables and even upgrading postgres version etc. The database will hold ~2,000 rows, therefore I value an easy-to-understand table structure over a super complicated optimal one. (But any solution is better than my broken one)
enter image description here

Best Answer

Recursive CTEs are notoriously hard to learn. Take a look at this fiddle to play with a working model.

The 2nd query in the CTE, or Common Table Expression, needs to reference the first query. This is where the "recursive" name comes from in CTE.

My fiddle has a single table, where each cat can only have a single parent, which is arguably a more common tree scenario. This makes it slightly easier to understand, in my opinion.

Anyway, the DDL consists of:

CREATE TABLE cats
(
    cat_id SERIAL
      PRIMARY KEY
  , cat_name VARCHAR(20) NOT NULL
  , parent_cat_id int NULL
);

ALTER TABLE cats ADD FOREIGN KEY (parent_cat_id) REFERENCES cats (cat_id);

INSERT INTO cats (cat_name, parent_cat_id)
VALUES ('garfield', NULL);

INSERT INTO cats (cat_name, parent_cat_id)
VALUES ('bertina', 1);

INSERT INTO cats (cat_name, parent_cat_id)
VALUES ('oletha', 2);

And the recursive CTE looks like:

;WITH RECURSIVE cats_hierarchy AS
(
  SELECT c1.cat_id
      , c1.cat_name
      , cast('' as varchar(20)) as parent_cat_name
      , c1.parent_cat_id
      , 1 as level
  FROM cats c1
  WHERE c1.parent_cat_id IS NULL

  UNION ALL

  SELECT c2.cat_id
      , c2.cat_name
      , c1.cat_name AS parent_cat_name
      , c2.parent_cat_id
      , c1.level + 1 AS level
  FROM cats c2
      INNER JOIN cats_hierarchy c1 ON c2.parent_cat_id = c1.cat_id
)
SELECT *
FROM cats_hierarchy;

Results look like:

╔═════════╦═══════════╦══════════════════╦════════════════╦═══════╗
║ cat_id  ║ cat_name  ║ parent_cat_name  ║ parent_cat_id  ║ level ║
╠═════════╬═══════════╬══════════════════╬════════════════╬═══════╣
║      1  ║ garfield  ║                  ║ (null)         ║     1 ║
║      2  ║ bertina   ║ garfield         ║ 1              ║     2 ║
║      3  ║ oletha    ║ bertina          ║ 2              ║     3 ║
╚═════════╩═══════════╩══════════════════╩════════════════╩═══════╝

In the CTE the first query (before the UNION ALL) shows cats that are at the root. The 2nd part of the CTE (after the UNION ALL) gets cats related to the parent cat. Those returned results are then queried again by the CTE, generating a tree.

Let me know if that helps, or if you need further clarification.