Postgresql – Recursively find the path to all leaves descending from a given parent in a tree

graphhierarchypostgresqlrecursivetree

I'm trying to write a query to identify the path to each furthest descendant of a particular parent node in a table of tree structures like:

    0     1
    |     |
    2     3
    |     |
    4     5
   / \    |
 *6*  8  *7*
      |
     *9*

There are many parents, all children have one parent, parents have 0-5 children (but the graph is very "long" – most parents only have one child). There are no cycles.

I'm trying to efficiently identify the path to the furthest descendants of a specific node (and not to any intermediate nodes). E.g. in the above:

  • get_leaf_paths(1) would return 1 row: {1, 3, 5, 7}
  • get_leaf_paths(2) would return 2 rows: {2, 4, 6} and {2, 4, 8, 9}

Sample table:

CREATE TABLE graph (
    id bigint PRIMARY KEY,
    parent_id bigint,
    FOREIGN KEY (parent_id) REFERENCES graph(id)
);
INSERT INTO graph (id, parent_id)
    VALUES (0, NULL),
           (1, NULL),
           (2, 0),
           (3, 1),
           (4, 2),
           (5, 3),
           (6, 4),
           (7, 5),
           (8, 4),
           (9, 8);

I'm hoping for a result that looks something like:

SELECT get_leaf_paths.* FROM get_leaf_paths(0);
path
-----
{0, 2, 4, 6}
{0, 2, 4, 8, 9}
(2 rows)

In my initial attempt at a function with a recursive query, I've had trouble selecting only the furthest leaves, especially since some branches are shorter than others (e.g. 6 and 9 above are at different depths). The paths can be very deep (thousands or millions of elements), so I would also like to avoid the excessive memory usage of generating paths for every intermediate node.

Any ideas would be greatly appreciated. Thanks!

Best Answer

WITH RECURSIVE
cte AS ( SELECT id, 
                parent_id, 
                id::TEXT path,
                NOT EXISTS ( SELECT NULL
                             FROM graph gr
                             WHERE graph.id = gr.parent_id ) is_leaf
                FROM graph 
                WHERE id = 2 /* initital node id */
         UNION ALL
         SELECT graph.id, 
                graph.parent_id, 
                cte.path || ',' || graph.id,
                NOT EXISTS ( SELECT NULL
                             FROM graph gr
                             WHERE graph.id = gr.parent_id ) 
         FROM cte JOIN graph ON cte.id = graph.parent_id)
SELECT path 
FROM cte
WHERE is_leaf

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=2e40ff454f302033bf5e8cba8b0b0d85

Multiple initial nodes may be applied too (WHERE id IN (0, 1)).