Postgresql – Get all descendants from 3 hierarchical tables

hierarchypostgresql

I have the following tables within a database:

+-------------+   +-------------+-------------+   +---------------------------+
|  project1   |   |          project2         +   |         project3          |
+-------------+   +-------------+-------------+   +---------------------------+
| project1_id |   | project2_id | project1_id |   | project3_id | project2_id |
+-------------+   +-------------+-------------+   +---------------------------+
|    hnc1     |   |    hnc4     |     hnc1    |   |     hnc7    |    hnc4     |
|    hnc2     |   |    hnc5     |     hnc1    |   |     hnc8    |    hnc4     |
|    hnc3     |   |    hnc6     |     hnc3    |   |     hnc9    |    hnc6     |
+-------------+   +-------------+-------------+   +---------------------------+

I need to retrieve all descendants of a given ID from all three tables – including the ID itself if it is found in any table.

For example, if I query using the ID 'hnc4' it should get:

+-------------+
| descendants |
+-------------+
|    hnc7     |
|    hnc8     |
|    hnc4     |
+-------------+

If I query using the ID 'hnc1':

+-------------+
| descendants |
+-------------+
|    hnc4     |
|    hnc5     |
|    hnc7     |
|    hnc8     |
|    hnc1     |
+-------------+

The query must work even if the ID is only in the last table project3.

I was searching and found that a recursive CTE could be useful but I don't know how to make it with 3 tables involved.

Any guidance on how to get with the right results?

Best Answer

Identifiers cannot be parameterized, so I can't think of a way to traverse multiple tables with an rCTE.

Assuming UNIQUE or PRIMARY KEY constraints on each table, so we don't have to worry about duplicates.

I wrapped the query in a SQL function to simplify passing the ID:

CREATE OR REPLACE FUNCTION f_all_descendants_of(_id text)
  RETURNS TABLE (descendants text) AS
$func$
   WITH p2 AS (
      SELECT project2_id
      FROM   project2
      WHERE  project1_id = $1
         OR  project2_id = $1
      )

   SELECT project3_id AS descendants
   FROM   project3
   WHERE  project2_id IN (TABLE p2)
      OR  project3_id = $1

   UNION ALL TABLE p2

   UNION ALL
   SELECT project1_id
   FROM   project1
   WHERE  project1_id = $1;
$func$  LANGUAGE sql;

The SELECT on project2 goes into CTE p2, so we can reuse the result for the SELECT on project3. The same is not needed for project1 and project3.

Call:

SELECT * FROM f_all_descendants_of('hnc4');

Produces your desired result exactly.

SQL Fiddle.

If the tables are big, be sure to have the right indexes. Related: