Postgresql – self-joins in recursive CTE: `recursive reference must not appear more than once`

postgresqlpostgresql-9.6recursiveself-join

This is tested with PostgreSQL 9.6, but it's a general SQL question.

Is it possible to use self-joins of the recursive table in a recursive CTE (rCTE)?

I tried the following rCTE containing a self-join of the recursive term x,

WITH RECURSIVE
x (id) AS (
  SELECT 1 id UNION ALL SELECT x1.id+x2.id FROM x x1, x x2
  WHERE x1.id < 5 AND x2.id < 5 AND x1.id = x2.id
  )
SELECT * FROM x;

, which hopefully should be equivalent to:

WITH RECURSIVE
x (id) AS (
  SELECT 1 id UNION ALL SELECT id+id FROM x WHERE id < 5
  )
SELECT * FROM x;

But the former rCTE generates an error:

ERROR:  recursive reference to query "x" must not appear more than once 
LINE 3:   SELECT 1 id UNION ALL SELECT x1.id+x2.id FROM x x1, x x2

Is there a fundamental reason why the recursive reference must not appear more than once? Or is this just a limitation of the PostgreSQL implementation?

Also, is there a work-around?

Best Answer

Indeed error like recursive reference to query "x" must not appear more than once is some strange restriction applied in postgres. And I made assumption it is because their parser just simple distinguish recursive and non-recursive part of query by present of that table. Meantime for that present nice workaround - you may use nested CTE (WITH statement), and give another name for such table. For your initial example it will look like:

WITH RECURSIVE
x (id) AS (
  SELECT 1 id
  UNION ALL
  SELECT * FROM (
    WITH x_inner AS ( -- Workaround of error: recursive reference to query "x" must not appear more than once
      SELECT * FROM x
    )
    SELECT x1.id+x2.id
    FROM x_inner x1, x_inner x2
    WHERE x1.id < 5 AND x2.id < 5 AND x1.id = x2.id
  )t
)
SELECT * FROM x;

You could try it in SQL fiddle.

Get qualifying rows only

One way ..

WITH x AS (
   SELECT *
         ,row_number() OVER (PARTITION BY reference ORDER BY date_out DESC) AS rn
   FROM   tbl
   )
, y AS (
   SELECT *
   FROM   x
   WHERE  x.rn = 1
   AND    date_out >= '2012-10-01'
   AND    date_out <  '2012-11-01'
   )
, z AS (
   SELECT x.*
   FROM   x
   JOIN   y USING (reference)
   WHERE  x.rn = 2
   AND    x.centreid = 1
   )
SELECT y.taskid, y.reference, y.centreid, y.date_out
FROM   y
JOIN   z USING (reference)

UNION  ALL
SELECT taskid, reference, centreid, date_out
FROM   z
ORDER  BY reference, date_out;

Another way:

WITH x AS (
   SELECT *
         ,row_number() OVER (PARTITION BY reference ORDER BY date_out DESC) AS rn
   FROM   tbl
   )
,y AS (
    SELECT x.*, y.taskid AS taskid2, y.centreid AS centreid2, y.date_out AS date_out2
    FROM   x
    JOIN   x y USING (reference)
    WHERE  x.rn = 1
    AND    x.date_out >= '2012-10-01'
    AND    x.date_out <  '2012-11-01'
    AND    y.rn = 2
    AND    y.centreid = 1
    )
SELECT y.taskid, y.reference, y.centreid, y.date_out
FROM   y

UNION  ALL
SELECT y.taskid2, y.reference, y.centreid2, y.date_out2
FROM   y
ORDER  BY reference, date_out;

I'd expect the second one to be faster. Depends on your data distribution. Test with EXPLAIN ANALYZE.

Get all rows for qualifying references

WITH x AS (
   SELECT *
         ,row_number() OVER (PARTITION BY reference ORDER BY date_out DESC) AS rn
   FROM   tbl
   )
,y AS (
    SELECT reference
    FROM   x
    JOIN   x y USING (reference)
    WHERE  x.rn = 1
    AND    x.date_out >= '2012-10-01'
    AND    x.date_out <  '2012-11-01'
    AND    y.rn = 2
    AND    y.centreid = 1
    )
SELECT *
FROM   tbl
JOIN   y USING (reference)
ORDER  BY reference, date_out;

-> sqlfiddle

Answer to follow-up in comment

Separate groups if more than 30 days between entries.

WITH a AS (
   SELECT *
         ,lag(date_out) OVER (PARTITION BY reference ORDER BY date_out DESC) AS last_date
         ,CASE WHEN date_out > 
                   (lag(date_out) OVER (PARTITION BY reference ORDER BY date_out DESC) - 30)
               THEN 0 ELSE 1
          END AS step
   FROM   tbl
   )
,b AS (
   SELECT *
         ,sum(step) OVER (PARTITION BY reference ORDER BY date_out DESC) AS grp
   FROM a
   )
 ,c AS (
   SELECT *
         ,row_number() OVER (PARTITION BY reference, grp ORDER BY date_out DESC) AS rn
   FROM b
   )
,d AS (
   SELECT reference, grp
   FROM   c
   JOIN   c d USING (reference, grp)
   WHERE  c.rn = 1
   AND    c.date_out >= '2012-10-01'
   AND    c.date_out <  '2012-11-01'
   AND    d.rn = 2
   AND    d.centreid = 1
   )
SELECT b.taskid, b.reference, b.centreid, b.date_out
FROM   b
JOIN   d USING (reference, grp)
ORDER  BY reference, date_out

-> sqlfiddle

But while pure SQL is a beaut .. I would solve this procedurally in a plpgsql function. Very similar to this recent answer on SO. Would be faster, because it can be done in a single table scan.

PostgreSQL ERROR: Recursive Reference to Query ‘najdi_tot’ Explained

The problem was caused by multiplication in the recursive calls. When I deleted one of them, the problem was solved:

  WITH RECURSIVE najdi_tot(sj) AS (
    SELECT nad FROM s_vztah WHERE pod=581
    UNION
    SELECT pod FROM s_vztah WHERE nad=581
    UNION
    SELECT sv.nad FROM najdi_tot n, s_vztah sv 
    WHERE sv.pod = n.sj
  )
  SELECT array_agg(sj) FROM najdi_tot

Then I realized that the relation is symmetrical - at the first call I get all the linked rows and recursion only adds the row I had in the beginning. So I removed the recursion at all and everything is OK:

  WITH najdi_tot(sj) AS (
    SELECT nad FROM s_vztah WHERE pod=581
    UNION
    SELECT pod FROM s_vztah WHERE nad=581
  )
  SELECT array_agg(sj) FROM najdi_tot

Best Answer

Related Solutions

Postgresql – Tricky selection of grouped rows, selecting based on values of two distinct but related rows

Get qualifying rows only

Get all rows for qualifying references

Answer to follow-up in comment

PostgreSQL ERROR: Recursive Reference to Query ‘najdi_tot’ Explained

Related Question