PostgreSQL Recursive Query – How to Break Out of a Recursive Query

postgresqlpostgresql-11

I'm having some trouble getting the correct data out of a tree structure, where I only want the first nodes in a branch matching criteria to be the result of the query.

I have this table of domains:

create table domains(
    id bigint,
    domain_name varchar,
    parent_id bigint,
    valid boolean
)

insert into domains values
    (1, 'example.com', null, false),
    (2, 'a.example.com', 1, true),
    (3, 'b.example.com', 1, false),
    (4, 'c.b.example.com', 3, true),
    (5, 'd.a.example.com', 2, true)

This gives this tree, with the valid domains in green:

domain graph

Based on this, I want a.example.com and c.b.example.com to be returned as valid domains. Had the top level domain example.com been valid, only that should be returned. ("valid" is just a flag determined elsewhere.)

I have a recursive query that fetches the whole domain tree:

WITH RECURSIVE valid_domains AS (
  SELECT id, domain_name, valid FROM domains
  WHERE parent_id IS NULL

  UNION ALL

  SELECT d.id, d.domain_name, d.valid FROM domains d
  JOIN valid_domains vd ON d.parent_id = vd.id
)
SELECT * FROM valid_domains

…but then my stumbling block is figuring out how to filter on valid domains and break that branch when I hit one that's valid, without returning any valid domains below it.

I've looked at window functions, but haven't been able to mesh that with the recursive query thus far. Am I taking the wrong approach by making this recursive in the first place?

Best Answer

WITH RECURSIVE cte AS (
  SELECT id, domain_name, valid 
  FROM domains
  WHERE parent_id IS NULL
UNION ALL
  SELECT domains.id, domains.domain_name, domains.valid 
  FROM domains
  JOIN cte ON domains.parent_id = cte.id
  WHERE NOT cte.valid -- stop recursion when valid node reached
)

SELECT id, domain_name
FROM cte
WHERE valid

fiddle