Postgresql – Removing duplicate-ish paths generated from recursive CTE

arrayduplicationpostgresqlrecursive

I have some multi-parented data, that I run a recursive CTE against to find common ancestors:

item_id |item_name |path_name    |id_path_array |
--------|----------|-------------|--------------|
1       |D         |B->Two->D    |{9,4,1}       |
2       |E         |B->Two->E    |{9,4,2}       |
13      |W         |B->Two->D->W |{9,4,1,13}    |
14      |Y         |B->Two->D->Y |{9,4,1,14}    |
15      |X         |B->Two->E->X |{9,4,2,15}    |
16      |Z         |B->Two->E->Z |{9,4,2,16}    |

However, what I really want here is just the first two rows; I want to remove the rows that represent paths that have already been covered by a shorter path. e.g. obviously if 'D' is a common ancestor then all of D's ancestors are too, so I don't want to include the paths for them.

I did think I had something with an array operator:

select * from common_ancestors q1
where exists (
    select 1 from common_ancestors q2
    where q1.id_path_array <@ q2.id_path_array and q1.item_id <> q2.item_id)

but even though it gives the right result here, I'm not convinced it's correct. I would be happy to hear opinions or alternative solutions .

Desired result:

item_id | item_name | path_name | id_path_array
------: | :-------- | :-------- | :------------
      1 | D         | B->Two->D | {9,4,1}      
      2 | E         | B->Two->E | {9,4,2}      

db-fiddle here

Best Answer

You need NOT EXISTS and the contains operator reversed:

select 
    q1.* 
from 
    common_ancestors q1
where not exists
    ( select 1 
      from   common_ancestors q2
      where  q1.id_path_array @> q2.id_path_array 
        and  q1.item_id <> q2.item_id
    ) ;

Test in dbfiddle.uk