Postgresql – Group by array overlapping

I have a PostgreSQL table with id and clusters like this:

CREATE TABLE w (id bigint, clst int);
INSERT INTO w (id,clst)
VALUES 
  (1,0),
  (1,4),
  (2,1),
  (2,2),
  (2,3),
  (3,2),
  (4,2),
  (5,4),
  (6,5);

If you aggregate clusters grouped by id, you can see that there are overlapping values in the cluster arrays:

select id, array_agg(clst) clst from w group by id order by id;
 id |  clst
----+---------
  1 | {0,4}
  2 | {1,2,3}
  3 | {2}
  4 | {2}
  5 | {4}
  6 | {5}

i.e. cluster 4 covers id 1 and 5, cluster 2 covers id 2, 3 and 4, whereas cluster 5 corresponds only to one id.

How can I now aggregate ids grouped by cluster arrays overlapping?
i.e. the expected result is:

 id      | clst
---------+-------
 {1,5}   | {0,4,4}
 {2,3,4} | {1,2,3,2,2}
 {6}     | {5}

I don't care much about the cluster column just need ids properly aggregated.

There is no restriction on number of possible overlappings. Number of clusters per id is not restricted either (it can be hundreds or even more). Clusters are assined to ids not sequenctially.

There are millions of rows in the table!!!

Using PostgreSQL 11.

with recursive a as ( select id, array_agg(distinct clst) clst from w group by id) , t(id,pid,clst) as ( select id,id,clst from a union all select t.id,a.id,t.clst|a.clst from t join a on a.id<>t.pid and t.clst&&a.clst and not t.clst@>a.clst) , d as ( select distinct on(id) id, clst from t order by id, cardinality(clst) desc) select array_agg(id), clst from d group by clst;

Postgresql – Group by array overlapping

Best Answer

Related Question

Best Answer

Related Solutions

Postgresql – Can this array group count query be improved

PostgreSQL return second ‘Group’ as an array

Related Question