Postgresql – Grouping by sequential relationships with Postgres

postgresqlpostgresql-9.3

This is quite similar to this question about continuous ranges, and this question about grouping by sequential numbers, but differs in that the sequences are not numeric. Given the following relationships as key pairs

a -- b -- c      e -- f -- g
     |   /
     |  /
      d

This is the table with example data (also on SQLFiddle):

CREATE TABLE relationships (
   name varchar(1),
   related varchar(1)
);

INSERT INTO relationships (name, related) VALUES 
('a', 'a'),
('a', 'b'),
('b', 'b'),
('b', 'a'),
('b', 'c'),
('b', 'd'),
('c', 'c'),
('c', 'b'),
('c', 'd'),
('d', 'd'),
('d', 'c'),
('d', 'b'),

('e', 'e'),
('e', 'f'),
('f', 'f'),
('f', 'e'),
('f', 'g'),
('g', 'g');

What is the most efficient way to produce an output that looks like this:

| group |    members    |
------------------------|
|   1   |  {a, b, c, d} |
|   2   |  {e, f, g}    |

or this:

| name |  group |
-----------------
|  a   |   1    |
|  b   |   1    |
|  c   |   1    |
|  d   |   1    |
|  e   |   2    |
|  f   |   2    |
|  g   |   2    |

I have thought about doing this operation outside of Postgres, but it seems like there must be a way to achieve this result with either a window function or PL/pgSQL.

Best Answer

Ugly, but it works!

First, define array_agg_mult from this question

CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
   ,STYPE     = anyarray
   ,INITCOND  = '{}'
);

and then run the query

WITH summary AS (
  SELECT name, array_agg(related) AS touches
  FROM relationships
  GROUP BY name
), 

grouped AS (
  SELECT name, (
    SELECT array_agg(uniques) FROM (
      select distinct unnest(array_agg_mult(sub.touches)) AS uniques 
      ORDER BY uniques
    ) x
  ) my_group

  FROM summary LEFT JOIN LATERAL (
    SELECT touches
    FROM summary r
    WHERE summary.touches && r.touches
    GROUP BY name, touches
  ) sub ON true
  GROUP BY summary.name
  ORDER BY summary.name
) 

SELECT DISTINCT my_group, row_number() over() as group_id 
FROM grouped 
GROUP BY my_group;

Which produces the following:

| my_group  |  group_id |
| {a,b,c,d} |      2    |
| {e,f,g}   |      1    |

SQLFiddle here - http://sqlfiddle.com/#!15/c8a5b/20. I'm a novice with this kind of query, so please let me know if there is a more efficient way to do this!

Related Solutions

PostgreSQL Top-K with Ties – How to Implement

You can use a window function for this:

select a,b,c
from (
  select a,b,c,
         dense_rank() over (order by a,b) as rnk
  from dbTable
) t
where rnk = 1;

For the "first" rows, it doesn't matter if you use rank() or dense_rank(). When you e.g. want the "second" ones, the rank() and dense_rank() would return different results in case of ties. Because rank() will have "gaps" in the numbers, but dense_rank() will not.

A possible speedup might be achieved by doing this in two steps and of course having an index on (a,b)

with ranked as (
  select *
  from (
    select a,b,
           dense_rank() over (order by a,b) as rnk
    from dbTable
  ) t
  where r.rnk = 1  -- (or <= for "top-k")
)
select t.a, t.b, t.c
from dbTable t
   join ranked r on r.a = t.a and r.b = t.b;

The idea is to give Postgres the chance to do an index-only scan for the ranking part and then join only the result of that to the base table to get the remaining column(s). The filtering on the rank is done inside the CTE as Postgres doesn't push down conditions from the outer query into the CTE itself (that's why I have the derived table inside the CTE)

I'm not sure if this really improves the performance, but I guess it would be worth trying and have a look a the execution plan with the real tables (and data).

Postgresql – Postgres Grouping of Like Results

If I understand you correctly you want a comma separated list of triggering_id's

select source_id, 
       string_agg(triggering_id::text,',') as id_list
from notifications
group by source_id;

Best Answer

Related Solutions

PostgreSQL Top-K with Ties – How to Implement

Postgresql – Postgres Grouping of Like Results

Related Question