Postgresql – Grouping by sequential relationships with Postgres

postgresqlpostgresql-9.3

This is quite similar to this question about continuous ranges, and this question about grouping by sequential numbers, but differs in that the sequences are not numeric. Given the following relationships as key pairs

a -- b -- c      e -- f -- g
     |   /
     |  /
      d

This is the table with example data (also on SQLFiddle):

CREATE TABLE relationships (
   name varchar(1),
   related varchar(1)
);

INSERT INTO relationships (name, related) VALUES 
('a', 'a'),
('a', 'b'),
('b', 'b'),
('b', 'a'),
('b', 'c'),
('b', 'd'),
('c', 'c'),
('c', 'b'),
('c', 'd'),
('d', 'd'),
('d', 'c'),
('d', 'b'),

('e', 'e'),
('e', 'f'),
('f', 'f'),
('f', 'e'),
('f', 'g'),
('g', 'g');

What is the most efficient way to produce an output that looks like this:

| group |    members    |
------------------------|
|   1   |  {a, b, c, d} |
|   2   |  {e, f, g}    |

or this:

| name |  group |
-----------------
|  a   |   1    |
|  b   |   1    |
|  c   |   1    |
|  d   |   1    |
|  e   |   2    |
|  f   |   2    |
|  g   |   2    |

I have thought about doing this operation outside of Postgres, but it seems like there must be a way to achieve this result with either a window function or PL/pgSQL.

Best Answer

Ugly, but it works!

First, define array_agg_mult from this question

CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
   ,STYPE     = anyarray
   ,INITCOND  = '{}'
);

and then run the query

WITH summary AS (
  SELECT name, array_agg(related) AS touches
  FROM relationships
  GROUP BY name
), 

grouped AS (
  SELECT name, (
    SELECT array_agg(uniques) FROM (
      select distinct unnest(array_agg_mult(sub.touches)) AS uniques 
      ORDER BY uniques
    ) x
  ) my_group

  FROM summary LEFT JOIN LATERAL (
    SELECT touches
    FROM summary r
    WHERE summary.touches && r.touches
    GROUP BY name, touches
  ) sub ON true
  GROUP BY summary.name
  ORDER BY summary.name
) 

SELECT DISTINCT my_group, row_number() over() as group_id 
FROM grouped 
GROUP BY my_group;

Which produces the following:

| my_group  |  group_id |
| {a,b,c,d} |      2    |
| {e,f,g}   |      1    |

SQLFiddle here - http://sqlfiddle.com/#!15/c8a5b/20. I'm a novice with this kind of query, so please let me know if there is a more efficient way to do this!