Say I have two tables:
CREATE TABLE datum
(
datum_id SERIAL PRIMARY KEY,
datum_text TEXT NOT NULL
);
CREATE TABLE datum_pair
(
datum_id1 INTEGER NOT NULL REFERENCES datum (datum_id),
datum_id2 INTEGER NOT NULL REFERENCES datum (datum_id)
);
datum_pair
records related pairs of rows in datum
. What I want to do is form identify groups that are all related, including relationships that pass through another row. For example, if the IDs 1 and 3 are related, the IDs 3 and 5 are related, and the IDs 5 and 7 are related, then 1, 3, 5, and 7 would all be in the same group. The number of rows participating in one of these chains is arbitrary.
I will need to perform various aggregations grouped by each chain. I'm not set on how the output should look, but one idea I have is to associate each ID of a chain with a single ID. For example, if 1, 3, 5, and 7 are all in the same chain, then this output would work:
datum_id | min_id_in_chain
----------+-----------------
1 | 1
3 | 1
5 | 1
7 | 1
Solutions that offer a different output format but still allow for aggregation by chain are welcome.
There are two additional complications:
datum_pair
always contains relationships in both directions. So if there is a row for IDs 1 and 3, there is also a row for 3 and 1. (In my use case, this is actually the result of a selfJOIN
, but I don't think this is especially relevant to the question.) Eachdatum
should only appear once in the output.- Not all rows in
datum
participate in a chain; many are solitary. But the final result still needs to include them. I would expect them to be identified as a chain containing only that row, so that aggregation would not combine these together.
I've created a sample SQL Fiddle containing 100 rows. To simplify testing the output, this sample relates all the odd numbered IDs and all the even numbered IDs except for 1 and 99. 1 and 99 are not related to any of the other rows.
Using PostgreSQL 9.3. (I would appreciate mentioning any simplifications provided by features of 9.4.)
Best Answer
That's a complicated puzzle. I've found a solution that starts with a list of all identifiers. Then for each identifier, it creates an array of identifiers that it is paired with. It iteratively expands that array (called "members") until all pairs are included.
Example at SQL Fiddle. Hopefully someone else can contribute a simpler, more elegant, or better performing solution!