PostgreSQL – Find Groups with Same Members

postgresqlrelational-division

Given the following schema:

User
  id

Group
  id

Membership
  group_id
  user_id

(With a unique index on group_id, user_id)

How would I select the group(s) of which the same two users are a member?

What I have now:

SELECT   "Membership"."group_id" 
FROM     "Membership" 
WHERE    "Membership"."user_id" IN (1, 2) 
GROUP BY "Membership"."group_id" 
HAVING   COUNT("Membership"."group_id") >= 2

Which seems to work, (it only retuns the group ids of which both users are a member), but looks rather confusing.

Is there a better way to express this query?

EDIT

So given the following data:

Users
-----
id
1
2
3

Group
-----
id
1
2
3

Membership
----------
group_id user_id
1        1
1        2
2        1
2        3
3        2
3        3

I'd like to find groups of which both user 1 and 2 are a member.

So the result of the query with user_id 1 and 2 should be group 1.
The same query for user_id 1 and 3 should return group 2.
And the query for user_id 2 and 3 should return group 3.

Best Answer

This is called relational division and your query is pretty much what you can do with standard SQL.

A different way to write this in Postgres would be to use arrays:

select group_id
from membership
group by group_id
having array_agg(user_id order by user_id) = array[1,3];

It's important to also order the IDs inside the array[1,3] constant because array[1,3] is a different array than array[3,1];

The above returns those groups that consist exactly of those two members. Your query returns those that consists at least of those two members. That can be written with an array as well:

select group_id
from membership
group by group_id
having array_agg(user_id order by user_id) @> array[1,3];

That uses the "contains" operator for arrays which tests if the array on the left side contains all elements of the array on the right side. But the array on the left side is allowed to have more elements.


The check for an exact match is probably faster using the array (because it would require an additional sub-query with standard SQL), but I don't think that the "at least" query makes a big difference in performance.