Postgresql – Get results from “combined, either inner join”

database-designjoin;postgresqlunion

I have a feeling this must have been asked multiple times, but I just cannot find a solution, or even a questions probably because of difficulty in explaining.

We all know that inner join basically gives an "intersection" of two tables. And hence multiple inner joins give the intersection of all tables. What I -however- wish is to get an inner join that gets an intersection of table B with A or table C with A.

Like for a request that gets all organizations I'm either moderator or admininistrator.

SELECT *
FROM organization
INNER JOIN moderator_organization
  ON organization.id = moderator_organization.organization AND moderator_organization.user = 10
INNER JOIN admin_organization
  ON organization.id = admin_organization.organization AND admin_organization.user = 10

Above however would only select the organizations from which the user (with id 10) is both admin and organizer, not one where either is true. To visualize in a diagram I'd like:

enter image description here

Where the red region is what I'd like to request.

Best Answer

Venn diagrams are not ideal to visualize join operations like a_horse commented:

Venn diagrams do not visualize joins, but set operations like union, intersect or except.

And his link to illustrate:

The twist, I think, is this: the Venn diagram is the right tool to visualize your objective, but the SQL JOIN is the wrong approach. You need a UNION (or equivalent) at the core of your query.

Title and the example query are misleading like that. What's more, you start with:

SELECT * FROM ...

That gets all columns from all joined tables. Also not the expressed objective:

all organizations I'm either moderator or administrator.

UNION

The core query is:

SELECT organization FROM admin_organization     WHERE user = 10
UNION
SELECT organization FROM moderator_organization WHERE user = 10

UNION. Not JOIN. And not UNION ALL - we don't want duplicate organizations in the result.

The column name organization is also misleading (IMHO). Should really be something like organization_id for clarity.

The resulting set of unique IDs may already be all that's needed. To flesh it out with more (or all) attributes of the organization (columns of table organization), now we JOIN to the table:

SELECT *
FROM  (
   SELECT organization AS id FROM admin_organization WHERE user = 10
   UNION
   SELECT organization FROM moderator_organization WHERE user = 10
   ) x
JOIN   organization USING (id);

It's typically (substantially) cheaper to apply UNION on just the ID column, and then join. It may even be a necessity, if some of the columns have types have data types with no equality operator. More common than one might think. See:

Assuming referential integrity, enforced with FK constraints, nothing is lost in the join.

For convenience, it added a column alias in the subquery (organization AS id), so that we can use the simplified join condition with USING (id) which, in turn, allows us to use the simple (and now correct) SELECT * in the outer query to get all columns of the table organization without a duplicate ID column.

Alternative with EXISTS

Shorter equivalent that also avoids undue duplication or elimination of rows:

SELECT *
FROM   organization o
WHERE  EXISTS (SELECT FROM admin_organization     WHERE user = 10 AND organization = o.id)
   OR  EXISTS (SELECT FROM moderator_organization WHERE user = 10 AND organization = o.id);

Naming convention

A naming convention with descriptive names can avoid some of the confusion and noise. Use organization_id (or org_id if you prefer short names and there is no ambiguity) for all three: admin_organization.organization, moderator_organization.organization, and organization.id.