How to Replace Multiple Correlated Subqueries in SELECT Clause with LEFT JOINs

postgresql

A fiddle for my question can be found on https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e387589d446d9c9a952294f8c7a98494.

I have simple table layout:

class
person: belongs to a class
room:   belongs to a class

The following query selects all classes with its persons embedded:

select    class.identifier, array(select person.identifier from person where person.class_identifier = class.identifier) as persons
from      class
order by  class.identifier;

While experimenting and learning more about correlated subqueries, I noticed that the query above could be rewritten by replacing the correlated subquery with a LEFT JOIN combined with a GROUP BY:

select     class.identifier, array_agg(person.identifier) as persons
from       class
left join  (
             select  person.identifier, person.class_identifier
             from    person
           ) as person
on         class.identifier = person.class_identifier
group by   class.identifier
order by   class.identifier;

Please note that I made the assumption that each class has at least one person. If not, I could add coalesce() around json_agg.

In my second case, I am going to select all classes with its persons and its rooms embedded. Let's first write in the same manner as the first query above:

select    class.identifier, array(select person.identifier from person where person.class_identifier = class.identifier) as persons,
                            array(select room.identifier from room where room.class_identifier = class.identifier) as rooms
from      class
order by  class.identifier;

This gives the expected results.

Now I want to repeat what I did before: introduce LEFT JOINs. My first attempt was as following:

select     class.identifier, array_agg(person.identifier) as persons
                           , array_agg(room.identifier) as rooms
from       class
left join  (
             select  person.identifier, person.class_identifier
             from    person
           ) as person
on         class.identifier = person.class_identifier
left join  (
             select  room.identifier, room.class_identifier
             from    room
           ) as room
on         class.identifier = room.class_identifier
group by   class.identifier
order by   class.identifier;

Now I am getting wrong results. Persons or rooms are being repeated in the output arrays. I understand why this happens (we are grouping on the cartesian product between class, person and room, hence every person is repeated for every room and vice versa), but I don't know how to proceed.

How can I continue here? Is replacing multiple correlated subqueries in a SELECT clause to LEFT JOIN + GROUP BY possible to start with, or need I other tricks?

Best Answer

You can aggregate over distinct values as:

array_agg(distinct person.identifier)

However, I'm not sure why you are joining against these sub-selects. You can join directly against the tables like:

select class.identifier, array_agg(distinct person.identifier) as persons
                       , array_agg(distinct room.identifier) as rooms
from class
left join  person
    on class.identifier = person.class_identifier
left join room
    on class.identifier = room.class_identifier
group by   class.identifier
order by   class.identifier;

I would also suggest that you use an alias for your tables:

select c.identifier, array_agg(distinct p.identifier) as persons
                   , array_agg(distinct r.identifier) as rooms
from class c
left join  person p
    on c.identifier = p.class_identifier
left join room r
    on c.identifier = r.class_identifier
group by   c.identifier
order by   c.identifier;

Oh, and welcome to the site. A nice first question with code and sample data, well done.