Postgresql – How to apply ORDER BY and LIMIT in combination with an aggregate function

aggregateorder-bypostgresqlsubquery

A fiddle for my question can be found on https://dbfiddle.uk/?rdbms=postgres_10&fiddle=3cd9335fa07565960c1837aa65143685.

I have a simple table layout:

class
person: belongs to a class

I want to select all classes, and for each class, I want the first two person identifiers of the belonging persons sorted by descending name.

I solved this with the following query:

select     c.identifier, array_agg(p.identifier order by p.name desc) as persons
from       class as c
left join lateral (
             select   p.identifier, p.name
             from     person as p
             where    p.class_identifier = c.identifier
             order by p.name desc
             limit    2
           ) as p
on         true
group by   c.identifier
order by   c.identifier

Note: I could have used a correlation subquery in the SELECT clause, but I am trying to avoid that as part of a learning process.

As you can see, I am applying order by p.name desc in two places:

  • in the subquery
  • in the aggregate function

Is there a way to avoid that? My train of tought:

  • First, obviously I cannot remove the order by in the subquery, as that would give a query which does not meet my requirement as stated above.

  • Second, I think that the order by in the aggregate function cannot be left out, as row order of the subquery is not necessarily preserved in the aggregate function?

Should I rewrite the query?

Best Answer

I am applying order by p.name desc in two places ... Is there a way to avoid that?

Yes. Aggregate with an ARRAY constructor in the lateral subquery directly:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT ARRAY (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) AS persons
   ) p
ORDER  BY c.identifier;

You also don't need GROUP BY in the outer SELECT this way. Shorter, cleaner, faster.

I replaced the LEFT JOIN with a plain CROSS JOIN since the ARRAY constructor always returns exactly 1 row. (Like you pointed out in a comment.)

db<>fiddle here.

Related:

Order of rows in subqueries

To address your comment:

I learned that order of rows in a subquery is never guaranteed to be preserved in the outer query.

Well, no. While the SQL standard does not offer any guarantees, there are limited guarantees in Postgres. The manual:

This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.

If all you do in the next level is to aggregate rows, the order is positively guaranteed. Any yes, what we feed to the ARRAY constructor is a subquery, too. That's not the point. It would work with array_agg() as well:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT array_agg(identifier) AS persons
   FROM  (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) sub
   ) p
ORDER  BY c.identifier;

But I expect the ARRAY constructor to be faster for the case. See: