Postgresql – How to apply ORDER BY and LIMIT in combination with an aggregate function

aggregateorder-bypostgresqlsubquery

A fiddle for my question can be found on https://dbfiddle.uk/?rdbms=postgres_10&fiddle=3cd9335fa07565960c1837aa65143685.

I have a simple table layout:

class
person: belongs to a class

I want to select all classes, and for each class, I want the first two person identifiers of the belonging persons sorted by descending name.

I solved this with the following query:

select     c.identifier, array_agg(p.identifier order by p.name desc) as persons
from       class as c
left join lateral (
             select   p.identifier, p.name
             from     person as p
             where    p.class_identifier = c.identifier
             order by p.name desc
             limit    2
           ) as p
on         true
group by   c.identifier
order by   c.identifier

Note: I could have used a correlation subquery in the SELECT clause, but I am trying to avoid that as part of a learning process.

As you can see, I am applying order by p.name desc in two places:

in the subquery
in the aggregate function

Is there a way to avoid that? My train of tought:

First, obviously I cannot remove the order by in the subquery, as that would give a query which does not meet my requirement as stated above.
Second, I think that the order by in the aggregate function cannot be left out, as row order of the subquery is not necessarily preserved in the aggregate function?

Should I rewrite the query?

Best Answer

I am applying order by p.name desc in two places ... Is there a way to avoid that?

Yes. Aggregate with an ARRAY constructor in the lateral subquery directly:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT ARRAY (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) AS persons
   ) p
ORDER  BY c.identifier;

You also don't need GROUP BY in the outer SELECT this way. Shorter, cleaner, faster.

I replaced the LEFT JOIN with a plain CROSS JOIN since the ARRAY constructor always returns exactly 1 row. (Like you pointed out in a comment.)

db<>fiddle here.

Preserve order of array elements after join

Order of rows in subqueries

To address your comment:

I learned that order of rows in a subquery is never guaranteed to be preserved in the outer query.

Well, no. While the SQL standard does not offer any guarantees, there are limited guarantees in Postgres. The manual:

This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.

If all you do in the next level is to aggregate rows, the order is positively guaranteed. Any yes, what we feed to the ARRAY constructor is a subquery, too. That's not the point. It would work with array_agg() as well:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT array_agg(identifier) AS persons
   FROM  (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) sub
   ) p
ORDER  BY c.identifier;

But I expect the ARRAY constructor to be faster for the case. See:

Related Solutions

Postgresql – Getting around the constraint “column must appear in the GROUP BY clause or be used in an aggregate function”

seharusnya seperti ini(supposed to be like this):

SELECT person.name, person.id, license.expiry_date, COUNT(car) FROM person
  JOIN license ON license.person_id = person.id
  JOIN car ON car.owner_id = person.id
WHERE person.name = 'Charles Bannerman'
GROUP BY person.name, person.id, license.expiry_date, car.car;

Postgresql – Postgres Nested WHEN Aggregate Function

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count(                                     d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
     , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
     , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."SettlementPointName" = 'John'
AND    d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
     , count(*) FILTER (WHERE d."SettlementPointPrice" <  10.5) AS da_00_10
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
                        AND   d."SettlementPointPrice" <  20.5) AS da_11_20
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
                        AND   d."SettlementPointPrice" <  30.5) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP  BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

Best Answer

Order of rows in subqueries

Related Solutions

Postgresql – Getting around the constraint “column must appear in the GROUP BY clause or be used in an aggregate function”

Postgresql – Postgres Nested WHEN Aggregate Function

Related Question