PostgreSQL Order By – Ordering by Most Recent of a Particular Type

group byorder-byperformancepostgresqlquery-performance

Assume I have a PET table with three columns: Name, Type, and Date. I want to order the three columns by Date but only display the ONE most recent of a specific particular type ("Dog" for this example), all for the rest. For example, my raw list of data:

Name        Type        Date
Nemo        Fish        June 1, 2016
Fido        Dog         January 1, 2016
Felix       Dog         February 1, 2016
Whiskers    Cat         April 1, 2016
Marlin      Fish        August 1, 2016
Shifu       Cat         March 3, 2016

Would look like this when the query was applied.

Felix       Dog         February 1, 2016
Shifu       Cat         March 3, 2016
Whiskers    Cat         April 1, 2016
Nemo        Fish        June 1, 2016
Marlin      Fish        August 1, 2016

I'm sure there are multiple ways to do this, but only a few that are the most optimal from an efficiency perspective.

Best Answer

Solution 1

CREATE TABLE pet ( name text, type text, date date);

INSERT INTO pet VALUES
 ('Nemo', 'Fish', 'June 1, 2016'),
 ('Fido', 'Dog', 'January 1, 2016'),
 ('Felix', 'Dog', 'February 1, 2016'),
 ('Whiskers', 'Cat', 'April 1, 2016'),
 ('Marlin', 'Fish', 'August 1, 2016'),
 ('Shifu', 'Cat', 'March 3, 2016');

Using PostgreSQL-specific DISTINCT ON clause we can do this in a single table scan:

SELECT name, type, date
FROM (
  SELECT DISTINCT ON (
    type,
    CASE WHEN type<>'Dog' THEN date END
  )
  name, type, date
  FROM pet
  ORDER BY
    type,
    CASE WHEN type<>'Dog' THEN date END,
    date DESC
) AS subq1
ORDER BY date;

Solution 2

Using more ANSI-compatible SQL, like UNION and LIMIT. It will work on MySQL, DB2 and some others. Similar solution can be done on Oracle, just replace LIMIT with ROWNUM.

(
  SELECT * FROM pet
  WHERE type = 'Dog'
  ORDER BY date DESC
  LIMIT 1
) UNION ALL (
  SELECT * FROM pet
  WHERE type <> 'Dog'
)
ORDER BY date;

Notes on performance

With small tables (less than 1000000 rows), any solution will work. DISTINCT ON is slightly faster:

Iterations: 100000
Query:   /*q1*/ SELECT name, type, date FROM ( SELECT DISTINCT ON ( type, CASE WHEN type<>'Dog' THEN date END ) name, type, date FROM pet ORDER BY type, CASE WHEN type<>'Dog' THEN date END, date DESC ) AS subq1 ORDER BY date; 
Time:    24.834 s (31%)
Average: 0.248 ms
Rows:    500000
Winner:  80914 times (80%)
Query:   /*q2*/ ( SELECT * FROM pet WHERE type = 'Dog' ORDER BY date DESC LIMIT 1 ) UNION ALL ( SELECT * FROM pet WHERE type <> 'Dog' ) ORDER BY date; 
Time:    26.778 s (33%)
Average: 0.268 ms
Rows:    500000
Winner:  12881 times (12%)
Query:   /*q3*/ select name, type, date from ( select pet.*, case when type = 'Dog' then rank() over (partition by type order by date desc) else 1 end as rnk from pet ) as x where rnk = 1 order by date; 
Time:    27.490 s (34%)
Average: 0.275 ms
Rows:    500000
Winner:  6205 times (6%)

With "big data", make sure your query does only one, indexed, table scan.