Postgresql – Count rows with max per group and additional condition

greatest-n-per-groupgroup bypostgresqlwindow functions

I have the following table:

 horse_main_id | race_id |    horse_name     | rating100 | race_result 
---------------+---------+-------------------+-----------+-------------
            23 |     159 | Angelology        |       126 |           1
            24 |     159 | Arwoc             |       118 |            
            25 |     159 | Marlo Man         |       118 |            
            26 |     159 | Hucknall          |       113 |            
            27 |     159 | It's Electrifying |       110 |           2
            28 |     159 | Monte Carlo       |       107 |            
            29 |     159 | Cross Constance   |       103 |            
            30 |     160 | Auld Burns        |       119 |            
            31 |     160 | Diamond Jim       |       117 |            
            33 |     160 | Livery            |       115 |            
            34 |     160 | Paraggi           |       115 |            
            35 |     160 | I'll'ava'alf      |       110 |           2
            36 |     160 | Jacks 'n' Kings   |       108 |            
            32 |     160 | Fastnet Isle      |       115 |           1

I would like to operate on groups of race_id, find all groups that have the max(rating100) for their group and race_result = 1, and count all the times this happens.

So this example should return only 1, as for race_id 159 group the max rating100 of 126 also has race_result = 1, but for race_id 160 group, the max rating100 is 119, but does not have race_result = 1.

This is the closest I've got (was told I should use window functions):

SELECT * FROM (
    SELECT horse_name, race_id, race_result
         , max(rating100) OVER (partition by race_id) AS max_rating
    FROM horse_main) t
WHERE race_result = 1;

Best Answer

Since you need a subquery in either case, I would use a plain aggregate in the subquery (may be cheaper):

SELECT count(*)
FROM  (SELECT race_id, max(rating100) AS rating100
       FROM   horse_main
       GROUP  BY 1) x
JOIN   horse_main h USING (race_id, rating100)
WHERE  h.race_result = 1;

If there are many rows per race_id, it will be faster to get group-wise maxima with one of these techniques:


Or use the window function rank() - that's what you count effectively: all greatest rating100 per race_id with race_result = 1:

SELECT count(*)
FROM  (SELECT race_result, rank() OVER (PARTITION BY race_id ORDER BY rating100 DESC) AS rnk
       FROM   horse_main) x
WHERE  rnk = 1
AND    race_result = 1;

An index on (race_id, rating100 DESC, race_result DESC) should help performance with big tables.

Assuming rating100 is defined NOT NULL, else you need to add NULLS LAST in query and index.