Postgresql – Are ORDER BY clause & ROW_NUMBER duplicating functionality when I need top n queries

greatest-n-per-grouppostgresqlwindow functions

I have a table that tracks violations for a student. I want to count the number of violations and select top 2 violators from each class. The query would look like this

SELECT *
FROM
  ( SELECT "people"."id", "name", "class",
    ROW_NUMBER() OVER (PARTITION BY "class" ORDER BY COUNT("violation") DESC) AS "v"
    FROM "people"
    INNER JOIN "discipline" on ("discipline"."people_id" = "people"."id")
    GROUP BY "people"."id", "name", "class"
  ) AS "v_table"
WHERE v < 3

The query seems inefficient because it sorts the count first with ORDER BY and then assigns a ROW_NUMBER. If I already have it sorted over a partition, how can I skip the ROW_NUMBER assignment & get the top 2.

Update :
Adding data tables & SQL fiddle (actually violation column is redundant, simply an entry in discipline table means that there's been a violation. If I remove it, should the Count be done on "discipline"."people_id"? like this )

Discipline                     People       
-------------------------      --------------
id  people_id  violation       id  name  class                    
1   1          True            1   Rob   A                 
2   1          True            2   Jen   B                 
3   2          True            3   Tom   C                
4   3          True            4   Ted   A                 
5   4          True            5   Tim   A            
6   1          True            ...                               
7   4          True         
...                                   

Using PostgreSQL 9.3

Best Answer

First, to answer the questions implied in the comments, that the assignment of row numbers with the ROW_NUMBER() aggregate seems inefficient because we already have the COUNT(violation) numbers:

This is needed because the COUNT numbers can be different for each partition (class). Since we want the 2 top numbers (for each class), we can't find a useful condition for that. With the row numbers, we can use the WHERE v < 3 which gives us the top 2.

In version 9.3, the LATERAL joins were added in Postgres, which are similar to the CROSS and OUTER APPLY of SQL-Server. With this new kind of join, you can write a query that uses the COUNT numbers and a TOP 2 for each partition. Whether it is more or less efficient, you can test:

WITH classes AS
  ( SELECT DISTINCT class
    FROM people
  ) 
                             -- if you have a "classes" table, skip the above lines
SELECT 
    v.id, v.name, c.class,
    v.violations
FROM
    classes AS c,
  LATERAL 
    ( SELECT p.id, p.name,
             COUNT(d.violation) AS violations
      FROM people AS p
        INNER JOIN discipline AS d
          ON d.people_id = p.id
      WHERE p.class = c.class 
      GROUP BY p.id, p.name
      ORDER BY violations DESC
      LIMIT 2
  ) AS v
ORDER BY
    c.class, v.violations DESC ;

Tested at SQL-Fiddle