Postgresql – Apply limit and conditions per grouped data

group byorder-bypostgresql

I am building an api endpoint to send statistics data that then is going to be plotted on a graph. The hardest part is being the query I need to execute. To sum up I want the first n most sold products per day in a period. The period, for instance, could be any date range.

Suppose I have a sales table with the following data (this is not the real schema I have defined):

---------------------------------
date       | product_name | count
---------------------------------
2015-01-08 | A            | 10
2015-01-08 | B            | 5
2015-01-08 | C            | 1
2015-02-08 | A            | 5
2015-02-08 | B            | 3
2015-02-08 | C            | 100

And suppose I want the 2 most sold products per day. I want a query that
returns the following:

---------------------------------
date       | product_name | count
---------------------------------
2015-01-08 | A            | 10
2015-01-08 | B            | 5
2015-02-08 | C            | 100
2015-02-08 | A            | 5

Note that besides that C is the most sold product in the whole period, it does not appear in the top 2 of 2015-01-08

Is there a way to achieve what I want?

Thanks 🙂

Best Answer

One way to solve such problems, is using window functions:

WITH cte AS
    ( SELECT date, product_name, count,
             ROW_NUMBER() OVER (PARTITION BY date
                                ORDER BY count DESC) AS rn
      FROM sales
      WHERE date >= '2015-07-01'     -- example range
        AND date < '2015-08-01'      -- 2015, July
    )
SELECT date, product_name, count
FROM cte
WHERE rn <= 2           -- max of 2 products per date
ORDER BY                     -- this does not affect the result 
    date, count DESC ;       -- only its order

If there are ties, they will be resolved arbitrarily and only 2 rows per date will be returned. If you want to resolve ties with a specific preference, you can change the order inside the OVER clause. Say you want the products' names to be sorted alphabetically when they tie in top sales, use (... ORDER BY count DESC, product_name).

If you want all the (tied) products per date, just change ROW_NUMBER() to RANK().