Postgresql – Query returns all values instead of the row of the one with the max value

maxpostgresql

I have this database

I need the row of the product with the greatest unitprice. I did:

SELECT id, MAX(unitprice)
FROM product
GROUP BY id;

Instead, it returned every row in products. If I do this:

SELECT MAX(unitprice)
FROM product;

It returns the greatest price, as expected.

I looked into this (https://stackoverflow.com/questions/18957372/sql-max-product-price) and this (https://stackoverflow.com/questions/12366390/how-to-select-product-that-have-the-maximum-price-of-each-category) thread but I can't figure it out.

Help?

Best Answer

The difference in what you linked is the asker wants the max value per a specific field (i.e. aggregate of a GROUP BY), where you rather want the whole row that contains the max value of a specific field (i.e. take the top 1 row ordered by an aggregate).

To achieve what you want you need to use a window function to order the records by the unitprice field and then filter on that the top 1 of that result set. Here's an example query to achieve this:

WITH CTE_Product_Sorted AS -- CTE of the product dataset with a SortId generated from the order of the unitprice field
(
   SELECT *, ROW_NUMBER() OVER (ORDER BY unitprice DESC) AS SortId -- Generates a unique ID for every record, in the order of unitprice, from greatest to least
   FROM product
)

SELECT *
FROM CTE_Product_Sorted
WHERE SortId = 1 -- Filter out every other record except the one with the greatest unitprice

Note when you use the ROW_NUMBER() window function, if there's a tie in the logic of the ORDER BY clause, the ordering between the tied records is non-deterministic (random). Depending on the case, this is ok, or when it's not then a unique field needs to be added to the ORDER BY clause as a fallback (e.g. ORDER BY unitprice DESC, productid ASC results in the product that was created first to win in the case of a tie by unitprice.)

Related Solutions

Postgresql – Storing data in PostgreSQL: One table or two

If you mix the history data in with the current like that in order to speed up queries over a time period, you do so at the expense of slowing down queries for current data. You can add an extra column to explicitly mark the relevant rows as the current prices (and have id+flag as the PK) but that adds extra work to your business logic to both keep it maintained and respect it in all reports.

Of course if you move the price completely out of the main table you have a similar problem in finding the latest price becoming more expensive unless you have a "latest" flag or do-normalise slightly and keep a copy of the current price in the main table as well as the price history table. Personally I would do the latter, and use a trigger on that product table to automatically update the price history table when a new product is added or the price updated (I'm assuming the products tables does no see massive write activity most of the time so the performance impact of using a trigger here should be minimal) in order to remove that auditing task from your other logic to avoid bugs caused by new code turning up that forgets to update the history/audit.

_{Caveat: I'm answering this from a general PoV as I've not used postgres much in anger, so do some benchmarks before taking anything I've said regarding performance as fact in that system!}

Postgresql – Alternate per 2 postgresql rows, but start over if there are rows remaining

I think you need something using ROW_NUMBER() and (integer) division by 2, along the lines of this:

WITH p AS
( SELECT *, ROW_NUMBER() OVER (PARTITION BY supplier_id ORDER BY id)-1 AS rn
  FROM products
)
SELECT * 
FROM p
ORDER BY rn/2, supplier_id, rn ;

Tested at SQLFiddle.

Best Answer

Related Solutions

Postgresql – Storing data in PostgreSQL: One table or two

Postgresql – Alternate per 2 postgresql rows, but start over if there are rows remaining

Related Question