I have a PostgreSQL database for a supermarket (it's a toy problem) and I need to find which outlet has sold the most copies of each product and display it in a query which also shows the product name, description, copies in stock across all stores, copies purchased across all stores.
I think I have the first few columns queried appropriately, as below:
SELECT a.ProductName, a.ProductDescription, a.StockSum, b.PurchaseSum, c.MaxSales
FROM (SELECT Product.Name AS ProductName, Product.Description AS ProductDescription
, SUM(Stock.copies) AS StockSum
FROM Product
INNER JOIN Stock
ON Stock.product_id = Product.product_id
GROUP BY Product.name, Product.description) AS a
FULL JOIN
(SELECT Product.name AS ProductName, Product.description AS ProductDescription
, SUM(PurchaseItem.copies) AS PurchaseSum
FROM Product
INNER JOIN PurchaseItem
ON PurchaseItem.product_id = Product.product_id
GROUP BY Product.name, Product.description) AS b;
But I cannot for the life of me work out how to pull the appropriate Outlet.name
associated with the MAX of the SUM of all PurchaseItem.copies
corresponding to a particular product_id
at a particular store. It seems like a very complex query, and it has flummoxed me!
The structure of the database is that Purchase
references Outlet
, while PurchaseItem
references Purchase
and a single Product
(with PurchaseItem.copies
recording the number of sales of that product in the purchase).
Best Answer
You could use a window function here, but I would think there's actually a better solution with
DISTINCT ON
.First I simplified what you had so far:
It should be considerably faster to aggregate counts before joining:
Also,
LEFT JOIN
retains products in the result that don't have any purchases, yet, or aren't in stock any more.Then add the missing parts:
About
DISTINCT ON
:You can run
DISTINCT
over the results of the aggregation. Consider the sequence of events in a query:Optimize performance
It's probably cheaper to scan
PurchaseItem
only once, using a CTE. But this also adds some overhead. You'll have to test which is faster:Test performance with
EXPLAIN ANALYZE
(a couple of times to exclude caching effects).