I am not entirely sure but if the aggregated results are supposed to be per date and the date is production.scheduled_pull_date
, then perhaps this:
UPDATE
pOut
SET
pOut.lead_ppm = (
SELECT
MAX(p1.lead_ppm)
FROM
dbo.production AS p1
WHERE
(p1.tank = 'B' or p1.tank = 'C')
AND p1.scheduled_pull_date = pOut.scheduled_pull_date
),
pOut.sulfur_ppm = (
SELECT
MAX(p2.sulfur_ppm)
FROM
dbo.production AS p2
WHERE
(p2.tank = 'B' or p2.tank = 'C')
AND p2.scheduled_pull_date = pOut.scheduled_pull_date
)
FROM
dbo.production AS pOut,
dbo.assay_temp AS tOut
WHERE
pOut.tank = 'S'
AND pOut.cell_num = 1
AND pOut.scheduled_pull_date = tOut.formatted_date
;
Although I would also rewrite the join using the contemporary syntax:
FROM
dbo.production AS pOut
INNER JOIN dbo.assay_temp AS tOut ON pOut.scheduled_pull_date = tOut.formatted_date
WHERE
pOut.tank = 'S'
AND pOut.cell_num = 1
The assay_temp
doesn't seem needed in the subqueries at all, and instead of matching scheduled_pull_date
against formatted_date
the subqueries should correlate with the outer query. Since you want aggregations per date, it seemed to me the subqueries should match against pOut.scheduled_pull_date
.
You could use a window function here, but I would think there's actually a better solution with DISTINCT ON
.
First I simplified what you had so far:
SELECT p.name AS product_name, p.description AS product_description
, a.stock_sum, b.purchase_sum
, c.max_sales, o.outlet_name -- still missing
FROM Product p
LEFT JOIN (
SELECT product_id, SUM(copies) AS stock_sum
FROM Stock
GROUP BY 1
) a USING (product_id)
LEFT JOIN (
SELECT product_id, sum(copies) AS purchase_sum
FROM PurchaseItem
GROUP BY 1
) b USING (product_id)
-- c, o still missing
It should be considerably faster to aggregate counts before joining:
Also, LEFT JOIN
retains products in the result that don't have any purchases, yet, or aren't in stock any more.
Then add the missing parts:
LEFT JOIN (
SELECT DISTINCT ON (product_id)
pi.product_id, pu.outlet_id, sum(copies) AS max_sales
FROM Purchase pu
JOIN PurchaseItem pi USING (purchase_id)
GROUP BY 1, 2
ORDER BY 1, sum(copies) DESC NULLS LAST
) c USING (product_id)
LEFT JOIN Outlet o USING (outlet_id);
About DISTINCT ON
:
You can run DISTINCT
over the results of the aggregation. Consider the sequence of events in a query:
Optimize performance
It's probably cheaper to scan PurchaseItem
only once, using a CTE. But this also adds some overhead. You'll have to test which is faster:
WITH ct AS (
SELECT pi.product_id, pu.outlet_id, sum(pi.copies) AS sales
FROM PurchaseItem pi
JOIN Purchase pu USING (purchase_id)
GROUP BY 1, 2
)
SELECT p.name AS product_name, p.description AS product_description
, a.stock_sum, b.purchase_sum
, c.max_sales, o.outlet_name
FROM Product p
LEFT JOIN (
SELECT product_id, SUM(copies) AS stock_sum
FROM Stock
GROUP BY 1
) a USING (product_id)
LEFT JOIN (
SELECT product_id, sum(sales) AS purchase_sum
FROM ct
GROUP BY 1
) b USING (product_id)
LEFT JOIN (
SELECT DISTINCT ON (product_id)
product_id, outlet_id, sales AS max_sales
FROM ct
ORDER BY product_id, sales DESC
) c USING (product_id)
LEFT JOIN Outlet o USING (outlet_id);
Test performance with EXPLAIN ANALYZE
(a couple of times to exclude caching effects).
Best Answer
The function is called
GREATEST()