Oracle Function to Determine Maximum Value in a Series

datemaxoracleupdate

Say I have a table with columns C and D (among others but these are the relevant ones) with the date type.

I am instructed to update the values in C to be 90 days before its corresponding value in D, or the current date, whichever is later.

I'm given the hint that it "will be useful to search for the Oracle function that determines the maximum value in a series. Hint: it is not max(), which is an aggregate function!"

I have no idea what this function is and would appreciate any help!

Best Answer

The function is called GREATEST()

GREATEST returns the greatest of a list of one or more expressions. Oracle Database uses the first expr to determine the return type. If the first expr is numeric, then Oracle determines the argument with the highest numeric precedence, implicitly converts the remaining arguments to that data type before the comparison, and returns that data type. If the first expr is not numeric, then each expr after the first is implicitly converted to the data type of the first expr before the comparison.

Related Solutions

Sql-server – the most compact way to calculate and then Update a maximum value for each day’s data

I am not entirely sure but if the aggregated results are supposed to be per date and the date is production.scheduled_pull_date, then perhaps this:

UPDATE
  pOut
SET
  pOut.lead_ppm = (
    SELECT
      MAX(p1.lead_ppm)
    FROM
      dbo.production AS p1
    WHERE
       (p1.tank = 'B' or p1.tank = 'C')
       AND p1.scheduled_pull_date = pOut.scheduled_pull_date
  ),
  pOut.sulfur_ppm = (
    SELECT
      MAX(p2.sulfur_ppm)
    FROM
      dbo.production AS p2
    WHERE
      (p2.tank = 'B' or p2.tank = 'C')
      AND p2.scheduled_pull_date = pOut.scheduled_pull_date
  )
FROM
  dbo.production AS pOut,
  dbo.assay_temp AS tOut
WHERE
  pOut.tank = 'S'
  AND pOut.cell_num = 1
  AND pOut.scheduled_pull_date = tOut.formatted_date
;

Although I would also rewrite the join using the contemporary syntax:

FROM
  dbo.production AS pOut
  INNER JOIN dbo.assay_temp AS tOut ON pOut.scheduled_pull_date = tOut.formatted_date
WHERE
  pOut.tank = 'S'
  AND pOut.cell_num = 1

The assay_temp doesn't seem needed in the subqueries at all, and instead of matching scheduled_pull_date against formatted_date the subqueries should correlate with the outer query. Since you want aggregations per date, it seemed to me the subqueries should match against pOut.scheduled_pull_date.

Postgresql – Displaying a value associated with a maximum value

You could use a window function here, but I would think there's actually a better solution with DISTINCT ON.

First I simplified what you had so far:

SELECT p.name AS product_name, p.description AS product_description
     , a.stock_sum, b.purchase_sum
     , c.max_sales, o.outlet_name   -- still missing
FROM   Product p
LEFT   JOIN (
   SELECT product_id, SUM(copies) AS stock_sum
   FROM   Stock
   GROUP  BY 1
   ) a USING (product_id)
LEFT   JOIN (
   SELECT product_id, sum(copies) AS purchase_sum
   FROM   PurchaseItem
   GROUP  BY 1
   ) b USING (product_id)
-- c, o still  missing

It should be considerably faster to aggregate counts before joining:

Why does the following join increase the query time significantly?

Also, LEFT JOIN retains products in the result that don't have any purchases, yet, or aren't in stock any more.

Then add the missing parts:

LEFT   JOIN (
   SELECT DISTINCT ON (product_id)
          pi.product_id, pu.outlet_id, sum(copies) AS max_sales
   FROM   Purchase      pu
   JOIN   PurchaseItem  pi USING (purchase_id)
   GROUP  BY 1, 2
   ORDER  BY 1, sum(copies) DESC NULLS LAST
   ) c  USING (product_id)
LEFT   JOIN Outlet o USING (outlet_id);

About DISTINCT ON:

Select first row in each GROUP BY group?

You can run DISTINCT over the results of the aggregation. Consider the sequence of events in a query:

Best way to get result count before LIMIT was applied in PHP/PostgreSQL

Optimize performance

It's probably cheaper to scan PurchaseItem only once, using a CTE. But this also adds some overhead. You'll have to test which is faster:

WITH ct AS (
   SELECT pi.product_id, pu.outlet_id, sum(pi.copies) AS sales
   FROM   PurchaseItem  pi 
   JOIN   Purchase      pu USING (purchase_id)
   GROUP  BY 1, 2
   )
SELECT p.name AS product_name, p.description AS product_description
     , a.stock_sum, b.purchase_sum
     , c.max_sales, o.outlet_name
FROM   Product p
LEFT   JOIN (
   SELECT product_id, SUM(copies) AS stock_sum
   FROM   Stock
   GROUP  BY 1
   ) a USING (product_id)
LEFT   JOIN (
   SELECT product_id, sum(sales) AS purchase_sum
   FROM   ct
   GROUP  BY 1
   ) b USING (product_id)
LEFT   JOIN (
   SELECT DISTINCT ON (product_id)
          product_id, outlet_id, sales AS max_sales
   FROM   ct
   ORDER  BY product_id, sales DESC
   ) c  USING (product_id)
LEFT   JOIN Outlet o USING (outlet_id);

Test performance with EXPLAIN ANALYZE (a couple of times to exclude caching effects).

Best Answer

Related Solutions

Sql-server – the most compact way to calculate and then Update a maximum value for each day’s data

Postgresql – Displaying a value associated with a maximum value

Optimize performance

Related Question