PostgreSQL – Multiply by Value from Related Row with Latest Date

greatest-n-per-grouppostgresql

I have two tables, one with staff time entries, another with staff rates starting from a particular date:

time_table

id | staff_id | entry_date | hours
----------------------------------
1  | 1        | 15-01-2019 | 1
2  | 1        | 15-02-2019 | 2
3  | 1        | 15-03-2019 | 3
4  | 2        | 15-01-2019 | 4
5  | 2        | 15-02-2019 | 5
6  | 2        | 15-03-2019 | 6

rates_table

id | staff_id | start_date | rate
----------------------------------
1  | 1        | 01-01-2019 | 1
2  | 1        | 01-02-2019 | 2
3  | 1        | 01-03-2019 | 3
4  | 2        | 01-01-2019 | 4
5  | 2        | 01-02-2019 | 5
6  | 2        | 01-03-2019 | 6

I would like to multiply the time entries by the rate for that staff member that is the most recent, occurring on or before that time entry.

I have this query but I have no idea how to select the most recent rate that occurs before the time entry:

select t.staff_id, t.entry_date, t.hours * r.rate as total_rate 
from time_table t
left join rates_table r on r.staff_id = t.staff_id and r.start_date < t.entry_date;

https://rextester.com/DDK49143

I would like a result like the following:

staff_id | entry_date | total_rate
----------------------------------
1        | 15-01-2019 | 1
1        | 15-02-2019 | 4
1        | 15-03-2019 | 9
2        | 15-01-2019 | 16
2        | 15-02-2019 | 25
2        | 15-03-2019 | 36

How could I do this in Postgres?

Best Answer

A LATERAL subquery would do the job:

SELECT t.staff_id, t.entry_date, t.hours * r.rate AS total_rate 
FROM   time_table t
LEFT   JOIN LATERAL (
   SELECT r.rate
   FROM   rates_table r
   WHERE  r.staff_id = t.staff_id
   AND    r.start_date <= t.entry_date -- "on or before that time entry"
   ORDER  BY r.start_date DESC NULLS LAST
   LIMIT  1
   ) r ON true;

db<>fiddle here

Details depend on more information. The LEFT JOIN keeps all rows from time_table in the result, even if no rate is found. (total_rate is NULL in that case.)

This is typically efficient for many rate entries per staff_id - if you have an index on rates_table(staff_id, start_date DESC NULLS LAST) or similar.

If you can get index-only scans out of it, a covering index would be better, yet:

CREATE INDEX ON rates_table (staff_id, start_date DESC NULLS LAST) INCLUDE (rate);

This form of the index requires Postgres 11 or later. See:

Can Postgres use an index-only scan for this query with joined tables?

Depending on table definition, indexes, data distribution etc. other query styles may be preferable. For querying the whole table and only few rows per staff_id, DISTINCT ON might be faster ...

Postgresql – Joining timestamped records with the most-recent-prior records from another timestamped table

Guess I don't need to tell you that this is a seriously awkward data model. Anyway, I think this query would do what you're looking for:

SELECT subq2.sales_time, subq2.num_sold, subq2.effective_price,
   subq2.effective_price * subq2.num_sold AS total_sale_price
FROM (

  SELECT subq1.sales_time, subq1.num_sold, subq1.max_price_ts,
    (SELECT price FROM prices
     WHERE timestamp = subq1.max_price_ts) AS effective_price
  FROM
    (SELECT sales.timestamp AS sales_time, sales.amount AS num_sold,
      (SELECT MAX(prices.timestamp) FROM prices
       WHERE prices.timestamp <=  sales.timestamp) AS max_price_ts
    FROM sales
    ) AS subq1

  ) AS subq2

ORDER BY subq2.sales_time DESC;

There's probably a more concise way to write the above, perhaps using DISTINCT ON and ORDER BY to save having to fetch the price for the MAX time in a separate subquery, but I'll leave that as an exercise for the reader.

EDIT Alright, here's a simplified version which I think should work out to be equivalent but much faster.

SELECT DISTINCT ON (sales.timestamp)
       sales.timestamp AS sales_time,
       sales.amount AS num_sold,
       prices.price AS effective_price,
       prices.price * sales.amount AS total_sale_price

      FROM sales
INNER JOIN prices
        ON prices.timestamp <= sales.timestamp
  ORDER BY sales.timestamp DESC, prices.price DESC

If that doesn't work for you, it would be helpful for you to post a lot more information such as a minimal testcase and EXPLAIN ANALYZE showing how slow the query is for you.

PostgreSQL – How to Select the First or Last Entry for a Specific Day

You can't use an aggregate function in a condition, you need to use a sub-select

SELECT EntryID 
FROM AttendanceREcords 
WHERE StaffID = 'xxxxx' 
AND ArrivalTime = (select min(ArrivalTime) 
                   from AttendanceREcords
                   where StaffID = 'xxxxx');

But this can be done more efficiently by using a LIMIT clause:

SELECT EntryID 
FROM AttendanceREcords 
WHERE StaffID = 'xxxxx' 
ORDER BY arrivalTime 
LIMIT 1

There is a difference between the two statements: if more than one row has the same minimum arrival time, the first one will return all of them, the second one only one row.

Another alternative that is usually more efficient that a sub-select is using a window function:

SELECT EntryID
FROM (
  SELECT EntryID, 
         dense_rank() over (order by arrivalTime) as rnk
  FROM AttendanceREcords 
  WHERE StaffID = 'xxxxx' 
) t
where rnk = 1;

By changing the order by arrivalTime you can select the first or the last.

If you want the first and last in a single query you can do something like this:

SELECT EntryID,
       arrivalTime
FROM (
  SELECT EntryID, 
         min(arrivalTime) as min_time,
         min(arrivalTime) as max_time,
  FROM AttendanceREcords 
  WHERE StaffID = 'xxxxx' 
) t
where arrival_time = min_time
   or arrival_time = max_time;

Best Answer

Related Solutions

Postgresql – Joining timestamped records with the most-recent-prior records from another timestamped table

PostgreSQL – How to Select the First or Last Entry for a Specific Day

Related Question