PostgreSQL – Multiply by Value from Related Row with Latest Date

greatest-n-per-grouppostgresql

I have two tables, one with staff time entries, another with staff rates starting from a particular date:

time_table

id | staff_id | entry_date | hours
----------------------------------
1  | 1        | 15-01-2019 | 1
2  | 1        | 15-02-2019 | 2
3  | 1        | 15-03-2019 | 3
4  | 2        | 15-01-2019 | 4
5  | 2        | 15-02-2019 | 5
6  | 2        | 15-03-2019 | 6

rates_table

id | staff_id | start_date | rate
----------------------------------
1  | 1        | 01-01-2019 | 1
2  | 1        | 01-02-2019 | 2
3  | 1        | 01-03-2019 | 3
4  | 2        | 01-01-2019 | 4
5  | 2        | 01-02-2019 | 5
6  | 2        | 01-03-2019 | 6

I would like to multiply the time entries by the rate for that staff member that is the most recent, occurring on or before that time entry.

I have this query but I have no idea how to select the most recent rate that occurs before the time entry:

select t.staff_id, t.entry_date, t.hours * r.rate as total_rate 
from time_table t
left join rates_table r on r.staff_id = t.staff_id and r.start_date < t.entry_date;

https://rextester.com/DDK49143

I would like a result like the following:

staff_id | entry_date | total_rate
----------------------------------
1        | 15-01-2019 | 1
1        | 15-02-2019 | 4
1        | 15-03-2019 | 9
2        | 15-01-2019 | 16
2        | 15-02-2019 | 25
2        | 15-03-2019 | 36

How could I do this in Postgres?

Best Answer

A LATERAL subquery would do the job:

SELECT t.staff_id, t.entry_date, t.hours * r.rate AS total_rate 
FROM   time_table t
LEFT   JOIN LATERAL (
   SELECT r.rate
   FROM   rates_table r
   WHERE  r.staff_id = t.staff_id
   AND    r.start_date <= t.entry_date -- "on or before that time entry"
   ORDER  BY r.start_date DESC NULLS LAST
   LIMIT  1
   ) r ON true;

db<>fiddle here

Details depend on more information. The LEFT JOIN keeps all rows from time_table in the result, even if no rate is found. (total_rate is NULL in that case.)

This is typically efficient for many rate entries per staff_id - if you have an index on rates_table(staff_id, start_date DESC NULLS LAST) or similar.

If you can get index-only scans out of it, a covering index would be better, yet:

CREATE INDEX ON rates_table (staff_id, start_date DESC NULLS LAST) INCLUDE (rate);

This form of the index requires Postgres 11 or later. See:

Depending on table definition, indexes, data distribution etc. other query styles may be preferable. For querying the whole table and only few rows per staff_id, DISTINCT ON might be faster ...

Related: