PostgreSQL – Efficient SELECT for Array of Tuples

performancepostgresqlpostgresql-performance

Assume I have a table records with the following structure

  • id (unique int)
  • updated (timestamp)

I have an input_array with values [[id_1, timestamp_1], [id_4, timestamp_4], ...]. I'll refer to each element as tuple_1, tuple_4, etc.

I'm looking for the most efficient query (in PostgreSQL v11.2+) to select [id_1, id_4, ...] from records, but only where tuple_{n}.updated > row{n}.updated. Assume input_array may contains thousands of tuples, and records upwards of a million rows.

I don't even know where to begin with this. Lateral join comes to mind, as does unnest, and where in, but everything I've tried so far fails miserably

Update I'm open to input_array being in any format (tuples, two separate arrays, whatever), and updated being an int

Best Answer

If you aren't fixed on the array input, you can use a tuple comparison.

select *
from records
where (id, updated) in ( (1, timestamp '2019-01-01 00:00:00'),
                         (2, timestamp '2019-01-02 00:00:00') )

That can make use of a regular btree index on (id, updated)

Note that this uses = for both values and is equivalent to

where (id = 1 and updated = timestamp '2019-01-01 00:00:00')
   or (id = 2 and updated = timestamp '2019-01-02 00:00:00')

But you want to compare the timestamps using >. You can do that if you join against a values clause:

select r.*
from records r
  join (
    values 
       (1, timestamp '2019-01-01 00:00:00'),
       (2, timestamp '2019-01-02 00:00:00') 
  ) as t(id,upd) on  r.id = t.id
where r.updated > t.upd;

Online example: https://rextester.com/GGC83046