PostgreSQL Oldest Values – How to Get Efficiently Over Each ID

greatest-n-per-groupperformancepostgresqlpostgresql-11query-performance

How can PostgreSQL return a list of the oldest timestamp values over a table of sensor id measurements?

Let me explain the situation with a sample table:

CREATE TABLE sensor_data(
sensor_id INTEGER,
time TIMESTAMPTZ,
value NUMERIC,
PRIMARY KEY (sensor_id, time)
)

Populated table example:

+-----------+------------------+-------+
| sensor_id |       time       | value |
+-----------+------------------+-------+
|         1 | 2018-01-01 00:00 |     1 |
|         1 | 2018-01-01 01:00 |     2 |
|         3 | 2018-01-01 03:00 |     4 |
|         3 | 2018-01-01 04:00 |     3 |
|         4 | 2018-01-01 03:00 |     5 |
|         4 | 2018-01-01 04:00 |     6 |
+-----------+------------------+-------+

While using something like sensor_id (1,3) inside the query I want it to return something like this:

+-----------+------------------+-------+
| sensor_id |       time       | value |
+-----------+------------------+-------+
|         1 | 2018-01-01 01:00 |     2 |
|         3 | 2018-01-01 04:00 |     3 |
+-----------+------------------+-------+

How can I do that in a query using the PRIMARY KEY index for speeding it up?

Best Answer

There are many possible query styles, most will readily use your PK index on (sensor_id, time) as it fits the task. (Postgres can read indexes backwards practically as fast.) This should be near perfect:

SELECT s.sensor_id, sd.time, sd.value
FROM   unnest ('{1,3}'::int[]) s(sensor_id)
LEFT   JOIN LATERAL (
   SELECT *
   FROM   sensor_data sd
   WHERE  sd.sensor_id = s.sensor_id
   ORDER  BY time DESC
   LIMIT  1
   ) sd ON true;

db<>fiddle here

LEFT JOIN .. ON true keeps sensors without any data entries in the result - with NULL values in place of values.

Related:

Since you are on Postgres 11, a covering index might pay:

... PRIMARY KEY (sensor_id, time) INCLUDE (value) 

But it makes the index bigger and writes to the table more expensive, and your names indicate a write-heavy table. And while you only query for few rows at a time, queries don't get much faster anyway. So probably best the way you have it. Related: