PostgreSQL 9.6 Performance – Partition Constraint Not Used for Joins by Timestamp

partitioningperformancepostgresqlpostgresql-9.6query-performance

I have a partitioned table structure like:

CREATE TABLE measurements (
    sensor_id bigint,
    tx timestamp,
    measurement int
);

CREATE TABLE measurements_201201(
    CHECK (tx >= '2012-01-01 00:00:00'::timestamp without time zone 
       AND tx < ('2012-01-01 00:00:00'::timestamp without time zone + '1 mon'::interval))    
)INHERITS (measurements);
CREATE INDEX ON measurements_201201(sensor_id);
CREATE INDEX ON measurements_201201(tx);
CREATE INDEX ON measurements_201201(sensor_id, tx);
....

And so on. Each table has approximately 20M rows.

If I query for a sample of sensors and a sample of timestamps in the WHERE clause, the query plan shows the correct tables being selected and indexes being usedm e.g.:

SELECT *
FROM measurements
INNER JOIN sensors TABLESAMPLE BERNOULLI (0.01) USING (sensor_id)
WHERE tx BETWEEN '2015-01-04 05:00' AND '2015-01-04 06:00' 
    OR tx BETWEEN '2015-02-04 05:00' AND '2015-02-04 06:00' 
    OR tx BETWEEN '2014-03-05 05:00' AND '2014-04-07 06:00' ;

However, if I use a CTE, or put the timestamp values into a table (not shown, even with indexes on the temporary table).

WITH sensor_sample AS(
    SELECT sensor_id, start_ts, end_ts
    FROM sensors TABLESAMPLE BERNOULLI (0.01)
    CROSS JOIN (VALUES (TIMESTAMP '2015-01-04 05:00', TIMESTAMP '2015-01-04 06:00'),
        (TIMESTAMP '2015-02-04 05:00', TIMESTAMP '2015-02-04 06:00'),
        (TIMESTAMP  '2014-03-05 05:00', '2014-04-07 06:00') ) tstamps(start_ts, end_ts)
)

Something like the below

SET constraint_exclusion = on;
SELECT * FROM measurements
INNER JOIN sensor_sample USING (sensor_id)
WHERE tx BETWEEN start_ts AND end_ts

Performs an index scan on every table. Which is still relatively fast, but with increasing complexity of queries, this can turn into seq scans which will end up being very slow for retrieving ~40K rows from a limited subset of partitioned tables (4-5 of 50).

I'm concerned that something like this is the problem.

For non-trivial expressions you have to repeat the more or less verbatim condition in queries to make the Postgres query planner understand it can rely on the CHECK constraint. Even if it seems redundant!

How can I improve partitioning and query structure to reduce the likelihood of running seq scans on all my data?

Best Answer

Constraint-based exclusion [CBE] is performed on early stage of query planning, just after the query is parsed, mapped to actual relations and rewritten. (internals, Planner/Optimizer stage)

The planner cannot assume any contents of "sensor_sample" table.

So unless you have values hardcoded in the query, the planner will not exclude "partitions".

I guess what happens with the CTE variant... the planner is restricted because you use TABLESAMPLE and the whole subquery may be treated as volatile even if literals in the subquery are static. (that's just my guess, I'm not expert on planner code)

On the bright side, the index scan with negative result is blazingly fast. (single page scan at most!) so unless you have over 10000 partitions, I would not bother.

So, to answer your question directly:

  • You cannot improve this data structure much more.

  • Regardin index scans - they are cheap;

  • Regarding sequential scans - they are avoided when possible, as you see on your own examples.