PostgreSQL – Constraint Exclusion to Fetch Latest N Rows

indexpartitioningpostgresql

I'm managing a table with several million records which are being inserted in real time. A part of my app needs to display the last N inserted rows, so at the beginning I just queried for:

select id, logdate, content from measurements order by logdate DESC limit 500;

A few days later, I found out it was faster to set id to be (for this example) 10000000000 - extract(epoch from logdate), and use it as PRIMARY KEY, so

select id, date, content from measurements limit 500;

would naturally order by id, therefore yielding the latest records.

As the table grew, it became unmanageable, so I resorted to partitioning. I did it just as the documentation says:

CREATE TABLE measurement_y2007m11 (
CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2007m12 (
CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2008m01 (
CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
) INHERITS (measurement);

and I distributed existing rows onto their respective partition.

The problem is, I'm partitioning on logdate, but when querying I don't know the timespan. When I query for the last 500 rows, I can't know if they are from last week, last month of last quarter. Therefore, the query planner always scans all the partitions.

I can't believe no one has dealt with this same problem before, it sounds trivial and yet it has me puzzled.

Best Answer

Misconception 1: "Natural order"

<query without ORDER BY>
... would naturally order by id, therefore yielding the latest records.

There is no natural order in a SELECT statement. Without ORDER BY you get rows in arbitrary order. Generally that will be the cheapest order in which Postgres can satisfy your query, i.e. the order in which tuples are stored physically or in which they are retrieved after an index look-up. But there is no guarantee whatsoever. If your statement seemed to work, this was pure luck / coincidence and it can break at any time.

Use your first query instead. If logdate is, in fact, of type date, or if you need to be sure, you should add more ORDER BY items to break ties and get a stable sort order. If you don't care which, append your (new) primary key (see below):

SELECT id, logdate, content
FROM   measurement
ORDER  BY logdate DESC, measurement_id DESC
LIMIT 500;

If the latest row (biggest measurement-id) is guaranteed to have the latest logdate, you can just ORDER BY measurement_id DESC, but don't take this for granted. In a multi-user environment a row with a later logdate can be written sooner than another row with a sooner logdate.

This is one reason why your idea for the new primary key is not very useful:

10000000000 - extract(epoch from logdate), and use it as PRIMARY KEY

The other reason: it is bound to fail sooner or later if logdate is not guaranteed to be unique - which it most probably isn't.

Use a serial column measurement_id as primary key instead. Or bigserial if you expect more than 2147483647 rows over time.

Index

You claim you did it just as the documentation says, and the documentation says:

For each partition, create an index on the key column(s), as well as ...

And further down:

We probably need indexes on the key columns too:

CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);
CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);
...

The only tiny difference: the example in the manual uses the more sensible singular form for the table name: measurement instead of ~~measurements~~.

If you go with my advice:

ORDER  BY logdate DESC, measurement_id DESC

make that:

CREATE INDEX measurement_y2006m02_logdate
ON measurement_y2006m02 (logdate DESC, measurement_id DESC);
...

Misconception 2: "scan all partitions"

Therefore, the query planner always scans all the partitions.

The query planner will plan to check all partitions in sequence. But as soon as the query is satisfied (500 rows are retrieved), it will stop executing. Test with EXPLAIN ANALYZE, you will see the annotation (never executed) behind remaining partitions.

If the planner shouldn't be smart enough to derive the best sequence in which to scan from your setup (can't test right now), you can give a hand with a UNION ALL query on the partitions:

(
SELECT measurement_id, logdate, content FROM measurement_y2006m03_logdate
ORDER  BY logdate DESC, measurement_id DESC
)            -- parens needed to include ORDER BY in individual legs of UNION query
UNION ALL
(
SELECT measurement_id, logdate, content FROM measurement_y2006m02_logdate
ORDER  BY logdate DESC, measurement_id DESC
)
...          -- latest partition first

LIMIT 500;

But that may not be necessary.

Best Answer

Misconception 1: "Natural order"

Index

Misconception 2: "scan all partitions"

Related Solutions

Sql-server – Partitioning in SQL Server 2008

Mysql – effective thesql table/index design for 35 million rows+ table, with 200+ corresponding columns (double), any combination of which may be queried

Related Question