I'm managing a table with several million records which are being inserted in real time. A part of my app needs to display the last N inserted rows, so at the beginning I just queried for:
select id, logdate, content from measurements order by logdate DESC limit 500;
A few days later, I found out it was faster to set id
to be (for this example) 10000000000 - extract(epoch from logdate)
, and use it as PRIMARY KEY
, so
select id, date, content from measurements limit 500;
would naturally order by id
, therefore yielding the latest records.
As the table grew, it became unmanageable, so I resorted to partitioning. I did it just as the documentation says:
CREATE TABLE measurement_y2007m11 (
CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2007m12 (
CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2008m01 (
CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
) INHERITS (measurement);
and I distributed existing rows onto their respective partition.
The problem is, I'm partitioning on logdate
, but when querying I don't know the timespan. When I query for the last 500 rows, I can't know if they are from last week, last month of last quarter. Therefore, the query planner always scans all the partitions.
I can't believe no one has dealt with this same problem before, it sounds trivial and yet it has me puzzled.
Best Answer
Misconception 1: "Natural order"
There is no natural order in a
SELECT
statement. WithoutORDER BY
you get rows in arbitrary order. Generally that will be the cheapest order in which Postgres can satisfy your query, i.e. the order in which tuples are stored physically or in which they are retrieved after an index look-up. But there is no guarantee whatsoever. If your statement seemed to work, this was pure luck / coincidence and it can break at any time.Use your first query instead. If
logdate
is, in fact, of typedate
, or if you need to be sure, you should add moreORDER BY
items to break ties and get a stable sort order. If you don't care which, append your (new) primary key (see below):If the latest row (biggest
measurement-id
) is guaranteed to have the latestlogdate
, you can justORDER BY measurement_id DESC
, but don't take this for granted. In a multi-user environment a row with a laterlogdate
can be written sooner than another row with a soonerlogdate
.This is one reason why your idea for the new primary key is not very useful:
The other reason: it is bound to fail sooner or later if
logdate
is not guaranteed to be unique - which it most probably isn't.Use a
serial
columnmeasurement_id
as primary key instead. Orbigserial
if you expect more than 2147483647 rows over time.Index
You claim you did it just as the documentation says, and the documentation says:
And further down:
The only tiny difference: the example in the manual uses the more sensible singular form for the table name:
measurement
instead of.measurements
If you go with my advice:
make that:
More on why that would probably help:
Misconception 2: "scan all partitions"
The query planner will plan to check all partitions in sequence. But as soon as the query is satisfied (500 rows are retrieved), it will stop executing. Test with
EXPLAIN ANALYZE
, you will see the annotation(never executed)
behind remaining partitions.If the planner shouldn't be smart enough to derive the best sequence in which to scan from your setup (can't test right now), you can give a hand with a
UNION ALL
query on the partitions:But that may not be necessary.