How do you index a table in PostgreSQL so that min/max queries return as quickly as possible?
I have a large table with a couple hundred million rows. Each row as a source_id and a date when the record was last updated. I'd like to collect some statistics for each source_id, specifically the min and max date range for each source_id.
So I created this index on my table:
CREATE INDEX CONCURRENTLY mydata_source_last_updated_date ON mydata (source_id, last_updated_date ASC);
However, when I try to query the min dates per source with:
SELECT source_id, MIN(last_updated_date) FROM mydata GROUP BY source_id;
the query takes about an hour to complete.
Is this normal performance for such a large table, even with an index? How can I reduce this query time?
Best Answer
With only a few dozen distinct values of source_id, you can get fast execution on the index you built, using a loose index scan, aka a "skip scan". Unfortunately PostgreSQL does not plan those automatically, so you have to force it into one, by using a recursive query.
Even if you don't resort to this, just doing the query as you originally wrote it should not take nearly an hour. But without seeing an
explain
and anexplain analyze
, there isn't much more that can be said on that.