Postgresql – Combine tsvector and timestamp in one index

postgresql

I have a table that has a text field, and a timestamp. I have various indexes, such as a btree on the timestamp (this works great for "get the most recent N") and a GIN on the text (for full text search, looks like CREATE INDEX foo ON bar USING GIN (to_tsvector('english', the_text)).

I need to support a query that looks like SELECT * FROM foo WHERE to_tsvector('english', the_text) @@ to_tsquery('english', ?) ORDER BY timestamp DESC LIMIT 1000. This works fine for when the text query has very few matches, but when the query has a lot of matches in the GIN, it takes an unbelievably long amount of time, since it seems to be grabbing the timestamp from the heap for each one, then sorting.

Is there any way to create a single index on both columns that supports this?

For example, if I had two normal columns a and b, I know that creating an index on (a, b) would speed up SELECT * FROM table WHERE a = ? ORDER BY b DESC LIMIT 1000. Is there an equivalent for when a is a full text search and b is a timestamp?

I've tried CREATE EXTENSION btree_gin then creating an index on (to_tsvector('english', the_text), timestamp) or (timestamp, to_tsvector('english', the_text)) using GIN or GiST. But none of those four indexes seem to change the query plan on a test table with dummy data. I could try them in production, but they would take a very long time to create (days).

Best Answer

You could create a RUM index like this:

CREATE INDEX ON bar USING rum (
   to_tsvector('english', the_text) rum_tsvector_hash_ops,
   timestamp
);

And query like

SELECT * FROM foo
WHERE to_tsvector('english', the_text) @@ to_tsquery('english', ?)
  AND timestamp < '2021-01-01 00:00:00'
ORDER BY timestamp DESC
LIMIT 1000;

You need a WHERE condition with timestamp for the RUM index.