Postgresql – Creating index on TIMESTAMP column to use with range operators

indexpostgresqlrange-typestimestamp

TLDR: Can I create an index that's used by the following WHERE clause:

WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01')

Let's say I have a table like this:

CREATE TABLE foo
(
    foo_id INTEGER GENERATED BY DEFAULT AS IDENTITY,
    foo_date timestamp without time zone NOT NULL,
    CONSTRAINT foo_pkey PRIMARY KEY (foo_id)
);

This table contains 100,000 records with dates from 2009-01-01 to 2018-12-29. I'd like to be able to query for rows in a given date range (e.g. for rows in January 2018).

Option 1

One approach is to use the BETWEEN operator:

SELECT * FROM foo WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31';

The problem of this approach is that if foo_date occurred on 2018-01-31 after midnight, they wouldn't be included in this query. So I could change the query to BETWEEN '2018-01-01' AND '2018-02-01'. The problem then, however, is records that occur on 2018-02-01 00:00:00. These would be included, which I don't want.

Option 2

Another option, put forth by Aaron Bertrand, is to use this construct:

foo_date >= '2018-01-01' AND foo_date < '2018-02-01'

(Yes, this blog is for SQL Server, but seems to be applicable here).

While this form unequivocally gives me the results I want, it's cumbersome: I have to repeat the column name twice.

Option 3

Since Postgres gives us the range data type, I thought a clearer form might be:

foo_date <@ tsrange('2018-01-01', '2018-02-01')

So my next question is, if I use this form, can I use an index to speed up operation?

With Options 1 and 2 above, a normal b-tree index can be used:

CREATE INDEX idx_foo ON foo(foo_date);

A query using Options 1 or 2 will use the index:

EXPLAIN SELECT * FROM foo 
WHERE 
    foo_date >= '2018-01-01' 
    AND foo_date < '2018-02-01';

gives me this query plan:

Bitmap Heap Scan on foo  (cost=21.95..592.70 rows=942 width=12)
  Recheck Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))
  ->  Bitmap Index Scan on idx_foo  (cost=0.00..21.71 rows=942 width=0)
        Index Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))

However, if I use Option 3, the index isn't used:

EXPLAIN SELECT * FROM foo 
WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01');

gives me:

Seq Scan on foo  (cost=0.00..1791.00 rows=500 width=12)
  Filter: (foo_date <@ '["2018-01-01 00:00:00","2018-02-01 00:00:00")'::tsrange)

If I try to create a gist index, I initially get an error message.

ERROR:  data type timestamp without time zone has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.
SQL state: 42704

After adding the btree_gist extension, I can create the index:

CREATE INDEX idx_foo ON foo USING gist (foo_date)

However, using the @> or <@ still doesn't make use of the index.

Is there something I'm missing? Or is it just not feasible to create an index that can be used by this construct?

Best Answer

The name "foo_date" indicates a date and is a bad choice for a timestamp column. Option 1 would work just fine with actual dates.
A plain btree index in combination with Option 2 is the unequivocally best solution. Look no further. Except maybe for the special case of a BRIN index for large tables with physically sorted data. See:
- Speed up creation of Postgres partial index

Proof of concept

That said, to make a GiST or SP-GiST index work, you could create an expression index on fake ranges. You do not need the module btree_gist for this. Demonstrating with SP-GiST since that is typically a bit faster here. See:

Perform this hours of operation query in PostgreSQL

CREATE INDEX foo_date_spgist_idx ON foo USING spgist(tsrange(foo_date, foo_date, '[]'));

SELECT * FROM foo
WHERE  tsrange(foo_date, foo_date, '[]') <@ tsrange('2018-01-01', '2018-02-01')

Or with range literal:

...
WHERE  tsrange(foo_date, foo_date, '[]') <@ '[2018-01-01,2018-02-01)'

But: bigger, more expensive to maintain, slower than the btree index. Not even less cumbersome to write. Pointless for your case.

Aside: technically, you could:

... WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31 23:59.999999';

The Postgres timestamp type is (currently) implemented with µs resolution, i.e. max. 6 fractional digits. Hence the expression does exactly what you want. But I strongly advice against building on this implementation detail. Option 2 is the way to go. Related:

Option 1

Option 2

Option 3

Best Answer

Proof of concept

Related Solutions

Postgresql – How to index WHERE (start_date >= ‘2013-12-15’)

Postgresql – How to speed up a Postgres query containing lots of Joins with an ILIKE condition

Related Question