Why PostgreSQL Performs Seq Scan When Comparing Numeric with Bigint

execution-planperformancepostgresqlpostgresql-9.3query-performance

Given a table vp with column timestamp type bigint, and a btree index on timestamp, why would Postgres ignore the index and run a seq scan on comparison of timestamp with a floating point value, when an index scan would produce identical results?

Integer comparison:

SELECT * FROM vp WHERE vp.timestamp > 1470752584 takes 48 ms:

 Index Scan using vp_ts_idx on vp  (cost=0.57..257.87 rows=2381 width=57) (actual time=0.014..38.669 rows=80323 loops=1)
   Index Cond: ("timestamp" > 1470752584)
 Total runtime: 48.322 ms

Numeric comparison:

SELECT * FROM vp WHERE vp.timestamp > 1470752584.1 takes 103 seconds because it ignores vp_ts_idx and performs a seq scan of the entire table:

  Seq Scan on vp  (cost=0.00..7378353.16 rows=95403915 width=57) (actual time=62625.420..103122.701 rows=98240 loops=1)
   Filter: (("timestamp")::numeric > 1470752584.1)
   Rows Removed by Filter: 285945491
 Total runtime: 103134.333 ms

Context: A query for recent vehicle positions compared timestamp with EXTRACT(EPOCH FROM NOW()) - %s, where %s was the desired number of seconds, without explicitly casting to a bigint. The workaround is to use CAST(EXTRACT(EPOCH FROM NOW()) - %s AS bigint).

Why wouldn't the query planner doesn't do this automatically when the column type is bigint? Is this a bug, or am I not considering some edge case where this behavior would be useful?

Best Answer

The key is that you don't compare the same types. When comparing a bigint to a numeric, the easier way is to 'expand' the bigint with the decimal places being 0 (like 1 -> 1.0), while the other way around it would mean rounding/truncation. (In this specific case it is easy to see that the both lead to the same result, but what if the values are negative?)

So, what you get in your comparison, is a numeric to numeric comparison, which is not something a bigint index could serve.

It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from pg_cast:

SELECT castsource::regtype, casttarget::regtype, castcontext 
  FROM pg_cast 
 WHERE castsource::regtype = 'bigint'::regtype
   AND casttarget::regtype = 'numeric'::regtype;

 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 bigint     │ numeric    │ i

SELECT castsource::regtype, casttarget::regtype, castcontext
  FROM pg_cast
 WHERE castsource::regtype = 'numeric'::regtype
   AND casttarget::regtype = 'bigint'::regtype;

 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 numeric    │ bigint     │ a

According to the linked documentation page, castcontext

Indicates what contexts the cast can be invoked in. e means only as an explicit cast (using CAST or :: syntax). a means implicitly in assignment to a target column, as well as explicitly. i means implicitly in expressions, as well as the other cases.

So this means that the numeric -> bigint direction is happening 'by itself' (i. e. you not calling one of the casting operators explicitly) only if you assign the former to the latter. In expressions like your comparison, this is not the case, so the parser will only consider the other way (marked with an i above). That means, you get a numeric to numeric comparison, unless you force it otherwise.

Notes:

  • psql is a command line client to PostgreSQL, it does not do any sort of these things itself (I've edited the title accordingly)
  • using keywords as column names (like timestamp) is never a good idea - you'll possibly get unexpected parsing errors here and there, unless you are careful enough to double-quote them everywhere
  • using the Unix epoch as a timestamp might be cumbersome. There is a rich functionality for 'real' timestamps in PostgreSQL - in most cases it is much easier to use it.