Postgresql – Index for inequality on multiple columns (col1 >= ‘const_1’ AND col2 <= 'const_2')

indexindex-tuningpostgresql

I have the following query:

SELECT * FROM plane_tracks 
    WHERE created_at >= '2016-12-29 08:00:00' AND updated_at <= '2016-12-30 00:00:00';

I've created the following indexes:

create index plane_tracks_on_created_at on plane_tracks(created_at);
create index plane_tracks_on_updated_at on plane_tracks(updated_at);
create index plane_tracks_on_created_at_updated_at on plane_tracks(created_at,updated_at);
create index plane_tracks_on_created_at_updated_at_desc on plane_tracks(created_at,updated_at DESC);

Yet when I run the query, it scans all elements in plane_tracks.

# EXPLAIN ANALYZE SELECT * FROM plane_tracks WHERE created_at >= '2016-12-29 08:00:00' AND updated_at <= '2016-12-30 00:00:00';
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on plane_tracks  (cost=0.00..582.78 rows=5425 width=471) (actual time=1.507..3.435 rows=1978 loops=1)
   Filter: ((created_at >= '2016-12-29 08:00:00'::timestamp without time zone) AND (updated_at <= '2016-12-30 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 15941
 Planning time: 0.366 ms
 Execution time: 3.618 ms
(5 rows)

When I drop the updated_at part, like this:

SELECT * FROM plane_tracks WHERE created_at >= '2016-12-29 08:00:00';

Then I get this:

# EXPLAIN ANALYZE SELECT * FROM plane_tracks WHERE created_at >= '2016-12-29 08:00:00';
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on plane_tracks  (cost=0.00..537.99 rows=8595 width=471) (actual time=0.022..3.261 rows=8593 loops=1)
   Filter: (created_at >= '2016-12-29 08:00:00'::timestamp without time zone)
   Rows Removed by Filter: 9326
 Planning time: 0.093 ms
 Execution time: 3.901 ms
(5 rows)

At least this last part makes sense.

But I don't understand why

  1. the first query isn't at least as efficient as the second one.
  2. why first query isn't much better than the second one, given that there are also indexes on updated_at.
  3. what index to create to make it really efficient

Tom

Best Answer

  1. How are you defining efficient?
    • The first query takes 3.618ms
    • The second query takes 3.901ms
  2. If PostgreSQL doesn't take the combined index, then the planner will assess the cost of two index scans, which will make the seq scan all that much more attractive.
  3. A few things to try.

    1. Drop these (unless you have other queries) and see if Pg takes the index. They're not needed and only serve to confuse things and make UPDATE/INSERT slower slower.

      create index plane_tracks_on_created_at on plane_tracks(created_at);
      create index plane_tracks_on_updated_at on plane_tracks(updated_at);
      create index plane_tracks_on_created_at_updated_at on plane_tracks(created_at,updated_at);
      
    2. If Pg doesn't take the index. Try running set enable_seqscan off in psql and running the same query. Paste back the response. This will tell us whether or not Pg can even do it without a seqscan. If PostgreSQL is still using a seq scan, it's a last resort, and then for some reason it feels it can not use compound index plane_tracks(created_at,updated_at DESC), and we have to look at rewriting that to another compound index.
    3. Also, as a side note. The first query assumes that 5425 rows will be returned, but actually 1978 are returned. This could throw the planner off. PostgreSQL thinks it has to visit 5425 rows, but it is instead only needs to visit 1978
      1. Try ANALYZE plane_tracks. And then EXPLAIN ANALYZE the query again to see if a new analysis has fixed this. If not,
      2. Try set default_statistics_target 1000; and rerunning ANALYZE plane_tracks. If that works, then consider ALTER TABLE plane_tracks SET STATISTICS 1000 (or something higher than what default_statistics_target is currently set at.

Proof that it's possible with test data

As proof that it's possible with one index

CREATE TEMP TABLE foo AS
SELECT
  floor(random()*10)::int AS lower,
  floor(random()*10)::int AS upper
FROM generate_series(1,1e4) AS t(gs);

CREATE INDEX foo_idx ON foo(lower, upper DESC);

VACUUM ANALYZE foo;

EXPLAIN ANALYZE SELECT *
FROM foo
WHERE lower >= 6 AND upper <= 8;

Query plan

 Index Only Scan using foo_idx on foo  (cost=0.29..123.86 rows=3584 width=8) (actual time=0.021..0.678 rows=3580 loops=1)
   Index Cond: ((lower >= 6) AND (upper <= 8))
   Heap Fetches: 0
 Planning time: 0.097 ms
 Execution time: 0.886 ms

Default analytics come close guessing 3584, when it returns 3580. I almost never tinker with default_statistics or table statistics.