Postgresql – Using an index on a Postgresql integer range causing trouble

memoryperformancepostgispostgresqlpostgresql-performance

Unfortunately I was not able to create a usable example / fiddle. But I am using a Postgresql 9.4 table that looks somewhat like this:

CREATE TABLE table1 (id INT PRIMARY KEY NOT NULL, dtype TEXT,
                     year_range INT4RANGE, geom GEOMETRY) 
CREATE INDEX year_rage_idx ON table2 USING BTREE (dtype);
CREATE INDEX year_rage_idx ON table2 USING BTREE (year_range);
CREATE INDEX table2_gix ON table2 USING GIST (geom);

table1 and table2 are basically identical. The query causing the trouble would be:

SELECT a.id, a.dtype, a.year_range, count(*)
FROM table1 AS a LEFT JOIN table2 AS b
    ON ST_DWithin(a.geom, b.geom, 500)
    AND a.dtype = b.dtype AND a.year_range = b.year_range
GROUP BY a.id, a.dtype, a.geom, a.year_range;

where geom is a PostGIS geometry, dtype a TEXT, and year_range an INT4RANGE. The above query will result in about 200 rows and the count from the LEFT JOIN will be between 0 and 200.

Recently I started using more indexes throughout my database and suddenly some parts of my program would not work properly anymore and memory use went through the roof. The above query typically took 40 seconds and the RAM use of postgresql was constant at ca. 2GB.

After adding a BTREE to the year_range column however, the query will take 2 minutes and RAM use will double to 4GB (which it usually does not for other queries).

Can you help me where to look for an issue or how to resolve this? Could it possibly be a bug?

EXPLAIN (ANALYZE, BUFFERS) for the query using CREATE INDEX ON table2 USING BTREE (year_range):

GroupAggregate  (cost=1989.80..1995.50 rows=228 width=54) (actual time=96278.148..96283.820 rows=228 loops=1)
  Group Key: a.id, a.dtype, a.geom, a.year_range
  Buffers: shared hit=16671469 read=6332935
  ->  Sort  (cost=1989.80..1990.37 rows=228 width=54) (actual time=96278.134..96278.896 rows=11281 loops=1)
        Sort Key: a.id, a.dtype, a.geom, a.year_range
        Sort Method: quicksort  Memory: 1971kB
        Buffers: shared hit=16671469 read=6332935
        ->  Nested Loop Left Join  (cost=0.43..1980.87 rows=228 width=54) (actual time=0.097..96243.523 rows=11281 loops=1)
              Buffers: shared hit=16671469 read=6332935
              ->  Seq Scan on table1 a (cost=0.00..9.28 rows=228 width=54) (actual time=0.014..0.344 rows=228 loops=1)
                    Buffers: shared read=7
              ->  Index Scan using year_range_idx on table2 b  (cost=0.43..8.64 rows=1 width=50) (actual time=25.213..422.075 rows=49 loops=228)
                    Index Cond: (a.year_range = year_range)
                    Filter: ((a.dtype = dtype) AND (a.geom && st_expand(geom, 500::double precision)) AND (geom && st_expand(a.geom, 500::double precision)) AND _st_dwithin(a.geom, geom, 500::double precision))
                    Rows Removed by Filter: 253251
                    Buffers: shared hit=16671469 read=6332928
Planning time: 0.343 ms
Execution time: 96684.861 ms

EXPLAIN (ANALYZE, BUFFERS) for the query using CREATE INDEX ON table2 USING GIST (year_range):

GroupAggregate  (cost=1994.38..2000.08 rows=228 width=54) (actual time=114066.421..114071.834 rows=228 loops=1)
  Group Key: a.id, a.dtype, a.geom, a.year_range
  Buffers: shared hit=41789804 read=5608602 written=9067
  ->  Sort  (cost=1994.38..1994.95 rows=228 width=54) (actual time=114066.412..114067.126 rows=11281 loops=1)
        Sort Key: a.id, a.dtype, a.geom, a.year_range
        Sort Method: quicksort  Memory: 1971kB
        Buffers: shared hit=41789804 read=5608602 written=9067
        ->  Nested Loop Left Join  (cost=0.41..1985.45 rows=228 width=54) (actual time=14.395..114033.344 rows=11281 loops=1)
              Buffers: shared hit=41789804 read=5608602 written=9067
              ->  Seq Scan on table1 a (cost=0.00..9.28 rows=228 width=54) (actual time=0.015..0.333 rows=228 loops=1)
                    Buffers: shared read=7
              ->  Index Scan using year_range_idx on table2 b  (cost=0.41..8.66 rows=1 width=50) (actual time=24.222..500.090 rows=49 loops=228)
                    Index Cond: (a.year_range = year_range)
                    Filter: ((a.dtype = dtype) AND (a.geom && st_expand(geom, 500::double precision)) AND (geom && st_expand(a.geom, 500::double precision)) AND _st_dwithin(a.geom, geom, 500::double precision))
                    Rows Removed by Filter: 253251
                    Buffers: shared hit=41789804 read=5608595 written=9067
Planning time: 0.343 ms
Execution time: 114073.068 ms

EXPLAIN (ANALYZE, BUFFERS) for the query using no INDEX on year range, i.e. DROP INDEX year_range_idx (check out the time: 186ms?!):

GroupAggregate  (cost=136824.31..136830.01 rows=228 width=54) (actual time=180.088..186.006 rows=228 loops=1)
  Group Key: a.id, a.dtype, a.geom, a.year_range
  Buffers: shared hit=71091 read=4763
  ->  Sort  (cost=136824.31..136824.88 rows=228 width=54) (actual time=180.074..180.962 rows=11281 loops=1)
        Sort Key: a.id, a.dtype, a.geom, a.year_range
        Sort Method: quicksort  Memory: 1971kB
        Buffers: shared hit=71091 read=4763
        ->  Nested Loop Left Join  (cost=29.15..136815.38 rows=228 width=54) (actual time=0.245..159.873 rows=11281 loops=1)
              Buffers: shared hit=71091 read=4763
              ->  Seq Scan on table1 a (cost=0.00..9.28 rows=228 width=54) (actual time=0.013..0.128 rows=228 loops=1)
                    Buffers: shared hit=1 read=6
              ->  Bitmap Heap Scan on table2 b  (cost=29.15..600.02 rows=1 width=50) (actual time=0.169..0.689 rows=49 loops=228)
                    Recheck Cond: (geom && st_expand(a.geom, 500::double precision))
                    Filter: ((a.dtype = dtype) AND (a.year_range = year_range) AND (a.geom && st_expand(geom, 500::double precision)) AND _st_dwithin(a.geom, geom, 500::double precision))
                    Rows Removed by Filter: 257
                    Heap Blocks: exact=69630
                    Buffers: shared hit=71090 read=4757
                    ->  Bitmap Index Scan on table2_gix  (cost=0.00..29.15 rows=140 width=0) (actual time=0.124..0.124 rows=307 loops=228)
                          Index Cond: (geom && st_expand(a.geom, 500::double precision))
                          Buffers: shared hit=4412 read=1805
Planning time: 0.301 ms
Execution time: 186.366 ms

Best Answer

I figured it out with some help from the IRC channel. Apparently the planner couldn't quite use the INDEX on year_range and as a result of that also on geom.

SELECT n_distinct FROM pg_stats WHERE tablename='table2' AND attname='year_range';

returned -1 although it should be only 10 distinct values. I was told that this might be due to a bug or non-implementation for integer ranges. So in my case I could solve the issue with a multi-column index:

CREATE INDEX ON table2 USING GIST (geom, year_range);

I was told that an alternative method would be to

ALTER TABLE table2 ALTER COLUMN year_range SET (n_distinct = 10);

to get the planner back on track.

Since I can't really explain well what is going on here, I'd be happy to accept the answer from someone who does a better job at explaining the issue.