PostgreSQL – Why GIN Index on JSONB Column Slows Down Query and How to Fix It

postgresqlpostgresql-9.4

Initialize test data:

CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TABLE docs (data JSONB NOT NULL DEFAULT '{}');
-- generate 200k documents, ~half with type: "type1" and another half with type: "type2", unique incremented index and random uuid per each row
INSERT INTO docs (data)
SELECT json_build_object('id', gen_random_uuid(), 'type', (CASE WHEN random() > 0.5 THEN 'type1' ELSE 'type2' END) ,'index', n)::JSONB
FROM generate_series(1, 200000) n;
-- inset one more row with explicit uuid to query by it later
INSERT INTO docs (data) VALUES (json_build_object('id', '30e84646-c5c5-492d-b7f7-c884d77d1e0a', 'type', 'type1' ,'index', 200001)::JSONB);

First query – filter by data->type and limit:

-- FAST ~19ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
LIMIT 25;
/* "Limit  (cost=0.00..697.12 rows=25 width=90) (actual time=0.029..0.070 rows=25 loops=1)"
   "  ->  Seq Scan on docs  (cost=0.00..5577.00 rows=200 width=90) (actual time=0.028..0.061 rows=25 loops=1)"
   "        Filter: (data @> '{"type": "type1"}'::jsonb)"
   "        Rows Removed by Filter: 17"
   "Planning time: 0.069 ms"
   "Execution time: 0.098 ms" 
*/

Second query – filter by data->type, order by data->index and limit

-- SLOW ~250ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index' -- added ORDER BY
LIMIT 25;

/* "Limit  (cost=5583.14..5583.21 rows=25 width=90) (actual time=236.750..236.754 rows=25 loops=1)"
   "  ->  Sort  (cost=5583.14..5583.64 rows=200 width=90) (actual time=236.750..236.750 rows=25 loops=1)"
   "        Sort Key: ((data -> 'index'::text))"
   "        Sort Method: top-N heapsort  Memory: 28kB"
   "        ->  Seq Scan on docs  (cost=0.00..5577.50 rows=200 width=90) (actual time=0.020..170.797 rows=100158 loops=1)"
   "              Filter: (data @> '{"type": "type1"}'::jsonb)"
   "              Rows Removed by Filter: 99842"
   "Planning time: 0.075 ms"
   "Execution time: 236.785 ms"
*/

Third query – same as Second (previous) but with btree index on data->index:

CREATE INDEX docs_data_index_idx ON docs ((data->'index'));

-- FAST ~19ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index' -- added BTREE index on this field
LIMIT 25;
/* "Limit  (cost=0.42..2473.98 rows=25 width=90) (actual time=0.040..0.125 rows=25 loops=1)"
   "  ->  Index Scan using docs_data_index_idx on docs  (cost=0.42..19788.92 rows=200 width=90) (actual time=0.038..0.119 rows=25 loops=1)"
   "        Filter: (data @> '{"type": "type1"}'::jsonb)"
   "        Rows Removed by Filter: 17"
   "Planning time: 0.127 ms"
   "Execution time: 0.159 ms"
*/

Fourth query – now filter by data->id and limit = 1:

-- SLOW ~116ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> ('{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}')::JSONB -- querying by "id" field now
LIMIT 1;
/* "Limit  (cost=0.00..27.89 rows=1 width=90) (actual time=97.990..97.990 rows=1 loops=1)"
   "  ->  Seq Scan on docs  (cost=0.00..5577.00 rows=200 width=90) (actual time=97.989..97.989 rows=1 loops=1)"
   "        Filter: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "        Rows Removed by Filter: 189999"
   "Planning time: 0.064 ms"
   "Execution time: 98.012 ms"
*/

Fifth query – same as Fourth but with gin (json_path_ops) index on data:

CREATE INDEX docs_data_idx ON docs USING GIN (data jsonb_path_ops);

-- FAST ~17ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::JSONB -- added gin index with json_path_ops
LIMIT 1;
/* "Limit  (cost=17.55..20.71 rows=1 width=90) (actual time=0.027..0.027 rows=1 loops=1)"
   "  ->  Bitmap Heap Scan on docs  (cost=17.55..649.91 rows=200 width=90) (actual time=0.026..0.026 rows=1 loops=1)"
   "        Recheck Cond: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "        Heap Blocks: exact=1"
   "        ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=0.016..0.016 rows=1 loops=1)"
   "              Index Cond: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "Planning time: 0.095 ms"
   "Execution time: 0.055 ms"
*/

Sixth (and last) query – same as Third query (query by data->type, order by data->index, limit):

-- SLOW AGAIN! ~224ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
/* "Limit  (cost=656.06..656.12 rows=25 width=90) (actual time=215.927..215.932 rows=25 loops=1)"
   "  ->  Sort  (cost=656.06..656.56 rows=200 width=90) (actual time=215.925..215.925 rows=25 loops=1)"
   "        Sort Key: ((data -> 'index'::text))"
   "        Sort Method: top-N heapsort  Memory: 28kB"
   "        ->  Bitmap Heap Scan on docs  (cost=17.55..650.41 rows=200 width=90) (actual time=33.134..152.618 rows=100158 loops=1)"
   "              Recheck Cond: (data @> '{"type": "type1"}'::jsonb)"
   "              Heap Blocks: exact=3077"
   "              ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=32.468..32.468 rows=100158 loops=1)"
   "                    Index Cond: (data @> '{"type": "type1"}'::jsonb)"
   "Planning time: 0.157 ms"
   "Execution time: 215.992 ms"
*/

So it seems that Sixth (same as Third) query is much slower when there is gin index on data column. It's probably because there are not many distinct values for data->type field (only "type1" or "type2")? What I can do about it? I need gin index to make other queries that take benfit of it…

Best Answer

It looks like you've run into the issue that jsonb columns have a flat 1% statistics rate, as reported here Working around jsonb's lack of stats?. Looking at your query plans, the differences between the estimates and the actual executions are huge. The estimates say there are probably 200 rows, and the actual return 100158 rows, which causes the planner to favor certain strategies over others.

Since the choice in the sixth query seems to come down to favoring a bitmap index scan over an index scan, you can nudge the planner along with SET enable_bitmapscan=off to try and get it to revert to the behavior you had in your third example.

Here's how it worked for me:

postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=656.06..656.12 rows=25 width=90) (actual time=117.338..117.343 rows=25 loops=1)
   Buffers: shared hit=3096
   ->  Sort  (cost=656.06..656.56 rows=200 width=90) (actual time=117.336..117.338 rows=25 loops=1)
         Sort Key: ((data -> 'index'::text))
         Sort Method: top-N heapsort  Memory: 28kB
         Buffers: shared hit=3096
         ->  Bitmap Heap Scan on docs  (cost=17.55..650.41 rows=200 width=90) (actual time=12.838..80.584 rows=99973 loops=1)
               Recheck Cond: (data @> '{"type": "type1"}'::jsonb)
               Heap Blocks: exact=3077
               Buffers: shared hit=3096
               ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=12.469..12.469 rows=99973 loops=1)
                     Index Cond: (data @> '{"type": "type1"}'::jsonb)
                     Buffers: shared hit=19
 Planning time: 0.088 ms
 Execution time: 117.405 ms
(15 rows)

Time: 117.813 ms
postgres@[local]:5432:postgres:=# SET enable_bitmapscan = off;
SET
Time: 0.130 ms
postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1320.48 rows=25 width=90) (actual time=0.017..0.050 rows=25 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using docs_data_index_idx on docs  (cost=0.42..10560.94 rows=200 width=90) (actual time=0.015..0.045 rows=25 loops=1)
         Filter: (data @> '{"type": "type1"}'::jsonb)
         Rows Removed by Filter: 27
         Buffers: shared hit=4
 Planning time: 0.083 ms
 Execution time: 0.071 ms
(8 rows)

Time: 0.402 ms
postgres@[local]:5432:postgres:=#

If you're looking to go this route, be sure to disable that scan only for queries that show behavior like this, otherwise, you'll get bad behavior on other query plans too. Doing something like this should work just fine:

BEGIN;
SET enable_bitmapscan=off;
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
SET enable_bitmapscan=on;
COMMIT;

Hope that helps =)

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

PostgreSQL Indexing – How to Index WHERE (start_date >= ‘2013-12-15’)

Related Question