I have a table table
with a JSONB field data
, which contains a variable-length array, e.g.
{"label": "xyz", "items": [ ... ]}
I created an index on the length of the "items"
element:
CREATE INDEX n_items ON table ( JSONB_ARRAY_LENGTH(data->'items') )
but when I filter, I still get a sequential scan when I try to filter on it:
EXPLAIN ANALYZE SELECT COUNT(*) FROM table WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;
QUERY PLAN
-----------------------------------------------------------------------------------------
Aggregate (cost=2565655.67..2565655.68 rows=1 width=8)
-> Seq Scan on table (cost=0.00..2535256.19 rows=12159794 width=8)
Filter: (jsonb_array_length((table.data -> 'items'::text)) = 2)
Planning time: 0.121 ms
Execution time: 482891.694 ms
That's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this data
column, but I'd like to be sure that's what I need to do before I start working on it.
edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?
Best Answer
Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).
One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a
NOT NULL
column (for example, the primary key of the table) to count, and then include this column in the index:This will hopefully turn into an index-only scan (I tested this omitting the
jsonb
part, but you will be able to tell if it works).