PostgreSQL filtering on array length inside JSON

I have a table table with a JSONB field data, which contains a variable-length array, e.g.

{"label": "xyz", "items": [ ... ]}

I created an index on the length of the "items" element:

CREATE INDEX n_items ON table ( JSONB_ARRAY_LENGTH(data->'items') )

but when I filter, I still get a sequential scan when I try to filter on it:

EXPLAIN ANALYZE SELECT COUNT(*) FROM table WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Aggregate  (cost=2565655.67..2565655.68 rows=1 width=8)
   ->  Seq Scan on table (cost=0.00..2535256.19 rows=12159794 width=8)
         Filter: (jsonb_array_length((table.data -> 'items'::text)) = 2)
 Planning time: 0.121 ms
 Execution time: 482891.694 ms

That's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this data column, but I'd like to be sure that's what I need to do before I start working on it.

edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?

Best Answer

Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).

One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a NOT NULL column (for example, the primary key of the table) to count, and then include this column in the index:

CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);

SELECT count(id) 
  FROM your_table
 WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

This will hopefully turn into an index-only scan (I tested this omitting the jsonb part, but you will be able to tell if it works).

Best Answer

Related Solutions

Postgresql – How to index WHERE (start_date >= ‘2013-12-15’)

Postgresql – How to speed up a Postgres query containing lots of Joins with an ILIKE condition

Related Question