Postgresql index for jsonb array field

indexperformancepostgresqlpostgresql-performance

I have a JSONB field called data with a few keys and there is a field like:

{school_id: [nil, 123456]}

I'm filtering all records where second element is not null using query:

"data -> 'school_id' ->> 1 IS NOT NULL"

How can I apply an index on that field ?
I've tried:

"CREATE INDEX data_school_id ON events ((data -> 'school_id'));"

but when I use EXPLAIN I do not see that this index is in use.

=> EXPLAIN for: SELECT "events".* FROM "events" WHERE (data -> 'school_id' ->> 1 IS NOT NULL)
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on events  (cost=0.00..211043.04 rows=2096733 width=555)
   Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)

Best Answer

The index is only used when Postgres expects it to help performance - which is only the case if the filter is expected to be selective enough (~ 5 % or less of the rows qualify, percentage heavily depends on various details).

One problem with document types like jsonb: Postgres currently does not maintain statistics about value frequencies of embedded elements (still true in Postgres 10). So it has to base its decision whether or not to use an index on generic frequency estimations. Meaning, even if your particular filter data -> 'school_id' ->> 1 IS NOT NULL is very selective and using the index would pay, Postgres works with a generic average estimation and might miss the opportunity.

There are ways around this with expression or partial indexes, because Postgres collects separate statistics for index expressions. The best you could do for this particular query would be a partial index:

CREATE INDEX data_school_id ON events (event_id)  -- idx column largely irrelevant here
WHERE (data -> 'school_id' ->> 1) IS NOT NULL;

And VACUUM ANALYZE events; at least once (or wait until autovacuum kicks in).

Related: