Postgresql – Best index for jsonb in Postgres

btreegist-indexindexjsonpostgresql

We have a table containing around 500k rows. The database table is supposed to grow to million of records.

This is how the table looks like:

CREATE TABLE public.influencers
(
    id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass),
    location jsonb,
    gender text COLLATE pg_catalog."default",
    birthdate timestamp without time zone,
    ig jsonb,
    contact_info jsonb,
    created_at timestamp without time zone DEFAULT now(),
    updated_at timestamp without time zone DEFAULT now(),
    categories text[] COLLATE pg_catalog."default",
    search_field text COLLATE pg_catalog."default",
    search_vector tsvector,
    ig_updated_at timestamp without time zone,
    CONSTRAINT influencers_pkey PRIMARY KEY (id),
    CONSTRAINT ig_id_must_exist CHECK (ig ? 'id'::text),
    CONSTRAINT ig_username_must_exist CHECK (ig ? 'username'::text)
)

And these are some of the queries we need to perform efficiently:

SELECT  "public"."influencers".*
FROM "public"."influencers"
WHERE (ig->'follower_count' IS NOT NULL)
ORDER BY (ig->'follower_count') DESC
LIMIT 9
OFFSET 0

SELECT *
FROM "public"."influencers"
WHERE (ig->'follower_count' >= '5000')
LIMIT 9 

SELECT SUM(CAST(ig ->> 'follower_count' AS integer))
FROM "public"."influencers"
WHERE (ig->'follower_count' >= '5000')
  AND (ig->'follower_count' <= '10000')
  AND (ig->'follower_count' IS NOT NULL)

ig -> follower_count are numeric values.

I read that GIN indexes are mainly intended for searching through composite items (text) so I'm guessing the best index to use would be a BTREE. Would that be correct?

Best Answer

GIN indexes do not work with any jsonb operators EXCEPT ? ?& ?| @>. Clearly, your use of comparison operators >= and <= are not on that list. And all of those operators that it could help are also not on the list (meaning the index won't do anything) afaik.

That means needing indexed comparison operators, you'll need a btree.

CREATE INDEX ON public.influencers ((ig->'follower_count'));

All of that said, I think this is silly. Just put follower_count as a proper column on the actual table and enjoy the performance benefits.