PostgreSQL – Finding Rows with a Key in a JSONB Array

arrayindexjsonjson-pathpostgresql

I'm trying to query for a key present in an array of objects. This structure:

column jdata
{"name": "Somedata",
 "array": [ {"name":"bla1", "attr": "somevalue"}, 
            {"name":"bla2", "otherdata": "somevalue2"},
            {"name":"bla3", "otherdata": "somevalue"}
           ],
"otherstuff": "stuff"
}

Now I do btree's on jdata->'name' or (jdata->'datetime')::cast and that works great.

I also do json_path_ops where jdata->'array' @> '[{"name":"bla3"}]' works a real charm.

My problem is the attr key can be in any of the objects in the array and I care about the record if the key is present, however the value can be nearly anything. Is there a way to query for this? Is there a way it can be indexed? I want to do jdata->'array' @> '[{"attr": ?}]' Or maybe the ? 'attr' can be used inside an array somehow?

Currently I am thinking a trigger that scans for the key and then moves it to a header with a true or false or whatever, and then a normal btree will work. Is there a better way? I need to edit around 500k records at the average site to add this values.

Please point me in a direction.

Best Answer

Postgres 12 or later: with SQL/JSON path expression

SELECT *
FROM   tbl
WHERE  jdata->'array' @? '$ ? (exists (@."attr"))';

You can look for keys or values, only in the outer nesting level or recurse, abstract arrays away (unnest them) in lax mode or not (strict mode). I added query variations in the fiddle to demonstrate:

db<>fiddle here

@? is jsonpath existence operator

jsonpath value '$ ? (exists (@."attr"))' explained:

$ ... look at each value found in left operand (in default "lax" mode)
? ... run the following test
(exists (@."attr")) ... Does a key with name 'attr' exist?

Can use an index, either with default jsonb_ops:

CREATE INDEX tbl_jdata_array_idx ON tbl USING GIN ((jdata->'array'));

Or with jsonb_path_ops:

CREATE INDEX tbl_jdata_array_path_ops_idx ON tbl USING GIN ((jdata->'array') jsonb_path_ops);

See:

Related:

Postgres 11 or older (no SQL/JSON, yet)

That particular use case is not covered by plain indexes for built-in operators.

Simple query without index support

SELECT *
FROM   tbl
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(jdata->'array') elem
   WHERE  elem ? 'attr' 
   );

EXISTS because we want each qualifying row once, even if multiple array elements can contain the key. And it's faster.
But this query cannot use an index.

Expression index

You can generate a text array of unique keys in the given jsonb array of records - and wrap the expression into a simple IMMUTABLE function:

CREATE OR REPLACE FUNCTION jsonb_arr_record_keys(jsonb)
  RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY (
   SELECT DISTINCT k
   FROM   jsonb_array_elements($1) elem, jsonb_object_keys(elem) k
   )';

COMMENT ON FUNCTION jsonb_arr_record_keys(jsonb) IS '
   Generates text array of unique keys in jsonb array of records.
   Fails if any array element is not a record!';

Then create a GIN expression index based on this function:

CREATE INDEX tbl_special_idx ON tbl USING gin (jsonb_arr_record_keys(jdata->'array'));

Query like this, using the generic array contains operator @>:

SELECT *
FROM   tbl
WHERE  jsonb_arr_record_keys(jdata->'array') @> '{attr}';

Now the index can be used efficiently.

Provide the key name nested in an array ('{attr}'). (You can conveniently check for multiple keys this way ('{attr1, attr2}') or similar ... )

dbfiddle here

Related: