Postgresql – Postgres: Filtering large jsonb arrays coming from a sub-query

arrayjsonpostgresql

I have a table called external_data in Postgres designed with two columns :

+----+---------------------------------------------+
| id |                    data                     |
+----+---------------------------------------------+
|  1 | [{"firstName":"John","lastName":"Doe"},...] |
|  2 | [{"productName":"Mouse","price":123},...]   |
+----+---------------------------------------------+
id: integer
data: jsonb

The purpose of this table is to save arrays of over a million objects.

So, in order to get a specified jsonb data, I'm just doing a simple query:

SELECT data
FROM external_data
WHERE id = 1
LIMIT 1

but I don't want to get all the entries in the json array, since there are too many of them.
Basically this is what I do to filter the entries:

SELECT json_agg(elem) FROM
(SELECT jsonb_array_elements(data) AS elem FROM external_data WHERE id = 1) x
WHERE elem @> '{"property":"value"}'

This works well, taking 1 to 1.5s to perform, but I wonder if i can optimize this request using indexes or something else ?

I've already read about GIN indexes using jsonb_path_ops, but since the jsonb data is from a subquery, creating an index with this command

CREATE INDEX idx_data on external_data using GIN(data jsonb_path_ops)

my query doesn't use the index to perform faster :

"Aggregate  (cost=2.77..2.78 rows=1 width=32)"
"  ->  Subquery Scan on x  (cost=0.00..2.77 rows=1 width=32)"
"        Filter: (x.elem @> '{"Property": "value"}'::jsonb)"
"        ->  ProjectSet  (cost=0.00..1.52 rows=100 width=32)"
"              ->  Seq Scan on external_data  (cost=0.00..1.01 rows=1 width=32)"
"                    Filter: ((id)::integer = 1)"

And I created some functions using plv8 language and tried to use expression index, and it didn't work. Also, using theses functions would increase the query duration by adding 8 to 9s.

Is there an efficient way to optimize these query or it is the optimal way ?

Best Answer

You're doing as well as can be managed with your data structure. The GIN indexes serve the same purpose as any other index - finding rows given a condition. They don't help once the row is located. If you want this to be improved, you'd need to unroll the array elements into separate rows.