I have a table called external_data in Postgres designed with two columns :
+----+---------------------------------------------+
| id | data |
+----+---------------------------------------------+
| 1 | [{"firstName":"John","lastName":"Doe"},...] |
| 2 | [{"productName":"Mouse","price":123},...] |
+----+---------------------------------------------+
id: integer
data: jsonb
The purpose of this table is to save arrays of over a million objects.
So, in order to get a specified jsonb data, I'm just doing a simple query:
SELECT data
FROM external_data
WHERE id = 1
LIMIT 1
but I don't want to get all the entries in the json array, since there are too many of them.
Basically this is what I do to filter the entries:
SELECT json_agg(elem) FROM
(SELECT jsonb_array_elements(data) AS elem FROM external_data WHERE id = 1) x
WHERE elem @> '{"property":"value"}'
This works well, taking 1 to 1.5s to perform, but I wonder if i can optimize this request using indexes or something else ?
I've already read about GIN indexes using jsonb_path_ops, but since the jsonb data is from a subquery, creating an index with this command
CREATE INDEX idx_data on external_data using GIN(data jsonb_path_ops)
my query doesn't use the index to perform faster :
"Aggregate (cost=2.77..2.78 rows=1 width=32)"
" -> Subquery Scan on x (cost=0.00..2.77 rows=1 width=32)"
" Filter: (x.elem @> '{"Property": "value"}'::jsonb)"
" -> ProjectSet (cost=0.00..1.52 rows=100 width=32)"
" -> Seq Scan on external_data (cost=0.00..1.01 rows=1 width=32)"
" Filter: ((id)::integer = 1)"
And I created some functions using plv8 language and tried to use expression index, and it didn't work. Also, using theses functions would increase the query duration by adding 8 to 9s.
Is there an efficient way to optimize these query or it is the optimal way ?
Best Answer
You're doing as well as can be managed with your data structure. The GIN indexes serve the same purpose as any other index - finding rows given a condition. They don't help once the row is located. If you want this to be improved, you'd need to unroll the array elements into separate rows.