Here is the table definition (simplified):
CREATE TABLE documents (
document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
data_block jsonb NULL
);
Sample values:
INSERT INTO documents (document_id, data_block)
VALUES
(878979,
'{"COMMONS": {"DATE": {"value": "2017-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 52408.53}},
{"AMOUNT": {"value": 654.23}}
]}')
, (977656,
'{"COMMONS": {"DATE": {"value": "2018-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 555.10}}
]}');
I want to search all documents where one of the 'PAYABLE_INVOICE_LINES'
elements contains a 'value'
greater than 1000.00.
My query is
select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
where (pil->'AMOUNT'->>'value')::decimal > 1000
But, as I want to limit to 50 documents, I have to group on the document_id
and limit the result to 50.
With millions of documents, this query is very expensive – 10 seconds with 1 million.
I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like @>
.
Do you have some ideas to have better performance?
Best Answer
This is generally hard to optimize: no direct operator or index support for
jsonb
for this kind of test.EXISTS
should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) columnpil
in the result:Related:
Indexed range comparison of array key's value in row inside JSONB
Speed up range test for key values nested in jsonb array of objects
To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an
IMMUTABLE
function in a very small and fast (but also specialized) expression index:Query (must match index expression):
dbfiddle here