PostgreSQL – Search for Nested Values in jsonb Array with Greater Operator

index-tuningjsonperformancepostgresqlpostgresql-10postgresql-performance

Here is the table definition (simplified):

CREATE TABLE documents (
    document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    data_block jsonb NULL
);

Sample values:

INSERT INTO documents (document_id, data_block)
VALUES
   (878979, 
    '{"COMMONS": {"DATE": {"value": "2017-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 52408.53}}, 
         {"AMOUNT": {"value": 654.23}}
     ]}')
 , (977656, 
    '{"COMMONS": {"DATE": {"value": "2018-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 555.10}}
     ]}');

I want to search all documents where one of the 'PAYABLE_INVOICE_LINES' elements contains a 'value' greater than 1000.00.

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal > 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive – 10 seconds with 1 million.

I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like @>.

Do you have some ideas to have better performance?

Best Answer

This is generally hard to optimize: no direct operator or index support for jsonb for this kind of test.

EXISTS should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) column pil in the result:

SELECT *
FROM   documents d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil 
   WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
   );

Related:

To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an IMMUTABLE function in a very small and fast (but also specialized) expression index:

CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
  RETURNS numeric AS
$func$
   SELECT max((a->'AMOUNT'->>'value')::numeric)
   FROM   jsonb_array_elements($1) a
$func$ LANGUAGE sql IMMUTABLE;

CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES')); 

Query (must match index expression):

SELECT *
FROM   documents d
WHERE  f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;

dbfiddle here