Postgresql – Indexed range comparison of array key’s value in row inside JSONB

indexjsonpostgresql

This is for tests with results. The order and if the test was done can be vary per row.

column jdata
{"name": "Somedata",
 "array": [ {"name":"test0","value":4.6}, 
            {"name":"test2", "value": 6.7},
            {"name":"test3", "value": 7.5},
            {"name":"test1","value":4.6}
           ],
"otherstuff": "stuff"
}

I want to get all occurrences of test1 where value is gt 4.5. test1 can be in any position in the array.

The best I can come up with is doing an indexed scan for @> {"name": "test1"} to get the IDs then on that to somehow construct a path getting (jdata->'array'->(gettest1indexsubqueryoffsorts)->'value')::float > 4.5

An index could maybe be constructed that summarizes all the tests into {"test1": 4.6, "test2": xx.xx} but again range checking cannot be done on that or the data structure can be changed. Pointers would be great.

Best Answer

You want to test a given key for a range of numeric values (not just for equality), but existing jsonb operators do not provide such functionality, even less with index support.

Makes the task difficult to speed up, but there are still options. The best solution depends on data distribution, value frequencies, typical queries, avg. column size and other undisclosed details.

Assuming the general case that you want to test for arbitrary names and values and return the whole row.

SELECT *
FROM   tbl t
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(t.jdata->'array') j
   WHERE  j->>'name' = 'test1'                           -- !
   AND   (j->>'value')::numeric > 4.5
   )
AND t.jdata->'array' @> '[{"name": "test1"}]';

The outer predicate AND t.jdata->'array' @> '[{"name": "test1"}]' is logically redundant and only makes sense if you expect a small percentage of rows to even have 'test1', in which case the query can make good use of this jsonb_path_ops index:

CREATE INDEX tbl_jdata_arr_test1_idx ON tbl USING gin ((jdata->'array') jsonb_path_ops);

Don't index the whole jsonb column for this purpose, would just makes the index bigger and slower. Only put the nested array into the index and make sure the WHERE condition of the query matches.

If you're only ever interested in 'test1', make it a partial index instead:

CREATE INDEX ...
WHERE  jdata->'array' @> '[{"name": "test1"}]';

We must repeat the test j->>'name' = 'test1' in the EXISTS expression to assert the object with "name":"test1" provides the value > 4.5 - and not just any object in the array.

Also, I would rather cast to numeric, not float. The implicit rounding of floating point numbers might make your test fail in extreme cases.

Related:

More specialized, smaller and faster indexes are possible, for instance by extracting a Postgres array of names and only indexing that. Related: