Postgresql – Postgres btree partial index on jsonb array -> array expression seems to get corrupted for larger tables in version 9.5.x

indexjsonpostgresql

I am looking into indexing of jsonb attributes and I am seeing something suspicious with Postgres 9.5.x but not in higher versions. Below is what I did that triggered the strange query errors. It could be that I am doing something wrong, but seeing this work in newer Postgres versions makes me think it's a bug in 9.5.x (I've tried up to version 9.5.21).

I'm consistently seeing this with table size of about 1 million rows and higher.

The json in the jsonb column contains attributes representing different json types single valued and arrays. I have string, boolean, number integer, number float and date formatted string. The error I see is with < array operator for integer array (I have not tried them all). From the error it seems as if the column -> 'attribute' part of the expression fails to retrieve the correct part of the jsonb value and say for an int array gets the nearby string array etc. This actually changes across the runs as the data is random.

The structure of the json in the properties column is fixed (deterministic) for each value of type column. So each row where type = 8 always has an integer array in properties -> 'r'. type = 7 has the array at properties -> 'q', type = 9 has the array at properties -> 's', etc. In other words type is a logical type in terms of structure (or "schema") of json in properties and all rows with same type value have homogeneous json structure as far as node names and value types (values themselves are random). Also right now arrays are always of length 3.

Is this a bug? Or am I doing something wrong?

CREATE TABLE test1 (
  id SERIAL PRIMARY KEY,
  type INTEGER NOT NULL,
  properties jsonb
);

-- generates test data wherein the json structure of "properties" column varies by "type" column
INSERT INTO test1 (type, properties)
SELECT
  s.type AS type,
  json_build_object(CHR(s.type + 100), md5(random() :: TEXT),
                    CHR(s.type + 101), (random() * 100)::INTEGER,
                    CHR(s.type + 102), (random() * 10)::DOUBLE PRECISION,
                    CHR(s.type + 103), random()::INTEGER::BOOLEAN ,
                    CHR(s.type + 104), to_char(to_timestamp((random() * 1500000000)::DOUBLE PRECISION), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"'),
                    CHR(s.type + 105), ARRAY[md5(random() :: TEXT), md5(random() :: TEXT), md5(random() :: TEXT)],
                    CHR(s.type + 106), ARRAY[(random() * 100)::INTEGER, (random() * 100)::INTEGER, (random() * 100)::INTEGER],
                    CHR(s.type + 107), ARRAY[(random() * 10)::DOUBLE PRECISION, (random() * 10)::DOUBLE PRECISION, (random() * 10)::DOUBLE PRECISION],
                    CHR(s.type + 108), ARRAY[random()::INTEGER::BOOLEAN, random()::INTEGER::BOOLEAN, random()::INTEGER::BOOLEAN],
                    CHR(s.type + 109), ARRAY[
                      to_char(to_timestamp((random() * 1500000000)::DOUBLE PRECISION), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"'),
                      to_char(to_timestamp((random() * 1500000000)::DOUBLE PRECISION), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"'),
                      to_char(to_timestamp((random() * 1500000000)::DOUBLE PRECISION), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"')
                    ]
  ) AS properties
FROM (SELECT (random() * 10) :: INT AS type
      FROM generate_series(1, 1000000)) s;

CREATE OR REPLACE FUNCTION jsonb_array_int_array(JSONB)
  RETURNS INTEGER[] AS
$$
DECLARE
  result INTEGER[];
BEGIN
  IF $1 ISNULL
  THEN
    result := NULL;
  ELSEIF jsonb_array_length($1) = 0
  THEN
    result := ARRAY [] :: INTEGER[];
  ELSE
    SELECT array_agg(x::INTEGER) FROM jsonb_array_elements_text($1) t(x) INTO result;
  END IF;
  RETURN result;
END;
$$
  LANGUAGE plpgsql
  IMMUTABLE;

--  properties -> 'r' field of type 8 is always an array of integers
CREATE INDEX test1_properties_r_int_array_index ON test1 USING btree (jsonb_array_int_array(properties -> 'r')) WHERE type = 8;

-- this works
SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array(properties -> 'r') < ARRAY[50];

-- this fails
SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array(properties -> 'r') < ARRAY[100];

-- but
DROP INDEX test1_properties_r_int_array_index;
-- now it works
SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array(properties -> 'r') < ARRAY[100];

-- also
CREATE INDEX test1_properties_r_int_array_index ON test1 USING gin (jsonb_array_int_array(properties -> 'r')) WHERE type = 8;

-- works here too
SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array(properties -> 'r') < ARRAY[100];

Thank you for your help.

Edit:

Here is some clarification on how it fails. I just re-executed the above and the query fails as follows

sql> SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array(properties -> 'r') < ARRAY[100]
[2020-03-04 00:46:20] [22P02] ERROR: invalid input syntax for integer: "1.73782130237668753"
[2020-03-04 00:46:20] Where: SQL statement "SELECT array_agg(x::INTEGER) FROM jsonb_array_elements_text($1) t(x)"
[2020-03-04 00:46:20] PL/pgSQL function jsonb_array_int_array(jsonb) line 12 at SQL statement

I scanned for the random value from the error message

SELECT id AS txt FROM test1 WHERE position('1.73782130237668753' IN properties::text) > 0;

and found that the row which caused the error actually has type equal to 7 not 8 as in the where clause of the query. So it seems as though the index condition is not satisfied in the row which is being returned.

Here is the plan for the failing query

Aggregate  (cost=69293.65..69293.66 rows=1 width=0)
  ->  Bitmap Heap Scan on test1  (cost=1228.78..69208.38 rows=34111 width=0)
        Recheck Cond: ((jsonb_array_int_array((properties -> 'r'::text)) < '{100}'::integer[]) AND (type = 8))
        ->  Bitmap Index Scan on test1_properties_r_int_array_index  (cost=0.00..1220.25 rows=34111 width=0)
              Index Cond: (jsonb_array_int_array((properties -> 'r'::text)) < '{100}'::integer[])

Edit 2:

Following Laurenz Albe's reply I performed the following test. I defined a new function

CREATE OR REPLACE FUNCTION jsonb_array_int_array2(json_value JSONB, actual_type INTEGER, expected_type INTEGER)
  RETURNS INTEGER[] AS
$$
DECLARE
  result INTEGER[];
BEGIN
  IF actual_type <> expected_type THEN
    RAISE EXCEPTION 'unexpected type % instead of %', actual_type, expected_type;
  END IF;

  IF $1 ISNULL OR actual_type <> expected_type
  THEN
    result := NULL;
  ELSEIF jsonb_array_length(json_value) = 0
  THEN
    result := ARRAY [] :: INTEGER[];
  ELSE
    SELECT array_agg(x::INTEGER) FROM jsonb_array_elements_text(json_value) t(x) INTO result;
  END IF;
  RETURN result;
END;
$$
  LANGUAGE plpgsql
  IMMUTABLE;

I redefined the index and restructured the query as follows

CREATE INDEX test1_properties_r_int_array_index ON test1 USING btree (jsonb_array_int_array2(properties -> 'r', type, 8)) WHERE type = 8;

SELECT count(*) FROM test1 WHERE type = 8 AND jsonb_array_int_array2(properties -> 'r', type, 8) < ARRAY[100];

And now I get

[2020-03-04 09:47:34] [P0001] ERROR: unexpected type 7 instead of 8

Which indicates a step is performed on all the rows, not just ones where type = 8. Is it maybe this from the plan

Recheck Cond: ((jsonb_array_int_array((properties -> 'r'::text)) < '{50}'::integer[]) AND (type = 8))

If this is the order of the evaluation is it possible to reverse it and check type = 8 before jsonb_array_int_array((properties -> 'r'::text)?

Also from the performance (once I remove the exception check and rerun) it seems whole table is scanned.

Is this expected?

Edit 3:

I have realized that this has now become a different question and Laurenz Albe's excellent and detailed response addresses the original issue of "why does it not work". The question now is how to best get to work the original scheme that I was after. I guess I will have to distill it into a separate question.

Thank you!

Btw, as Laurenz predicted, I was able to reproduce the issue on Postgres 10.x with more data.

Edit 4:

For the record, this is not specific to arrays. Any casting of values in this scenario is going to eventually fail with large tables. So given that properties ->> 'm' is always an integer when type = 8 this also is not safe

CREATE INDEX test1_properties_m_int_index ON test1 (((properties ->> 'm')::INTEGER)) WHERE type = 8;

and the query

SELECT count(*) FROM test1 WHERE type = 8 AND (properties ->> 'm')::INTEGER < 50;

fails with

[2020-03-05 09:35:24] [22P02] ERROR: invalid input syntax for integer: "["a1c815126aa058706476b21f37f60038", "450513bd0f25abf8bd39b1b4645a1427", "e51acc579414985eaa59d9bdc3dc8187"]"

The lesson here is, if the json schema is not fixed in the column across the table, whatever casting is done it has to anticipate any jsonb input during indiscriminate scans of portions of the table.

Best Answer

That is an interesting question, so I'll try to give a good answer.

In short, the problem is your function definition, which makes unfounded assumptions about the kind of JSON object it has to deal with.

Explanation of the error:

The error one gets when running your example is not deterministic; it depends on the random numbers in your example. I get this, for example:

ERROR:  cannot get array length of a non-array
CONTEXT:  PL/pgSQL function jsonb_array_int_array(jsonb) line 6 at IF

But the cause is the same.

Observe that your execution plan uses a Bitmap Index Scan. That is, PostgreSQL builds a bitmap in memory that indicates which rows in the table satisfy the index condition. The second step, the Bitmap Heap Scan, then accesses the actual table rows.

You can imagine that such a bitmap consumes memory. Now the amount of memory for a bitmap is limited by the configuration parameter work_mem. If work_mem is too small to contain a bitmap that contains one bit per table row, PostgreSQL will partly degrade to a “lossy bitmap” that contains only one bit per 8 KB block, indicating if the block contains a matching row or not. You can see that in EXPLAIN (ANALYZE) output, but not in your case, because the query fails.

If you have a lossy bitmap, all the rows in a block indicated by the bitmap have to be rechecked to filter out the false positives, so your function is called for arguments that are not in the index at all.

The error is caused by

ELSEIF jsonb_array_length($1) = 0

or

SELECT array_agg(x::INTEGER) FROM jsonb_array_elements_text($1) t(x) INTO result;

Both lines assume that the argument is a jsonb array, and the second tries to cast the array elements to integer. The actual error you get depends on the row that happens to be processed by jsonb_array_int_array.

The problem as such is not connected to a certain PostgreSQL version, and it is a coincidence that you see it on 9.5. Perhaps something has changed in dealing with the work_mem limit or when a bitmap becomes lossy, perhaps the random numbers happened to be different.

Proof for my theory:

Increase work_mem, and you will see that the error magically disappears, because the resulting bitmap is no longer lossy.

Solution:

Change your function so that it does not fail for JSON values that are not arrays of integers.