Postgresql – Proper way to create an integer jsonb nested key index

castindexjsonpostgresql

I have the following index I'm trying to modify by declaring the poLineNumberInteger key as numeric/integer instead of text. This is valid syntax:

CREATE INDEX ubdrecord_jsonb_extract_path_text_idx
  ON public.ubdrecord
  USING btree
  (jsonb_extract_path_text(metadata, VARIADIC ARRAY['lineInfo'::text, 'poLineNumberInteger'::text]));

However, if I try to modify the "text" to "numeric" in the above statement, I get this error:

ERROR:  invalid input syntax for type numeric: "poLineNumberInteger"
LINE 4: ...h_text(metadata, VARIADIC ARRAY['lineInfo'::text, 'poLineNum...
                                                             ^

********** Error **********

ERROR: invalid input syntax for type numeric: "poLineNumberInteger"
SQL state: 22P02
Character: 158

The question is, how to modify this create statement to declare the key as numeric without having to cast? Also, we have to use the jsonb_extract_path_text wrapper. Even without it, not sure how to appropriately modify the syntax.

Table Structure:

CREATE TABLE ubdrecord
(
  id uuid NOT NULL,
  senderduns character varying(256),
  sendername character varying(256),
  receiverduns character varying(256),
  receivername character varying(256),
  messagetype character varying(100),
  metadata jsonb,
  sourceid uuid NOT NULL,
  sourcetype character varying(10),
  originaltime timestamp without time zone NOT NULL,
  createdtime timestamp without time zone NOT NULL,
  processingmode integer,
  ubdversion character varying(20),
  unitofmeasure character varying(256),
  quantity numeric,
  active boolean,
  documentid character varying(256),
  CONSTRAINT pk_ubdrecord1 PRIMARY KEY (id)
)

* Query *

select "receivername", "unitofmeasure", "ubdversion", "processingmode", "metadata", "senderduns", "active", "createdtime", "sourcetype",
"documentid", "id", "sendername", "messagetype", "quantity", "sourceid", "originaltime", "receiverduns"
from "ubdrecord"
where
(jsonb_extract_path_text("metadata",'lineInfo','poLineNumberInteger') = '11123')
order by "createdtime"
desc limit 1

* Query Explain Plan Results *

'Limit  (cost=0.43..311.36 rows=1 width=4046) (actual time=206530.018..206530.018 rows=0 loops=1)'
'  ->  Index Scan Backward using createdtimeindex on ubdrecord  (cost=0.43..5593950.48 rows=17991 width=4046) (actual time=206530.015..206530.015 rows=0 loops=1)'
'        Filter: (jsonb_extract_path_text(metadata, VARIADIC '{lineInfo,poLineNumberInteger}'::text[]) = '11123'::text)'
'        Rows Removed by Filter: 3598344'
'Planning time: 22.892 ms'
'Execution time: 206530.082 ms'

* Created Time Index *

CREATE INDEX createdtimeindex
  ON public.ubdrecord
  USING btree
  (createdtime);

The explain plan looks like maybe the issue in with the sort column. However, the sort comes back fast (~100-200 ms) for querying on other jsonb values within that metadata block.

* \d ubdrecord *

Time: 0.113s
summarist> \d ub
Did not find any relation named ub.
Time: 0.145s
summarist> \d ubdrecord
+----------------+-----------------------------+-------------+
| Column         | Type                        | Modifiers   |
|----------------+-----------------------------+-------------|
| id             | uuid                        |  not null   |
| senderduns     | character varying(256)      |             |
| sendername     | character varying(256)      |             |
| receiverduns   | character varying(256)      |             |
| receivername   | character varying(256)      |             |
| messagetype    | character varying(100)      |             |
| metadata       | jsonb                       |             |
| sourceid       | uuid                        |  not null   |
| sourcetype     | character varying(50)       |             |
| originaltime   | timestamp without time zone |  not null   |
| createdtime    | timestamp without time zone |  not null   |
| processingmode | integer                     |             |
| ubdversion     | character varying(20)       |             |
| unitofmeasure  | character varying(256)      |             |
| quantity       | numeric                     |             |
| active         | boolean                     |             |
| documentid     | character varying(256)      |             |
+----------------+-----------------------------+-------------+
Indexes:
    "pk_ubdrecord" PRIMARY KEY, btree (id)
    "createdtimeindex" btree (createdtime)
    "deliverynotelinenumberintegerindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['deliveryNoteInfo'::text, 'deliveryNoteLineNumberInteger'::text]))
    "deliverynotenumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['deliveryNoteInfo'::text, 'deliveryNoteNumber'::text]))
    "documentidindex" btree (documentid)
    "invoicenumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['invoiceInfo'::text, 'invoiceNumber'::text]))
    "messagetypeindex" btree (messagetype)
    "polinenumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['lineInfo'::text, 'poLineNumber'::text]))
    "polinenumberintegerindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['lineInfo'::text, 'poLineNumberInteger'::text]))
    "ponumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['lineInfo'::text, 'poNumber'::text]))
    "processingmodeindex" btree (processingmode)
    "receiverindex" btree (receiverduns)
    "salesorderlinenumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['salesOrderInfo'::text, 'salesOrderLineNumber'::text]))
    "salesorderlinenumberintegerindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['salesOrderInfo'::text, 'salesOrderLineNumberInteger'::text]))
    "salesordernumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['salesOrderInfo'::text, 'salesOrderNumber'::text]))
    "senderindex" btree (senderduns)
    "shipmentnumberindex" btree (jsonb_extract_path_text(metadata, VARIADIC ARRAY['shipmentInfo'::text, 'shipmentNumber'::text])) 

* Similar query *

select "receivername", "unitofmeasure", "ubdversion", "processingmode", "metadata", "senderduns", "active", "createdtime", "sourcetype",
"documentid", "id", "sendername", "messagetype", "quantity", "sourceid", "originaltime", "receiverduns" from "ubdrecord"
where
(jsonb_extract_path_text("metadata",'lineInfo','poLineNumber') = '11124124124124')
order by "createdtime" desc limit 1;

* Similar query plan *

'Limit  (cost=3776.77..3776.77 rows=1 width=4046) (actual time=0.053..0.053 rows=0 loops=1)'
'  ->  Sort  (cost=3776.77..3779.20 rows=970 width=4046) (actual time=0.050..0.050 rows=0 loops=1)'
'        Sort Key: createdtime DESC'
'        Sort Method: quicksort  Memory: 25kB'
'        ->  Bitmap Heap Scan on ubdrecord  (cost=19.95..3771.92 rows=970 width=4046) (actual time=0.043..0.043 rows=0 loops=1)'
'              Recheck Cond: (jsonb_extract_path_text(metadata, VARIADIC '{lineInfo,poLineNumber}'::text[]) = '11124124124124'::text)'
'              ->  Bitmap Index Scan on polinenumberindex  (cost=0.00..19.70 rows=970 width=0) (actual time=0.039..0.039 rows=0 loops=1)'
'                    Index Cond: (jsonb_extract_path_text(metadata, VARIADIC '{lineInfo,poLineNumber}'::text[]) = '11124124124124'::text)'
'Planning time: 0.205 ms'
'Execution time: 0.091 ms'

The real head scratcher for me is why the first query doesn't seem to use the index/sortkey as this other one does. The index for both fields look the same to me.

Best Answer

The question is, how to modify this create statement to declare the key as numeric without having to cast?

You can not. From the docs

enter image description here

Addressing

The argument type is text[]. SQL Arrays, like path_elems, are strictly typed and SQL does not have type unions so there is neither a text|numeric type-union, nor a TextOrNumeric type. If you're addressing a deep heterogeneous structure of arrays and objects you need to use -> and ->>.

Return type

You can see above, the return type of jsonb_extract_path_text is simply text. You must cast. There is also jsonb_extract_path but it's return type is fixed to jsonb. What you would need is jsonb_extract_path_numeric and you're not likely to get that. Even if someone implemented it, you'd still have to cast if you wanted to store it as an interger.

Keep in mind, for the purposes of indexing, this casting is only done when the row is written or the column is updated and it's very fast.

See also,