Postgresql – Join table from values nested deeply in a jsonb field

jsonpostgresql

In Postgres 12.3 I am having trouble building the correct SQL query that can join a table from a deeply nested array in a jsonb object with rows from another table.

The deeply nested array contains strings and optionally an array of strings. These arrays should be flattened and then used to join in the other table.

Here is a DB Fiddle with the below data ready to test:
https://www.db-fiddle.com/f/6tMMNn3JsgkbehCptw4xEx/0

Details

I need a SQL query given the below tables that will return from order_batches the batches which contain orders where state does not equal 'picked' or 'packed'. With the given table data below only order_batches with ID 1 should be returned.

Essentially I need the query to look at each row in order_batches, for all the tags->'bins' arrays, find and flatten any order_numbers, and join (or sub-select?) the orders table on those order_numbers. The where clause should retrieve only those batches that include an order with a state that does not match 'picked' or 'packed'.

Expressing this in pseudo sql might look like:

SELECT order_batches.*
FROM order_batches
JOIN orders ON order_batches.tags->'bins'->__all array elements__->'order_numbers' = orders.number
WHERE orders.state IS NOT IN ('picked', 'packed')

Desired Output

The output that I am looking for is equal to the result of this query:

SELECT * from order_numbers WHERE id = 1
╔════╦══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ id ║                                                             tags                                                             ║
╠════╬══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  1 ║ {"bins":[{"order_numbers":["555111111","555111112",["555111114","555111115"]]},{"order_numbers":["555111116","555111117"]}]} ║
╚════╩══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Resources

I have tried searching for an answer, but did not find one exactly like mine. These results have helped, but I'm still not sure how to get it right:

Test Data

Here is the db schema and test data:

create table "order_batches" (id bigint primary key, tags jsonb);
insert into "order_batches" 
  (id, tags)
values 
  (1, '{
  "bins": [
    {
      "order_numbers": [
        "555111111",
        "555111112",
        [
          "555111114",
          "555111115"
        ]
      ]
    },
    {
      "order_numbers": [
        "555111116",
        "555111117"
      ]
    }
  ]
}'),
(2, '{
  "bins": [
    {
      "order_numbers": [
        "555111118",
        "111111111"
            ]
        }
  ]
}');

create table "orders" (id bigint primary key, "number" varchar(255), state varchar(255));
INSERT INTO "orders" (id, "number", state) VALUES
(1, '555111111', 'picked'),
(2, '555111112', 'picked'),
(3, '555111114', 'picked'),
(4, '555111115', 'ready'),
(5, '555111116', 'ready'),
(6, '555111117', 'ready'),
(7, '555111118', 'picked'),
(8, '111111111', 'packed');

Attempts

Here are some queries I have tried in my attempt to get it working and to understand the problem better:

SELECT ob.id, j.nums->'order_numbers'
FROM order_batches ob
CROSS JOIN lateral jsonb_array_elements(tags->'bins') j(nums)
JOIN orders o on o.number = j.nums->'order_numbers'
WITH numbers AS(
    SELECT jsonb_array_elements("tags"->'bins') AS numbers FROM order_batches
)
SELECT * FROM numbers JOIN orders o ON o.number = numbers.numbers
SELECT 
case jsonb_typeof(numbers)
    when 'string' then numbers->>0
    when 'array' then jsonb_array_elements_text(numbers)
end as number
FROM order_batches, jsonb_array_elements(order_batches.tags->'bins') as numbers

Thanks in advance for any help with this problem. At this point I do not think it would be worth revising the data structure. I'm hoping a SQL expert can whip up some magic to help me (and others) out here.

Best Answer

This is relatively easy to solve in Postgres 12, because the new SQL JSON/Path feature does the heavy lifting of iterating through (nested) arrays.

SELECT ob.*
FROM  order_batches ob
where exists (select *
              from orders ord
              where jsonb_path_exists(ob.tags, '$.bins[*].order_numbers[*] ? (@ == $nr)', jsonb_build_object('nr', ord.number))
              and ord.state NOT IN ('picked', 'packed'));

The JSON path expression '$.bins[*].order_numbers[*] iterates through all array elements in the order_numbers array - including nested arrays. The jsonb_path_exists() then checks if there is a match with the number from the orders table.

The number is passed as a "json parameter" to the jsonb_path_exists() function and the $nr placeholder references that. The name of that parameter must match the key of the JSON value passed as the third parameter of the jsonb_path_exists function.

I don't think this is going to be very fast though as the way the jsonb_path_exists() function is used, can't make use of an index as far as I know.

Online example