PostgreSQL – How to Select Sub-Object with Given Keys from JSONB

functionsjsonpostgresql

How to extract / get / select a "sub object" from a json(b) object in Postgres?

There seems to be a lot of info that get me almost all the way there but not quite. Lots of stuff about converting to records, filtering and then building a new object out of it. TBH, I'm really surprised that this is not a builtin capability. Perhaps there's a simple way to achieve this by composing builtin fns? What I'm looking for is essentially the equivalent of select-keys; a function (say, jsonb_select_keys) that given this:

SELECT jsonb_select_keys('{"a":42,"b":43,"c":44,"d":97}', '{a,d,e}');

would return this:

{"a":42,"d":97}

Kind of like jsonb_path_query_array but for k-v pairs instead of values.

Best Answer

You can remove given keys with the - operator (the inverse of what you have in mind):

jsonb - text[] → jsonb

Deletes all matching keys or array elements from the left operand.

'{"a": "b", "c": "d"}'::jsonb - '{a,c}'::text[] → {}

But I wouldn't know of a built-in function or operator doing what you ask for in the current Postgres 13.

Remarkably, it's the other way round with SQL SELECT statements. There you get a positive list with SELECT a,d,e FROM tbl, but cannot simply get "all columns except [a,d,e]" like you can get "all keys except [a,d,e]" from a JSON object with the above operator. I would love to have the complementary feature for each.

Also can't think of an easy way with the SQL/JSON path language (Postgres 12+).

Workaround for `jsonb`

You can create a simple function like:

CREATE OR REPLACE FUNCTION f_jsonb_select_keys(_js jsonb, _keys text[])
  RETURNS jsonb
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT jsonb_object_agg(t.key, t.value)
FROM   jsonb_each(_js) t
WHERE  t.key = ANY (_keys);
$func$;

db<>fiddle here

Call:

SELECT f_jsonb_select_keys('{"a":42,"b":43,"c":44,"d":97}', '{a,d,e}');

Returns:

{"a":42,"d":97}

You get NULL for NULL or empty input for either argument (as demonstrated in the fiddle). So I declared the function STRICT (a.k.a. RETURNS NULL ON NULL INPUT), even though the nested function jsonb_object_agg() is not STRICT. That could mess with function inlining, but since this one cannot be inlined anyway (containing an aggregate function), we might as well. See:

IMMUTABLE is only kind of true. Keep reading.

Order of keys?

jsonb does not preserve the order of keys (keys are ordered in deterministic fashion internally), so we need not bother about the order of input rows in the aggregation. Except when there can be duplicates, then the latest copy from the input prevails. The manual:

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

Meaning, if there can be duplicate keys in the input, the order of rows is significant. My function does not explicitly order rows, so the result would not strictly be IMMUTABLE. But the input comes directly from jsonb_each() and there cannot be duplicate keys in jsonb (unlike json). Possible duplicates in the input array of keys are irrelevant for the = ANY construct. So duplicates can never happen. IMMUTABLE after all.

Again, this would prohibit function inlining - if it could be inlined to begin with.

Equivalent for `json`

For completeness: the same for json instead of jsonb:

The first variant does not try to preserve the given order of keys, and it eliminates possible duplicates in the input array of keys (but not possible duplicates from the json input!):

CREATE OR REPLACE FUNCTION f_json_select_keys(_js json, _keys text[])
  RETURNS json
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS
$func$
SELECT json_object_agg (t.key, t.value)
FROM   json_each(_js) t
WHERE  key = ANY (_keys);
$func$;

The second variant preserves the given order of keys and keeps all possible duplicates. (If the same key is 2x in the json input, and 3x in the input array of keys, you get it 6x in the result.):

CREATE OR REPLACE FUNCTION f_json_select_keys(_js json, _keys text[])
  RETURNS json
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS
$func$
SELECT json_object_agg (t.key, t.value ORDER BY ord)
FROM   unnest(_keys) WITH ORDINALITY k(key, ord)
JOIN   json_each(_js) t USING (key);
$func$;

About WITH ORDINALITY:

How to preserve the original order of elements in an unnested array?

I made the json functions STABLE because json_object_agg() is only STABLE as opposed to jsonb_object_agg(), which is IMMUTABLE.

You could use json_build_object ('a', js->'a', 'd', js->'d', 'e', js->'e') but that would include all keys, with a NULL value if not found. Not exactly your request. And you couldn't tell the difference between a missing key and the same key with an actual NULL value.

Related Solutions

PostgreSQL – Group By Transformed Column

You can simplify further:

SELECT date_trunc('month', c.created_at) AS year_month
     , p.reference,
     , count (distinct c.id) as nb_crea,
     , (sum(o.amount_ati_cents) / 100) as ca_euros
from   creations   c
join   products    p on p.id = c.product_id 
join   order_items i on i.creation_id = c.id
join   bundles     b on b.id = i.bundle_id
join   orders      o on o.id = b.order_id
where  o.status NOT IN ('canceled', 'new')
and    c.name like '% - Premium(%'
and    p.reference like 'FLA_%'
GROUP BY year_month, reference
ORDER BY year_month DESC;

Or even:

SELECT to_char(c.created_at, 'YYYY-MM') AS year_month
...
GROUP  BY 1, reference
ORDER  BY 1 DESC;

date_trunc() and to_char() in the manual.

SQLite Trigger – How to Insert Multiple Rows from JSON Data

This is not valid JSON; strings require double quotes.

Anyway, the table-valued function json_each() allows to enumerate a dynamically-sized array:

SELECT * FROM json_each('[["c2","c1","clone"],["c1","n0","clone"],["n0","p0","offspring"],["p0","a2","offspring"],["a2","a1","offspring"],["a1",null,null]]');
key         value                type        atom        id          parent      fullkey     path
----------  -------------------  ----------  ----------  ----------  ----------  ----------  ----------
0           ["c2","c1","clone"]  array                   1                       $[0]        $
1           ["c1","n0","clone"]  array                   5                       $[1]        $
2           ["n0","p0","offspri  array                   9                       $[2]        $
3           ["p0","a2","offspri  array                   13                      $[3]        $
4           ["a2","a1","offspri  array                   17                      $[4]        $
5           ["a1",null,null]     array                   21                      $[5]        $

For extracting the values from an array with a known size, we can use json_extract():

SELECT json_extract(value, '$[0]'), json_extract(value, '$[1]'), json_extract(value, '$[2]')
FROM json_each('[["c2","c1","clone"],["c1","n0","clone"],["n0","p0","offspring"],["p0","a2","offspring"],["a2","a1","offspring"],["a1",null,null]]');
c2          c1          clone
c1          n0          clone
n0          p0          offspring
p0          a2          offspring
a2          a1          offspring
a1

This can then simply be plugged into an INSERT … SELECT … statement.