How to Unnest and GROUP BY Elements of a JSON Array in PostgreSQL

arraygroup byjsonpostgresql

Given the band table, with a json column holding an array :

id | people
---+-------------
1  | ['John', 'Thomas']
2  | ['John', 'James']
3  | ['James', 'George']

How to list the number of bands each name is part of?
Desired output:

name   | count
-------+------------
John   | 2
James  | 2
Thomas | 1
George | 1

Best Answer

The data type of the column people is json, as is the result of json_array_elements(people). And there is no equality operator (=) for the data type json. So you also cannot run GROUP BY on it. More:

How to remove known elements from a JSON[] array in PostgreSQL?

jsonb has an equality operator, so the "workaround" in your answer is to cast to jsonb and use the equivalent jsonb_array_elements(). The cast adds cost:

jsonb_array_elements(people::jsonb)

Since Postgres 9.4 we also have json_array_elements_text(json) returning array elements as text. Related:

How to turn JSON array into Postgres array?

So:

SELECT p.name, count(*) AS c
FROM   band b, json_array_elements_text(b.people) p(name)
GROUP  BY p.name;

It seems more convenient to get names as text instead of jsonb objects (double-quoted in text representation) and your "desired output" indicates you want / need text in the result to begin with.

GROUP BY on text data is also cheaper than on jsonb, so this alternative "workaround" should be faster for two reasons. (Test with EXPLAIN (ANALYZE, TIMING OFF).)

For the record, there was nothing wrong with your original answer. The comma (,) is just as "correct" as CROSS JOIN LATERAL. Having been defined earlier in standard SQL does no make it inferior. See:

What does [FROM x, y] mean in Postgres?

Neither is it more portable to other RDBMS, and since jsonb_array_elements() or json_array_elements_text() aren't portable to other RDBMS to begin with, that's also irrelevant. The short query does not get any clearer with CROSS JOIN LATERAL IMO, but the last bit is just my personal opinion.

I used the more explicit table and column alias p(name) and the table-qualified reference p.name to defend against possible duplicate names. name is such a common word, it might also pop up as column name in the underlying table band, in which case it would silently resolve to band.name. The simple form json_array_elements_text(people) name only attaches a table alias, the column name is still value, as returned from the function. But name resolves to it's single column value when used in the SELECT list. It happens to work as expected. But a true column name name (if band.name should exist) would bind first. While that won't bite in the given example, it can be a a loaded foot gun in other cases.

Don't use the generic "name" as identifier to begin with. Maybe that was just for the simple test case.

If the column people can hold anything but a plain JSON array, either query would trigger an exception. If you cannot guarantee data integrity, you might want to defend with json_typeof():

SELECT p.name, count(*) AS c
FROM   band b, json_array_elements_text(b.people) p(name)
WHERE  json_typeof(b.people) = 'array'
GROUP  BY 1; -- optional short syntax since you seem to prefer short syntax

Excludes violating rows from the query.

When can we use an identifier number instead of its name in PostgreSQL?

`jsonb` in Postgres 9.4 or later

Consider the jsonb data type in Postgres 9.4 or later. The 'b' at the end stands for 'binary'. Among other things, there is an equality operator (=) for jsonb. Most people will want to switch.

Depesz blog about jsonb.

`json`

There is no = operator defined for the data type json, because there is no well defined method to establish equality for whole json values. But see below.

You could cast to text and then use the = operator. This is short, but only works if your text representation happens to match. Inherently unreliable, except for corner cases. See:

How to query a json column for empty objects?

Or you can unnest the array and use the ->> operator to .. get JSON object field as text and compare individual fields.

Test table

2 rows: first one like in the question, second one with simple values.

CREATE TABLE tbl (
   tbl_id int PRIMARY KEY
 , jar    json[]
);

INSERT INTO t VALUES
   (1, '{"{\"value\" : \"03334/254146\", \"typeId\" : \"ea4e7d7e-7b87-4628-ba50-f5\"}"
        ,"{\"value\" : \"03334/254147\", \"typeId\" : \"ea4e7d7e-7b87-4628-ba50-f6\"}"
        ,"{\"value\" : \"03334/254148\", \"typeId\" : \"ea4e7d7e-7b87-4628-ba50-f7\"}"}')

 , (2, '{"{\"value\" : \"a\", \"typeId\" : \"x\"}"
        ,"{\"value\" : \"b\", \"typeId\" : \"y\"}"
        ,"{\"value\" : \"c\", \"typeId\" : \"z\"}"}');

Demos

Demo 1

You could use array_remove() with text representations (unreliable).

SELECT tbl_id
     , jar, array_length(jar, 1) AS jar_len
     , jar::text[] AS t, array_length(jar::text[], 1) AS t_len
     , array_remove(jar::text[], '{"value" : "03334/254147", "typeId" : "ea4e7d7e-7b87-4628-ba50-f6"}'::text) AS t_result
     , array_remove(jar::text[], '{"value" : "03334/254147", "typeId" : "ea4e7d7e-7b87-4628-ba50-f6"}'::text)::json[] AS j_result
FROM   tbl;

Demo 2

Unnest the array and test fields of individual elements.

SELECT tbl_id, array_agg(j) AS j_new
FROM   tbl, unnest(jar) AS j   -- LATERAL JOIN
WHERE  j->>'value' <> '03334/254146'
AND    j->>'typeId' <> 'ea4e7d7e-7b87-4628-ba50-6a5f6e63dbf5'
GROUP  BY 1;

Demo 3

Alternative test with row type.

SELECT tbl_id, array_agg(j) AS j_new
FROM   tbl, unnest(jar) AS j   -- LATERAL JOIN
WHERE  (j->>'value', j->>'typeId') NOT IN (
         ('03334/254146', 'ea4e7d7e-7b87-4628-ba50-6a5f6e63dbf5')
        ,('a', 'x')
       )
GROUP  BY 1;

`UPDATE` as requested

Finally, this is how you could implement your UPDATE:

UPDATE tbl t
SET    jar = j.jar
FROM   tbl t1
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT j
      FROM   unnest(t1.jar) AS j  -- LATERAL JOIN
      WHERE  j->>'value'  <> 'a'
      AND    j->>'typeId' <> 'x'
      ) AS jar
   ) j
WHERE  t1.tbl_id = 2              -- only relevant rows
AND    t1.tbl_id = t.tbl_id;

db<>fiddle here

About the implicit LATERAL JOIN:

About unnesting arrays:

How to preserve the original order of elements in an unnested array?

DB design

To simplify your situation consider an normalized schema: a separate table for the json values (instead of the array column), joined in a n:1 relationship to the main table.

PostgreSQL – Update Table with Array of JSON

If you know the desired column names (be it the same or different from the keys in the JSON structure, you can use json[b]_to_recordset():

SELECT * FROM jsonb_to_recordset('[
    { "col1": "a", "col2": 1, "col3": 1, "col4": "one"},
    { "col1": "b", "col2": 2, "col3": 2, "col4": "two"},
    { "col1": "c", "col2": 3, "col3": 3, "col4": "three"},
    { "col1": "d", "col2": 4, "col3": 4, "col4": "four"}
]'::jsonb) AS t (col1 text, col2 integer, col3 integer, col4 text);

 col1 │ col2 │ col3 │ col4  
──────┼──────┼──────┼───────
 a    │    1 │    1 │ one
 b    │    2 │    2 │ two
 c    │    3 │    3 │ three
 d    │    4 │    4 │ four

As the documentation tells us,

Note: In json_populate_record, json_populate_recordset, json_to_record and json_to_recordset, type coercion from the JSON is "best effort" and may not result in desired values for some types. JSON keys are matched to identical column names in the target row type. JSON fields that do not appear in the target row type will be omitted from the output, and target columns that do not match any JSON field will simply be NULL.

If you already have a table to work with, json_populate_recordset() is an even better solution:

CREATE TABLE inputtable (col1 text, col2 integer, col3 integer, col4 text);

SELECT * FROM jsonb_populate_recordset(NULL::yourtable, '[
    { "col1": "a", "col2": 1, "col3": 1, "col4": "one"},
    { "col1": "b", "col2": 2, "col3": 2, "col4": "two"},
    { "col1": "c", "col2": 3, "col3": 3, "col4": "three"},
    { "col1": "d", "col2": 4, "col3": 4, "col4": "four"}
]'::jsonb);

 col1 │ col2 │ col3 │ col4  
──────┼──────┼──────┼───────
 a    │    1 │    1 │ one
 b    │    2 │    2 │ two
 c    │    3 │    3 │ three
 d    │    4 │    4 │ four

Now updating the table itself may be done like this:

WITH source AS (SELECT * FROM jsonb_populate_recordset [...])
UPDATE yourtable
   SET col1 = s.col1, col2 = s.col2
  FROM source AS s
 WHERE col3 = s.col3;

In case it seems slow, it might make sense to not use the CTE, but a subquery in the FROM clause instead.

Best Answer

Related Solutions

PostgreSQL – How to Remove Known Elements from JSON[] Array

jsonb in Postgres 9.4 or later

json