PostgreSQL – Flatten JSON Array into Comma Delimited List

arrayjsonpostgresql

I am trying to construct a SELECT statement which will take a JSONB column in the following format:

{
  "Cities": [
    {
      "Name": "Atlanta"
    },
    {
      "Name": "London"
    },
    {
      "Name": "New York"
    }
  ]
}

The output of the column result set needs to be in the following format:

Atlanta, London, New York

UPDATE

@a_horse_with_no_name's answer below is correct, but my requirements are actually a little more complicated than originally posted. I actually need to fit this select in to a larger (join) query as follows:

select eo.citydata.cities.names <-- Flattened Comma delimited JSON Array
from orderline o 
join eventorders eo on eo.orderlineid = o.id
join events e on e.id = eo.eventid
where e.id = '123'

Clearly the answer provided will need to be modified in order for this to work and I'm struggling to figure out how to do it.

Best Answer

Unnest the array, then aggregate back:

select string_agg(city, ',')
from (
  select x.val ->> 'Name' as city
  from the_table t
     cross join jsonb_array_elements(t.the_column -> 'Cities') as x(val)
) t;

If you have a bigger query, use that in a derived table:

select string_agg(t2.city, ',')
from (
  select x.val ->> 'Name' as city
  from (
    select eo.citydata
    from orderline o 
      join eventorders eo on eo.orderlineid = o.id
      join events e on e.id = eo.eventid
    where e.id = 123
  ) t1
     cross join jsonb_array_elements(t1.citydata -> 'Cities') as x(val)
) t2;

Alternatively - if you need that very often - you can create a function that does this:

create function get_element_list(p_value jsonb, p_keyname text)
  returns text
as 
$$ 
   select string_agg(x.val ->> p_keyname, ',')
   from jsonb_array_elements(p_value) as x(val);
$$
language sql;

Then you can use it like this:

select get_element_list(eo.citydata -> 'Cities', 'Name')
from orderline o 
  join eventorders eo on eo.orderlineid = o.id
  join events e on e.id = eo.eventid
where e.id = 123;

Postgres 9.4 or newer

Obviously inspired by this post, Postgres 9.4 added the missing function(s):
_{Thanks to Laurence Rowe for the patch and Andrew Dunstan for committing!}

To unnest the JSON array. Then use array_agg() or an ARRAY constructor to build a Postgres array from it. Or string_agg() to build a text string.

Aggregate unnested elements per row in a LATERAL or correlated subquery. Then original order is preserved and we don't need ORDER BY, GROUP BY or even a unique key in the outer query. See:

How to apply ORDER BY and LIMIT in combination with an aggregate function?

Replace 'json' with 'jsonb' for jsonb in all following SQL code.

SELECT t.tbl_id, d.list
FROM   tbl t
CROSS  JOIN LATERAL (
   SELECT string_agg(d.elem::text, ', ') AS list
   FROM   json_array_elements_text(t.data->'tags') AS d(elem)
   ) d;

Short syntax:

SELECT t.tbl_id, d.list
FROM   tbl t, LATERAL (
   SELECT string_agg(value::text, ', ') AS list
   FROM   json_array_elements_text(t.data->'tags')  -- col name default: "value"
   ) d;

What is the difference between LATERAL and a subquery in PostgreSQL?

ARRAY constructor in correlated subquery:

SELECT tbl_id, ARRAY(SELECT json_array_elements_text(t.data->'tags')) AS txt_arr
FROM   tbl t;

How to apply ORDER BY and LIMIT in combination with an aggregate function?

Subtle difference: null elements are preserved in actual arrays. This is not possible in the above queries producing a text string, which cannot contain null values. The true representation is an array.

Function wrapper

For repeated use, to make this even simpler, encapsulate the logic in a function:

CREATE OR REPLACE FUNCTION json_arr2text_arr(_js json)
  RETURNS text[] LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT ARRAY(SELECT json_array_elements_text(_js))';

Make it an SQL function, so it can be inlined in bigger queries.
Make it IMMUTABLE (because it is) to avoid repeated evaluation in bigger queries and allow it in index expressions.
Make it PARALLEL SAFE (in Postgres 9.6 or later!) to not stand in the way of parallelism. See:

When to mark functions as PARALLEL RESTRICTED vs PARALLEL SAFE?

Call:

SELECT tbl_id, json_arr2text_arr(data->'tags')
FROM   tbl;

db<>fiddle here

Postgres 9.3 or older

Use the function json_array_elements(). But we get double quoted strings from it.

Alternative query with aggregation in the outer query. CROSS JOIN removes rows with missing or empty arrays. May also be useful for processing elements. We need a unique key to aggregate:

SELECT t.tbl_id, string_agg(d.elem::text, ', ') AS list
FROM   tbl t
CROSS  JOIN LATERAL json_array_elements(t.data->'tags') AS d(elem)
GROUP  BY t.tbl_id;

ARRAY constructor, still with quoted strings:

SELECT tbl_id, ARRAY(SELECT json_array_elements(t.data->'tags')) AS quoted_txt_arr
FROM   tbl t;

Note that null is converted to the text value "null", unlike above. Incorrect, strictly speaking, and potentially ambiguous.

Poor man's unquoting with trim():

SELECT t.tbl_id, string_agg(trim(d.elem::text, '"'), ', ') AS list
FROM   tbl t, json_array_elements(t.data->'tags') d(elem)
GROUP  BY 1;

Retrieve a single row from tbl:

SELECT string_agg(trim(d.elem::text, '"'), ', ') AS list
FROM   tbl t, json_array_elements(t.data->'tags') d(elem)
WHERE  t.tbl_id = 1;

Strings form correlated subquery:

SELECT tbl_id, (SELECT string_agg(trim(value::text, '"'), ', ')
                FROM   json_array_elements(t.data->'tags')) AS list
FROM   tbl t;

ARRAY constructor:

SELECT tbl_id, ARRAY(SELECT trim(value::text, '"')
                     FROM   json_array_elements(t.data->'tags')) AS txt_arr
FROM   tbl t;

Original (outdated) SQL Fiddle.
db<>fiddle here.

Need to select a JSON array element dynamically from a postgresql table

Notes (outdated since pg 9.4)

We would need a json_array_elements_text(json), the twin of json_array_elements(json) to return proper text values from a JSON array. But that seems to be missing from the provided arsenal of JSON functions. Or some other function to extract a text value from a scalar JSON value. I seem to be missing that one, too.
So I improvised with trim(), but that will fail for non-trivial cases ...

PostgreSQL – List of Integers Separated by Comma vs Integer Array for Performance

It's highly likely that the best approach will be a side-table of sometable(main_id, value) where you have a composite index on (main_id, value). This allows very fast lookups to see "for this mainid, does this value exist". This will let you enforce foreign key relationships. Unless you have a good reason, use this conventional relational approach.

Failing that, you can and should use an array field instead of a comma separated list. Using a comma-separated list is just downright horrid from a design point of view. It makes queries harder to write, more error prone, and forces you to do lots of slow and inefficient string manipulation and number parsing just for simple operations, and prevents any kind of integrity checking without very inefficient CHECK constaints or triggers. I think Bill nailed it with:

1,2,3,banana,5

With an array you can use the intarray extension to provide a GiST index that lets you quickly test if the array contains a given value using an indexable @> or <@ operation. You may want to add the btree_gist extension too, so you can create a composite GiST index of main_id, the_values_array, in case your queries are usually of the form:

WHERE main_id = blah AND the_values_array @> ARRAY[42]

(or have two separate indexes and see if it'll do a bitmap index scan).

You can't enforce a foreign key relationship into an array in PostgreSQL yet, though the feature seems to be on the way. You'd need to do it with somewhat complicated triggers in the mean time. Still, it's a lot better than a comma-separated list.

Best Answer

Related Solutions

PostgreSQL – How to Turn JSON Array into Postgres Array?

Postgres 9.4 or newer

Function wrapper

Postgres 9.3 or older

Notes (outdated since pg 9.4)

PostgreSQL – List of Integers Separated by Comma vs Integer Array for Performance

Related Question