PostgreSQL – Create Array from Multiple Columns Without NULLs

arrayjsonpostgresqlpostgresql-9.4

I'm trying to build a query to aggregate together multiple columns in a legacy table stored in a similar structure as below:

CREATE TEMPORARY TABLE foo AS
SELECT * FROM ( VALUES
  (1,'Router','Networking','Sale',NULL),
  (2,NULL,'Router','Networking','Sale'),
  (3,NULL,NULL,'Networking','Sale'),
  (4,NULL,NULL,NULL,NULL)
) AS t(id,tag_1,tag_2,tag_3,tag_4);

An example of NOT WHAT I WANT

This is an example of the query I want to build:

SELECT ID, json_build_array(Tag_1, Tag_2, Tag_3, Tag_4) AS tags
FROM table

The problem is that the above query adds the NULL values from the rows to the array:

ID  Tags
--------------------------------------------------
1   ['Router', 'Networking', 'Sale', null]
2   [null, 'Router', 'Networking', 'Sale']
3   [null, null, 'Networking', 'Sale']
4   [null, null, null, null]

I want to avoid having to write an overly complicated CASE WHEN statement to filter out the NULLs and I'm still new to working PostgreSQL's JSON datatypes. Is there anyway I can avoid including NULLs when building a JSON array in Postgres?

Best Answer

I would suggest not using a JSON array, and instead using the native SQL array syntax which is likely much faster and more efficiently stored. It's also stronger typed. The JSON array is "possibly-heterogeneously-typed" per the docs.

I also wouldn't do this routinely. I would alter the table's schema to have an ARRAY (preferably SQL) on the table itself to store the tags without ever storing null in columns. This can put you down the path of correcting the schema.

Building arrays

Strictly-typed PostgreSQL array

Just use the ARRAY literal constructor.

 SELECT id, ARRAY[tag_1,tag_2,tag_3,tag_4] FROM foo;
 id |             array             
----+-------------------------------
  1 | {Router,Networking,Sale,NULL}
  2 | {NULL,Router,Networking,Sale}
  3 | {NULL,NULL,Networking,Sale}
  4 | {NULL,NULL,NULL,NULL}

A JSON array

SELECT id, json_build_array(tag_1,tag_2,tag_3,tag_4) FROM foo;
 id |            json_build_array            
----+----------------------------------------
  1 | ["Router", "Networking", "Sale", null]
  2 | [null, "Router", "Networking", "Sale"]
  3 | [null, null, "Networking", "Sale"]
  4 | [null, null, null, null]
(4 rows)

Filtering Nulls without manual coalesce

Strictly-typed PostgreSQL array

You can easily filter nulls in a single pass by wrapping the above in array_remove.

SELECT id, array_remove(ARRAY[tag_1,tag_2,tag_3,tag_4], null)
FROM foo;

 id |       array_remove       
----+--------------------------
  1 | {Router,Networking,Sale}
  2 | {Router,Networking,Sale}
  3 | {Networking,Sale}
  4 | {}

JSON array

SELECT id,jsonb_agg(elem)
FROM (SELECT id, ARRAY[tag_1,tag_2,tag_3,tag_4] FROM foo) AS g
CROSS JOIN LATERAL unnest(g.array)
  WITH ORDINALITY AS t(elem,ord)
WHERE elem IS NOT NULL
GROUP BY id
ORDER BY id;

 id |            jsonb_agg             
----+----------------------------------
  1 | ["Router", "Networking", "Sale"]
  2 | ["Router", "Networking", "Sale"]
  3 | ["Networking", "Sale"]