PostgreSQL 9.4 – Equivalent to json_strip_nulls

jsonpostgresqlpostgresql-9.4

The json_strip_nulls function was introduced in 9.5. What could give an equivalent result for 9.4?

I have a json column that is essentially just a text column, but since the contents are JSON, we made it a json column for semantic purposes. Now there are a lot of keys mapped to null, and it's wasting space and causing clutter on the screen.

Casting to jsonb and back doesn't remove the nulls, and 9.4 doesn't give a lot of tools to work with json(b).

Best Answer

I think this can be done using an custom aggregate and existing jsonb functionality:

The following aggregate will "merge" two jsonb values into one. Essentially the same as a || b (where a and b are jsonb values) but as an aggregate for multiple rows.

create aggregate jsonb_merge(jsonb) 
(
    sfunc = jsonb_concat(jsonb, jsonb),
    stype = jsonb
);

With that aggregate it's possible to write a recursive function that removes keys with null values:

create function strip_nulls(p_input jsonb)
  returns jsonb
as
$$
  select jsonb_aggregate(o)
  from (
    select jsonb_build_object(t.k, case jsonb_typeof(t.v) when 'object' then strip_nulls(t.v) else t.v end) o
    from jsonb_each(p_input) as t(k,v)
    where t.v::text <> 'null'
  ) t;
$$
language sql;

The inner select turns a jsonb value into a set of key/value pairs and removes the null values recursively. Then the resulting values are aggregated back into a jsonb object.

select strip_nulls('{"one": null, 
                     "two": 2, 
                     "three": {"four": 1, "five": null, 
                               "six": {"seven": 7, "eight": null}
                              }
                    }'::jsonb)

returns:

strip_nulls                                          
-----------------------------------------------------
{"two": 2, "three": {"six": {"seven": 7}, "four": 1}}

The result can be cast back to json

I don't have a Postgres 9.4 installation around for testing, but I think I only used functions available in 9.4.

Related Solutions

Postgresql – Postgres 9.4+: JSONB instead of EAV

JSONB may be easy to read, but it's complicated and inefficient to write into.

See for example this question: PostgreSQL update and delete property from JSONB column, on how it looks like. It's an order of magnitude harder than an update/delete with classic EAV tables.

Possibly when you'll have written the parts to append/merge/delete key/value pairs, the elegance or simplicity of that JSON-based design are going to seem much less obvious. As for the performance, I'm betting on it being much worse.

At the storage level, any write into one property in a JSON structure will require a rewrite of the entire column (and in fact of the containing row), with the same cost as if all properties changed. This is not optimal in terms of I/O size and pressure on vacuum.

This problem is mentioned on S.O in another question with some more links and references:

How to perform update operations on columns of type JSONB in Postgres 9.4

There's ongoing work to ease JSONB updates for the programmer. jsonbx provides functions and operators that can help with 9.4; presumably these will be integrated into PostgreSQL core in future versions. But the large I/O cost of a small update inside a large JSON object will remain.

jsonbx was demonstrated in a recent PG conference (youtube links):

Update and Delete operations for jsonb (part 1 of 2)
Update and Delete operations for jsonb (part 2 of 2)

Postgresql – “ERROR:malformed array literal” when using json_to_record with a JSON array element in Postgres 9.4

Just a slight variation to Chris's answer:

SELECT a, translate(b, '[]', '{}')::text[] AS b, d
FROM json_to_record('{"a": 1, "b": ["hello", "There"], "c": "bar"}')
AS x(a int, b text, d text);

The idea is the same: massage the JSON array into an array - in this case, through an array literal. In addition to a bit cleaner looking code (though I love it, regex usually does not help much in this regard :), it seems slighly faster, too:

CREATE TABLE jsonb_test (
    id serial,
    data jsonb
);

INSERT INTO jsonb_test (id, data)
SELECT i, format('{"a": %s, "b": ["foo", "bar"], "c": "baz"}', i::text)::jsonb 
FROM generate_series(1,10000) t(i);

SELECT a, string_to_array(regexp_replace(b, '\[*\"*\s*\]*','','g'),',') AS b, d
FROM jsonb_test AS j, 
LATERAL json_to_record(j.data::json) AS r(a int, b text, d text);

-- versus 

SELECT a, translate(b, '[]', '{}')::text[] AS b, d
FROM jsonb_test AS j, 
LATERAL json_to_record(j.data::json) AS r(a int, b text, d text);

On this dataset and on my test box, the regex version shows and average execution time of 300 ms, while my version shows 210 ms.

Best Answer

Related Solutions

Postgresql – Postgres 9.4+: JSONB instead of EAV

Postgresql – “ERROR:malformed array literal” when using json_to_record with a JSON array element in Postgres 9.4

Related Question