Postgresql – How to select subset JSON with PostgresQL

jsonpostgresqlpostgresql-performancesubquery

I'm trying to deal with a JSON column that does not have an explicit JSON schema or convention. There are varying keys within the JSON. Some keys do not exist, some keys do exist but empty string values.

For example:

CREATE TABLE json_table (
    id         int,
    json_data   json
);

INSERT INTO json_table (id, json_data) VALUES
  (1, '{"foo" : "bar", "baz": "biz"}');
  (2, '{"foo" : "", "baz": "biz"}');
  (3, '{"hello" : "world"}');
  (4, '{"hello" : "world2", "foo" : "bar2", "baz" : "" }');
  1. Is it possible to query this table and select/yield a subset of json_data with only key-value pairs that are non-empty strings?

  2. Is it possible to query this table and select records that contain JSON keys that are within a given list? (I essentially want to find records with json_data that have the key "foo")

For #2 I do have a solution that selects only records where json_data keys are within a regex pattern:

SELECT tmp.* 
FROM (
    SELECT id, 
           ARRAY(SELECT json_object_keys(json_data))::text AS keys
    FROM json_table
) tmp 
WHERE tmp.keys LIKE ANY(ARRAY['%foo%', '%bar%']);

This yields:

"id","keys"
1,"{foo,baz}"
2,"{foo,baz}"
4,"{hello,foo,baz}"

But there's a caveat! I'm returning records that DO have the keys, but empty string values! Is there a way to incorporate a solution to #1 into this and pre-filter the JSON to only "valid" key-value pairs?

Best Answer

You can remove the empty values during aggregation. And use an array operator in the final select to get only rows that contain a specific key:

SELECT tmp.* 
FROM (
  SELECT id, 
         ARRAY(SELECT t.k
               from json_each_text(json_data) as t(k,v)
               where nullif(trim(t.v),'') is not null) AS keys
  FROM json_table
) tmp 
WHERE keys && array['foo','bar']
;

If you want the complete JSON values, rather than just the keys, you can use something like this:

SELECT tmp.* 
FROM (
  SELECT id, 
         (SELECT jsonb_object_agg(t.k, t.v)
               from json_each_text(json_data) as t(k,v)
               where nullif(trim(t.v),'') is not null) AS keys
  FROM json_table
) tmp 
WHERE keys ?| array['foo','bar']
;

(Note that I am using jsonb for the result in this case as that is much more flexible to handle. In general it's recommended to prefer jsonb over json these days)


If you need to cleanup the json frequently, you might want to consider writing a function for that:

create function non_empty_values(p_input jsonb)
  returns jsonb
as
$$
  SELECT jsonb_object_agg(t.k, t.v)
  from jsonb_each(jsonb_strip_nulls(p_input)) as t(k,v)
  where trim(t.v::text) <> '';
$$
language sql
immutable;