PostgreSQL 9.4 deep merge jsonb values

jsonpostgresqlpostgresql-9.4

Requirements and situation

We're currently using a JSONB column to speed up somewhat arbitrary searches in our database and this is working fine so far. When updating the data the requirement is as follows:

  • Any new value will overwrite any existing value. This includes nulls and arrays.
  • Any (nested) objects will be merged.
  • Null values as well as empty arrays and objects will be removed (should they exist at all).

To illustrate that, consider this example:

Existing (contains a null value for illustration purposes):

{
  "a":null, 
  "b":1, 
  "c":1,
  "f":1, 
  "g": { 
     "nested": 1 
  }
}

This should be merged into the existing object:

{
  "b":2, 
  "d":null, 
  "e":2, 
  "f":null, 
  "g":{
    "nested": 2
  }
}

As you can see we're overriding a few fields and remove f. So the expected output would be:

{
  "b": 2, //overridden
  "c": 1, //kept
  "e": 2, //added
  "g": {
    "nested": 2 //overridden
  }
}

To achieve this we're using the following function:

CREATE OR REPLACE FUNCTION jsonb_merge(jsonb1 JSONB, jsonb2 JSONB) 
RETURNS JSONB LANGUAGE sql IMMUTABLE
AS $$
  SELECT  
    CASE    
      WHEN jsonb_typeof($1) = 'object' AND jsonb_typeof($2) = 'object' THEN
        (
          SELECT jsonb_object_agg(merged.key, merged.value) FROM 
          (
            SELECT
              COALESCE( p1.key, p2.key ) as key,          
              CASE 
                WHEN p1.key IS NULL then p2.value
                WHEN p2.key IS NULL THEN  p1.value
                ELSE jsonb_merge(  p1.value, p2.value ) 
              END AS value 
            FROM  jsonb_each($1) p1 
            FULL OUTER JOIN jsonb_each($2) p2 ON p1.key = p2.key                 
          ) AS merged
          -- Removing this condition reduces runtime by 70%
          WHERE NOT (merged.value IS NULL OR merged.value in ( '[]', 'null', '{}') )
        ) 
      WHEN jsonb_typeof($2) = 'null' OR (jsonb_typeof($2) = 'array' AND jsonb_array_length($2) < 1) OR $2 = '{}' THEN
        NULL
      ELSE    
        $2    
    END     
$$;

Problem and question

As I said this works quite well from a functional point of view. However, that function is very slow.

One finding was that the condition on merged.value slows the query down. Removing it results in about 70% lower execution time but obviously the result is not as required.

So how could we achieve a fast deep merge of jsonb objects?

Please note that the Postgres 9.5 || operator doesn't work as intended, i.e. it keeps the unwanted elements. While we could use the special remove operations that would complicate our queries and I'm not sure that would be faster.

Considered options

So far we considered the following (unsatisfying) options:

  • Upgrade our 9.4 server to 9.5 or 9.6. Problem: the new operator doesn't work the way we need it to, so we'd still have to use a function or heavily refactor our queries. Additionally upgrading or even restarting our production servers is something we're meant to avoid as much as possible.
  • Use some scripting language like Python etc. Again there's the problem of having to avoid server restarts. Additionally we'd have to do a full security review first.

That being said we'd like to solve the problem on Postgres 9.4 and using SQL or PL/pgSQL if possible.

Update

I experimented a little further and found the following function pass my tests and be way faster than my previous (10x). I'm pretty sure this works as intended but since I'm no database expert any second look is welcome:

CREATE OR REPLACE FUNCTION jsonb_update(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB  LANGUAGE sql IMMUTABLE
AS $$
  SELECT json_object_agg(merged.key, merged.value)::jsonb FROM 
  (
    WITH existing_object AS (
      SELECT key, value FROM jsonb_each($1) 
        WHERE NOT (value IS NULL OR value in ('[]', 'null', '{}') ) 
    ),
    new_object AS (
      SELECT key, value FROM jsonb_each($2)
    ),
    deep_merge AS (
      SELECT lft.key as key, jsonb_merge( lft.value, rgt.value ) as value
        FROM existing_object lft
        JOIN new_object rgt ON ( lft.key = rgt.key) 
          AND jsonb_typeof( lft.value ) = 'object' 
          AND jsonb_typeof( rgt.value ) = 'object' 
    )

    -- Any non-empty element of jsonb1 that's not in jsonb2 (the keepers)
    SELECT key, value FROM existing_object 
      WHERE key NOT IN (SELECT key FROM new_object )
    UNION
    -- Any non-empty element from jsonb2 that's not to be deep merged (the simple updates and new elements)
    SELECT key, value FROM new_object 
      WHERE key NOT IN (SELECT key FROM deep_merge )                 
        AND NOT (value IS NULL OR value in ('[]', 'null', '{}') ) 
    UNION
    -- All elements that need a deep merge
    SELECT key, value FROM deep_merge
  ) AS merged 
$$;  

Best Answer

I am with a_horse on this: Upgrade to Postgres 9.6 to have new options at your disposal (and for other reasons).

While stuck with 9.4, it might help to simplify like this:

CREATE OR REPLACE FUNCTION jsonb_merge2(jsonb1 JSONB, jsonb2 JSONB) 
  RETURNS JSONB LANGUAGE sql IMMUTABLE AS
$func$
SELECT
CASE    
   WHEN jsonb_typeof($1) = 'object' AND jsonb_typeof($2) = 'object' THEN
     (
       SELECT jsonb_object_agg(merged.key, merged.value)
       FROM  (
         SELECT key
              , CASE WHEN p1.value <> p2.value          -- implies both are NOT NULL
                     THEN jsonb_merge2(p1.value, p2.value) 
                     ELSE COALESCE(p2.value, p1.value)  -- p2 trumps p1
                END AS value 
         FROM   jsonb_each($1) p1 
         FULL   JOIN jsonb_each($2) p2 USING (key)      -- USING helps to simplify
         ) AS merged
       WHERE  merged.value IS NOT NULL                  -- simpler, might help query planner
       AND    merged.value NOT IN ( '[]', 'null', '{}' )
     ) 
   WHEN $2 IN ( '[]', 'null', '{}' ) THEN               -- just as simple as above
     NULL
   ELSE    
     $2    
 END
$func$;