PostgreSQL JSON – How to Aggregate JSON Column in PostgreSQL

aggregatefunctionspostgresql

I have a table with a JSON column in Postgres, I am grouping the rows and I am looking for some aggregate function that could "sum up" values from jsons.

sum aggregator sums the integers, I am looking for equivalent aggregator for JSONs, for example these 2 JSONs:

 {a: 1, b: 4}
 {b: 7, c: 3}

should be aggregated to

{a: 1, b: 11, c: 3}

Note that the values in conflicting keys are summed. JSONs are always 'str->int'

Is it possible to that in Postgres? Alternatively is it possible to write a custom aggregator to do that in C?

Best Answer

You need to first normalize the JSON data into a result that can be aggregated:

with data (doc) as (
  values 
    ('{"a": 1, "b": 4}'::jsonb), 
    ('{"b": 7, "c": 3}'::jsonb)
)
select k, v::int as val
from data, jsonb_each_text(doc) as t(k,v);

returns

k | v
--+--
a | 1
b | 4
b | 7
c | 3

This can now be aggregated per key:

select k, sum(v::int) s
from data, jsonb_each_text(doc) as t(k,v)
group by k;

returns:

k | s
--+----
c |  3
a |  1
b | 11

This can now be converted back and aggregated into a JSON:

select jsonb_object_agg(k,s) as doc
from (
  select k, sum(v::int) s
  from data, jsonb_each_text(doc) as t(k,v)
  group by k
) t

returns

doc                      
-------------------------
{"a": 1, "b": 11, "c": 3}

Related Solutions

PostgreSQL – Using Nested WHEN Aggregate Functions

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count(                                     d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
     , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
     , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."SettlementPointName" = 'John'
AND    d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
     , count(*) FILTER (WHERE d."SettlementPointPrice" <  10.5) AS da_00_10
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
                        AND   d."SettlementPointPrice" <  20.5) AS da_11_20
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
                        AND   d."SettlementPointPrice" <  30.5) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP  BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

PostgreSQL – Use Trigger to Synchronize JSON Columns on Insert or Update

Actually, this is all you need:

NEW := jsonb_populate_record(NEW, NEW.json);

Per documentation:

jsonb_populate_record(base anyelement, from_json jsonb)
Expands the object in from_json to a row whose columns match the record type defined by base (see note below).

What's not documented: The row provided as first argument retains all values that are not overwritten (no matching key in the json value). I see no reason why this should change, but you cannot fully rely on it unless it's documented.

One thing to note - you wrote:

Setting the column to NULL if the corresponding json field does not exist is fine.

This retains all values with no matching key in the JSON value, which should be even better.

If "undocumented" is too uncertain for you, use the hstore operator #= doing exactly the same.

NEW := (NEW #= hstore(jsonb_populate_record(NEW, NEW.json)));

The hstore module should be installed in most systems anyway. Instructions:

Assign to NEW by key in a Postgres trigger

Both solution can be derived from my answer that Daniel already referenced:

How to set value of composite variable field using dynamic SQL

Function code

CREATE OR REPLACE FUNCTION json_fn()
  RETURNS TRIGGER AS
$func$
BEGIN
   NEW := jsonb_populate_record(NEW, NEW.json); -- or hstore alternative
   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

Everything else in your setup looks right, just add a PK to testy:

CREATE TABLE testy (
    id   int   PRIMARY KEY REFERENCES testy_index
  , data jsonb NOT NULL
);

Tested in pg 9.4 and it works for me as advertised. I doubt that the PLv8 function can rival performance and simplicity.

Set other columns to NULL

As per comment:

CREATE OR REPLACE FUNCTION json_fn()
  RETURNS TRIGGER AS
$func$
DECLARE
  _j jsonb := NEW.json;  -- remember the json value
BEGIN
   NEW := jsonb_populate_record(NULL::testy, _j);
   NEW.json := _j;   -- reassign
   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

Obviously, you need to make sure that the column name or your jsonb column does not appear as key name in the JSON value. And i wouldn't use json as column name, since it's a data type name and that can get confusing.