Postgresql – Aggregate a JSON doing an average on values without defining keys

aggregatejsonpostgresql

I would like to know if there is a way or an existing function to perform an aggregation on a JSON field by doing the average of values if the key appears multiple times with PostgreSQL. The json's result keys are unknow and have to be add to it if these keys are meet during the aggregate function. They can't be define on the query.

datetime             data
2018-05-06 18:17:00  {"speed":23.3,"orientation":1.3,"o2":75.2,"pm25":12.1}
2018-05-06 19:17:00  {"speed":20.3,"pm25":13.1}
2018-05-07 15:02:00  {"speed":21.3,"orientation":1.3,"pm10":72.2}

Imagine here we want the result aggregate by day

The result wanted :

day         data
2018-05-06  {"speed":21.8,"orientation":1.3,"o2":75.2,"pm25":12.6}
2018-05-07  {"speed":21.3,"orientation":1.3,"pm10":72.2}

The first two rows has been aggregate doing an average on recurrent keys and kept all of the existing keys.

Best Answer

I don't know how to do this, so I'll learn myself as we go!

Test table and data:

postgres=# CREATE TABLE testtable (
postgres(#     dt date,
postgres(#     data jsonb
postgres(# );
CREATE TABLE
postgres=# 
postgres=# INSERT INTO testtable 
postgres-# VALUES 
postgres-# ('2018-05-06 18:17:00','{"speed":23.3,"orientation":1.3,"o2":75.2,"pm25":12.1}'),
postgres-# ('2018-05-06 19:17:00','{"speed":20.3,"pm25":13.1}'),
postgres-# ('2018-05-07 15:02:00','{"speed":21.3,"orientation":1.3,"pm10":72.2}');
INSERT 0 3
postgres=#

First, we don't need the time component of the data, so we cast:

postgres=# select dt::date from testtable ;
     dt     
------------
 2018-05-06
 2018-05-06
 2018-05-07
(3 rows)

postgres=# 

Next, we want to decompose the JSON object to its individual elements:

postgres=# SELECT dt::date, jsonb_each_text(data)
FROM testtable;
     dt     |  jsonb_each_text  
------------+-------------------
 2018-05-06 | (o2,75.2)
 2018-05-06 | (pm25,12.1)
 2018-05-06 | (speed,23.3)
 2018-05-06 | (orientation,1.3)
 2018-05-06 | (pm25,13.1)
 2018-05-06 | (speed,20.3)
 2018-05-07 | (pm10,72.2)
 2018-05-07 | (speed,21.3)
 2018-05-07 | (orientation,1.3)
(9 rows)

postgres=# 

Now split out the key<>value pairs from the above:

postgres=# SELECT dt::date, jsondata.key, jsondata.value 
postgres-# FROM testtable, jsonb_each_text(data) as jsondata; 
     dt     |     key     | value 
------------+-------------+-------
 2018-05-06 | o2          | 75.2
 2018-05-06 | pm25        | 12.1
 2018-05-06 | speed       | 23.3
 2018-05-06 | orientation | 1.3
 2018-05-06 | pm25        | 13.1
 2018-05-06 | speed       | 20.3
 2018-05-07 | pm10        | 72.2
 2018-05-07 | speed       | 21.3
 2018-05-07 | orientation | 1.3
(9 rows)

postgres=#

Aggregate the above data:

postgres=# SELECT dt::date, jsondata.key, sum(jsondata.value::float) as total
postgres-# FROM testtable, jsonb_each_text(data) as jsondata
postgres-# Group by dt::date, jsondata.key; 
     dt     |     key     | total 
------------+-------------+-------
 2018-05-06 | o2          |  75.2
 2018-05-06 | speed       |  43.6
 2018-05-07 | orientation |   1.3
 2018-05-07 | speed       |  21.3
 2018-05-06 | orientation |   1.3
 2018-05-07 | pm10        |  72.2
 2018-05-06 | pm25        |  25.2
(7 rows)

postgres=# 

Now turn the above resultset into JSON:

postgres=# With sourcedata as ( 
postgres(# SELECT dt::date as dt, jsondata.key as key, sum(jsondata.value::float) as total
postgres(# FROM testtable, jsonb_each_text(data) as jsondata
postgres(# Group by dt::date, jsondata.key
postgres(# )
postgres-# Select dt, jsonb_object_agg(key, total)
postgres-# From sourcedata
postgres-# Group by dt; 
     dt     |                       jsonb_object_agg                        
------------+---------------------------------------------------------------
 2018-05-07 | {"pm10": 72.2, "speed": 21.3, "orientation": 1.3}
 2018-05-06 | {"o2": 75.2, "pm25": 25.2, "speed": 43.6, "orientation": 1.3}
(2 rows)

postgres=#