Postgresql – Aggregate a JSON doing an average on values without defining keys

aggregatejsonpostgresql

I would like to know if there is a way or an existing function to perform an aggregation on a JSON field by doing the average of values if the key appears multiple times with PostgreSQL. The json's result keys are unknow and have to be add to it if these keys are meet during the aggregate function. They can't be define on the query.

datetime             data
2018-05-06 18:17:00  {"speed":23.3,"orientation":1.3,"o2":75.2,"pm25":12.1}
2018-05-06 19:17:00  {"speed":20.3,"pm25":13.1}
2018-05-07 15:02:00  {"speed":21.3,"orientation":1.3,"pm10":72.2}

Imagine here we want the result aggregate by day

The result wanted :

day         data
2018-05-06  {"speed":21.8,"orientation":1.3,"o2":75.2,"pm25":12.6}
2018-05-07  {"speed":21.3,"orientation":1.3,"pm10":72.2}

The first two rows has been aggregate doing an average on recurrent keys and kept all of the existing keys.

Best Answer

I don't know how to do this, so I'll learn myself as we go!

Test table and data:

postgres=# CREATE TABLE testtable (
postgres(#     dt date,
postgres(#     data jsonb
postgres(# );
CREATE TABLE
postgres=# 
postgres=# INSERT INTO testtable 
postgres-# VALUES 
postgres-# ('2018-05-06 18:17:00','{"speed":23.3,"orientation":1.3,"o2":75.2,"pm25":12.1}'),
postgres-# ('2018-05-06 19:17:00','{"speed":20.3,"pm25":13.1}'),
postgres-# ('2018-05-07 15:02:00','{"speed":21.3,"orientation":1.3,"pm10":72.2}');
INSERT 0 3
postgres=#

First, we don't need the time component of the data, so we cast:

postgres=# select dt::date from testtable ;
     dt     
------------
 2018-05-06
 2018-05-06
 2018-05-07
(3 rows)

postgres=#

Next, we want to decompose the JSON object to its individual elements:

postgres=# SELECT dt::date, jsonb_each_text(data)
FROM testtable;
     dt     |  jsonb_each_text  
------------+-------------------
 2018-05-06 | (o2,75.2)
 2018-05-06 | (pm25,12.1)
 2018-05-06 | (speed,23.3)
 2018-05-06 | (orientation,1.3)
 2018-05-06 | (pm25,13.1)
 2018-05-06 | (speed,20.3)
 2018-05-07 | (pm10,72.2)
 2018-05-07 | (speed,21.3)
 2018-05-07 | (orientation,1.3)
(9 rows)

postgres=#

Now split out the key<>value pairs from the above:

postgres=# SELECT dt::date, jsondata.key, jsondata.value 
postgres-# FROM testtable, jsonb_each_text(data) as jsondata; 
     dt     |     key     | value 
------------+-------------+-------
 2018-05-06 | o2          | 75.2
 2018-05-06 | pm25        | 12.1
 2018-05-06 | speed       | 23.3
 2018-05-06 | orientation | 1.3
 2018-05-06 | pm25        | 13.1
 2018-05-06 | speed       | 20.3
 2018-05-07 | pm10        | 72.2
 2018-05-07 | speed       | 21.3
 2018-05-07 | orientation | 1.3
(9 rows)

postgres=#

Aggregate the above data:

postgres=# SELECT dt::date, jsondata.key, sum(jsondata.value::float) as total
postgres-# FROM testtable, jsonb_each_text(data) as jsondata
postgres-# Group by dt::date, jsondata.key; 
     dt     |     key     | total 
------------+-------------+-------
 2018-05-06 | o2          |  75.2
 2018-05-06 | speed       |  43.6
 2018-05-07 | orientation |   1.3
 2018-05-07 | speed       |  21.3
 2018-05-06 | orientation |   1.3
 2018-05-07 | pm10        |  72.2
 2018-05-06 | pm25        |  25.2
(7 rows)

postgres=#

Now turn the above resultset into JSON:

postgres=# With sourcedata as ( 
postgres(# SELECT dt::date as dt, jsondata.key as key, sum(jsondata.value::float) as total
postgres(# FROM testtable, jsonb_each_text(data) as jsondata
postgres(# Group by dt::date, jsondata.key
postgres(# )
postgres-# Select dt, jsonb_object_agg(key, total)
postgres-# From sourcedata
postgres-# Group by dt; 
     dt     |                       jsonb_object_agg                        
------------+---------------------------------------------------------------
 2018-05-07 | {"pm10": 72.2, "speed": 21.3, "orientation": 1.3}
 2018-05-06 | {"o2": 75.2, "pm25": 25.2, "speed": 43.6, "orientation": 1.3}
(2 rows)

postgres=#

Related Solutions

Mysql – group by clause without aggregate function

This:

SELECT users.* FROM users
INNER JOIN timesheets ON timesheets.user_id = users.id
WHERE (timesheets.submitted_at <= '2010-07-06 15:27:05.117700')
GROUP BY users.id

Finds all users who have a timesheet submitted on or before the given date. It's equivalent to:

SELECT DISTINCT users.* FROM users
INNER JOIN timesheets ON timesheets.user_id = users.id
WHERE (timesheets.submitted_at <= '2010-07-06 15:27:05.117700');

or:

SELECT  users.*
FROM users
WHERE EXISTS (
    SELECT 1
    FROM timesheets 
    WHERE timesheets.user_id = users.id
    AND timesheets.submitted_at <= '2010-07-06 15:27:05.117700'
);

It works because users.id is the primary key, so all other fields of users are functionally dependent on it. PostgreSQL knows that you don't have to use an aggregate to guarantee a single unambiguous result for each field in a row because there can only be one candidate users.name or whatever for any given users.id row.

(Older PostgreSQL versions didn't know how to identify functional dependencies of the primary key and and would throw an ERROR about needing to use an aggregate or include the field in the GROUP BY here).

PostgreSQL – Aggregate Objects into JSON Array

If you are on 9.4 something like this might be what you are after:

select json_object(array_agg(id)::text[],array_agg(rw)::text[])
from( select id
           , ( select to_json(array_agg(row_to_json(t)))
               from (select typ,prop from bgb where id=b.id) t ) rw
      from bgb b
      group by id ) z;

Best Answer

Related Solutions

Mysql – group by clause without aggregate function

PostgreSQL – Aggregate Objects into JSON Array

Related Question