Postgresql – Merge jsonb objects and sum their values over multiple rows

postgresql

I save visits to my database. In my database I have a table with aggregations of visits called visits_agg. This table has one row per hour per hostname. Both are fields in the table which will be used to query the rows needed for viewing statistics.

For example I will query for the visits in the last 24 hours:

SELECT * FROM visits_agg WHERE hostname = 'example.com' AND hour > NOW() - INTERVAL '24 hours'

To create the structure I used this query:

CREATE TABLE IF NOT EXISTS visits_agg (
  id          serial primary key,
  hour        timestamp without time zone default (now() at time zone 'utc'),
  hostname    text not null,
  visits      integer default 0,
  referrers   jsonb not null default '{}'::jsonb,
  paths       jsonb not null default '{}'::jsonb,
  widths      jsonb not null default '{}'::jsonb
);

With some dummy data:

INSERT INTO "visits_agg" ("hour", "hostname", "visits", "referrers", "paths", "widths") VALUES
('2019-01-16 09:00:00', 'example.com', '6', '{"example2.com": 4}', '{"/": 5, "/about": 1}', '{"1280": 5, "1283": 1}'),
('2019-01-16 14:00:00', 'example.com', '7', '{"google": 6}', '{"/": 1, "/about": 3, "/article/title1": 1, "/article/title2": 2}', '{"360": 3, "1398": 1, "1680": 1, "2560": 1}'),
('2019-01-16 15:00:00', 'example.com', '6', '{"google": 6}', '{"/about": 1, "/article/title1": 1, "/article/title2": 3, "/": 1}', '{"360": 3, "1280": 2, "1920": 1}'),
('2019-01-16 16:00:00', 'example.com', '6', '{"facebook": 1, "example2.com": 1}', '{"/": 4, "/about": 1, "/article/title1": 1}', '{"360": 1, "1440": 2, "1680": 2, "1920": 1}');

I would like to get the rows from a certain period and sum all referrers, paths, and widths.

So the result of the above records should be something like this

'example.com',
'25',
'{"example2.com": 4, "google": 12, "facebook": 1}',
'{"/": 11, "/about": 6, "/article/title1": 3, "/article/title2": 5}',
'{"360": 7, "1398": 1, "1440": 2, "1680": 3, "2560": 1, "1280": 7, "1283": 1, "1920": 2}'),

The keys in the jsonb objects can be any string and the values are always numbers. I tried to merge the objects, but can't figure out how to sum the values inside the jsonb objects.

Best Answer

As I pointed out in comments I think this job would be easy to accomplish working with a disaggregated data, because now IMHO you must disaggregate it again:

For example to get aggregated values for referrers:

SELECT 
    hostname,
    r.key as ref_name,
    r.value::int as ref_value
FROM
    visits_agg
CROSS JOIN LATERAL
    jsonb_each(referrers) r;
hostname    | ref_name     | ref_value
:---------- | :----------- | --------:
example.com | example2.com |         4
example.com | google       |         6
example.com | google       |         6
example.com | facebook     |         1
example.com | example2.com |         1

Now you can get acumulated values grouped by ref_name, and then you must build the jsonb array again.

SELECT   hostname, jsonb_agg(jsonb_build_object(ref_name, ref_value)) as referrers
FROM     (SELECT hostname, r.key as ref_name, SUM(r.value::int) as ref_value
          FROM   visits_agg
          CROSS JOIN LATERAL jsonb_each(referrers) r
          GROUP BY hostname, r.key) t1
GROUP BY hostname;

This is my approach, perhaps some Postgres guru can give you a better answer.

SELECT row_to_json(t0, true)
FROM
    (
    SELECT t1.hostname, t1.visits, t2.referrers, t3.paths, t4.widths
    FROM   (SELECT   hostname, SUM(visits) as visits
            FROM     visits_agg
            GROUP BY hostname) t1
    JOIN   (SELECT   hostname, jsonb_agg(jsonb_build_object(ref_name, ref_value)) as referrers
            FROM     (SELECT hostname, r.key as ref_name, SUM(r.value::int) as ref_value
                      FROM   visits_agg
                      CROSS JOIN LATERAL jsonb_each(referrers) r
                      GROUP BY hostname, r.key) t1
           GROUP BY hostname) t2
           ON t1.hostname = t2.hostname
    JOIN   (SELECT   hostname, jsonb_agg(jsonb_build_object(path_name, path_value)) as paths
            FROM     (SELECT hostname, p.key as path_name, SUM(p.value::int) as path_value
                      FROM   visits_agg
                      CROSS JOIN LATERAL jsonb_each(paths) p
                      GROUP BY hostname, p.key) t1
            GROUP BY hostname) t3
           ON t1.hostname = t3.hostname
    JOIN   (SELECT   hostname, jsonb_agg(jsonb_build_object(width_name, width_value)) as widths
            FROM     (SELECT hostname, w.key as width_name, SUM(w.value::int) as width_value
                      FROM   visits_agg
                      CROSS JOIN LATERAL jsonb_each(widths) w
                      GROUP BY hostname, w.key) t1
            GROUP BY hostname) t4
           ON t1.hostname = t4.hostname
    ) t0;

And this is the result:

{
 "hostname":"example.com",
 "visits":25,
 "referrers":[{"google": 12}, {"facebook": 1}, {"example2.com": 5}],
 "paths":[{"/about": 6}, {"/article/title2": 5}, {"/article/title1": 3}, {"/": 11}],
 "widths":[{"1680": 3}, {"1398": 1}, {"2560": 1}, {"360": 7}, {"1280": 7}, {"1440": 2}, {"1920": 2}, {"1283": 1}]
}

db<>fiddle here