Postgresql – Aggregated JSONB result in update query

jsonpostgresqlupdate

Introduction

I created a toy example to extract the problem into a smaller part.
Here is the schema definition with two tables users and events, users can have multiple events:

create table users (
    id int, name text, history jsonb
);
insert into users(id, name) values (1, 'Mike'), (2, 'Jake'), (3, 'Toots');


create table events (
    id serial, user_id int, external_id int, type text, timestamp timestamp
);

# Generate test data
insert into events (external_id, user_id, type, timestamp) 
    select x % 5 + 1, x % 3 + 1, 'random', NOW() - '1 day'::interval * (random()::int * 100) from generate_series(1, 3000000) as x;

With the actual data, the users table would be lower volume table less than ~1M records and events table quite high ~25M records.

Problem

Update users table history column which would be an aggregated result of the events table as jsonb. The events should be grouped together by external_id, user_id based on that the object with min(timestamp) and max(timestamp) would be great.

History example:
history: [{start: '2018-12-12', end: '2018-12-20', external_id: 1}, {start: '2018-11-12', end: '2018-11-20', external_id: 2}]

My example, the grouping part seems to be not correct as an only single object is returned in the subquery. Also by performance wise, this is not the best solution.

 update users
    set history = e.history
    from (
        select 
            user_id,
            json_build_array(
                json_build_object(
                  'start', MIN(timestamp),
                  'end', MAX(timestamp),
                  'external_id', external_id,
                  'is_fetched', true
                )::jsonb) history
        from events 
        group by external_id, user_id
    ) e 
    where users.id = e.user_id

Best Answer

I'm not sure if this is the solution you're looking for, but IMHO you should get the aggregated values and then build the array.

WITH t AS 
(
  SELECT
      user_id,
      jsonb_build_object('start', MIN(timestamp), 
                         'end', MAX(timestamp),
                         'external_id', external_id,
                         'is_fetched', true) ev
  FROM 
      events
  GROUP BY
      user_id,
      external_id
), t2 AS
(
  SELECT
    user_id,
    jsonb_agg(ev) history
  FROM
    t
  GROUP BY
    user_id
)
UPDATE users AS usr
SET    history = t2.history
FROM   t2
WHERE  usr.id = t2.user_id;

SELECT * FROM users;

id | name  | history                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
-: | :---- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1 | Mike  | [{"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 1}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 5}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 3}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 4}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 2}]
 2 | Jake  | [{"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 3}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 2}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 4}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 1}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 5}]
 3 | Toots | [{"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 3}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 1}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 4}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 2}, {"end": "2019-01-26T17:14:37.349739", "start": "2018-10-18T17:14:37.349739", "is_fetched": true, "external_id": 5}]

db<>fiddle here

Related Solutions

PostgreSQL – How to Update Complex JSONB Column

Like I commented, this would be more efficient with a normalized DB layout, with a table like this

CREATE TABLE task_packets (
  task_id int PRIMARY KEY
, state text NOT NULL
-- or: state_id int NOT NULL REFERENCES state(state_id) ...
);

Among other things, we can have a PK constraint enforcing unique task_id numbers. And the UPDATE you want is trivial.

But to answer the question asked:

To SELECT:

SELECT *
FROM   json_test jt
     , LATERAL (
   SELECT jsonb_set(filter_data
                  , '{task_packets}'
                  , jsonb_agg(CASE WHEN elem->>'task_id' = '1001'
                               THEN jsonb_set(elem, '{state}', to_jsonb(text 'DONE'))
                               ELSE elem
                             END)) AS filter_data_new
   FROM   jsonb_array_elements(filter_data->'task_packets') elem
   ) tp
WHERE  jt.filter_data @> '{"task_packets": [{"task_id": 1001}]}';

I suggest a LATERAL join, among other things to exclude the possibility of multiple matching rows that might be lumped together incorrectly in a plain join.

To UPDATE:

UPDATE json_test
SET    filter_data =
   (
   SELECT jsonb_set(filter_data
                  , '{task_packets}'
                  , jsonb_agg(CASE WHEN elem->>'task_id' = '1001'
                                THEN jsonb_set(elem, '{state}', to_jsonb(text 'DONE'))
                                ELSE elem
                              END))
   FROM   jsonb_array_elements(filter_data->'task_packets') elem
   )
WHERE  filter_data @> '{"task_packets": [{"task_id": 1001}]}';

The same can be implemented with a correlated subquery in the UPDATE (or in the SELECT as well).

To make this fast for big tables, be sure to have an appropriate index, ideally a jsonb_path_ops index:

What's the proper index for querying structures in arrays in Postgres jsonb?

PostgreSQL – Dealing with Out-of-Range Values on PARTITION BY RANGE

Is there a way to get the min/max values I could store within the existing partitions?

And you also ask in a comment:

Do you know of a way to get the lower and upper bounds of the range by query?

I wouldn't know of any dedicated system catalog information function for this particular purpose. But:

Range partitioning is based on inheritance internally:

Individual partitions are linked to the partitioned table with inheritance behind-the-scenes;
Inheritance trees are stored in pg_inherits:

one entry for each direct child table
Partition bounds are stored in pg_class.relpartbound in internal format (pg_node_tree).
The system catalog information functions pg_get_expr(pg_node_tree, relation_oid) can:

decompile internal form of an expression

We can build a query from this set of clues. Based on the example for range partitioning in the manual:

SELECT i.inhrelid::regclass
     , partition_bound
     , split_part(partition_bound, '''', 2) AS lower_bound
     , split_part(partition_bound, '''', 4) AS upper_bound
FROM   pg_inherits i
JOIN   pg_class    c ON c.oid = i.inhrelid
     , pg_get_expr(c.relpartbound, i.inhrelid) AS partition_bound
WHERE  inhparent = 'measurement'::regclass;

inhrelid             | partition_bound                                  | lower_bound | upper_bound
:------------------- | :----------------------------------------------- | :---------- | :----------
measurement_y2006m02 | FOR VALUES FROM ('2006-02-01') TO ('2006-03-01') | 2006-02-01  | 2006-03-01 
measurement_y2006m03 | FOR VALUES FROM ('2006-03-01') TO ('2006-04-01') | 2006-03-01  | 2006-04-01