Introduction
I created a toy example to extract the problem into a smaller part.
Here is the schema definition with two tables users and events, users can have multiple events:
create table users (
id int, name text, history jsonb
);
insert into users(id, name) values (1, 'Mike'), (2, 'Jake'), (3, 'Toots');
create table events (
id serial, user_id int, external_id int, type text, timestamp timestamp
);
# Generate test data
insert into events (external_id, user_id, type, timestamp)
select x % 5 + 1, x % 3 + 1, 'random', NOW() - '1 day'::interval * (random()::int * 100) from generate_series(1, 3000000) as x;
With the actual data, the users table would be lower volume table less than ~1M records and events table quite high ~25M records.
Problem
Update users table history column which would be an aggregated result of the events table as jsonb. The events should be grouped together by external_id, user_id based on that the object with min(timestamp) and max(timestamp) would be great.
History example:
history: [{start: '2018-12-12', end: '2018-12-20', external_id: 1}, {start: '2018-11-12', end: '2018-11-20', external_id: 2}]
My example, the grouping part seems to be not correct as an only single object is returned in the subquery. Also by performance wise, this is not the best solution.
update users
set history = e.history
from (
select
user_id,
json_build_array(
json_build_object(
'start', MIN(timestamp),
'end', MAX(timestamp),
'external_id', external_id,
'is_fetched', true
)::jsonb) history
from events
group by external_id, user_id
) e
where users.id = e.user_id
Best Answer
I'm not sure if this is the solution you're looking for, but IMHO you should get the aggregated values and then build the array.
db<>fiddle here