Every once in a while we need to backfill a bunch of data based on some complex calculations. Because we have a lot of data, we need to make sure that we are not locking a table for a long time and do the updates in chunks. For that we have this type of script:
DO $$
DECLARE
affected_count integer;
start_time timestamp;
end_time timestamp;
elapsed_interval interval;
BEGIN
LOOP
select now() into start_time;
-- Add code here, make sure it limit it, below is an example
-- Make sure that the query eventually updates 0 rows,
-- otherwise this script will run forever
update checklists
set last_active_date = updated_date
where id in (
select id
from checklists
where last_active_date is null
limit 10000
);
-- End of code
COMMIT;
GET DIAGNOSTICS affected_count = ROW_COUNT;
select now() into end_time;
elapsed_interval := end_time - start_time;
RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval;
IF affected_count = 0 THEN
EXIT;
END IF;
END LOOP;
END; $$
Note the COMMIT
inside the script.
This approach works great! Usually.
Lately, we had a bit more complex logic that drives this kind of update. The later one includes the usage of crosstab. And looks something like this:
DO $$
DECLARE
affected_count integer;
start_time timestamp;
end_time timestamp;
elapsed_interval interval;
BEGIN
LOOP
select now() into start_time;
insert into permissions_stats(
id,
organization_id,
reference_type,
folder_id,
template_id,
all_members_groups_count,
members_count,
groups_count,
guests_count,
inherited_all_members_groups_count,
inherited_members_count,
inherited_groups_count,
inherited_guests_count,
template_shared)
select
uuid_generate_v4() as id,
organization_id,
'Folder' as reference_type,
folder_id,
null as template_id,
sum(all_members_groups_count) as all_members_groups_count,
sum(members_count) as members_count,
sum(groups_count) as groups_count,
sum(guests_count) as guests_count,
sum(inherited_all_members_groups_count) as inherited_all_members_groups_count,
sum(inherited_members_count) as inherited_members_count,
sum(inherited_groups_count) as inherited_groups_count,
sum(inherited_guests_count) as inherited_guests_count,
false as template_shared
from (
select
folder_id,
organization_id,
coalesce(all_members_groups_count, 0) as all_members_groups_count,
coalesce(members_count, 0) as members_count,
coalesce(groups_count, 0) as groups_count,
coalesce(guests_count, 0) as guests_count,
coalesce(inherited_all_members_groups_count, 0) as inherited_all_members_groups_count,
coalesce(inherited_members_count, 0) as inherited_members_count,
coalesce(inherited_groups_count, 0) as inherited_groups_count,
coalesce(inherited_guests_count, 0) as inherited_guests_count
from crosstab('
with recursive folders_path_to_root(start_id, id, parent_folder_id, organization_id) as (
select
f.id as start_id,
f.id,
f.parent_folder_id,
f.organization_id
from folders f
where deleted_by_id is null
union all
select
pf.start_id,
f.id,
f.parent_folder_id,
f.organization_id
from folders_path_to_root pf
inner join folders f on (f.id = pf.parent_folder_id and f.parent_folder_id is not null)
),
folder_ids as (
select f.id as id from folders f
where not exists (select * from permissions_stats where reference_type = ''Folder'' and folder_id = f.id)
order by f.id asc
limit 1000
),
counts_cte as (
select
rp.start_id as folder_id,
rp.organization_id as organization_id,
case
when u.user_type = ''Standard'' and om.guest then ''inherited_guests_count''
when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''inherited_groups_count''
when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''inherited_all_members_groups_count''
else ''inherited_members_count''
end as permit_type,
count (u) as count
from folder_permits fp
inner join folders_path_to_root rp on (rp.id = fp.folder_id and rp.organization_id = fp.organization_id)
inner join organization_memberships om on (fp.organization_membership_id = om.id)
inner join users u on (u.id = om.user_id)
left join groups g on (g.user_id = u.id and u.user_type = ''Group'')
where rp.start_id in (select id from folder_ids)
and rp.id not in (select id from folder_ids)
group by permit_type, rp.start_id, rp.organization_id
)
select
folder_id as folder_id,
organization_id as organization_id,
permit_type,
count
from counts_cte
union all
select
f.id as folder_id,
f.organization_id as organization_id,
case
when u.user_type = ''Standard'' and om.guest then ''guests_count''
when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''groups_count''
when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''all_members_groups_count''
else ''members_count''
end as permit_type,
count (u) as count
from folders f
left join folder_permits fp on (fp.folder_id = f.id)
left join organization_memberships om on (fp.organization_membership_id = om.id)
left join users u on (u.id = om.user_id and u.user_type in (''Standard'', ''Group''))
left join groups g on (g.user_id = u.id and u.user_type = ''Group'')
where f.id in (select id from folder_ids)
group by permit_type, f.id, f.organization_id
',
'
values
(''all_members_groups_count''),
(''members_count''),
(''guests_count''),
(''groups_count''),
(''inherited_all_members_groups_count''),
(''inherited_members_count''),
(''inherited_groups_count''),
(''inherited_guests_count'')
'
) as grouped_counts (
"folder_id" uuid,
"organization_id" uuid,
"all_members_groups_count" int,
"members_count" int,
"guests_count" int,
"groups_count" int,
"inherited_all_members_groups_count" int,
"inherited_members_count" int,
"inherited_groups_count" int,
"inherited_guests_count" int
)
) as pivot
group by pivot.folder_id, pivot.organization_id
on conflict (reference_type, folder_id) where reference_type = 'Folder'
do update set
all_members_groups_count = excluded.all_members_groups_count,
members_count = excluded.members_count,
groups_count = excluded.groups_count,
guests_count = excluded.guests_count,
inherited_all_members_groups_count = excluded.inherited_all_members_groups_count,
inherited_members_count = excluded.inherited_members_count,
inherited_groups_count = excluded.inherited_groups_count,
inherited_guests_count = excluded.inherited_guests_count;
-- End of code
COMMIT;
GET DIAGNOSTICS affected_count = ROW_COUNT;
select now() into end_time;
elapsed_interval := end_time - start_time;
RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval;
IF affected_count = 0 THEN
EXIT;
END IF;
END LOOP;
END; $$
The internal query works pefertly fine. But, when I execute the entire loop, I am getting the following error:
ERROR: cannot begin/end transactions in PL/pgSQL
HINT: Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT: PL/pgSQL function inline_code_block line 168 at SQL statement
Now, digging a bit more into the docs and a bunch of other posts, I found that commit is not supported in plpgsql anonymous functions, as the function will execute with its own transaction.
It kind of makes sense. However, how come the first query works without any issues.
The main target of this process is to allow inserting/updating a lot of data in chunks with an option to stop in a middle and restart from the same point. Without the commit
it will always start from the very beginning every time.
What am I missing?
Best Answer
Well. Your finding is historically correct. (But the terminology is not.) A
DO
command executes ...Note the term "block", not "function". That's key here because a
FUNCTION
still cannot execute transaction control statements likeCOMMIT
in Postgres 13 - and probably never will. But aPROCEDURE
can, which was added with Postgres 11. And starting with the same version, anonymous blocks executed with aDO
command can do that, too.Related:
Did you run both tests with different versions of Postgres?