Postgresql – Using a commit inside plpgsql block works in one case not another

functionsplpgsqlpostgresql

Every once in a while we need to backfill a bunch of data based on some complex calculations. Because we have a lot of data, we need to make sure that we are not locking a table for a long time and do the updates in chunks. For that we have this type of script:

DO $$
DECLARE
    affected_count integer;
    start_time timestamp;
    end_time timestamp;
    elapsed_interval interval;
BEGIN
  LOOP    
    select now() into start_time;

    -- Add code here, make sure it limit it, below is an example
    -- Make sure that the query eventually updates 0 rows, 
    -- otherwise this script will run forever
    
    update checklists
    set last_active_date = updated_date
    where id in ( 
      select id 
      from checklists 
      where last_active_date is null
      limit 10000
    );
    
    -- End of code
    
    COMMIT;

    GET DIAGNOSTICS affected_count = ROW_COUNT;
    
    select now() into end_time;
    elapsed_interval := end_time - start_time;
    
    RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval;
    IF affected_count = 0 THEN
      EXIT;
    END IF;
  END LOOP;
END; $$

Note the COMMIT inside the script.

This approach works great! Usually.

Lately, we had a bit more complex logic that drives this kind of update. The later one includes the usage of crosstab. And looks something like this:

DO $$
DECLARE
    affected_count integer;
    start_time timestamp;
    end_time timestamp;
    elapsed_interval interval;
BEGIN
  LOOP  
    select now() into start_time;
    
insert into permissions_stats(
  id,
  organization_id,
  reference_type,
  folder_id,
  template_id,
  all_members_groups_count,
  members_count,
  groups_count,
  guests_count,
  inherited_all_members_groups_count,
  inherited_members_count,
  inherited_groups_count,
  inherited_guests_count,
  template_shared)
select
  uuid_generate_v4() as id,
  organization_id,
  'Folder' as reference_type,
  folder_id,
  null as template_id,
  sum(all_members_groups_count) as all_members_groups_count,
  sum(members_count) as members_count,
  sum(groups_count) as groups_count,
  sum(guests_count) as guests_count,
  sum(inherited_all_members_groups_count) as inherited_all_members_groups_count,
  sum(inherited_members_count) as inherited_members_count,
  sum(inherited_groups_count) as inherited_groups_count,
  sum(inherited_guests_count) as inherited_guests_count,
  false as template_shared
from (
  select
    folder_id,
    organization_id,
    coalesce(all_members_groups_count, 0) as all_members_groups_count,
    coalesce(members_count, 0) as members_count,
    coalesce(groups_count, 0) as groups_count,
    coalesce(guests_count, 0) as guests_count,
    coalesce(inherited_all_members_groups_count, 0) as inherited_all_members_groups_count,
    coalesce(inherited_members_count, 0) as inherited_members_count,
    coalesce(inherited_groups_count, 0) as inherited_groups_count,
    coalesce(inherited_guests_count, 0) as inherited_guests_count
  from crosstab('
    
with recursive folders_path_to_root(start_id, id, parent_folder_id, organization_id) as (
  select
    f.id as start_id,
    f.id,
    f.parent_folder_id,
    f.organization_id
  from folders f
  where deleted_by_id is null

  union all

  select
    pf.start_id,
    f.id,
    f.parent_folder_id,
    f.organization_id
  from folders_path_to_root pf
    inner join folders f on (f.id = pf.parent_folder_id and f.parent_folder_id is not null)
),
folder_ids as (
    select f.id as id from folders f
    where not exists (select * from permissions_stats where reference_type = ''Folder'' and folder_id = f.id)
    order by f.id asc
    limit 1000
),
counts_cte as (
  select
    rp.start_id as folder_id,
    rp.organization_id as organization_id,
    case
      when u.user_type = ''Standard'' and om.guest then ''inherited_guests_count''
      when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''inherited_groups_count''
      when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''inherited_all_members_groups_count''
      else ''inherited_members_count''
    end as permit_type,
    count (u) as count
  from folder_permits fp
    inner join folders_path_to_root rp on (rp.id = fp.folder_id and rp.organization_id = fp.organization_id)
    inner join organization_memberships om on (fp.organization_membership_id = om.id)
    inner join users u on (u.id = om.user_id)
    left join groups g on (g.user_id = u.id and u.user_type = ''Group'')
  where rp.start_id in (select id from folder_ids)
    and rp.id not in (select id from folder_ids)
  group by permit_type, rp.start_id, rp.organization_id
)
select
  folder_id as folder_id,
  organization_id as organization_id,
  permit_type,
  count
from counts_cte


    union all

    
select
  f.id as folder_id,
  f.organization_id as organization_id,
  case
    when u.user_type = ''Standard'' and om.guest then ''guests_count''
    when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''groups_count''
    when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''all_members_groups_count''
    else ''members_count''
  end as permit_type,
  count (u) as count
from folders f
  left join folder_permits fp on (fp.folder_id = f.id)
  left join organization_memberships om on (fp.organization_membership_id = om.id)
  left join users u on (u.id = om.user_id and u.user_type in (''Standard'', ''Group''))
  left join groups g on (g.user_id = u.id and u.user_type = ''Group'')
where f.id in (select id from folder_ids)
group by permit_type, f.id, f.organization_id

  ',
  '
    values
      (''all_members_groups_count''),
      (''members_count''),
      (''guests_count''),
      (''groups_count''),
      (''inherited_all_members_groups_count''),
      (''inherited_members_count''),
      (''inherited_groups_count''),
      (''inherited_guests_count'')
  '
  ) as grouped_counts (
    "folder_id" uuid,
    "organization_id" uuid,
    "all_members_groups_count" int,
    "members_count" int,
    "guests_count" int,
    "groups_count" int,
    "inherited_all_members_groups_count" int,
    "inherited_members_count" int,
    "inherited_groups_count" int,
    "inherited_guests_count" int
  )
) as pivot
group by pivot.folder_id, pivot.organization_id
on conflict (reference_type, folder_id) where reference_type = 'Folder'
do update set
  all_members_groups_count = excluded.all_members_groups_count,
  members_count = excluded.members_count,
  groups_count = excluded.groups_count,
  guests_count = excluded.guests_count,
  inherited_all_members_groups_count = excluded.inherited_all_members_groups_count,
  inherited_members_count = excluded.inherited_members_count,
  inherited_groups_count = excluded.inherited_groups_count,
  inherited_guests_count = excluded.inherited_guests_count;

-- End of code
    
    COMMIT;

    GET DIAGNOSTICS affected_count = ROW_COUNT;
    
    select now() into end_time;
    elapsed_interval := end_time - start_time;
    
    RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval;
    IF affected_count = 0 THEN
      EXIT;
    END IF;
  END LOOP;
END; $$ 

The internal query works pefertly fine. But, when I execute the entire loop, I am getting the following error:

ERROR:  cannot begin/end transactions in PL/pgSQL
HINT:  Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT:  PL/pgSQL function inline_code_block line 168 at SQL statement

Now, digging a bit more into the docs and a bunch of other posts, I found that commit is not supported in plpgsql anonymous functions, as the function will execute with its own transaction.
It kind of makes sense. However, how come the first query works without any issues.

The main target of this process is to allow inserting/updating a lot of data in chunks with an option to stop in a middle and restart from the same point. Without the commit it will always start from the very beginning every time.

What am I missing?

Best Answer

I found that commit is not supported in plpgsql anonymous functions

Well. Your finding is historically correct. (But the terminology is not.) A DO command executes ...

an anonymous code block

Note the term "block", not "function". That's key here because a FUNCTION still cannot execute transaction control statements like COMMIT in Postgres 13 - and probably never will. But a PROCEDURE can, which was added with Postgres 11. And starting with the same version, anonymous blocks executed with a DO command can do that, too.

Related:

Did you run both tests with different versions of Postgres?