Postgresql – how to call lo_unlink(lo) in batches

batch processinghibernatepostgresql-9.5postgresql-performance

In Postgres 9.5 I am tying to reproduce the postgres tool vacuumlo to get rid of unreferenced large object to free disk space.
References are stored in text type columns by hibernate so using the tool as is is not an option.
Looking at vacuumlo sources ( vacuumlo.c) I have created a vacuum_l table like so:

CREATE TABLE vacuum_l AS SELECT oid AS lo, false as deleted FROM pg_largeobject_metadata;

and proceded to delete all rows from vacuum_l table wich oids are referenced by user tables in my database.
Now is safe to call unlink(lo) for all items remaining in the vacuum_l table.

Unfortunately the table is very big and, as reported in the vacuumlo.c source code:

We don't want to run each delete as an individual transaction, because the commit overhead would be high. However, since 9.0 the backend will acquire a lock per deleted LO, so deleting too many LOs per transaction risks running out of room in the shared-memory lock table. Accordingly, we delete up to transaction_limit LOs per transaction.

So far I tried to unlink all oids with this sql function:

CREATE OR REPLACE FUNCTION unlinkOrphanedLobs() returns void AS
$BODY$
DECLARE
count_variable int;
begin
    loop
        SELECT COUNT(*) INTO count_variable FROM vacuum_l WHERE deleted=false;
        EXIT WHEN count_variable <= 0;
        UPDATE vacuum_l SET deleted = true WHERE lo IN (SELECT lo FROM vacuum_l where deleted=false limit 500);
        PERFORM lo_unlink(lo) FROM vacuum_l where deleted=true;
        delete from vacuum_l where deleted=true;
    end loop;
END
$BODY$
LANGUAGE plpgsql;

however this end up with the error:

ERROR: out of shared memory

HINT: You might need to increase max_locks_per_transaction.

Why, is clearly explained by the code comments reported above.
This function wont close the transaction at the end of the loop.

How would it be the correct way to do it?

Thanks for any help.

Best Answer

Only solution so far is via bash script:

#!/bin/bash
user=myuser
host=127.0.0.1
database=mydb
batch_size=10000

elements=$( psql -X -A -U $user -h $host -d $database -t -c "SELECT COUNT (*) FROM vacuum_l where deleted = false" )
echo "los to unlink:"
echo $elements
echo
num_batches=$((elements / batch_size))
num_batches=$((num_batches + 1))
echo "Batches:"
echo $num_batches
echo
for ((i = 1; i <= $num_batches; i++)); do
    echo "Batch "$i
    psql -U $user -h $host -d $database -c "UPDATE vacuum_l SET deleted = true, lo = lo_unlink(lo) WHERE lo IN (SELECT lo FROM vacuum_l where deleted=false limit $batch_size )"
done