Does AWS Ever VACUUMLO PostgreSQL RDS Instances?

amazon-rdsblobpostgresqlvacuum

Link to the long story happened before (and reason to have) this question.

Does AWS ever vacuumlo PostgreSQL RDS instances?

Best Answer

Probably not, because blindly executing vacuumlo on random databases would be foolish.

Unlike vacuum that is pretty much unavoidable on a normal live database, vacuumlo fixes an oversight, a situation that should not arise in the first place, since applications should unlink the large objects that they no longer use, at about the same time that they delete the references to these large objects.

In a database with a large number of large objects that are properly handled, vacuumlo would burn CPU cycles and I/O at each invocation, just to compute every single time that there is nothing to remove.

Moreover, there are two assumptions that vacuumlo must do that are debatable:

the fact that any column of type oid should be taken as a candidate to a large object reference. oid are used for other purposes, especially if you want to refer to objects in the catalog.
the fact that, if a large object doesn't have it oid refered in some table in the same database , it should be removed. If the reference is in another database, or in a file, or in the same database but embedded in a text column, that logic doesn't work and is in fact dangerous.

1. Temporary table construction.

Prepare the temporary table of object references or OIDs.

1.1. Temporary lob table as copy of full lob table.

=> SET search_path = pg_catalog;
[...]
=> CREATE TABLE vacuum_lo_removeme AS \
        SELECT oid AS lo FROM pg_largeobject_metadata;
[...]
=> ANALYZE vacuum_lo_removeme;
[...]
=> _

1.2. Limit temporary lob table.

Perform the query that returns you all your data base columns typed OID:

=> SELECT
     s.nspname, c.relname, a.attname FROM pg_class c, pg_attribute a
     , pg_namespace s, pg_type t
   WHERE
     a.attnum > 0 AND NOT a.attisdropped AND a.attrelid = c.oid
     AND a.atttypid = t.oid AND c.relnamespace = s.oid 
     AND t.typname in ('oid', 'lo') AND c.relkind in ('r','m')
     AND s.nspname !~ '^pg_';

Next you have to execute this query for all results fetched by the earlier query (note ${VARIABLE} is something you should substitute yourself according to your mileage) in order to remove from the temporary table all OIDs of objects actually in use:

=> DELETE FROM vacuum_lo_removeme WHERE \
        lo IN (SELECT ${column} FROM ${SCHEMA}.${TABLE});

In my case it was only two tables totallying five columns, and actually both tables empty, so naturally the five DELETE queries did not do a thing. If you have a bigger OID enabled subschema you might need to automate this somehow.

2. Large objects unlinking.

Finally the program declares a cursor that iterates the remaining lo cells of the temporary table, purging them with a lo_unlink function call.

2.A. Do not do it this way.

I should have automated that with a PLPGSQL stored procedure, but since I suck at that kind of tasks, I just issued this one liner:

$ echo 'SELECT COUNT(*) FROM vacuum_lo_removeme;' | \
        $MY_AUTOMATABLE_PSQL_MILEAGE 
  count  
---------
 1117233
(1 row)

Then this other that iterates selecting the first orphan OID in the temporary table of orphan OIDs then unlinking it and removing it from the table:

$ for i in {1..1117000}; do \
        export oid=$(echo 'SELECT * FROM vacuum_lo_removeme LIMIT 1' | \
        $MY_AUTOMATABLE_PSQL_MILEAGE | grep -v 'lo\|\-\-\-\-\|row\|^$' | \
        sed s/\ //g) && \
        echo "SELECT lo_unlink($oid); \
              DELETE FROM vacuum_lo_removeme WHERE lo = $oid" | \
              $MY_AUTOMATABLE_PSQL_MILEAGE; \
        done
 lo_unlink 
-----------
         1
(1 row)

DELETE 1
 lo_unlink 
-----------
         1
(1 row)

DELETE 1
 lo_unlink 
-----------
         1
(1 row)

DELETE 1
 lo_unlink 
-----------
         1
(1 row)

DELETE 1
ERROR:  must be owner of large object 18448
DELETE 1
ERROR:  must be owner of large object 18449
DELETE 1
ERROR:  must be owner of large object 18450
DELETE 1
ERROR:  must be owner of large object 18451
[...]
 lo_unlink 
-----------
         1
(1 row)

DELETE 1
[...]

I knew it was suboptimized as hell, but I let it slowly removing those orphan records. When not being DBA at all those one liners can be easier to forge than working some meaningful idiomatic PLPGSQL.

But this was too slow to leave it like this.

2.B. Do this better than 2.A (though not a silver bullet yet).

You will be able to speed up large object unlinking with something simple such as:

=> CREATE OR REPLACE FUNCTION unlink_orphan_los() RETURNS VOID AS $$
DECLARE
  iterator integer := 0;
  largeoid OID;
  myportal CURSOR FOR SELECT lo FROM vacuum_lo_removeme;
BEGIN
  OPEN myportal;
  LOOP
    FETCH myportal INTO largeoid;
    EXIT WHEN NOT FOUND;
    PERFORM lo_unlink(largeoid);
    DELETE FROM vacuum_lo_removeme WHERE lo = largeoid;
    iterator := iterator + 1;
    RAISE NOTICE '(%) removed lo %?', iterator, largeoid;
    IF iterator = 100 THEN EXIT; END IF;
  END LOOP;
END;$$LANGUAGE plpgsql;

_{NOTE it is not required to unlink large objects 100 at a time, not even a particular number x of them at a time, but unlinking 100 at a time is the safest base applicable to all AWS instance sizes' default memory configuration. If you use an excessively large number for this, you risk function failure because of insufficiently assigned memory; how you can do will depend on the amount of objects to unlink and their size, on instance type and size, and on degree of manual further configuration applied.}

Which is somewhat easy to forge for a non DBA person, then calling it with something like

$ for i in {0..$WHATEVER}; do echo 'SELECT unlink_orphan_los()' | \
        $YOUR_AUTOMATABLE_PSQL_MILEAGE

Where ${WHATEVER} is a constructed constant depending on the temporary large objects' table size and the number of locks per transaction your configuration is allowing (I am using RDS defaults but iterating from bash I guess I don't even have to get to know which is the largest number of lo_unlinks the RDBMS is allowing with current max_locks_per_transaction.

3. `VACUUM` large object tables.

Hinted by this thread in the postgres mailing list I understood I should VACUUM pg_largeobject after unlinking the objects.

I am not sure which of these are the minimum required set, or the proper time for their execution, but some of them might be of help, and none should cause any damage: I ran VACUUM ANALYZE VERBOSE pg_largeobject_metadata; VACUUM ANALYZE VERBOSE pg_largeobject; VACUUM FULL ANALYZE VERBOSE pg_largeobject_metadata; VACUUM FULL ANALYZE VERBOSE pg_largeobject; several times while the micro instance was unlinking the objects (took a really large time, alas), firstly when about 1/4 objects already got unlinked, and some little storage was given back to OS, secondly when about 1/3 objects already got unlinked, and some another little storage was given back to OS, thirdly when about 3/5 objects already got unlinked the instance underwent a massive storage giveback to OS:

That massive storage giveback was what I was looking for. After running the query for largest tables found at the official home page, with less than 3/4 total objects unlinked, the objects table got shrunk to less than 3GiB, far from the initial and even more bloated 20GiB.

_{NOTE automating fast VACUUM ANALYZE iteration on tables has the same long term effects (of cleaning the table from using excess disk storage) as executing a single VACUUM FULL, but without acquiring an exclusive lock.}

Postgresql – Moving a PostgreSQL database to a different AWS RDS instance

You can safely replicate the whole instance using DMS, then remove the unnecessary DBs on both resulting clusters. After this, you will most possibly have excessive storage on the new instance - I think (but never tried) you can just do another such migration to a fitting (smaller) instance type.

Best Answer

Related Solutions

How to Use `vacuumlo` on an RDS PostgreSQL Database