Free Space in pg_largeobject Not Being Reused

postgresql

postgres noobie, PG 9.4 (please don't say I need to update PG, thx)

Database has huge pg_largeobject table (~450GB, 169.5M rows, lots of DML leading to ~95GB/~22% free space – all per pgstattuple.

I would expect ALL new inserts and updates to pg_largeobject to create ZERO new space in table due to the massive amount of free space.

Yet table size continues to grow, including space added to the last physical file in the chain for the table. File /data/base/31130/302605.429 increased in size from 574,005,248 to 625,180,672 (51MB) in 4 hours yesterday.

What am I missing? Is there something I need to do to make the vacuum'd free space in the table be reused? We are struggling with free disk space, so this is quite important to sort out. Note that vacuum full is not an option either.

Thanks for the help!

UPDATE 1:
pg_freespace('pg_largeobject') returns 56.3M rows. I did an aggregate avail, count(*) and found avail all over the place, but the vast majority of the rows were < 2000. The 3 BIG count values were:

Avail rowcount
1856 48799038
1888 6601315
3968 141702

UPDATE 2:
I sadly have very little info (nor does anyone at the client) regarding usage. I "think" blob documents are placed in here. I have corelated the vast majority of pg_largeobject loid values to INTEGER values in rows in the documents table and claimnotes table.

The research I did all seemed to mention OID's being the linkage from pg_largeobject back to actual rows in tables (including tools like vacuumlo, the lo maintenance trigger, etc). But that didn't seem to be the way the application developers handled their blobs. I would love to try to look for "completely unreferenced" rows so I could forcibly delete them (via vacuumlo?), but being a noob I can't figure out how to exhaustively look for parent rows.

FOLLOW UP QUESTION 1:
Can someone help me with a query that can exhaustively search every table in the database for rows that might be associated with the pg_largeobject table rows?

FOLLOW UP QUESTION 2:

I do know that around 100K to 150K rows per day become dead, and vacuum will delete them to free up the space. Is that correct?!?

FOLLOW UP QUESTION 3:

a) Blobs are sharded to 2K chunks and then inserted into pg_largeobject. Can those 2K chunks be placed in any EXISTING 8K page in pg_largeobject that has at least 2K of free space (which the about 150K pages do)?

b) And given the 2K chunk size, how did pg_largeobject wind up with 95ish% of the table having just 1856 or 1888 bytes free?? Is that due to row/page overhead taking away some of the 8192 bytes on the page?

Best Answer

This might be due to the inefficient packing of large object.

Large objects are chunked before compression into slices of 2048 bytes (in fact pagesize/4, but normally pagesize=8192). The slices get compressed individually (if applicable) and stored in pg_largeobject.data

In a 8192 byte page, only 3 slices of 2048 bytes can fit. The rest of the space is occupied by the other columns loid and pageno, plus the page header (24 bytes), plus the tuple headers (27 bytes each), see Database Page Layout for the details. The rest of the page consists of free space (that would be about 1900 bytes if I'm counting right).

22% free space seems about right, without any bloat due to MVCC that could be reclaimed by a VACUUM FULL.

When is that free space going to be used? If you rarely insert large objects slices that are smaller than these 1900 bytes, it's quite possible that there is no opportunity to use it.


As for the space that is getting freed and gets potentially reusable when large objects are removed (through lo_unlink), it works for pg_largeobject as with any other table. The MVCC lifecycle of the tuples is essentially:

  1. the tuples with a given loid are live
  2. when a lo_unlink of the large object occurs: an implicit delete from pg_largeobject where loid=<large object id> causes the tuples to be marked as "old".
  3. later, VACUUM (automatic or manual) sees that the "old" tuples are no longer visible to any live transaction, and marks their space as free space.
  4. later, when new tuples are inserted, they may use that space if they fit.