Postgresql – A ‘deeper’ PostgreSQL autovacuum

postgresqlpostgresql-8.3

We have a PostgreSQL 8.3.7 database suffering severe bloat after an algorithmic change. Unfortunately upgrading isn't an option at this time.

For a particular group of partitioned count roll-up tables, we used to update them by selecting then either inserting for new counts or updating existing counts. To avoid network saturation we switched instead to updating, checking for failed updates and then inserting.

I've read this is a bad scenario for PostgreSQL (at least circa 8.3.7) where the dead tuples from the updates are in the middle of the table rather than the end (as previously) and so are not being reclaimed by the autovacuum which works from the back of the table.

It seems to me that autovacuum_vacuum_cost_limit is the most likely setting I should change. Currently it is set to the default of 200 – perhaps I should start at 2,000 and go up from there?

I have a small window to make production changes to use trial and error, and no test database of equivalent size.

Best Answer

You have a couple options. The basic thing is that dead tuples in the middle of the table, as you have guessed are not reclaimed for the filesystem with autovacuum. Rather they are marked as free so that future inserts in the same table can use that space. The space isn't bloated permanently. It is available for re-use within the same table regarding future inserts or updates.

If you want to reclaim that space for the filesystem you have two options:

1) vacuum full. This locks things. It is usually not ideal but sometimes it is the best option. Also it takes a while.

2) cluster on an index. This rewrites the table into a new file. It requires a lot more disk space, but is also a lot faster. It does fully return free space back to the filesystem at the end however.

Both these require table locks. They are not concurrent safe. You don't want autovacuum deciding that now is a good time to lock tables for you so this is just the way it is.

Edit: This wasn't obvious from the question initially but from the comment reply it looks like the problem here is that the free space map settings may be insufficient. These settings were removed in 9.1 iirc but basically set the size of the free space map per table. What happens in this case is that the free space map is insufficient to hold all dead tuples. In this case tuples get deleted but the space can't be re-used.

When this happens bloat is the result and vacuum full or cluster can be necessary to recover space. As an intermediate solution barring these, you can significantly increase your free space map settings and re-run vacuum (regular, not full) to update the fsm. This won't recover space, but it will make more of the space available for re-use.