Postgresql – A ‘deeper’ PostgreSQL autovacuum

postgresqlpostgresql-8.3

We have a PostgreSQL 8.3.7 database suffering severe bloat after an algorithmic change. Unfortunately upgrading isn't an option at this time.

For a particular group of partitioned count roll-up tables, we used to update them by selecting then either inserting for new counts or updating existing counts. To avoid network saturation we switched instead to updating, checking for failed updates and then inserting.

I've read this is a bad scenario for PostgreSQL (at least circa 8.3.7) where the dead tuples from the updates are in the middle of the table rather than the end (as previously) and so are not being reclaimed by the autovacuum which works from the back of the table.

It seems to me that autovacuum_vacuum_cost_limit is the most likely setting I should change. Currently it is set to the default of 200 – perhaps I should start at 2,000 and go up from there?

I have a small window to make production changes to use trial and error, and no test database of equivalent size.

Best Answer

You have a couple options. The basic thing is that dead tuples in the middle of the table, as you have guessed are not reclaimed for the filesystem with autovacuum. Rather they are marked as free so that future inserts in the same table can use that space. The space isn't bloated permanently. It is available for re-use within the same table regarding future inserts or updates.

If you want to reclaim that space for the filesystem you have two options:

1) vacuum full. This locks things. It is usually not ideal but sometimes it is the best option. Also it takes a while.

2) cluster on an index. This rewrites the table into a new file. It requires a lot more disk space, but is also a lot faster. It does fully return free space back to the filesystem at the end however.

Both these require table locks. They are not concurrent safe. You don't want autovacuum deciding that now is a good time to lock tables for you so this is just the way it is.

Edit: This wasn't obvious from the question initially but from the comment reply it looks like the problem here is that the free space map settings may be insufficient. These settings were removed in 9.1 iirc but basically set the size of the free space map per table. What happens in this case is that the free space map is insufficient to hold all dead tuples. In this case tuples get deleted but the space can't be re-used.

When this happens bloat is the result and vacuum full or cluster can be necessary to recover space. As an intermediate solution barring these, you can significantly increase your free space map settings and re-run vacuum (regular, not full) to update the fsm. This won't recover space, but it will make more of the space available for re-use.

Related Solutions

PostgreSQL 8.3 – issues with autovacuum

I run some extremely busy 8.3 db servers. When I first started working on them, they were blowing out their free space map settings and going off the rails on a semi-weekly basis. The solution was to crank up the fsm settings, AND to make autovacuum far more aggressive.

autovacuum_vacuum_cost_delay was dropped to 0 or 1ms autovacuum_vacuum_cost_limit was raised to 5000 max_fsm_pages was raised to 2M to 10M depending on the machine max_fsm_relations was raised to 10k to 100k depending on the machine autovacuum_max_workers was raised to 5 or 10 depending on the machine

These machines all have fairly powerful IO subsystems (8 to 32 15K SAS drives with various HW RAID cards or SANs).

In short if someone thinks autovac in 8.3 is buggy and won't use it, they likely don't really understand it very well, and are behaving in a particular way based on superstition, not science.

Postgresql – Aggressive Autovacuum on PostgreSQL

Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.

For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).

First, turn on logging so you can check up on whether autovacuum is doing what you think it is:

log_autovacuum_min_duration = 0

Then let's make more autovac workers and have them check tables more often:

autovacuum_max_workers = 6
autovacuum_naptime = 15s

Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:

autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1

autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05

Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:

autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000

There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.

Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.

Again, though, it's unlikely to address your real problem.

Best Answer

Related Solutions

PostgreSQL 8.3 – issues with autovacuum

Postgresql – Aggressive Autovacuum on PostgreSQL

Related Question