Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.
For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).
First, turn on logging so you can check up on whether autovacuum is doing what you think it is:
log_autovacuum_min_duration = 0
Then let's make more autovac workers and have them check tables more often:
autovacuum_max_workers = 6
autovacuum_naptime = 15s
Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:
autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05
Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:
autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000
There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.
Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.
Again, though, it's unlikely to address your real problem.
VACUUM can only remove dead tuples which are long-dead, that is, dead to all possible uses. If you have long-lived transactions, they may prevent the recently-dead tuples from being removed.
This is an example of a situation where a long-lived transaction prevented removal:
INFO: "pgbench_accounts": found 0 removable, 2999042 nonremovable row versions in 49181 out of 163935 pages
DETAIL: 2999000 dead row versions cannot be removed yet.
It is not really long-lived transactions, but long lived snapshots. Certainly a long running select or insert statement will do that. For isolation levels higher than read-committed, the whole transaction will retain the snapshot until it is down, so if some opens a repeatable read transaction and then goes on vacation without committing it, that would be a problem. Hung-up prepared transactions will as well (if you don't know what a prepared transaction is, then you probably aren't using them).
The examples you show don't indicate a problem, but you also say the problem had resolved by then. If this is a recurring problem, you should probably start logging the output of your VACUUM VERBOSE statements, so that you can find the information that covers the period during which the problem exists.
The multiple passes over the index are because of your maintenance_work_mem settings. It can only remove one tuple for every 6 bytes of memory on each pass over the index, and needs to make multiple passes if you need to remove more than that. So increasing maintenance_work_mem will help.
Best Answer
This nomenclature all comes down to knowing about MVCC, or Multi-Version Concurrency Control, which is how PostgreSQL handles transactions and concurrency.
When you update or delete a row, doesn't actually remove the row. When you do an
UPDATE
orDELETE
, the row isn't actually physically deleted. For aDELETE
, the database simply marks the row as unavailable for future transactions, and forUPDATE
, under the hood it's a combinedINSERT
thenDELETE
, where the previous version of the row is marked unavailable. These new versions of rows are generally referred to as the "live" rows, and the older versions are referred to as "dead" rows.The statistics that you are looking at, where it shows
213,324,422 inserted tuples
are the number of new data inserts that has been done on that table. The124,510,280 live tuples
are the number of tuples that are up to date in the table and available to be read or modified in transactions. The3,087,919 dead tuples
are the number of tuples that have been changed and are unavailable to be used in future transactions. The autovacuum daemon, or a manual vacuum will eventually come along and mark the space of those "dead" tuples available for future use, which means that newINSERTS
can overwrite the data in them.Side note, since these are just statistics that are gathered by the statistics collector, they aren't going to be exact, but they're good enough to give you an idea of how active the table is, and how much bloat there is (live vs dead tuples), and how well the autovacuum daemon is keeping up with your workload.
Hopefully this makes it clearer. You can read more about this in detail in the PostgreSQL Manual in the chapter on Concurrency Control