PostgreSQL – How Much Time Will a Vacuum/Autovacuum Operation Take?

postgresqlvacuum

I manage a big (some hundreds of gigs) database containing tables with various roles, some of them holding millions of records. Some tables only receive large number of inserts and deletes, some other few inserts and large number of updates.

Database runs on PostgreSQL 8.4 on a Debian 6.0 amd64 system with 16 gigabytes of RAM.

The question is sometimes autovacuum process on a table, takes a very long time (days) to complete. I want to be able to roughly tell how much time a particular vacuum command will take, to be able to decide whether to cancel it or not. Also if there were a progress indicator for postgres vacuum operations, it would be really helpful.

Edit:

I'm not looking for a bullet-proof solution. Just a rough hint on the number of dead tuples or necessary I/O bytes is enough to decide. It is really annoying to have no clue when VACUUM will finish, whatsoever.

I've seen that pg_catalog.pg_stat_all_tables has a column for number of dead tuples. So it is possible to have an estimation, even if it means one has to ANALYZE the table before. On the other hand, autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor settings alone prove that postgres itself knows something about the amount of change on the tables and probably puts it in the hands of the DBA too.

I'm not sure what query to run, because when I run VACUUM VERBOSE, I see that not only tables, but indexes on them are being processed too.

Best Answer

On my PostgreSQL (8.3) I use this trick:

I get table's disk size using pg_total_relation_size() - this includes indexes and TOAST size, which is what VACUUM processes. This gives me the idea of how many bytes the VACUUM has to read.
I run VACUUM on the table.
I find the pid of the VACUUM process (in pg_catalog.pg_stat_activity).
In Linux shell I run while true; do cat /proc/123/io | grep read_bytes; sleep 60; done (where 123 is the pid) - this shows me bytes read by the process from the disk so far.

This gives me rough idea on how many bytes are processed (read) every minute by the VACUUM. I presume that the VACUUM must read through the whole table (including indexes and TOAST), whose disk size I know from the step 1.

I presume that the table is large enough so that the majority of it's pages must be read from disk (they are not present in Postgres shared memory), so the read_bytes field is good enough to be used as a progress counter.

Everytime I did this, the total bytes read by the process was no more than 5% from the total relation size, so I guess this approach may be good enough for You.

Related Solutions

Postgresql – Aggressive Autovacuum on PostgreSQL

Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.

For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).

First, turn on logging so you can check up on whether autovacuum is doing what you think it is:

log_autovacuum_min_duration = 0

Then let's make more autovac workers and have them check tables more often:

autovacuum_max_workers = 6
autovacuum_naptime = 15s

Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:

autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1

autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05

Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:

autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000

There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.

Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.

Again, though, it's unlikely to address your real problem.

PostgreSQL – Why VACUUM ANALYZE Doesn’t Clear All Dead Tuples

VACUUM can only remove dead tuples which are long-dead, that is, dead to all possible uses. If you have long-lived transactions, they may prevent the recently-dead tuples from being removed.

This is an example of a situation where a long-lived transaction prevented removal:

INFO:  "pgbench_accounts": found 0 removable, 2999042 nonremovable row versions in 49181 out of 163935 pages
DETAIL:  2999000 dead row versions cannot be removed yet.

It is not really long-lived transactions, but long lived snapshots. Certainly a long running select or insert statement will do that. For isolation levels higher than read-committed, the whole transaction will retain the snapshot until it is down, so if some opens a repeatable read transaction and then goes on vacation without committing it, that would be a problem. Hung-up prepared transactions will as well (if you don't know what a prepared transaction is, then you probably aren't using them).

The examples you show don't indicate a problem, but you also say the problem had resolved by then. If this is a recurring problem, you should probably start logging the output of your VACUUM VERBOSE statements, so that you can find the information that covers the period during which the problem exists.

The multiple passes over the index are because of your maintenance_work_mem settings. It can only remove one tuple for every 6 bytes of memory on each pass over the index, and needs to make multiple passes if you need to remove more than that. So increasing maintenance_work_mem will help.

Best Answer

Related Solutions

Postgresql – Aggressive Autovacuum on PostgreSQL

PostgreSQL – Why VACUUM ANALYZE Doesn’t Clear All Dead Tuples

Related Question