I manage a big (some hundreds of gigs) database containing tables with various roles, some of them holding millions of records. Some tables only receive large number of inserts and deletes, some other few inserts and large number of updates.
Database runs on PostgreSQL 8.4 on a Debian 6.0 amd64 system with 16 gigabytes of RAM.
The question is sometimes autovacuum process on a table, takes a very long time (days) to complete. I want to be able to roughly tell how much time a particular vacuum command will take, to be able to decide whether to cancel it or not. Also if there were a progress indicator for postgres vacuum operations, it would be really helpful.
Edit:
I'm not looking for a bullet-proof solution. Just a rough hint on the number of dead tuples or necessary I/O bytes is enough to decide. It is really annoying to have no clue when VACUUM
will finish, whatsoever.
I've seen that pg_catalog.pg_stat_all_tables
has a column for number of dead tuples. So it is possible to have an estimation, even if it means one has to ANALYZE
the table before. On the other hand, autovacuum_vacuum_threshold
and autovacuum_vacuum_scale_factor
settings alone prove that postgres itself knows something about the amount of change on the tables and probably puts it in the hands of the DBA too.
I'm not sure what query to run, because when I run VACUUM VERBOSE
, I see that not only tables, but indexes on them are being processed too.
Best Answer
On my PostgreSQL (8.3) I use this trick:
pg_total_relation_size()
- this includes indexes and TOAST size, which is whatVACUUM
processes. This gives me the idea of how many bytes theVACUUM
has to read.VACUUM
on the table.pid
of theVACUUM
process (inpg_catalog.pg_stat_activity
).while true; do cat /proc/123/io | grep read_bytes; sleep 60; done
(where123
is the pid) - this shows me bytes read by the process from the disk so far.This gives me rough idea on how many bytes are processed (read) every minute by the
VACUUM
. I presume that theVACUUM
must read through the whole table (including indexes and TOAST), whose disk size I know from the step 1.I presume that the table is large enough so that the majority of it's pages must be read from disk (they are not present in Postgres shared memory), so the
read_bytes
field is good enough to be used as a progress counter.Everytime I did this, the total bytes read by the process was no more than 5% from the total relation size, so I guess this approach may be good enough for You.