Postgresql – Are regular VACUUM ANALYZE still recommended under 9.1

etlpostgresqlvacuum

I'm using PostgreSQL 9.1 on Ubuntu. Are scheduled VACUUM ANALYZE still recommended, or is autovacuum enough to take care of all needs?

If the answer is "it depends", then:

  • I have a largish database (30 GiB compressed dump size, 200 GiB data directory)
  • I do ETL into the database, importing close to 3 million rows per week
  • The tables with the most frequent changes are all inherited from a master table, with no data in the master table (data is partitioned by week)
  • I create hourly rollups, and from there, daily, weekly and monthly reports

I'm asking because the scheduled VACUUM ANALYZE is impacting my reporting. It runs for more than 5 hours, and I've had to kill it twice this week, because it was impacting regular database imports. check_postgres doesn't report any significant bloat on the database, so that's not really an issue.

From the docs, autovacuum should take care of transaction ID wrap around as well. The question stands: do I still need a VACUUM ANALYZE?

Best Answer

VACUUM is only needed on updated or deleted rows in non-temporary tables. Obviously you're doing lots of INSERTs but it's not obvious from the description that you're also doing lots of UPDATEs or DELETEs.

These operations can be tracked with the pg_stat_all_tables view, specifically the n_tup_upd and n_tup_del columns. Also, even more to the point, there is a n_dead_tup column that tells, per table, how much rows need to be vacuumed. (see Monitoring statistics in the doc for functions and views related to statistics gathering).

A possible strategy in your case would be to suppress the scheduled VACUUM, keeping an eye on this view and checking on which tables the n_dead_tup is going up significantly. Then apply the aggressive VACUUM to these tables only. This will be a win if there are large tables whose rows never get deleted nor updated and the aggressive VACUUM is really necessary only on smaller tables.

But keep running the ANALYZE for the optimizer to always have fresh statistics.