PostgreSQL – Performance Characteristics of Different Autovacuum Types

autovacuumpostgresql

In monitoring the autovacuum activity on my database, I've noticed that autovacuums tagged with (to prevent wraparound) seem to take longer than "regular" autovacuums. That got me started trying to understand the differences in performance/behavior of the various "kinds" or autovacuums.

From the docs I've identified a few different reasons why autovacuum might VACUUM a table:

The table has old transactions that need to be frozen (relfrozenxid > min(autovacuum_freeze_max_age, table.autovacuum_freeze_max_age). This is the "(to prevent wraparound)" autovacuum.
The number of tuples obsoleted in the table since the last VACUUM exceeds the "vacuum threshold"

Are there any others that I'm missing?

The "aggressiveness" knobs that seem to be exposed for a manual VACUUM are FREEZE and DISABLE_PAGE_SKIPPING. Do these different kinds of autovacuum ever use these options, and if so, when?

Best Answer

In monitoring the autovacuum activity on my database, I've noticed that autovacuums tagged with (to prevent wraparound) seem to take longer than "regular" autovacuums.

That's to be expected. They have to do more work. These are essentially a vacuum freeze run by autovacuum.

Freezing tuples means marking them as visible by all current and future transactions by replacing their transaction ID with a special fixed value FrozenTransactionId (xid 3). Since the xmin for a tuple is stored in the tuple on disk, this means each tuple that's not already frozen and is a candidate for freezing needs to be modified. This often triggers a full-page write since the tuple is generally the first modification on the page since the last checkpoint, resulting in an 8k write for the first frozen tuple on each page. (This also bloats WAL and therefore PITR backups, replication streams, etc).

Freeze vacuums in PostgreSQL 9.6 and below cannot skip parts of the table; they must scan the whole table each time, even if only a little of it is a candiate for freezing.

Unless you tune autovacuum to freeze more aggressively, it tends to leave forced freezes quite late and then do a LOT of work at once. This is efficient since we don't scan the table lots of times, but expensive since we do a lot of work each time we do process it.

When vacuum isn't freezing tuples it's still doing much of the same work. But it's usually touching much smaller numbers of pages on each pass since it only has to care about tuples that have a not-for-locking-only xmax set by a since-committed transaction. The pages with these tuples are also MUCH more likely to be in cache since they're recently deleted, and any multixact data for them is more likely to be in cache too.

You can make freeze vacuums faster by:

Lowering autovacuum_vacuum_cost_delay - this will make autovacuum use more resources and put more load on the system, but run faster.
Lowering autovacuum_freeze_max_age so they run more often and do less work each time. This is overall less efficient, since the whole table must be scanned each time, evicting pages from the OS buffer cache as we read the table. (Pg uses a ring-buffer for seq scans, but the OS generally won't). But it means we do fewer writes for each pass and each pass takes less time.
Doing COPY with the FREEZE option to bulk-load a table inside the same transaction that TRUNCATEd it or CREATEd it. Tuples loaded this way start off pre-frozen. (See the manual on COPY for details).
Manually freezing tables after bulk-loading them, so you pay the freeze cost at a time you control.

Personally I'd like normal VACUUMs (and autovacuums) to do more opportunistic freezing; in particular, if we've just touched the item pointer array on a page to mark some tuples as free space, we should probably scan for freeze-able tuples while the page is dirty. This would play very well with the new freeze map code in PostgreSQL 10.

There are some big improvements in freezing in PostgreSQL 9.6 that reduce the impact of repeated vacuums on tables and make freezing much cheaper. Due to the introduction of the freeze map functionality in the visibility map we can avoid scanning the whole table, skipping over frozen pages. See commits a892234 and fd31cd265.

commit fd31cd265138019dcccc9b5fe53043670898bc9f
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Mar 10 16:12:10 2016 -0500

    Don't vacuum all-frozen pages.

    Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough
    infrastructure to avoid vacuuming pages where every tuple on the
    page is already frozen.  So, replace the notion of a scan_all or
    whole-table vacuum with the less onerous notion of an "aggressive"
    vacuum, which will pages that are all-visible, but still skip those
    that are all-frozen.

    This should greatly reduce the cost of anti-wraparound vacuuming
    on large clusters where the majority of data is never touched
    between one cycle and the next, because we'll no longer have to
    read all of those pages only to find out that we don't need to
    do anything with them.

    Patch by me, reviewed by Masahiko Sawada.

I think there is also work in progress for PostgreSQL 10 to entirely avoid the need to modify tuples on disk to mark them as frozen, but can't find the relevant mailing list thread atm.

Related Solutions

Postgresql – Aggressive Autovacuum on PostgreSQL

Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.

For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).

First, turn on logging so you can check up on whether autovacuum is doing what you think it is:

log_autovacuum_min_duration = 0

Then let's make more autovac workers and have them check tables more often:

autovacuum_max_workers = 6
autovacuum_naptime = 15s

Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:

autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1

autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05

Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:

autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000

There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.

Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.

Again, though, it's unlikely to address your real problem.

PostgreSQL Vacuum – Avoid Vacuum to Prevent Wraparound on Read-Only Databases

It is really the frozen map (introduced in version 9.6), not the related visibility map, which will help you here. The table counters will still increase and the tables will still get vacuumed for wraparound, but that vacuuming will now be very fast as it will just skip to the end and be done.

For your weekly maintenance task, you could do a VACUUM FREEZE, not ordinary VACUUM, for the to-be-retired database. That way they won't need to be visited again in the future, except through the perfunctory vacuum facilitated by the frozen map.

If you need to buy yourself some time for that change to propagate through the system, you could increase autovacuum_freeze_max_age. That should make the autovac a bit more docile while you slowly do VACUUM FREEZE on those old databases at off-peak times of your choosing. But don't just increase autovacuum_freeze_max_age and then forget about it, or when the new age is met you will be in an even deeper hole than you are now.

If you need to figure out which databases still need manual freezing, you can use pg_visibility_map_summary.

Best Answer

Related Solutions

Postgresql – Aggressive Autovacuum on PostgreSQL

PostgreSQL Vacuum – Avoid Vacuum to Prevent Wraparound on Read-Only Databases

Related Question