When Does Autovacuum Analyze in PostgreSQL? – Execution Plan Insights

autovacuumexecution-planpostgresqltransaction

Let's suppose that a table foo became eligible for autovacuum analyze, e.g. I have inserted a number of rows into foo that exceeds autovacuum_analyze_scale_factor * number of rows + autovacuum_analyze_threshold effective threshold. I can't figure out which are the conditions that should be satisfied in order to have autovacuum performing the analyze on foo.

Will autovacuum analyze run for table foo only if there are no transactions? Or it can run even during active transactions with the conditions that: (1) no transaction is reading a previous version of table foo and (2) the autovacuum analyze can gain SHARE UPDATE EXCLUSIVE lock on table foo?

These questions are related to particular case. I peform a significant number of INSERTs into foo in one transaction and then some UPDATEs on foo in a second transaction. I need the autovacuum analyze to run before the 2nd transaction in order to have an up to date statistics of foo for my query planner better estimates. Are there a way to guarantee that the autovacuum analyze will run before the 2nd transaction? Maybe to sleep (I'm running both transactions from Java app) couple of milliseconds between transactions?

Best Answer

The formula is correct.

The number of live rows is taken from the reltuples column in pg_class, and the result of the formula is compared to n_mod_since_analyze from pg_stat_all_tables.

Note that autovacuum_analyze_scale_factor and autovacuum_analyze_threshold can be overridden by storage parameters on th table.

If all available autovacuum workers are running (autovacuum_max_workers), you may have to wait.

Analyze only takes an ACCESS SHARE lock on the table (it only reads), so it can run concurrently with almost anything. Like a SELECT statement, it will see the rows that are visible to it.

If you need a table to be analyzed between an INSERT and an UPDATE, don't wait for autoanalyze to run. Explicitly start ANALYZE on the table.

Related Solutions

PostgreSQL Optimization – ANALYZE Strategy for Big Tables

As already stated in the above comments there are some details hidden. I understand from your question that the query plan changes after an ANALYZE. This may indicate that the statistical data used by the query planner are not reflecting the real distribution of the data.

ANALYZE in any case takes only a sample - it does not investigate the whole table. This means tweaking the autovacuum_analyze_threshold makes only sense to me if the new rows would change the distribution in the whole table dramatically. This depends on your use case.

Much more important seems to me to adjust the size of the sample taken by ANALYE. You can influence the sample size for your table by setting the statistics target (unfortunately it is not mentioned in the question). In this blog post it is shown, how the statistics target influences the validity of the sample taken by ANALYZE.

PostgreSQL – Performance Characteristics of Different Autovacuum Types

In monitoring the autovacuum activity on my database, I've noticed that autovacuums tagged with (to prevent wraparound) seem to take longer than "regular" autovacuums.

That's to be expected. They have to do more work. These are essentially a vacuum freeze run by autovacuum.

Freezing tuples means marking them as visible by all current and future transactions by replacing their transaction ID with a special fixed value FrozenTransactionId (xid 3). Since the xmin for a tuple is stored in the tuple on disk, this means each tuple that's not already frozen and is a candidate for freezing needs to be modified. This often triggers a full-page write since the tuple is generally the first modification on the page since the last checkpoint, resulting in an 8k write for the first frozen tuple on each page. (This also bloats WAL and therefore PITR backups, replication streams, etc).

Freeze vacuums in PostgreSQL 9.6 and below cannot skip parts of the table; they must scan the whole table each time, even if only a little of it is a candiate for freezing.

Unless you tune autovacuum to freeze more aggressively, it tends to leave forced freezes quite late and then do a LOT of work at once. This is efficient since we don't scan the table lots of times, but expensive since we do a lot of work each time we do process it.

When vacuum isn't freezing tuples it's still doing much of the same work. But it's usually touching much smaller numbers of pages on each pass since it only has to care about tuples that have a not-for-locking-only xmax set by a since-committed transaction. The pages with these tuples are also MUCH more likely to be in cache since they're recently deleted, and any multixact data for them is more likely to be in cache too.

You can make freeze vacuums faster by:

Lowering autovacuum_vacuum_cost_delay - this will make autovacuum use more resources and put more load on the system, but run faster.
Lowering autovacuum_freeze_max_age so they run more often and do less work each time. This is overall less efficient, since the whole table must be scanned each time, evicting pages from the OS buffer cache as we read the table. (Pg uses a ring-buffer for seq scans, but the OS generally won't). But it means we do fewer writes for each pass and each pass takes less time.
Doing COPY with the FREEZE option to bulk-load a table inside the same transaction that TRUNCATEd it or CREATEd it. Tuples loaded this way start off pre-frozen. (See the manual on COPY for details).
Manually freezing tables after bulk-loading them, so you pay the freeze cost at a time you control.

Personally I'd like normal VACUUMs (and autovacuums) to do more opportunistic freezing; in particular, if we've just touched the item pointer array on a page to mark some tuples as free space, we should probably scan for freeze-able tuples while the page is dirty. This would play very well with the new freeze map code in PostgreSQL 10.

There are some big improvements in freezing in PostgreSQL 9.6 that reduce the impact of repeated vacuums on tables and make freezing much cheaper. Due to the introduction of the freeze map functionality in the visibility map we can avoid scanning the whole table, skipping over frozen pages. See commits a892234 and fd31cd265.

commit fd31cd265138019dcccc9b5fe53043670898bc9f
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Mar 10 16:12:10 2016 -0500

    Don't vacuum all-frozen pages.

    Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough
    infrastructure to avoid vacuuming pages where every tuple on the
    page is already frozen.  So, replace the notion of a scan_all or
    whole-table vacuum with the less onerous notion of an "aggressive"
    vacuum, which will pages that are all-visible, but still skip those
    that are all-frozen.

    This should greatly reduce the cost of anti-wraparound vacuuming
    on large clusters where the majority of data is never touched
    between one cycle and the next, because we'll no longer have to
    read all of those pages only to find out that we don't need to
    do anything with them.

    Patch by me, reviewed by Masahiko Sawada.

I think there is also work in progress for PostgreSQL 10 to entirely avoid the need to modify tuples on disk to mark them as frozen, but can't find the relevant mailing list thread atm.

Best Answer

Related Solutions

PostgreSQL Optimization – ANALYZE Strategy for Big Tables

PostgreSQL – Performance Characteristics of Different Autovacuum Types

Related Question