Vacuum on large table sometimes takes very long

postgresqlpostgresql-performancevacuum

In a PostgreSQL 11 database server I have a large table (932 million rows, 2150GB, not partitioned) into which I insert about 2-3 million rows every night with a batch job. The table was created with autovacuum_enabled=FALSE. After the insert I perform a VACUUM (ANALYZE) on the table, which usually takes about 30-40 seconds. However, once every few weeks this vacuum takes very long, such as today when it took 2:11 h:mm.

I am now considering doing a vacuum ahead of my nightly job so as to avoid these long run times during the job run. I had a look at the options described in the PostgreSQL 11 vacuum documentation page. I know that a VACUUM (FULL) is very expensive in terms of runtime and thus is probably out of the question. Will a VACUUM (FREEZE) be useful in my case? The other option, VACUUM (DISABLE_PAGE_SKIPPING) seems not relevant to my needs.

Edit (2021-09-27): looking back at the past 270 days (2021-01-01 to 2021-09-27), it took 16-60 seconds most days to do the vacuum. However on 9 days the vacuum exceeded 1000 seconds, on 4 out of which it even exceeded 3500 seconds (see graph below). Interesting is the time gap between these outliers, which lies between 25 and 35 days. It looks like something builds up over the span of these 25-35 days, necessitating a longer vacuum. Quite possibly this is the transaction ID wraparound mentioned in the comments below.

Edit (2022-02-08): In September 2021 we modified our script by changing:

VACUUM (ANALYZE)

to:

ANALYZE (VERBOSE)
VACUUM (FREEZE, VERBOSE)

Unfortunately, the about once monthly long runs of this job step still continue. However, with the splitting into two separate analyze and vacuum steps, as well as with the verbose output, we are much better able to tell what is going on.

On a fast day the output of the analyze and vacuum steps looks like this:

INFO:  analyzing "analytics.a_box_events"
INFO:  "a_box_events": scanned 30000 of 152881380 pages, containing 264294 live rows and 0 dead rows; 30000 rows in sample, 1346854382 estimated total rows
ANALYZE
INFO:  aggressively vacuuming "analytics.a_box_events"
INFO:  "a_box_events": found 0 removable, 3407161 nonremovable row versions in 406302 out of 152881380 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 108425108
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 152475078 frozen pages.
0 pages are entirely empty.
CPU: user: 4.53 s, system: 3.76 s, elapsed: 25.77 s.
INFO:  aggressively vacuuming "pg_toast.pg_toast_1361926"
INFO:  "pg_toast_1361926": found 0 removable, 3700 nonremovable row versions in 680 out of 465143 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 108425110
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 464463 frozen pages.
0 pages are entirely empty.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.05 s.
VACUUM

The analyze has no timing output, and the vacuum on the main table takes around 20-40 seconds. Analyze and vacuum together usually finish in under a minute.

Once every 35-45 days, the vacuum takes much longer, as several minutes are spent vacuuming each of the table's 39 indexes. There are two variations to this:

No index row versions to be removed

Example output:

INFO:  index "idx_a_box_events_node_id" now contains 1347181817 row versions in 4038010 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 5.44 s, system: 13.45 s, elapsed: 230.59 s.

The entire vacuum's final output then looks like this:

INFO:  "a_box_events": found 0 removable, 2837554 nonremovable row versions in 340887 out of 153111143 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 108429069
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 152770256 frozen pages.
0 pages are entirely empty.
CPU: user: 669.94 s, system: 870.59 s, elapsed: 10501.17 s.

Index row versions to be removed

Example output:

INFO:  scanned index "idx_a_box_events_node_id" to remove 2524 row versions
DETAIL:  CPU: user: 49.34 s, system: 11.42 s, elapsed: 198.63 s

and:

INFO:  index "idx_a_box_events_node_id" now contains 1228052362 row versions in 3478524 pages
DETAIL:  2524 index row versions were removed.
6 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

The entire vacuum's final output then looks like this:

INFO:  "a_box_events": found 56 removable, 3851482 nonremovable row versions in 461834 out of 139126225 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 97583006
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 138664391 frozen pages.
0 pages are entirely empty.
CPU: user: 2367.93 s, system: 761.34 s, elapsed: 8138.76 s.

Such long run times are currently more a nuisance than a show-stopper. An upgrade to the latest Postgres server version is planned for late 2023. At that time we will try the INDEX_CLEANUP option to VACUUM, running this once every weekend, to avoid the costly clean up of the indexes in the daily VACUUM. In the meantime we have apparently no choice but to be patient.

Edit (2022-08-24): We have, in the meantime, partitioned our large table into monthly slices, giving us currently 56 partitions, with about 75-90 GB for the table and 45-55 GB for the indexes per partition. Once a night new data is added to the current month's partition, followed by an ANALYZE (VERBOSE) (we don't do any more daily VACUUM). On the first Sunday of a month we additionally do a VACUUM (FULL, ANALYZE) of the previous month's partition.

As can be expected, the daily ANALYZE (VERBOSE) on a single partition is pretty quick, averaging around 16-18 seconds per run, building from nearly 0 seconds on the first day of a month to a maximum toward the end of a month, after which the following INSERT and ANALYZE are applied to a new partition and are again faster:

Best Answer

My theory on what is happening here is that for an INSERT only table, vacuum only needs to visit the parts of the table dirtied by the inserts, and can skip visiting the indexes at all.

But if it finds even one (in v11) dead tuple, then it needs to scan the entirety of all the indexes, and that can take a long time. I'm surprised it takes over 2 hours, but that is a mighty big table, so the indexes would be mighty big as well. Successful INSERT and COPY don't generate any dead tuples. But even one UPDATE or DELETE do, and so do any INSERT or COPY which rolled back. Does this theory seem compatible with what you are seeing? Do you ever have any DELETE or UPDATE of even one tuple, or ever have an INSERT that fails?

You could run a VACUUM at 9pm, then if it does have to scan all the indexes, then the one you run after the bulk load won't have to do it again--unless that bulk load itself is the thing which generated the dead tuples.

But what if you instead just run ANALYZE, then VACUUM, separately and in that order, after the bulk load? Is the problem that a long VACUUM is using too much IO when you start work and slows things down, or is the problem that a long VACUUM means the ANALYZE has not yet finished and so you get bad plans at the start of the work day?

Another problem is that if you are not doing anything to freeze parts of the table, then once an anti-wraparound VACUUM does kick in, you could be in for a very bad time. The first VACUUM FREEZE you do is going to generate an enormous amount of IO and take a long time, but after that it will only have to freeze small parts of the table at a time and so should be manageable. The nice thing about VACUUM FREEZE, unlike index scanning part of a VACUUM, is that if it is interrupted it doesn't lose all the work it has done so far, as the pages recorded as frozen in the visibility map can be skipped next time. So you can start VACUUM FREEZE and then just cancel if it seems to be causing problems, or if you leave your desk so you won't be around to monitor it for a while. Then once you are caught up, you can change the morning post-bulk-load VACUUM to a VACUUM FREEZE.

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

I don't think that you can tell from the "verbose" listing whether any pages were skipped because they were known to contain only visible tuples.

I think that it will only vacuum pages flagged as "all visible" if it decides to freeze tuples in the table, either because of approaching transaction wraparound or because you explicitly ran VACUUM with the FREEZE option. (If you use the vacuumdb command-line executable, this is the -F option.)

In our shop we supplement the autovacuum activity with a scheduled VACUUM FREEZE ANALYZE on the database during non-peak hours. That minimizes the overhead of autovacuum during peak hours, and slightly improves query performance, because visibility checks are simpler.

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

This:

INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages 22483 removable, 10475318 nonremovable row versions in 10448587 pages

suggests that the underlying issue is that something can still "see" those rows so they can't be removed.

The candidates for that are:

Lost prepared transactions. Check pg_catalog.pg_prepared_xacts; it should be empty. Also run SHOW max_prepared_transactions; it should report zero.
Long-running sessions with an open, idle transaction. In PostgreSQL 8.4 and above this should only be an issue for SERIALIZABLE transactions. Check pg_catalog.pg_stat_activity for <IDLE> in transaction sessions.

Most likely you have a client that's failing to commit or rollback transactions during long idle periods.

If this doesn't turn out to be it, the next thing I'd check would be to do a sum of the octet_size of each column of the table of interest. Compare that to the pg_relation_size of the table and its TOAST side-table. If there's a big difference then the space consumed is likely by no longer visible rows and you probably do have table bloat issues. If they're quite similar, you can start narrowing down where the space use is by summing up the octet sizes per column, getting the top 'n' values, etc.

Best Answer

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

Related Question