Postgresql – VACUUM VERBOSE outputs, nonremovable “dead row versions cannot be removed yet”

postgresqlvacuum

I have a Postgres 9.2 DB where a certain table has lots of nonremovable dead rows:

# SELECT * FROM public.pgstattuple('mytable');
 table_len  | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
 2850512896 |      283439 | 100900882 |          3.54 |          2537195 |     2666909495 |              93.56 |   50480156 |         1.77
(1 row)

Normal vacuuming also shows lots of nonremovable dead rows:

# VACUUM VERBOSE mytable;
[...]
INFO:  "mytable": found 0 removable, 2404332 nonremovable row versions in 309938 out of 316307 pages
DETAIL:  2298005 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 1.90s/2.05u sec elapsed 16.79 sec.
[...]

The table only has around 300.000 actual data rows, but 2.3 million dead rows (and this appears to make certain queries very slow).

According to SELECT * FROM pg_stat_activity where xact_start is not null and datname = 'mydb' order by xact_start; there is no old transaction accessing the database. The oldest transactions are some minutes old and haven't modified anything on the table yet.

I've also checked select * from pg_prepared_xacts (to check for prepared transactions) and select * from pg_stat_replication (to check for pending replications), both of which are empty.

There are lots of inserts, updates and deletes performed on that table, so I can understand that lots of dead rows are being created. But why aren't they removed by the VACUUM command?

Best Answer

The oldest transactions are some minutes old and haven't modified anything on the table yet.

That's not sufficient. I think what is required to mark these rows as dead is that, when these transactions were started, there was no other transaction that had touched these rows (doing an UPDATE or DELETE on them).

Updating or deleting a row will keep the previous version of the row physically where it was, and set its xmax field to the TXID of the current transaction. From the point of view of other transactions, this old version of the row is still visible if it is part of their snapshot. Each snapshot has an xmin and xmax to which the xmin and xmax of the row versions can be compared. The point is that VACUUM must compare row versions against the combined visibility of all live snapshots, as opposed to simply checking if a row change is definitely committed. The latter is necessary but not sufficient to recycle the space used by the old version.

For example, here's a sequence of events such that VACUUM can't clean up dead rows even though the transaction that modified them has finished:

t0: Long running transaction TX1 starts
t0+30mn: TX2 starts and sets itself in REPEATABLE READ mode.
t0+35mn: TX1 finishes.
t0+40mn: pg_stat_activity shows only the 10-mn old TX2
t0+45mn: VACUUM runs but won't eliminate the old versions of the rows modified by TX1 (because TX2 might need them).

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

I don't think that you can tell from the "verbose" listing whether any pages were skipped because they were known to contain only visible tuples.

I think that it will only vacuum pages flagged as "all visible" if it decides to freeze tuples in the table, either because of approaching transaction wraparound or because you explicitly ran VACUUM with the FREEZE option. (If you use the vacuumdb command-line executable, this is the -F option.)

In our shop we supplement the autovacuum activity with a scheduled VACUUM FREEZE ANALYZE on the database during non-peak hours. That minimizes the overhead of autovacuum during peak hours, and slightly improves query performance, because visibility checks are simpler.

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

This:

INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages 22483 removable, 10475318 nonremovable row versions in 10448587 pages

suggests that the underlying issue is that something can still "see" those rows so they can't be removed.

The candidates for that are:

Lost prepared transactions. Check pg_catalog.pg_prepared_xacts; it should be empty. Also run SHOW max_prepared_transactions; it should report zero.
Long-running sessions with an open, idle transaction. In PostgreSQL 8.4 and above this should only be an issue for SERIALIZABLE transactions. Check pg_catalog.pg_stat_activity for <IDLE> in transaction sessions.

Most likely you have a client that's failing to commit or rollback transactions during long idle periods.

If this doesn't turn out to be it, the next thing I'd check would be to do a sum of the octet_size of each column of the table of interest. Compare that to the pg_relation_size of the table and its TOAST side-table. If there's a big difference then the space consumed is likely by no longer visible rows and you probably do have table bloat issues. If they're quite similar, you can start narrowing down where the space use is by summing up the octet sizes per column, getting the top 'n' values, etc.

Best Answer

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

Related Question