Postgresql – VACUUM VERBOSE outputs, nonremovable “dead row versions cannot be removed yet”

postgresqlvacuum

I have a Postgres 9.2 DB where a certain table has lots of nonremovable dead rows:

# SELECT * FROM public.pgstattuple('mytable');
 table_len  | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
 2850512896 |      283439 | 100900882 |          3.54 |          2537195 |     2666909495 |              93.56 |   50480156 |         1.77
(1 row)

Normal vacuuming also shows lots of nonremovable dead rows:

# VACUUM VERBOSE mytable;
[...]
INFO:  "mytable": found 0 removable, 2404332 nonremovable row versions in 309938 out of 316307 pages
DETAIL:  2298005 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 1.90s/2.05u sec elapsed 16.79 sec.
[...]

The table only has around 300.000 actual data rows, but 2.3 million dead rows (and this appears to make certain queries very slow).

According to SELECT * FROM pg_stat_activity where xact_start is not null and datname = 'mydb' order by xact_start; there is no old transaction accessing the database. The oldest transactions are some minutes old and haven't modified anything on the table yet.

I've also checked select * from pg_prepared_xacts (to check for prepared transactions) and select * from pg_stat_replication (to check for pending replications), both of which are empty.

There are lots of inserts, updates and deletes performed on that table, so I can understand that lots of dead rows are being created. But why aren't they removed by the VACUUM command?

Best Answer

The oldest transactions are some minutes old and haven't modified anything on the table yet.

That's not sufficient. I think what is required to mark these rows as dead is that, when these transactions were started, there was no other transaction that had touched these rows (doing an UPDATE or DELETE on them).

Updating or deleting a row will keep the previous version of the row physically where it was, and set its xmax field to the TXID of the current transaction. From the point of view of other transactions, this old version of the row is still visible if it is part of their snapshot. Each snapshot has an xmin and xmax to which the xmin and xmax of the row versions can be compared. The point is that VACUUM must compare row versions against the combined visibility of all live snapshots, as opposed to simply checking if a row change is definitely committed. The latter is necessary but not sufficient to recycle the space used by the old version.

For example, here's a sequence of events such that VACUUM can't clean up dead rows even though the transaction that modified them has finished:

  • t0: Long running transaction TX1 starts
  • t0+30mn: TX2 starts and sets itself in REPEATABLE READ mode.
  • t0+35mn: TX1 finishes.
  • t0+40mn: pg_stat_activity shows only the 10-mn old TX2
  • t0+45mn: VACUUM runs but won't eliminate the old versions of the rows modified by TX1 (because TX2 might need them).