Postgresql – Why wouldn’t VACUUM ANALYZE clear all dead tuples

postgresqlpostgresql-9.3vacuum

We run a "manual" VACUUM ANALYZE VERBOSE on some of our larger tables after we do major DELETE/INSERT changes to them. This seems to work without issue although sometimes a table's VACUUM job will run for hours (see this post for similar issues and reasoning).

On doing more research I found that we have large tables with a large number of dead tuples even after running VACUUM. For example, here are some of the stats produced from the query in this response.

-[ RECORD 50 ]--+---------------------------
relname         | example_a
last_vacuum     | 2014-09-23 01:43
last_autovacuum | 2014-08-01 01:19
n_tup           |    199,169,568
dead_tup        |    111,048,906
av_threshold    |     39,833,964
expect_av       | *
-[ RECORD 51 ]--+---------------------------
relname         | example_b
last_vacuum     | 2014-09-23 01:48
last_autovacuum | 2014-08-30 12:40
n_tup           |    216,596,624
dead_tup        |    117,224,220
av_threshold    |     43,319,375
expect_av       | *
-[ RECORD 52 ]--+---------------------------
relname         | example_c
last_vacuum     | 2014-09-23 01:55
last_autovacuum | 2014-09-23 18:25
n_tup           |    309,831,136
dead_tup        |    125,047,233
av_threshold    |     61,966,277
expect_av       | *

The last field states that these (and most tables) would meet the threshold for autovacuum. However, having just run VACUUM ANALYZE VEBOSE on each of those tables, shouldn't the dead tuple count be 0 (or close to 0, not 125M of 300M)?

The documentation states:

VACUUM reclaims storage occupied by dead tuples.

Does this mean our VACUUM is not working?


UPDATE

Per request in repsonse here are some logs from the VERBOSE jobs:

INFO:  vacuuming "public.example_1"
INFO:  scanned index "idx_example_1_on_gp_id_and_dd_id" to remove 378386 row versions
DETAIL:  CPU 1.83s/3.42u sec elapsed 23.01 sec.
INFO:  scanned index "index_example_1_on_q_id" to remove 378386 row versions
DETAIL:  CPU 2.10s/3.91u sec elapsed 18.92 sec.
INFO:  "example_1": removed 378386 row versions in 7085 pages
DETAIL:  CPU 0.09s/0.05u sec elapsed 0.19 sec.
INFO:  index "idx_example_1_on_gp_id_and_dd_id" now contains 30347438 row versions in 291065 pages
DETAIL:  378386 index row versions were removed.
165587 index pages have been deleted, 164287 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_example_1_on_q_id" now contains 30347438 row versions in 333287 pages
DETAIL:  378386 index row versions were removed.
152773 index pages have been deleted, 152757 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "example_1": found 1773 removable, 401984 nonremovable row versions in 14438 out of 1493006 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 10567 unused item pointers.
0 pages are entirely empty.
CPU 4.26s/7.51u sec elapsed 46.10 sec.
INFO:  vacuuming "pg_toast.pg_toast_17917"
INFO:  index "pg_toast_17917_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_17917": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.example_1"
INFO:  "example_1": scanned 30000 of 1493006 pages, containing 611502 live rows and 0 dead rows; 30000 rows in sample, 40563141 estimated total rows

This table now shows 0 dead tuples in the stats. Most of the tables are much lower dead tuples this morning so either our VACUUM or autovacuum is working.

We do have a handful of tables that output nothing and yet still show dead tuples:

-[ RECORD 49 ]--+---------------------------
relname         | example_2
last_vacuum     | 2014-09-23 02:23
last_autovacuum | 2014-09-02 14:30
n_tup           |    117,914,944
dead_tup        |     34,507,388
av_threshold    |     23,583,039
expect_av       | *

A couple times I have seen in the logs where the indexes will get checked over and over again. This seems to correspond to long-running VACUUM jobs. Any idea why? Is this just working around record locking (I don't think any writes were happening during this jobs run.)

INFO:  vacuuming "public.example_2"
...
INFO:  scanned index "index_example_2_on_gsg_id_and_dd_id" to remove 2795959 row versions
DETAIL:  CPU 3.88s/16.54u sec elapsed 23.09 sec.
INFO:  scanned index "index_example_2_on_q_id" to remove 2795959 row versions
DETAIL:  CPU 6.74s/21.13u sec elapsed 84.64 sec.
INFO:  "example_2": removed 2795959 row versions in 48214 pages
DETAIL:  CPU 0.71s/0.32u sec elapsed 33.65 sec.
INFO:  scanned index "index_example_2_on_gsg_id_and_dd_id" to remove 2591011 row versions
DETAIL:  CPU 2.84s/16.11u sec elapsed 19.28 sec.
INFO:  scanned index "index_example_2_on_q_id" to remove 2591011 row versions
DETAIL:  CPU 5.46s/22.70u sec elapsed 130.57 sec.
INFO:  "example_2": removed 2591011 row versions in 45539 pages
DETAIL:  CPU 0.67s/0.38u sec elapsed 15.16 sec.
INFO:  index "index_example_2_on_gsg_id_and_dd_id" now contains 123807784 row versions in 1560915 pages
DETAIL:  108836958 index row versions were removed.
1100790 index pages have been deleted, 718471 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.25 sec.
INFO:  index "index_example_2_on_q_id" now contains 123807784 row versions in 1886087 pages
DETAIL:  110336259 index row versions were removed.
1058063 index pages have been deleted, 266983 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.07 sec.
INFO:  "example_2": found 124808 removable, 1355901 nonremovable row versions in 2086343 out of 6966379 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 7858495 unused item pointers.
0 pages are entirely empty.
CPU 595.49s/2130.13u sec elapsed 5656.34 sec.
INFO:  vacuuming "pg_toast.pg_toast_18079"
INFO:  index "pg_toast_18079_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_18079": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.example_2"
INFO:  "example_2": scanned 30000 of 6966379 pages, containing 528443 live rows and 522 dead rows; 30000 rows in sample, 152953760 estimated total rows

Best Answer

VACUUM can only remove dead tuples which are long-dead, that is, dead to all possible uses. If you have long-lived transactions, they may prevent the recently-dead tuples from being removed.

This is an example of a situation where a long-lived transaction prevented removal:

INFO:  "pgbench_accounts": found 0 removable, 2999042 nonremovable row versions in 49181 out of 163935 pages
DETAIL:  2999000 dead row versions cannot be removed yet.

It is not really long-lived transactions, but long lived snapshots. Certainly a long running select or insert statement will do that. For isolation levels higher than read-committed, the whole transaction will retain the snapshot until it is down, so if some opens a repeatable read transaction and then goes on vacation without committing it, that would be a problem. Hung-up prepared transactions will as well (if you don't know what a prepared transaction is, then you probably aren't using them).

The examples you show don't indicate a problem, but you also say the problem had resolved by then. If this is a recurring problem, you should probably start logging the output of your VACUUM VERBOSE statements, so that you can find the information that covers the period during which the problem exists.

The multiple passes over the index are because of your maintenance_work_mem settings. It can only remove one tuple for every 6 bytes of memory on each pass over the index, and needs to make multiple passes if you need to remove more than that. So increasing maintenance_work_mem will help.