Postgresql – Why is VACUUM FULL necessary here

postgresqlvacuum

Queries are running very slowly on a table with 80k rows.

SELECT count(*) FROM task takes <100ms. SELECT type, count(*) FROM task GROUP BY type takes 30,000ms.

Basically, any query that is not index-only takes excruciatingly long.


I kill the connections and vacuum repeatedly.

rivet=> VACUUM VERBOSE task;
INFO:  vacuuming "public.task"
INFO:  scanned index "task_id_idx" to remove 1830 row versions
DETAIL:  CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.36 s
INFO:  scanned index "task_type_queue_idx" to remove 1830 row versions
DETAIL:  CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.25 s
INFO:  "task": removed 1830 row versions in 41 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "task_id_idx" now contains 87135 row versions in 23995 pages
DETAIL:  1830 index row versions were removed.
23488 index pages have been deleted, 23428 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "task_type_queue_idx" now contains 8218 row versions in 28362 pages
DETAIL:  915 index row versions were removed.
28219 index pages have been deleted, 28140 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "task": found 666 removable, 3874 nonremovable row versions in 98 out of 1005331 pages
DETAIL:  3174 dead row versions cannot be removed yet, oldest xmin: 70916502
There were 16134 unused item pointers.
Skipped 0 pages due to buffer pins, 1003551 frozen pages.
0 pages are entirely empty.
CPU: user: 0.15 s, system: 0.00 s, elapsed: 0.83 s.
INFO:  vacuuming "pg_toast.pg_toast_133792"
INFO:  index "pg_toast_133792_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_133792": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 70916502
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

But with no success.

Then I vacuum "full."

rivet=> VACUUM (VERBOSE, FULL) task;
INFO:  vacuuming "public.task"
INFO:  "task": found 931 removable, 83522 nonremovable row versions in 1005331 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 3.30 s, system: 0.28 s, elapsed: 29.96 s.
VACUUM

Now my table scan query over 80k rows takes <200ms.


Why was VACUUM FULL necessary?

I notice that before full vacuum there were 1M pages, and now there are <400.

Best Answer

Your table clearly has had a troubled history and became extremely bloated. However, nothing in the log files you showed us indicate what the nature of that history is.

It could be that autovacuum was turned off for a time, or misconfigured and so ran too slow to keep up, or that there was a transaction held open for a very long time which prevented vacuum from being effective, or maybe someone was constantly locking the table in a mode stronger than ROW EXCLUSIVE, causing autovacuum to yield before finishing everytime it tried to run.