We are running a PostgreSQL database on Windows Server. The database version is 9.4.5 (it's been running stable for a long time now so we aren't updating it).
Some of our tables have millions of rows, and they are running fine.
What causes problems is a tiny little table that is only ~2000 rows.
This table is used to store current status reported from remote devices. These devices are sending their updated status every few seconds, so there are quite a few updates in that database each second. Nothing is being removed though, the rows are kept intact they are just being updated every time a device reports new status.
Some time ago I have noticed huge performance drop on our Apache server that uses this database. What I saw was that the status
table was >3GB in size while having only 2000 rows. I read a lot on the internet, tried normal VACUUMs but they didn't do anything.
I finally went for VACUUM FULL
as I saw that it is safe to use it on versions 9.0 and above. That really helped, the table size dropped just to a few KBs!
However, the problem is happening periodically every few days, and now when I try to run VACUUM FULL
it takes a very long time, and our whole service is unusable during that time.
What could I do to prevent this from happening? Autovacuum is turned on, but it seems to do nothing in this particular case.
Output of VACUUM VERBOSE
:
username=# vacuum verbose status;
INFO: vacuuming "public.status"
INFO: index "status_pkey" now contains 3295731 row versions in 11704 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.07s/0.01u sec elapsed 0.14 sec.
INFO: "status": found 0 removable, 4472204 nonremovable row versions in 400557 out of 400558 pages
DETAIL: 4470176 dead row versions cannot be removed yet.
There were 23081 unused item pointers.
0 pages are entirely empty.
CPU 2.87s/0.70u sec elapsed 3.64 sec.
INFO: vacuuming "pg_toast.pg_toast_16753"
INFO: index "pg_toast_16753_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_16753": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
As you can see, there are 0 removable row versions, despite the table has only 2039 physical rows.
SELECT n_live_tup, n_dead_tup
from pg_stat_user_tables
where relname = 'status';
n_live_tup | n_dead_tup
------------+------------
2039 | 4473416
At that moment,
SELECT pg_size_pretty(pg_total_relation_size('status'))
reported a size of 3221 MB, for only 2039 rows in the table. That is hilarious.
Best Answer
Something is keeping autovacuum from removing those dead tuples.
These can be only three things that keep
VACUUM
from doing its job:A long running concurrent transaction. Check with
An abandoned replication slot. Check with
An orphaned prepared transaction. Check with
But all that would also block
VACUUM (FULL)
from shrinking the table. So there must be something extra you are doing when you runVACUUM (FULL)
.You have autovacuum enabled, right?
This is what you should do:
Remove whatever is blocking
VACUUM
.Make sure autovacuum runs fast enough on that table:
If the data modifications are caused by
UPDATE
you could also try to get HOT updates. These remove the need forVACUUM
. A prerequisite is that the updated column or columns are not indexed.