Postgresql – Postgres 9.4.5 small table grows into huge size – periodic VACUUM FULL needed

postgresqlpostgresql-9.4vacuum

We are running a PostgreSQL database on Windows Server. The database version is 9.4.5 (it's been running stable for a long time now so we aren't updating it).

Some of our tables have millions of rows, and they are running fine.

What causes problems is a tiny little table that is only ~2000 rows.
This table is used to store current status reported from remote devices. These devices are sending their updated status every few seconds, so there are quite a few updates in that database each second. Nothing is being removed though, the rows are kept intact they are just being updated every time a device reports new status.

Some time ago I have noticed huge performance drop on our Apache server that uses this database. What I saw was that the status table was >3GB in size while having only 2000 rows. I read a lot on the internet, tried normal VACUUMs but they didn't do anything.

I finally went for VACUUM FULL as I saw that it is safe to use it on versions 9.0 and above. That really helped, the table size dropped just to a few KBs!

However, the problem is happening periodically every few days, and now when I try to run VACUUM FULL it takes a very long time, and our whole service is unusable during that time.

What could I do to prevent this from happening? Autovacuum is turned on, but it seems to do nothing in this particular case.

Output of VACUUM VERBOSE:

username=# vacuum verbose status;
INFO:  vacuuming "public.status"
INFO:  index "status_pkey" now contains 3295731 row versions in 11704 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.07s/0.01u sec elapsed 0.14 sec.
INFO:  "status": found 0 removable, 4472204 nonremovable row versions in 400557 out of 400558 pages
DETAIL:  4470176 dead row versions cannot be removed yet.
There were 23081 unused item pointers.
0 pages are entirely empty.
CPU 2.87s/0.70u sec elapsed 3.64 sec.
INFO:  vacuuming "pg_toast.pg_toast_16753"
INFO:  index "pg_toast_16753_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16753": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

As you can see, there are 0 removable row versions, despite the table has only 2039 physical rows.

SELECT n_live_tup, n_dead_tup
from pg_stat_user_tables
where relname = 'status';

 n_live_tup | n_dead_tup
------------+------------
       2039 |    4473416

At that moment,

SELECT pg_size_pretty(pg_total_relation_size('status'))

reported a size of 3221 MB, for only 2039 rows in the table. That is hilarious.

Best Answer

Something is keeping autovacuum from removing those dead tuples.

These can be only three things that keep VACUUM from doing its job:

A long running concurrent transaction. Check with

SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

An abandoned replication slot. Check with

SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;

An orphaned prepared transaction. Check with

SELECT gid, prepared, owner, database, transaction AS xmin
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;

But all that would also block VACUUM (FULL) from shrinking the table. So there must be something extra you are doing when you run VACUUM (FULL).

You have autovacuum enabled, right?

This is what you should do:

Remove whatever is blocking VACUUM.

Make sure autovacuum runs fast enough on that table:

ALTER TABLE status SET (autovacuum_vacuum_cost_delay = 0);

If the data modifications are caused by UPDATE you could also try to get HOT updates. These remove the need for VACUUM. A prerequisite is that the updated column or columns are not indexed.

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

I don't think that you can tell from the "verbose" listing whether any pages were skipped because they were known to contain only visible tuples.

I think that it will only vacuum pages flagged as "all visible" if it decides to freeze tuples in the table, either because of approaching transaction wraparound or because you explicitly ran VACUUM with the FREEZE option. (If you use the vacuumdb command-line executable, this is the -F option.)

In our shop we supplement the autovacuum activity with a scheduled VACUUM FREEZE ANALYZE on the database during non-peak hours. That minimizes the overhead of autovacuum during peak hours, and slightly improves query performance, because visibility checks are simpler.

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

This:

INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages 22483 removable, 10475318 nonremovable row versions in 10448587 pages

suggests that the underlying issue is that something can still "see" those rows so they can't be removed.

The candidates for that are:

Lost prepared transactions. Check pg_catalog.pg_prepared_xacts; it should be empty. Also run SHOW max_prepared_transactions; it should report zero.
Long-running sessions with an open, idle transaction. In PostgreSQL 8.4 and above this should only be an issue for SERIALIZABLE transactions. Check pg_catalog.pg_stat_activity for <IDLE> in transaction sessions.

Most likely you have a client that's failing to commit or rollback transactions during long idle periods.

If this doesn't turn out to be it, the next thing I'd check would be to do a sum of the octet_size of each column of the table of interest. Compare that to the pg_relation_size of the table and its TOAST side-table. If there's a big difference then the space consumed is likely by no longer visible rows and you probably do have table bloat issues. If they're quite similar, you can start narrowing down where the space use is by summing up the octet sizes per column, getting the top 'n' values, etc.

Best Answer

Related Solutions

Postgresql – Partial vacuum and whole table vacuum

Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

Related Question