Postgresql – TOAST Table Growth Out of Control – FULLVAC Does Nothing

postgresqlpostgresql-8.4

Recently, I've had a PostgreSQL 8.2.11 server upgraded to 8.4 in order to take advantage of autovacuum features and be in line with 30ish other PGSQL servers. This was done by a separate IT group who administrates the hardware, so we don't have much choice on any other upgrades (won't see 9+ for a while). The server exists in a very closed environment (isolated network, limited root privileges) and runs on RHEL5.5 (i686). After the upgrade, the database has constantly been growing to the tune of 5-6 GB a day. Normally, the database, as a whole, is ~20GB; currently, it is ~89GB. We have a couple other servers which run equivalent databases and actually synchronize the records to each other via a 3rd party application (one I do not have access to the inner workings). The other databases are ~20GB as they should be.

Running the following SQL, it's fairly obvious there's an issue with a particular table, and, more specifically, its TOAST table.

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 20;

Which produces:

              relation              |  size  
------------------------------------+---------  
  pg_toast.pg_toast_16874           | 89 GB  
  fews00.warmstates                 | 1095 MB  
  ...  
(20 rows)

This TOAST table is for a table called "timeseries" which saves large records of blobbed data. A SUM(LENGTH(blob)/1024./1024.) of all the records in timeseries yields ~16GB for that column. There should be no reason this table's TOAST table should be as large as it is.

I've performed a VACUUM FULL VERBOSE ANALYZE timeseries, and the vacuum runs to completion with no errors.

INFO: vacuuming "pg_toast.pg_toast_16874"
INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 37 to 2036 bytes long.
There were 20121422 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
4944885 pages are or will become empty, including 0 at the end of the table.
4944885 pages containing 0 free bytes are potential move destinations.
CPU 75.31s/29.59u sec elapsed 877.79 sec.
INFO: index "pg_toast_16874_index" now contains 10475318 row versions in 179931 pages
DETAIL: 23884 index row versions were removed.
101623 index pages have been deleted, 101623 are currently reusable.
CPU 1.35s/2.46u sec elapsed 21.07 sec.

REINDEXed the table which freed some space (~1GB). I can't CLUSTER the table as there isn't enough space on disk for the process, and I'm waiting to rebuild the table entirely as I'd like to find out why it is so much bigger than equivalent databases we have.

Ran a query from the PostgreSQL wiki here – "Show Database Bloat", and this is what I get:

current_database | schemaname |           tablename            | tbloat | wastedbytes |              iname              | ibloat | wastedibytes  
-----------------+------------+--------------------------------+--------+-------------+---------------------------------+--------+--------------  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_synchlevel       |    0.0 |            0  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_localavail       |    0.0 |            0  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_expirytime       |    0.0 |            0  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_expiry_null      |    0.0 |            0  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | uniq_localintid                 |    0.0 |            0  
ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | pk_timeseries                   |    0.1 |            0  
ptrdb04          | fews00     | idx_timeseries_expiry_null     |    0.6 |           0 | ?                               |    0.0 |            0

It looks like the database doesn't consider this space as "empty," at all, but I just don't see where all the disk space is coming from!

I suspect that this database server is deciding to use 4-5x as much disk space to save the same records pulled from the other data servers. My question is this: Is there a way I can verify the physical disk size of a row? I'd like to compare the size of one row on this database to another "healthy" database.

Thanks for any help you can provide!

UPDATE 1

I ended up rebuilding the table from a dumped schema due to its size (couldn't leave it alone for another day). After synchronizing the data, via the software synch process, the TOAST table was ~35GB; however, I could only account for ~9GB of it from that blob column which should be the longest in terms of values. Not sure where the other 26GB is coming from. CLUSTERed, VACUUM FULLed, and REINDEXed to no avail. The postgresql.conf files between the local and remote data servers are exactly the same. Is there any reason this database might be trying to store each record with a larger space on disk?

UPDATE 2 – Fixed

I finally decided to just completely rebuild the database from the ground up- even going as far as to reinstall the PostgreSQL84 packages on the system. The database path was reinitialized and tablespaces wiped clean. The 3rd party software synchronization process repopulated the tables, and the final size came out to be ~12GB! Unfortunately, this, in no way, helps to solve what the exact source of the issue was here. I'm going to watch it for a day or two and see if there are any major differences with how the revitalized database is handling the TOAST table and post those results here.

Relation Size


ptrdb04=> SELECT nspname || '.' || relname AS "relation",
ptrdb04->     pg_size_pretty(pg_relation_size(C.oid)) AS "size"
ptrdb04->   FROM pg_class C
ptrdb04->   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
ptrdb04->   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ptrdb04->   ORDER BY pg_relation_size(C.oid) DESC
ptrdb04->   LIMIT 2;

        relation         |  size   
-------------------------+---------
 pg_toast.pg_toast_17269 | 18 GB
 fews00.warmstates       | 1224 MB
(2 rows)

VACUUM VERBOSE ANALYZE timeseries;

INFO:  "timeseries": found 12699 removable, 681961 nonremovable row versions in 58130 out of 68382 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 105847 unused item pointers.
0 pages are entirely empty.
CPU 0.83s/2.08u sec elapsed 33.36 sec.
INFO:  vacuuming "pg_toast.pg_toast_17269"
INFO:  scanned index "pg_toast_17269_index" to remove 2055849 row versions
DETAIL:  CPU 0.37s/2.92u sec elapsed 13.29 sec.
INFO:  "pg_toast_17269": removed 2055849 row versions in 518543 pages
DETAIL:  CPU 8.60s/3.21u sec elapsed 358.42 sec.
INFO:  index "pg_toast_17269_index" now contains 7346902 row versions in 36786 pages
DETAIL:  2055849 index row versions were removed.
10410 index pages have been deleted, 5124 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "pg_toast_17269": found 1286128 removable, 2993389 nonremovable row versions in 1257871 out of 2328079 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 18847 unused item pointers.
0 pages are entirely empty.
CPU 26.56s/13.04u sec elapsed 714.97 sec.
INFO:  analyzing "fews00.timeseries"
INFO:  "timeseries": scanned 30000 of 68382 pages, containing 360192 live rows and 0 dead rows; 30000 rows in sample, 821022 estimated total rows

The only noticeable difference after the rebuild (other than disk usage) is

INFO:  "pg_toast_17269": found 1286128 removable, 2993389 nonremovable row versions

as @CraigRinger mentioned in a comment. The nonremovable row count is much smaller than before.

New question: Can other tables affect the size of another table? (via foreign keys and such) Rebuilding the table did nothing, yet rebuilding the whole database proved to fix the problem.

Best Answer

This:

INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages 22483 removable, 10475318 nonremovable row versions in 10448587 pages

suggests that the underlying issue is that something can still "see" those rows so they can't be removed.

The candidates for that are:

  • Lost prepared transactions. Check pg_catalog.pg_prepared_xacts; it should be empty. Also run SHOW max_prepared_transactions; it should report zero.

  • Long-running sessions with an open, idle transaction. In PostgreSQL 8.4 and above this should only be an issue for SERIALIZABLE transactions. Check pg_catalog.pg_stat_activity for <IDLE> in transaction sessions.

Most likely you have a client that's failing to commit or rollback transactions during long idle periods.

If this doesn't turn out to be it, the next thing I'd check would be to do a sum of the octet_size of each column of the table of interest. Compare that to the pg_relation_size of the table and its TOAST side-table. If there's a big difference then the space consumed is likely by no longer visible rows and you probably do have table bloat issues. If they're quite similar, you can start narrowing down where the space use is by summing up the octet sizes per column, getting the top 'n' values, etc.