Postgresql – pg_dump Error while running backup

postgresqlpostgresql-8.4

I encountered an error on PostgresSQL. Here is details for the error which i have seen during the backup.

 pg_dump: SQL command failed
 pg_dump: Error message from server: ERROR: found toasted toast chunk for toast value 16431418 in pg_toast_16431418
 pg_dump: The command was: COPY.XXXXXXXXXXX 

So what I did:

 VACUUM ANALYZE <table_name>;

it showed me the same errors, then I ran:

 SET zero_damaged_pages = on;
 VACUUM FULL <table_name>;

it zeroed out the bad page header, then I ran:

 REINDEX table pg_toast.pg_toast_16306418;

After running this command, I checked the other process which could be waiting but nothing was in the waiting except this query taking up 99% of CPU cycles. I waited for an hour and seems nothing was happening, I terminated the query.

Then I ran full vacuum and reindexing of database but vacuum again failed

 VACUUM
 RESET

vacuum did not complete

I thought zero-ing out the invalid page header would allow me to back up the database but it did not do anything. I have checked the memory, RAM, hardware etc. and that all looks pretty good. Can someone suggest something please?

I am fairly new to postgres so please be specific, if you can.

Best Answer

What the error message suggests is a bit flipped in this pg_toast_16431418 relation on the exact place where the size is stored (the data size has the information that it's toasted in its upper bits). See TOAST in the 8.4 doc for details.

This does not suggest a problem with the page header, or with an index, or with the catalog.

BUG #5929 indicates it may be caused by a hardware problem, especially if the server doesn't use ECC memory.

You might want to try pageinspect to peek at the faulty data. If the damage is just a flipped bit, in theory it may be flipped back by writing directly at the data page as the last resort.

There are many steps involved:

  1. do a file-level copy with the db shut down and work on the copy. Turn off autovacuum or anything that may touch the data.
  2. find the main table corresponding to the pg_toast table. See Postgres pg_toast in autovacuum - which table?
  3. Select each row of this table one by one until the failure occurs. Note the ctid of the row. It contains the offset of the page containing the row, and the offset of the row withing the page.
  4. Dump the page with get_raw_page(relname text, blkno int) and examine the row. Use postgres doc to parse the data. You want to extract the pointer into the pg_toast_NNN relation.
  5. Convert the pointer into a page (I don't know how to do that. I'd need to dive into the doc). Dump the corresponding page of pg_toast_NNN with get_raw_page() again.

It's no small feat if you don't have any prior knowledge of any of that stuff. All the information is in the documentation, but in a compact form.