Postgresql – Postgres 9.2 – postgres – FATAL: XX000: relation mapping file “global/pg_filenode.map” contains incorrect checksum

corruptionpostgresql-9.2

We have a ~400GB Postgres database (v9.2.6) (running on CentOS 6.5) that had a problem with free space and with every command (ls, pwd etc) we were seeing a segmentation fault error. We were able to read the disk and so we copied the pg_data directory and rebuilt the server. In an attempt to place the pg_data back and start the Postgres service we saw the error:

postgres - FATAL:  XX000: relation mapping file "global/pg_filenode.map" 
contains incorrect checksum

I know the easiest answer would be restore the latest backup. Well, as you would know it, the last pg_base_backup was just over 3 weeks ago and we only retain 1 weeks worth of WAL logs. So I do not have enough logs to bring the backup up to date. I know this is a flaw here, I know I learned some lessons.

I tried to use the pg_filenode.map file from the 2 week old backup, but it resulted in the same checksum error. I will note that a FULL VACUUM was done over this past weekend on one of the larger tables but completed before the failure.

Is there any chance I can recover from this error? Any ideas on possibly reading some of the smaller table data / schema / functions / views to help me in the rebuild process of a new clean slate database.

Best Answer

Before I did anything, I would make a copy of the cluster as is and keep it safe, following the directions here:

https://wiki.postgresql.org/wiki/Corruption

Once you have your master copy safe, you might try a fresh initdb in another directory, and copy over the pg_filenode.map file, as suggested here, and see if that helps start your database:

http://tapoueh.org/blog/2013/09/16-PostgreSQL-data-recovery

If that doesn't work, or you encounter more errors, I would try pgsql-general@postgresql.org for more suggestions, or engaging a PostgreSQL consultancy to try and recover your data.

If it does work, I would do a pg_dump immediately, then initdb a new cluster on the new machine, and reload the pg_dump, just to make sure there isn't anything else lurking that could cause problems.

My sympathies on your issue, and best of luck getting it resolved.