Not going into details how that happened, all started with a corrupt DB. It was OK on the file-system level, but trying to:
select count(*) from mytable
resulted in the following:
ERROR: could not read block 257798 in file "pg_tblspc/16386/PG_9.6_201608131/16385/16506.1": read only 0 of 8192 bytes
The table is very large (~200GB) and contains some bytea
columns with binary data. It is never updated, only inserted into and selected from.
I run pg_dump on it, it was working for a long time, produced a ~200GB file, but failed in the end with a similar error:
pg_dump -Z9 -Fc -d mydatabase -t mytable -v -f /datadir/mytable.backup
pg_dump: Dumping the contents of table "mytable" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: could not read block 257798 in file "pg_tblspc/16386/PG_9.6_201608131/16385/16506.1": read only 0 of 8192 bytes
pg_dump: The command was: COPY public.mytable (id, account_id, fetched, col1, col2, col3, col4, col5) TO stdout;
Then I tried to restore it, again it was working a long time, in the end it took ~200GB space on that tablespace, however select count(*) from mytable
returns 0
.
pg_restore -Fc -d mydatabase -v /datadir/mytable.backup
pg_restore: processing data for table "public.mytable"
and after a long time
pg_restore: could not read from input file: end of file
On server side this resulted in:
ERROR: canceling statement due to user request
CONTEXT: COPY mytable, line 14497030
STATEMENT: COPY mytable (id, account_id, fetched, col1, col2, col3, col4, col5) FROM stdin
LOG: could not send data to client: Connection reset by peer
FATAL: connection to client lost
It looks like the pg_restore requested to abort the whole operation.
I would like to recover as many rows as possible, I suspect that all the data should be still there since it's basically immutable, and that just the last record is corrupted.
Is there a way to force Postgres to keep the records which were restored up till the failed one?
Best Answer
I would suggest you try to do something along the line of:
... and find at which point it fails. [This may be tedious, but you can just cut in half repeatedly, until you find the value that first fails.]
You will most probably need to alter the different settings that affect index usage, because you actually want to force the database to use the index, even if it actually must scan 99% of the table. You want it not to scan the page that booms.
If you can get
SELECT
to give you most of the data, you can then do something such as:and later on:
... and you'll have all the data that could be retrieved. I wouldn't drop the old table, in case someone finds later on a better method of retrieving the missing info.
Best of luck.