PostgreSQL – Fixing pg_restore Error: Invalid Byte Sequence for UTF8

migrationpostgresql

Lucky me, I get to play DBA today!

I am migrating a Postgres 8.4 DB from one server to another (both RHEL6). I backed up the schema from the old server and restored it to the new server. Then I did a full database backup and restored it to the new server, but I had two tables that were skipped during the restore. In both cases I used the format=c option with pg_dump and pg_restore.

I am now focusing on those two tables, and i need help! I do a backup on the old server:

pg_dump --format=c --file=mytable.sqlc -a --table public.mytable mydatabase

Now I go to the new server and attempt a restore:

pg_restore -d mydatabase -t mytable --format=c -a -c mytable.sqlc

I'm getting the following errors:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2241; 0 23420338 TABLE DATA mytable mydatabase_script
pg_restore: [archiver (db)] COPY failed for table "mytable": ERROR:  invalid byte sequence for encoding "UTF8": 0xe05c62
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT:  COPY mytable, line 231993
WARNING: errors ignored on restore: 1

The error seems to imply that I have an encoding difference between the old db server and the new one (even though I backed up/restored the schema). I did a:

select * from pg_database where datname = "mydatabase"

And it shows that both databases are UTF-8 encoded. What do I need to check next? Thanks.

Best Answer

Your options are pretty simple, and the error message is explicit. This is the problem

ERROR: invalid byte sequence for encoding "UTF8": 0xe05c62

And this hint pretty much tells you what we would tell you,

HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

So you have two options,

  1. Remove the byte sequence (assumes you may not need it), there is no indicator that you know what it's actually supposed to be anyway.
  2. Set the client_encoding to whatever encoding supports that sequence, as UTF-8 doesn't.

You may want to confirm the encoding of your database

SELECT pg_encoding_to_char(encoding)
FROM pg_database
WHERE datname = 'yourdb';

And confirm that the output encoding (--encoding to pg_dump) supports all of those characters (is the same or a superset). And, that also the new database is either the same encoding or a superset.

Try not to use --format=c, it only adds complexity. If you're having problems use plain, the default.