Postgresql – Remove characters from PostgreSQL database causing encoding errors

encodingpostgresqlpostgresql-10unicode

I have a PostgreSQL 10 database that uses WIN1252 encoding.

One of my columns has values that cause conversion errors when running a select from pgAdmin 4:

SELECT myfield FROM mydb.myschema."MYTABLE"

Returns:

ERROR:  character with byte sequence 0x9d in encoding "WIN1252" has no 
equivalent in encoding "UTF8"
SQL state: 22P05

Setting the client encoding to WIN1252 in pgAdmin 4 causes it to drop the connection to the database (I remember reading somewhere that pgAdmin 4 has issues with setting the client encoding, but track down the link).

Using psql and setting the client encoding to WIN1252, selecting one of the offending features:

SELECT myfield
FROM mydb.myschema."MYTABLE"
WHERE oid = 12345

Returns:

“A sample comment

Looking at the data that was originally imported into the DB, it appears that “curly quotes” were incorrectly imported.

Is there a way to remove these characters from the column – either replacing them with regular quotation marks, or simply deleting them?

Best Answer

Looking at the WIN1252 character set: https://en.wikipedia.org/wiki/Windows-1252 , there is no character corresponding to that hexadecimal code 9D. The same goes for 81, 8D, 8F, 90.

That's why it "has not equivalent" in UTF8: that character doesn't exist in the first place in the source map.

It doesn't error out when displayed or input in a session configured in WIN1252 encoding. When using the proper code page (chcp 1252) with psql.exe on top of cmd.exe, these characters are displayed in my environment as a question mark inside a rectangular box.

Anyway, you might want to remove these with statements like (when using the WIN1252 client encoding):

UPDATE tablename SET field=replace(field, chr(x'9D'::int), '')
  WHERE strpos(field,chr(x'9D'::int))>0;

Use psql if WIN1252 is unusable with PgAdmin. Once these characters are removed, you'll be able to switch later to UTF8 client encoding.