Postgresql – Postgres client unable to detect UTF-8 database

character-setencodingpostgresqlpostgresql-9.1

Context

I am deploying a Vagrant box for my development team, and have encountered a problem restoring UTF-8 collated dumps. I don't believe it is related to the now-infamous "UTF8" has no equivalent in "LATIN1" problem, as my issue seems to lie in the client. Additionally, I have successfully restored exactly the same dump files on other Precise boxes (though precise64) with the same client and server packages (9.1). However, I am uncertain if other supporting libraries may not be present in my Vagrant box.

I can not alter the way in which the dump files are created, but I don't believe the problem lies there, anyway.

Problem

I successfully programatically create and restore a UTF-8 (specifically, en_CA.UTF-8) dump by creating a table with:

createdb --lc-collate=en_CA.UTF-8 --lc-ctype=en_CA.UTF-8 -E UTF8 -T template0 ${SCHEMA}

(If relevant, I'll add that my other successful restores (on the precise64 system) did not require any additional locale parameters to createdb.)

Then restore,

pg_restore -d ${SCHEMA} --single-transaction /tmp/${SCHEMA}_-_latest.backup

I have attempted with both postgresql.conf's client_encoding commented out and manually set to client_encoding = 'UTF8'; both behave the same.

My /etc/default/locale contains

LC_ALL=en_CA.UTF-8
LANG=en_CA.UTF-8

And yet, at the psql prompt:

schema=# SELECT * FROM table;
ERROR:  character 0xe28099 of encoding "UTF8" has no equivalent in "LATIN1"

schema=# show server_encoding; show client_encoding;
 server_encoding 
-----------------
 UTF8
(1 row)

 client_encoding 
-----------------
 LATIN1
(1 row)

The schema is UTF-8, right (so the commented-out client_encoding should use this value)?

schema=# SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = ...
pg_encoding_to_char 
---------------------
 UTF8
(1 row)

Setting it manually at the command prompt works:

schema=# set client_encoding to UTF8
SET
schema=# show client_encoding;
 client_encoding 
-----------------
 UTF8
(1 row)

schema=# SELECT * FROM table;
<EXPECTED DATA>

What have I done wrong?

EDIT for Craig:

vagrant@precise32:~$ locale
LANG=en_CA.UTF-8
LANGUAGE=
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=

Best Answer

psql detects the client_encoding from the LC_CTYPE variable in the environment; this falls back to LC_ALL and then LANG if unset.

In the terminal you're launching psql from, run locale. e.g.

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Paste the result as an edit to your question. I strongly suspect your terminal locale will be en_CA, not en_CA.UTF-8.