Context
I am deploying a Vagrant box for my development team, and have encountered a problem restoring UTF-8 collated dumps. I don't believe it is related to the now-infamous "UTF8" has no equivalent in "LATIN1" problem, as my issue seems to lie in the client. Additionally, I have successfully restored exactly the same dump files on other Precise boxes (though precise64) with the same client and server packages (9.1). However, I am uncertain if other supporting libraries may not be present in my Vagrant box.
I can not alter the way in which the dump files are created, but I don't believe the problem lies there, anyway.
Problem
I successfully programatically create and restore a UTF-8 (specifically, en_CA.UTF-8
) dump by creating a table with:
createdb --lc-collate=en_CA.UTF-8 --lc-ctype=en_CA.UTF-8 -E UTF8 -T template0 ${SCHEMA}
(If relevant, I'll add that my other successful restores (on the precise64 system) did not require any additional locale parameters to createdb
.)
Then restore,
pg_restore -d ${SCHEMA} --single-transaction /tmp/${SCHEMA}_-_latest.backup
I have attempted with both postgresql.conf's client_encoding
commented out and manually set to client_encoding = 'UTF8'
; both behave the same.
My /etc/default/locale contains
LC_ALL=en_CA.UTF-8
LANG=en_CA.UTF-8
And yet, at the psql
prompt:
schema=# SELECT * FROM table;
ERROR: character 0xe28099 of encoding "UTF8" has no equivalent in "LATIN1"
schema=# show server_encoding; show client_encoding;
server_encoding
-----------------
UTF8
(1 row)
client_encoding
-----------------
LATIN1
(1 row)
The schema is UTF-8, right (so the commented-out client_encoding
should use this value)?
schema=# SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = ...
pg_encoding_to_char
---------------------
UTF8
(1 row)
Setting it manually at the command prompt works:
schema=# set client_encoding to UTF8
SET
schema=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
schema=# SELECT * FROM table;
<EXPECTED DATA>
What have I done wrong?
EDIT for Craig:
vagrant@precise32:~$ locale
LANG=en_CA.UTF-8
LANGUAGE=
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
Best Answer
psql
detects theclient_encoding
from theLC_CTYPE
variable in the environment; this falls back toLC_ALL
and thenLANG
if unset.In the terminal you're launching
psql
from, runlocale
. e.g.Paste the result as an edit to your question. I strongly suspect your terminal locale will be
en_CA
, noten_CA.UTF-8
.