Ssh and character encoding

character encodinglocalessh

When I ssh into my VPS, I have irssi running in screen. When someone sends a unicode character (such as © or €), irssi displays garbage when I use it via the screen in a ssh session. If I connect to that irssi using irssi's proxy module, from irssi running on my local computer, it shows up correctly.

Likewise, if I run ghci on my VPS (outside a screen) and enter in one of those characters, it crashes.

So, obviously, there is a character encoding issue of some sort with my connection to my VPS, either in ssh or the system setup.

How can I find out what is causing this, and solve it?

Details:

Client system

  • Arch Linux x64
  • UTF-8 encoding

VPS system

  • Ubuntu Server 10.04
  • Unknown encoding used. How do I find this? (I just have to look in my /etc/rc.conf for Arch)

Best Answer

Running the locale command will give you information about your locale settings; the character encoding is given by the LC_CTYPE setting.

Under Ubuntu, the default locale settings are given in /etc/default/locale. You can change the character encoding by setting LC_CTYPE in your ~/.profile on the VPS, e.g.

export LC_CTYPE=en_US.UTF-8

You'll have to make sure that the en_US.UTF-8 locale is available. Ubuntu only generates locale data for requested locales. All English locales should be available if you have the package language-pack-en-base installed. You can manually request their generation with

sudo locale-gen en

You can also add entries to /var/lib/locales/supported.d/local to make sure a particular locale is installed (e.g., add the line en_US.UTF-8 UTF-8).

Related Question