SSH – Working with Filenames in Different Encoding

altlinuxcharacter encodingfilenamessshterminal

I'm ssh'ing to a remote system where a different encoding for the filenames (and for the users' locales) has been used. And this causes some problems.

Problems solved by matching the locale settings

Before I move to the problems with filenames, I want to say that some encoding problems with such an ssh session are solved by setting the remote locale so that it matches the local locale, namely,

  • the problems with editing the command line (I pressed Backspace trice, but since on my host the encoding is UTF-8, and on the remote end — KOI8-R, or perhaps CP1251, some 8-bit Cyrillic encodings, this didn't affect my Cyrillic string correctly):

 

[imz@localhost ~]$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
[imz@localhost ~]$ echo привет
привет
[imz@localhost ~]$ echo при
при
[imz@localhost ~]$ ssh -vv ivan@example.com
Last login: Fri Nov 25 13:44:56 2011 from NN.NN.NN.NN
[ivan@dell ~]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES=POSIX
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=
[ivan@dell ~]$ echo привет
привет
[ivan@dell ~]$ echo при   
привÐ
[ivan@dell ~]$ export LANG=ru_RU.UTF-8
[ivan@dell ~]$ echo привет
привет
[ivan@dell ~]$ echo при
при
[ivan@dell ~]$ 
  • the problem with correct understanding of case-insensitivity for the strings processed; now it would work, after I set the locale:

 

[ivan@dell ~]$ echo привет | fgrep -i ВЕТ
привет
[ivan@dell ~]$ 

but this wouldn't work before.

Minor problems with filenames

The utilities that print out filenames (which, as you remember, are stored remotely in a different encoding) wouldn't print them verbatim, but they susbstitute question marks for the foreign characters:

[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls'
/home/mama/Desktop/????????? ????????.xls
/home/mama/Desktop/???????? ??? ???????????? (1).xls
/home/mama/Desktop/???????? ??? ???????????? (2).xls
/home/mama/Desktop/???????? ??? ???????????? (3).xls
/home/mama/Desktop/???????? ??? ????????????.xls
[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls' -print
/home/mama/Desktop/????????? ????????.xls
/home/mama/Desktop/???????? ??? ???????????? (1).xls
/home/mama/Desktop/???????? ??? ???????????? (2).xls
/home/mama/Desktop/???????? ??? ???????????? (3).xls
/home/mama/Desktop/???????? ??? ????????????.xls
[ivan@dell ~]$ 

The same problem is exhibited by ls, and so on. But this can be easily overcome by passing them as strings to printing commands (that are not aware of the issue with non-matching encodings of the filenames and of the terminal–or for whatever reason, but it works):

[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls' -print0 | xargs -0 -n 1 echo 
/home/mama/Desktop/Êðåäèòíûé ïîðòôåëü.xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (2).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé.xls
[ivan@dell ~]$ 

Also, the fact that they are unreadable wasn't very annoying, because I could always append an | recode -f cp1251..utf-8 at the end of the command.

The annoying problem

The essential problem is that selecting (with a mouse) the filenames in the terminal and pasting them doesn't work:

[ivan@dell ~]$ diff '/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls' '/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls'
diff: /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls: No such file or directory
diff: /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls: No such file or directory
[ivan@dell ~]$ 

I've noticed an escaped representation of the filenames in the output of stat, so I was able to select and paste it (inside $'' in bash):

[ivan@dell ~]$ diff '/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls' '/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls'
diff: /home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls: No such file or directory
diff: /home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls: No such file or directory
[ivan@dell ~]$ diff $'/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls' $'/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls'
Files /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls and /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls differ
[ivan@dell ~]$ 

So, the question is:

How to conveniently work with remote filenames (over ssh), which are
in a different encoding?

It would be nice if they were readable, and selectable and pastable (and also typable by me from the keyboard and then completable by Tab in bash; to be typable by me conveniently, they must be readable, of course).

I'm working in urxvt in X.org on Linux on the local host, and it's bash on Linux at the remote end.

Best Answer

Inside a terminal emulator that supports UTF-8, you can use the luit command to run a subshell (or other program) in a different locale. The locale setting that indicates character sets is LC_CTYPE.

LC_CTYPE=ru_RU.KOI8-R luit ls   # run one command
LC_CTYPE=ru_RU.KOI8-R luit      # start a shell (type Ctrl+D or exit to return to the parent shell)

If you have a whole tree of files in a different encoding, I recommend (if possible) mounting it through convmvfs.

mkdir ~/net/ivan@example.com.KOI8-R ~/net/ivan@example.com.UTF-8
sshfs ivan@example.com: ~/net/ivan@example.com.KOI8-R
convmvfs -o srcdir=~/net/ivan@example.com.KOI8-R,icharset=KOI8-R,ocharset=UTF-8 ~/net/ivan@example.com.UTF-8
ls ~/net/ivan@example.com.UTF-8