I think all you need to do is run reset
.
If that doesn't help, look to see if you changed any files in /etc
recently (e.g. find /etc -mtime -1
) and read the unicode_start
or consolechars
man pages.
Short answer: restrictions imposed in Unix/Linux/BSD kernel, namei()
function. Encoding takes place in user level programs like xterm
, firefox
or ls
.
I think you're starting from incorrect premises. A file name in Unix is a string of bytes with arbitrary values. A few values, 0x0 (ASCII Nul) and 0x2f (ASCII '/') are just not allowed, not as part of a multi-byte character encoding, not as anything. A "byte" can contain a number representing a character (in ASCII and some other encodings) but a "character" can require more than 1 byte (for example, code points above 0x7f in UTF-8 representation of Unicode).
These restrictions arise from file name printing conventions and the ASCII character set. The original Unixes used ASCII '/' (numerically 0x2f) valued bytes to separate pieces of a partially- or fully-qualified path (like '/usr/bin/cat' has pieces "usr", "bin" and "cat"). The original Unixes used ASCII Nul to terminate strings. Other than those two values, bytes in file names may assume any other value. You can see an echo of this in the UTF-8 encoding for Unicode. Printable ASCII characters, including '/', take only one byte in UTF-8. UTF-8 for code points above does not include any Zero-valued bytes, except for the Nul control character. UTF-8 was invented for Plan-9, The Pretender to the Throne of Unix.
Older Unixes (and it looks like Linux) had a namei()
function that just looks at paths a byte at a time, and breaks the paths into pieces at 0x2F valued bytes, stopping at a zero-valued byte. namei()
is part of the Unix/Linux/BSD kernel, so that's where the exceptional byte values get enforced.
Notice that so far, I've talked about byte values, not characters. namei()
does not enforce any character semantics on the bytes. That's up to the user-level programs, like ls
, which might sort file names based on byte values, or character values. xterm
decides what pixels to light up for file names based on the character encoding. If you don't tell xterm
you've got UTF-8 encoded filenames, you'll see a lot of gibberish when you invoke it. If vim
isn't compiled to detect UTF-8 (or whatever, UTF-16, UTF-32) encodings, you'll see a lot of gibberish when you open a "text file" containing UTF-8 encoded characters.
Best Answer
It is all you need. You have got command and its arguments separated by null byte
\0
. Encoding of the characters is based on thelocale
, but it should not really matter.Do you have some specific example where you need help?