Why is the unit separator (ASCII 31) invisible in terminal output

asciiterminal

The unit separator ASCII character (ASCII 31, octal 37), is visible in Vim as a ^_. But if I print the same file to the terminal, the character is invisible. This causes the fields on a line to get stuck together:

# In Vim and less:

first field^_second field^_last field

# cat the same file to terminal:
cat delim.txt
first fieldsecond fieldlast field

# print 2nd field with awk 
cat delim.txt | awk 'BEGIN {FS = "\037"} {print $2}'
second field

I suppose I can make the unit separator visible with cat -v:

cat -v delim.txt
first field^_second field^_last field

But this is rather cumbersome. Why doesn't the unit separator have a visible representation when printed to stdout in the Bash shell? I can't even copy and paste the shell output correctly; the unit separator gets lost in the process.

Best Answer

The unit separator (US) character, also known as IS1, is in the cntrl character class and is not in the print character class. It is a control character that is intended for organizing text into groups, for programs that are designed to make use of that information. In general, non-printable characters are probably going to be interpreted and rendered differently in different programs or environments.

The reason you are seeing it represented as ^_ in Vim is because Vim is an interactive editor. It can freely render non-printable characters however it wants, as long as the correct binary character is written to disk.

You cannot get the same behavior in the shell because Unix shell programs are written to operate on and pass plain text to each other. When you cat a file, the text that is written to the terminal must be what is actually in the file.

So that leaves it to the terminal device to interpret the character. And it turns out that some terminal emulators do render the US character differently from others. In gnome-terminal (or any vte-based terminal), the character will be rendered as a box containing the hex code 001F. In xterm or rxvt, the character is indeed invisible.

Related Question