Bash Terminal – Why \x0d\x20 Erases the Line

bashcatgrepterminal

This is the view from gedit editor:
enter image description here

and the view from vim editor:
enter image description here

I then try to grep it, it did grep successfully if i put Log instead of Tog, but the output is corrupted:

[xiaobai@xiaobai grep]$ grep  Tog test
[xiaobai@xiaobai grep]$ grep  Log test
                               Dtring.valueOf
[xiaobai@xiaobai grep]$ 

And then i cat the file, it's also corrupted:

[xiaobai@xiaobai grep]$ cat test 
                               Dtring.valueOf
[xiaobai@xiaobai grep]$ 

So i use hexdump:

[xiaobai@xiaobai grep]$ hexdump -C test 
00000000  4c 6f 67 2e 64 28 22 6d  75 73 69 63 22 2c 20 22  |Log.d("music", "|
00000010  4e 41 56 49 47 41 54 4f  52 3a 20 22 20 2b 20 53  |NAVIGATOR: " + S|
00000020  74 72 69 6e 67 2e 76 61  6c 75 65 4f 66 0d 20 20  |tring.valueOf.  |
00000030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 44 0d 0a  |             D..|
00000050
[xiaobai@xiaobai grep]$ 

I'm narrow down it:

[xiaobai@xiaobai grep]$ cat test3
                               D
[xiaobai@xiaobai grep]$ hexdump -C test3
00000000  61 0d 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |a.              |
00000010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000020  20 44 0d 0a                                       | D..|
00000024
[xiaobai@xiaobai grep]$ echo -e '\x61'
a
[xiaobai@xiaobai grep]$ echo -e '\x61\x0d'
a
[xiaobai@xiaobai grep]$ echo -e '\x61\x0d\x20'

[xiaobai@xiaobai grep]$ echo -e '\x61\x0d\x20\x62'
 b

As you can see, the 'a' erased after i appended one \x20 byte.

So my question is, why is that happening and how can i get rid of this without prior knowledge of some files might contains \x0d\x20, e.g. grep -r ?

Best Answer

Characters of code 0 to 31 in ASCII are control characters. When sent to a terminal, they're used to do special things. For instance, \a (BEL, 0x7) rings the terminal's bell. \b (BS, 0x8) moves the cursor backward. \n (LF, 0xa) moves the cursor one row down, \t (TAB 0x9) moves the cursor to the next tabulation...

\r (CR, 0xd) moves the cursor to the first column.

When you run at a shell prompt in a terminal:

printf 'foo\nbar\n'

printf writes foo\nbar\n to /dev/tty<something>, the tty line discipline of that device translates that to foo\r\nbar\r\n, which is why you see bar on the next line after foo.

printf 'foo\rbar\n'

Would have the terminal overwrite foo with bar.

If your file contains control characters, you could either remove them, or give them a textual representation (for instance ^M or \r for the CR 0xd character) if you want to check for their presence.

You may not want to do that for the LF and TAB characters though. So:

LC_ALL=C tr -d '\0-\10\13-\37\177' < file # to remove them

cat -v < file # to display as ^M

sed -n l < file # to display as \r (also converts TAB to \t)
                # and marks the end of lines with $

Note that those sed and cat ones would also transform non-ASCII characters. You could do instead:

LC_ALL=C sed "$(printf 's/[^\t -\176\200-\377]/^&/g')" < file |
  LC_ALL=C tr '\0-\10\13-\37\177' '@-HK-_?'

To only convert the ASCII control characters (except TAB and LF) to their ^X visual form (note though that not all sed implementations support input files with NUL characters in them).