This is the view from gedit editor:
and the view from vim editor:
I then try to grep it, it did grep successfully if i put Log instead of Tog, but the output is corrupted:
[xiaobai@xiaobai grep]$ grep Tog test
[xiaobai@xiaobai grep]$ grep Log test
Dtring.valueOf
[xiaobai@xiaobai grep]$
And then i cat the file, it's also corrupted:
[xiaobai@xiaobai grep]$ cat test
Dtring.valueOf
[xiaobai@xiaobai grep]$
So i use hexdump:
[xiaobai@xiaobai grep]$ hexdump -C test
00000000 4c 6f 67 2e 64 28 22 6d 75 73 69 63 22 2c 20 22 |Log.d("music", "|
00000010 4e 41 56 49 47 41 54 4f 52 3a 20 22 20 2b 20 53 |NAVIGATOR: " + S|
00000020 74 72 69 6e 67 2e 76 61 6c 75 65 4f 66 0d 20 20 |tring.valueOf. |
00000030 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000040 20 20 20 20 20 20 20 20 20 20 20 20 20 44 0d 0a | D..|
00000050
[xiaobai@xiaobai grep]$
I'm narrow down it:
[xiaobai@xiaobai grep]$ cat test3
D
[xiaobai@xiaobai grep]$ hexdump -C test3
00000000 61 0d 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |a. |
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000020 20 44 0d 0a | D..|
00000024
[xiaobai@xiaobai grep]$ echo -e '\x61'
a
[xiaobai@xiaobai grep]$ echo -e '\x61\x0d'
a
[xiaobai@xiaobai grep]$ echo -e '\x61\x0d\x20'
[xiaobai@xiaobai grep]$ echo -e '\x61\x0d\x20\x62'
b
As you can see, the 'a' erased after i appended one \x20 byte.
So my question is, why is that happening and how can i get rid of this without prior knowledge of some files might contains \x0d\x20, e.g. grep -r ?
Best Answer
Characters of code 0 to 31 in ASCII are control characters. When sent to a terminal, they're used to do special things. For instance,
\a
(BEL, 0x7) rings the terminal's bell.\b
(BS, 0x8) moves the cursor backward.\n
(LF, 0xa) moves the cursor one row down,\t
(TAB 0x9) moves the cursor to the next tabulation...\r
(CR, 0xd) moves the cursor to the first column.When you run at a shell prompt in a terminal:
printf
writesfoo\nbar\n
to/dev/tty<something>
, the tty line discipline of that device translates that tofoo\r\nbar\r\n
, which is why you seebar
on the next line afterfoo
.Would have the terminal overwrite
foo
withbar
.If your file contains control characters, you could either remove them, or give them a textual representation (for instance
^M
or\r
for the CR 0xd character) if you want to check for their presence.You may not want to do that for the LF and TAB characters though. So:
Note that those
sed
andcat
ones would also transform non-ASCII characters. You could do instead:To only convert the ASCII control characters (except TAB and LF) to their
^X
visual form (note though that not allsed
implementations support input files with NUL characters in them).