Issue #1: grepping "Flyers: Video Center"... I don't see the result :
In the hexadecimal dump of the file, notice the two bytes C2A0 between the words Flyers: and Video. This is a the UTF8 encoding for Non-breaking space. grepping NBSP is known to fail For more information, read How to remove special 'M-BM-' character with sed and use sed to replace ...Hex c2a0. Short answer is:
sed -i.bak -e 's/\xc2\xa0/ /' /path/to/file
Issue #2 `America’s' shows as 'Americaâs' (??):
Here, the dump contains three bytes e28099, known as RIGHT SINGLE QUOTATION MARK (’). Actually, there should be no problem here ! You probably got distracted by the problem above (could you confirm?)
If you use grep
, sed
and other tools with expression that respect your locale (UTF8!), then it will work:
printf 'America\xe2\x80\x99s\n' | grep --only-matching "[[:punct:]]"
printf 'America\xe2\x80\x99s\n' | sed -e "s/[[:punct:]]/?/"
If you want to get rid of all those UTF-8 "special" characters, use can use the tips above or iconv
(but nowadays, there are few excuses not to support UTF8).
Drop all non-ascii chars:
type a.txt | iconv -f utf8 -t ASCII//TRANSLIT
Or to preserve chars from one locale:
type a.txt | iconv -f utf8 -t iso8859-15//TRANSLIT | iconv -f iso8859-15 -t utf8
ls
will print non-ASCII characters (or rather, characters not supported in the current locale) as ?
. This is one of the reasons why parsing the output of ls
is a bad thing to do. The output from ls
is meant to be looked at. In some cases, like this, those are not the actual names that exist in the filesystem.
Try instead something like (these will delete all files and directories, including /path/to/dir
)
rm -rf /path/to/dir
or
find /path/to/dir -delete
or
find /path/to/dir -exec rm -rf {} +
or
find /path/to/dir -print0 | xargs -0 rm -rf
Modify to fit your needs. To only delete files, add -type f
after the path in the find
examples, for example.
Doing just rm -rf *
inside that directory (that's important, the current working directory must be the directory whose files and directories you want to delete) may also be enough.
See also Why not parse ls
?
Best Answer
It is known as carriage return.
If you're using
vim
you can enter insert mode and type CTRL-v CTRL-m. That ^M is the keyboard equivalent to \r.Inserting
0x0D
in a hex editor will do the task.How do I remove it?
You can remove it using the command
As the OP suggested in the comments of this answer here, you can even try a `
and see if that fixes it.
As @steeldriver suggests in the comments, after opening the vim editor, press esc key and type
:set ff=unix
.References
https://stackoverflow.com/questions/1585449/insert-the-carriage-return-character-in-vim
https://stackoverflow.com/a/7742437/1742825
-ksh: revenue_ext.ksh: not found [No such file or directory]