To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS
variable to the empty string for the array parameter expansion.
This is documented in man bash
under Word Splitting:
Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.
(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do
echo "${i}: '${arr[${i}]}'"
done
)
Converting a standalone file
If you run the following command:
$ dos2unix <file>
The <file>
will have all the ^M characters stripped. If you want to leave <file>
intact, then simply run dos2unix
like this:
$ dos2unix -n <file> <newfile>
Parsing output from a command
If you need to do them as part of a chain of commands via a pipe, you can use any number of tools such as tr
, sed
, awk
, or perl
to do this.
tr
$ java -jar test.jar | tr -d '^M' >> test.log
sed
$ java -jar test.jar | sed 's/^M//g' >> test.log
awk
$ java -jar test.jar | awk 'sub(/^M/,"")' >> test.log
perl
$ java -jar test.jar | perl -p -e 's/^M//g' >> test.log
Typing ^M
When entering the ^M
be sure to enter it in one of the following ways:
- As Control + v + M and not Shift + 6 + M.
- As a backslash r, i.e. (
\r
).
- As an octal number (
\015
).
- As a hexidecimal number (
\x0D
).
Why is this necessary?
The ^M
is part of how end of lines are terminated on the Windows platform. Each end of line is terminated with a carriage return character followed by a linefeed character.
On Unix systems the end of line is terminated by just a linefeed character.
- linefeed character =
0x0A
in hex, also written as \n
.
- carriage return character =
0x0D
in hex, also written as \r
.
Examples
You can see these if you pipe the output to a tool such as od
or hexdump
. Here's a sample file with the line terminating carriage returns + linefeed characters.
$ cat sample.txt
hi there
bye there
You can see them with hexdump
as \r
+ \n
:
$ hexdump -c sample.txt
0000000 h i t h e r e \r \n b y e t h
0000010 e r e \r \n
0000015
Or as their hexidecimal 0d
+ 0a
:
$ hexdump -C sample.txt
00000000 68 69 20 74 68 65 72 65 0d 0a 62 79 65 20 74 68 |hi there..bye th|
00000010 65 72 65 0d 0a |ere..|
00000015
Running this through sed 's/\r//g'
:
$ sed 's/\r//g' sample.txt |hexdump -C
00000000 68 69 20 74 68 65 72 65 0a 62 79 65 20 74 68 65 |hi there.bye the|
00000010 72 65 0a |re.|
00000013
You can see that sed
has removed the 0d
character.
Viewing files with ^M without converting?
Yes you can use vim
to do this. You can either set the fileformat
setting in vim, which will have the effect of converting the file like we were doing above, or you can change the fileformat in the vim
view.
changing a file's format
:set fileformat=dos
:set fileformat=unix
You can use the shorthand notation too:
:set ff=dos
:set ff=unix
Alternatively you can just change the fileformat of the view. This approach is nondestructive:
:e ++ff=dos
:e ++ff=unix
Here you can see me opening our ^M
file, sample.txt
in vim
:
Now I'm converting the fileformat in the view:
Here's what it looks like when converted to the unix
fileformat:
References
Best Answer
Not really.
One solution is to reserve a character as the field separator. Obviously it will not be possible to include that character, whatever it is, in an option. Tab and newline are obvious candidates, if the source language makes it easy to insert them. I would avoid multibyte characters if you want portability (e.g. dash and BusyBox don't support multibyte characters).
If you rely on IFS splitting, don't forget to turn off wildcard expansion with
set -f
.Another approach is to introduce a quoting syntax. A very common quoting syntax is that a backslash protects the next character. The downside of using backslashes is that so many different tools use it as a quoting characters that it can sometimes be difficult to figure out how many backslashes you need.