Shell – How to Protect an IFS Character from Field Splitting

bourne-shellposixshell

In a POSIX sh, or in the Bourne shell (as in Solaris 10's /bin/sh), is it possible to have something like:

a='some var with spaces and a special space'
printf "%s\n" $a

And, with the default IFS, get:

some
var
with
spaces
and
a
special space

That is, protect the space between special and space by some combination of quoting or escaping?

The number of words in a isn't known beforehand, or I'd try something like:

a='some var with spaces and a special\ space'
printf "%s\n" "$a" | while read field1 field2 ...

The context is this bug reported in Cassandra, where OP tried to set an environment variable specifying options for the JVM:

export JVM_EXTRA_OPTS='-XX:OnOutOfMemoryError="echo oh_no"'

In the script executing Cassandra, which has to support POSIX sh and Solaris sh:

JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
#...
exec $NUMACTL "$JAVA" $JVM_OPTS $cassandra_parms -cp "$CLASSPATH" $props "$class"

IMO the only way out here is to use a script wrapping the echo oh_no command. Is there another way?

Best Answer

Not really.

One solution is to reserve a character as the field separator. Obviously it will not be possible to include that character, whatever it is, in an option. Tab and newline are obvious candidates, if the source language makes it easy to insert them. I would avoid multibyte characters if you want portability (e.g. dash and BusyBox don't support multibyte characters).

If you rely on IFS splitting, don't forget to turn off wildcard expansion with set -f.

tab=$(printf '\t')
IFS=$tab
set -f
exec java $JVM_EXTRA_OPTS …

Another approach is to introduce a quoting syntax. A very common quoting syntax is that a backslash protects the next character. The downside of using backslashes is that so many different tools use it as a quoting characters that it can sometimes be difficult to figure out how many backslashes you need.

set java
eval 'set -- "$@"' $(printf '%s\n' "$JVM_EXTRA_OPTS" | sed -e 's/[^ ]/\\&/g' -e 's/\\\\/\\/g') …
exec "$@"