Bash Scripting – Word Splitting in Positional Parameters

argumentsbash-expansionparametervariablevariable substitution

Consider the following example:

IFS=:
x="a   :b"   # three spaces
echo ["$x"]  # no word splitting
# [a   :b]   # as is
echo [$x]    # word splitting 
# [a    b]   # four spaces

Word splitting identifies the the words "a " (three spaces) and "b", separated by the colon, then echo joins the words with a space in the middle.
However, when using the value of $x as a function argument, I find it difficult to interpret the results.

args(){ echo ["$*"];}
args a   :b  # three spaces
# [a::b]

and:

args(){ echo [$*];}
args a   :b  # three spaces
# [a  b]     # two spaces

$* expands to the value of all the positional parameters combined. Also, "$*" is equivalent to "$1c$2", where c is the first character of the value of the IFS variable.

args(){ echo ["$1"]["$2"]; }
args a   :b  # three spaces
# [a][:b]

and:

args(){ echo [$1][$2]; }
args a   :b  # three spaces
# [a][ b]

Word splitting should always occur when there are unquoted expansions. Here "$1" and $1 are the same and in both cases they do not use the : delimiter. [$2] -> [ b] is also unclear.

Probably, before applying IFS-splitting, other tokenization rules are used, but I was unable to find them.

Best Answer

Word splitting only applies to unquoted expansions (parameter expansion, arithmetic expansion and command substitution) in modern Bourne-like shells (in zsh, only command substitution unless you use an emulation mode).

When you do:

args a    :b

Word splitting is not involved at all.

It's the shell parsing that tokenises those, finds the first one is not one of its keywords and so it's a simple command with 3 arguments: args, a and :b. The amount of space won't make any difference there. Note that it's not only spaces, also tabs, and in some shells (like yash or bash) any character considered as blank in you locale (though in the case of bash, not the multibyte ones)¹.

Even in the Bourne shell where word splitting also applied to unquoted arguments of commands regardless of whether they were the result of expansions or not, that would be done on top (long after) the tokenising and syntax parsing.

In the Bourne shell, in

IFS=i
while bib=did edit foo

That would not parse that as:

"wh" "le b" "b=d" "d ed" "t foo"

But first as a while with a simple command and the edit word (as it's an argument but not the bid=did word which is an assignment) of that simple command would be further split into ed and t so that the ed command with the 3 arguments ed, t and foo would be run as the condition of that while loop.

Word splitting is not part of the syntax parsing. It's like an operator that is applied implicitly to arguments (also in for loop words, arrays and with some shell the target of redirections and a few other contexts) for the parts of them that are not quoted. What's confusing is that it's done implicitly. You don't do cmd split($x), you do cmd $x and the split() (actually glob(split())) is implied. In zsh, you have to request it explicitly for parameter expansions (split($x) is $=x there ($= looking like a pair of scissors)).

So, now, for your examples:

args(){ echo ["$*"];}
args a   :b  # three spaces
# [a::b]

a and :b arguments of args joined with the first character of $IFS which gives a::b (note that it's a bad idea of using [...] here as it's a globbing operator).

args(){ echo [$*];}
args a   :b  # three spaces
# [a  b]     # two spaces

$* (which contains a::b) is split into a, the empty string and b. So it's:

echo '[a' '' 'b]'

args(){ echo ["$1"]["$2"]; }
args a   :b  # three spaces
# [a][:b]

no surprise as not word splitting.

args(){ echo [$1][$2]; }
args a   :b  # three spaces
# [a][ b]

That's like:

 echo '[a]' '[' 'b]'

as $2 (:b) would be split into the empty string and b.

One case where you will see variations between implementations is when $IFS is empty.

In:

set a b
IFS=
printf '<%s>\n' $*

In some shells (most nowadays), you see

<a>
<b>

And not <ab> even though "$*" would expand to ab. Those shells still separate those a and b position parameters and that has now been made a POSIX requirement in the latest version of the standard.

If you did:

set a b
IFS=
var="$*" # note that the behaviour for var=$* is unspecified
printf '<%s>\n' $var

you'd see <ab> as the information that a and b were 2 separate arguments was lost when assigned to $var.

¹, of course, it's not only blanks that delimit words. Special tokens in the shell syntax do as well, the list of which depends on the context. In most contexts, |, ||, &, ;, newline, <, >, >>... delimit words. In ksh93 for instance, you can write a blank-less command like:

while({([[(:)]])})&&((1||1))do(:);uname<&2|tee>(rev)file;done

Best Answer

Related Solutions

Shell – How to unset the positional parameters

Bash – Function caller positional parameters

Related Question