Consider the following example:
IFS=:
x="a :b" # three spaces
echo ["$x"] # no word splitting
# [a :b] # as is
echo [$x] # word splitting
# [a b] # four spaces
Word splitting identifies the the words "a "
(three spaces) and "b"
, separated by the colon, then echo
joins the words with a space in the middle.
However, when using the value of $x
as a function argument, I find it difficult to interpret the results.
args(){ echo ["$*"];}
args a :b # three spaces
# [a::b]
and:
args(){ echo [$*];}
args a :b # three spaces
# [a b] # two spaces
$*
expands to the value of all the positional parameters combined. Also, "$*"
is equivalent to "$1c$2"
, where c
is the first character of the value of the IFS variable.
args(){ echo ["$1"]["$2"]; }
args a :b # three spaces
# [a][:b]
and:
args(){ echo [$1][$2]; }
args a :b # three spaces
# [a][ b]
Word splitting should always occur when there are unquoted expansions. Here "$1"
and $1
are the same and in both cases they do not use the :
delimiter. [$2]
-> [ b]
is also unclear.
Probably, before applying IFS-splitting, other tokenization rules are used, but I was unable to find them.
Best Answer
Word splitting only applies to unquoted expansions (parameter expansion, arithmetic expansion and command substitution) in modern Bourne-like shells (in
zsh
, only command substitution unless you use an emulation mode).When you do:
Word splitting is not involved at all.
It's the shell parsing that tokenises those, finds the first one is not one of its keywords and so it's a simple command with 3 arguments:
args
,a
and:b
. The amount of space won't make any difference there. Note that it's not only spaces, also tabs, and in some shells (likeyash
orbash
) any character considered as blank in you locale (though in the case ofbash
, not the multibyte ones)¹.Even in the Bourne shell where word splitting also applied to unquoted arguments of commands regardless of whether they were the result of expansions or not, that would be done on top (long after) the tokenising and syntax parsing.
In the Bourne shell, in
That would not parse that as:
But first as a
while
with a simple command and theedit
word (as it's an argument but not thebid=did
word which is an assignment) of that simple command would be further split intoed
andt
so that theed
command with the 3 argumentsed
,t
andfoo
would be run as the condition of thatwhile
loop.Word splitting is not part of the syntax parsing. It's like an operator that is applied implicitly to arguments (also in
for
loop words, arrays and with some shell the target of redirections and a few other contexts) for the parts of them that are not quoted. What's confusing is that it's done implicitly. You don't docmd split($x)
, you docmd $x
and thesplit()
(actuallyglob(split())
) is implied. Inzsh
, you have to request it explicitly for parameter expansions (split($x)
is$=x
there ($=
looking like a pair of scissors)).So, now, for your examples:
a
and:b
arguments ofargs
joined with the first character of$IFS
which givesa::b
(note that it's a bad idea of using[...]
here as it's a globbing operator).$*
(which containsa::b
) is split intoa
, the empty string andb
. So it's:no surprise as not word splitting.
That's like:
as
$2
(:b
) would be split into the empty string andb
.One case where you will see variations between implementations is when
$IFS
is empty.In:
In some shells (most nowadays), you see
And not
<ab>
even though"$*"
would expand toab
. Those shells still separate thosea
andb
position parameters and that has now been made a POSIX requirement in the latest version of the standard.If you did:
you'd see
<ab>
as the information thata
andb
were 2 separate arguments was lost when assigned to$var
.¹, of course, it's not only blanks that delimit words. Special tokens in the shell syntax do as well, the list of which depends on the context. In most contexts,
|
,||
,&
,;
, newline,<
,>
,>>
... delimit words. Inksh93
for instance, you can write a blank-less command like: