Bash – How does the shell filename expansion delimit items within a ( * ) list

bashshell

I don't understand shell expansion fully yet (hopefully, one day soon I will)…
I saw this comment to a superuser question, but I think I'm still parked at the kerb…

Using Linux without the shell is like driving a Ferrari at 50 km/h through city traffic. All fun will just go away …

I don't understand the following example.. What heirarchy, or whatever, is causing the 2nd example "array item count:" to be different to the 1st example?

What happened to the shell introduced "space"?. or is it the echo which is introducing the space, and the shell is (perhaps) using \0?

#!/bin/bash
# Make a couple of files whose names contain a space.
junkd=$HOME/junkd
mkdir $junkd # || exit 1
cd $junkd
touch f\ {1..2}
#
echo -n * |xxd         # This shows a space between the two names.
names=$(echo -n * )
echo -n "$names" |xxd  # This shows a space between the two names.
#
# So far, it seems that the shell is inserting a space between each filename.
#
array=( $names )
echo "array item count: ${#array[@]}" 
# 4 items... This shows that a space is the delimiter char ....
#
array=( * )
echo "array item count: ${#array[@]}" 
# 2 items... What happened to the shell introduced space?
#

Best Answer

From man bash

EXPANSION Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion. The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.

array=( $names )

The reason this gives you 4 entries is because an unquoted $names parameter is further subject to word splitting based on the internal field separator IFS which is by default <space><tab><newline>. If you were to quote "$names" to inhibit word splitting, then you'll only get one array element with value f 1 f 2, again not what you want.

array=( * )

The above on the other hand is only subject to pathname expansion which happens to be the last expansion performed. The results are not subject to word splitting thus you get the desired 2 elements.

If you want to make array=( $names ) work then you'll need to somehow separate the file names by a non-space character which also is not contained in the file names. You'll then need to set IFS to this character.

$ names=$(echo f* | sed "s/ /#/2")
$ echo $names
f 1#f 2
$ IFS='#' array=( $names )
$ echo ${#array[@]}
2
$ echo ${array[0]}
f 1

A more elegant way to do this would be to use the NUL byte \0 as the filename delimiter as that is guaranteed to never be apart of a filename. To accomplish this we will need to use the find command with its -print0 flag as well as the read builtin delimited on NUL. We well also need to clear IFS so no word splitting on spaces is performed.

#!/bin/bash

unset array

while IFS= read -r -d $'\0' name; do
  array+=( "$name" )
done < <(find . -type f -name "f*" -print0 )

Update

Expansion is performed on the command line after it has been split into words.

I can see how one would be confused by the quote above only to have it further state that word splitting is the 2nd to last expansion to occur.

A better way to word that quote in my opinion would be:

Expansion is performed on the command line after it has been split into arguments.

The splitting of arguments on the shell is always done by white space, and it's those arguments which are further subject to expansion. If you want to have white space in your argument you must either use Quoting or Escaping. IFS does not augment argument splitting, only word splitting.

Consider this example:

$ touch f{1,2}; IFS="#"; rm f1#f2
rm: cannot remove `f1#f2': No such file or directory

Notice how setting IFS to # did not alter the fact that the shell still only saw one argument f1#f2; which by the way is then further subject to the various expansions.

I would highly recommend your aquatint yourself with the BashFAQ if you haven't already. In particular, I would strongly suggest you read the following two supplemental entries:

  1. Arguments
  2. Word Splitting
Related Question