Bash – Expansion Doubt

bashshell

If I'm in a folder and I want to find the jpg files, I can run

find -type f -name "*jpg"

but I don't understand why I have to use doble quotes. I know without them, pathname expansion will mess up the find command (if there is more than one jpg file in the folder), but do not know exactly why. I don't get too why here the quotes do not avoid expansion by the shell.

Best Answer

Both double and single quotes prevent filename expansion in the shell.

Find is somewhat special because it recurses through all the levels of directory.

If you do not quote the -name option in a find command, the shell expands the name expression immediately, in the directory where you run find. That may match the files zero times, one time, or many times.

For no shell matches, the *jpg just gets put back into the command line and passed to find as an option.

For one match, the actual name (e.g. K3256.jpg) is passed to find, which will therefore only look for files in lower directories with that exact name.

For multiple matches, several names will be put into the find command line, and find will refuse to run because the syntax of the arguments will be wrong.

Find itself takes on the responsibility of expanding the wildcard within each directory it descends into. It does not want any misplaced help from the shell.

Shell removes the quotes before invoking the find as a new process. This avoids all programs having to deal with quotes, which are strictly part of shell syntax. When the child process sees the args, they have been converted to an array of null-terminated strings, and do not need further adornment.

The find command understands and actions filename expansion in exactly the same way that shell does. The difference is that find descends through all the levels of directory, and in each directory it reads the list of names in there, and matches every name at that level against the -name pattern.

Note also that the -type f option also does its work within each sub-directory: the directory entries contain that information too, so find has to deal with both the file type and the name match all over again at every branch of the directory tree.

Also see this answer

Related Solutions

Bash – using brace expansion to change filenames, not extensions

I found this just after I posted the question:

for f in file1.*; do mv "$f" "${f/file1/newfilename}"; done

Works like a charm.

Bash – How does the shell filename expansion delimit items within a ( * ) list

From man bash

EXPANSION Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion. The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.

array=( $names )

The reason this gives you 4 entries is because an unquoted $names parameter is further subject to word splitting based on the internal field separator IFS which is by default <space><tab><newline>. If you were to quote "$names" to inhibit word splitting, then you'll only get one array element with value f 1 f 2, again not what you want.

array=( * )

The above on the other hand is only subject to pathname expansion which happens to be the last expansion performed. The results are not subject to word splitting thus you get the desired 2 elements.

If you want to make array=( $names ) work then you'll need to somehow separate the file names by a non-space character which also is not contained in the file names. You'll then need to set IFS to this character.

$ names=$(echo f* | sed "s/ /#/2")
$ echo $names
f 1#f 2
$ IFS='#' array=( $names )
$ echo ${#array[@]}
2
$ echo ${array[0]}
f 1

A more elegant way to do this would be to use the NUL byte \0 as the filename delimiter as that is guaranteed to never be apart of a filename. To accomplish this we will need to use the find command with its -print0 flag as well as the read builtin delimited on NUL. We well also need to clear IFS so no word splitting on spaces is performed.

#!/bin/bash

unset array

while IFS= read -r -d $'\0' name; do
  array+=( "$name" )
done < <(find . -type f -name "f*" -print0 )

Update

Expansion is performed on the command line after it has been split into words.

I can see how one would be confused by the quote above only to have it further state that word splitting is the 2nd to last expansion to occur.

A better way to word that quote in my opinion would be:

Expansion is performed on the command line after it has been split into arguments.

The splitting of arguments on the shell is always done by white space, and it's those arguments which are further subject to expansion. If you want to have white space in your argument you must either use Quoting or Escaping. IFS does not augment argument splitting, only word splitting.

Consider this example:

$ touch f{1,2}; IFS="#"; rm f1#f2
rm: cannot remove `f1#f2': No such file or directory

Notice how setting IFS to # did not alter the fact that the shell still only saw one argument f1#f2; which by the way is then further subject to the various expansions.

I would highly recommend your aquatint yourself with the BashFAQ if you haven't already. In particular, I would strongly suggest you read the following two supplemental entries:

Best Answer

Related Solutions

Bash – using brace expansion to change filenames, not extensions

Bash – How does the shell filename expansion delimit items within a ( * ) list

Update

Related Question