In the shell, you need to distinguish filename generation/expansion (aka globbing: a pattern that expands to a list of files) from pattern matching. globbing uses pattern matching internally, but it's really before all an operator to generate a list of files based on a pattern.
*/*.txt
is a pattern which matches a sequence of 0 or more characters, followed by /
, followed by a sequence of zero or more characters, followed by .txt
. When used as a shell pattern as in:
case $file in
*/*.txt) echo match
esac
It will match on file=.foo/bar/baz.txt
.
However, */*.txt
as a glob is something related but more complex.
In expanding */*.txt
into a list of files, the shell will open the current directory, list its content, find the non-hidden files of type directory (or symlink to directory) that match *
, sort that list, open each of those, list their content, and find the non-hidden ones that match *.txt
.
It will never expand .foo/bar/bar.txt
even though that matches the pattern because that's not how it works. On the other hand, the file paths generated by a glob will all match that pattern.
Similarly, a glob like foo[a/b]baz*
will find all the files whose name starts with b]baz
in the foo[a
directory.
So, we've seen already that for globbing, but not for pattern matching, /
is special (globs are somehow split on /
and each part treated separately) and dot-files are treated specially.
Shell globbing and pattern matching are part of the shell syntax. It's intertwined with quoting and other forms of expansion.
$ bash -c 'case "]" in [x"]"]) echo true; esac'
true
Quoting that ]
removes its special meaning (of closing the previous [
):
It can be even more confusing when you mix everything:
$ ls
* \* \a x
$ p='\*' ksh -xc 'ls $p'
+ ls '\*' '\a'
\* \a
OK \*
is all the files starting with \
.
$ p='\*' bash -xc 'ls $p'
+ ls '\*'
\*
It's not all the files starting with \
. So, somehow, \
must have escaped the *
, but then again it's not matching *
either...
For find, it's a lot simpler. find
descends the directory tree at each of the file argument it receives and then do the tests as instructed for each encountered file.
For -type f
, that's true if the file is a regular file, false otherwise for -name <some-pattern>
, that's true if the name of the currently considered file matches the pattern, false otherwise. There's no concept of hidden file or /
handling or shell quoting here, that's just matching a string (the name of the file) against a pattern.
So for instance, -name '*foo[a/b]ar'
(which passes -name
and *foo[a/b]ar
arguments to find
) will match foobar
and .fooaar
. It will never match foo/bar
, but that's because -name
matches on the file name; it would with -path
instead.
Now, there is one form of quoting/escaping -- for find
-- recognised here, and that's only with backslash. That allows to escape operators. For the shell, it's done as part of the usual shell quoting (\
is one of the shell's quoting mechanisms). For find
(fnmatch()
), that's part of the pattern syntax.
For instance, -name '\**'
would match on files whose name starts with *
. -name '*[\^x]*'
would match on files whose name contains ^
or x
...
Now, as for the different operators recognised by find
, fnmatch()
, bash
and various other shells, they should all agree at least on a common subset: *
, ?
and [...]
.
Whether a particular shell or find
implementation uses the system's fnmatch()
function or their own is up to the implementation. GNU find
does at least on GNU systems. Shells are very unlikely to use them as it would make things complicated for them and not worth the effort.
bash
certainly doesn't. Modern shells like ksh, bash, zsh also have extensions over *
, ?
, [...]
and a number of options and special parameters (GLOBIGNORE
/FIGNORE
) to affect their globbing behaviour.
Also note that beside fnmatch()
which implements shell pattern matching, there's also the glob()
function that implements something similar to shell globbing.
Now, there can be subtle differences between the pattern matching operators in those various implementations.
For instance, for GNU fnmatch()
, ?
, *
or [!x]
would not match a byte or sequence of bytes that don't form valid characters while bash
(and most other shells) would. For instance, on a GNU system, find . -name '*'
may fail to match files whose name contains invalid characters, while bash -c 'echo *'
will list them (as long as they don't start with .
).
We've mentioned already the confusion that can be incurred by quoting.
They are called brace expansion.
It is one of several expansions done by bash
, zsh
and ksh
, filename expansion *.txt
being another one of them. Brace expansion is not covered by the POSIX standard and is thus not portable.
You can read on this in bash manual.
On @Arrow's suggestion: in order to get cat test.pdf test.pdf test.pdf
with brace expansion alone, you would have to use this "hack":
#cat test.pdf test.pdf
cat test.pdf{,}
#cat test.pdf test.pdf test.pdf
cat test.pdf{,,}
#cat test.pdf test.pdf test.pdf test.pdf
cat test.pdf{,,,}
Some common uses:
for index in {1..10}; do
echo "$index"
done
touch test_file_{a..e}.txt
Or another "hack" to print a string 10 times:
printf -- "mystring\n%0.s" {1..10}
Be aware that brace expansion in bash
is done before parameter expansion, therefore a common mistake is:
num=10
for index in {1..$num}; do
echo "$index"
done
(the ksh93
shell copes with this though)
Best Answer
From man bash
The reason this gives you 4 entries is because an unquoted
$names
parameter is further subject to word splitting based on the internal field separatorIFS
which is by default<space><tab><newline>
. If you were to quote"$names"
to inhibit word splitting, then you'll only get one array element with valuef 1 f 2
, again not what you want.The above on the other hand is only subject to pathname expansion which happens to be the last expansion performed. The results are not subject to word splitting thus you get the desired 2 elements.
If you want to make
array=( $names )
work then you'll need to somehow separate the file names by a non-space character which also is not contained in the file names. You'll then need to set IFS to this character.A more elegant way to do this would be to use the NUL byte
\0
as the filename delimiter as that is guaranteed to never be apart of a filename. To accomplish this we will need to use thefind
command with its-print0
flag as well as theread
builtin delimited on NUL. We well also need to clear IFS so no word splitting on spaces is performed.Update
I can see how one would be confused by the quote above only to have it further state that word splitting is the 2nd to last expansion to occur.
A better way to word that quote in my opinion would be:
The splitting of arguments on the shell is always done by white space, and it's those arguments which are further subject to expansion. If you want to have white space in your argument you must either use Quoting or Escaping.
IFS
does not augment argument splitting, only word splitting.Consider this example:
Notice how setting
IFS
to#
did not alter the fact that the shell still only saw one argumentf1#f2
; which by the way is then further subject to the various expansions.I would highly recommend your aquatint yourself with the BashFAQ if you haven't already. In particular, I would strongly suggest you read the following two supplemental entries: