Shell Wildcards – Why Nullglob is Not Default

shellwildcards

In most shells nullglob isn't the default. That means, for example, if you run this command

ls *

in an empty directory, it will expand the * glob to a literal *, instead to an empty list of arguments. There are ways to change that behaviour, so that * in an empty directory will return an empty list of arguments, which would seem more intuitive.

So, is there a reason why nullglob is disabled by default? If so, what is that reason?

Best Answer

The nullglob option (which BTW is a zsh invention, only added years later to bash (2.0)) would not be ideal in a number of cases. And ls is a good example:

ls *.txt

Or its more correct equivalent:

ls -- *.txt

With nullglob on would run ls with no argument which is treated as ls -- . (list the current directory) if no files match, which is probably worse than calling ls with a literal *.txt as argument.

You'd have similar problems with most text utilities:

grep foo *.txt

Would look for foo on stdin if there's no txt file.

A more sensible default, and the one of csh, tcsh, zsh or fish 2.3+ (and of early Unix shells) is to cancel the command altogether if the glob doesn't match.

bash (since version 3) has a failglob option for that (interesting to this discussion, since contrary to ash, AT&T ksh or zsh, bash doesn't support local scopes for options (though that's to change in 4.4), that option when enabled globally does break a few things like the bash-completion functions).

Note that csh and tcsh are slightly different from zsh, fish or bash -O failglob in cases like:

ls -- *.txt *.html

Where you need all the globs to not-match for the command to be cancelled. For instance, if there's one txt file and no html file, that becomes:

ls -- file.txt

You can get that behaviour with zsh with setopt cshnullglob though a more sensible way to do it in zsh would be to use a glob like:

ls -- *.(txt|html)

In zsh and ksh93, you can also apply nullglob on a per-glob basis, which is a lot saner approach than modifying a global setting:

files=(*.txt(N))  # zsh
files=(~(N)*.txt) # ksh93

would create an empty array if there's no txt file instead of failing the command with an error (or making it an array with one *.txt literal argument with other shells).

Versions of fish prior to 2.3 would work like bash -O nullglob but give a warning when interactive when a glob has no match. Since 2.3, it works like zsh except for globs used in for, set or count.

Now, on the history note, the behaviour was actually broken by the Bourne shell. In prior versions of Unix, globbing was done via the /etc/glob helper and that helper behaved like csh: it would fail the command if none of the globs matched any file and remove the globs with no match otherwise.

So the situation we're in today is due to a bad decision made in the Bourne shell.

Note that the Bourne shell (and the C shell) came with another new Unix feature: the environment. That meant variable expansion (it's predecessor only had the $1, $2... positional parameters). The Bourne shell also introduced command substitution.

Another poor design decision of the Bourne shell was to perform globbing (and splitting) upon the expansion of variables and command substitution (possibly for backward compatibility with the Thompson shell where echo $1 would still invoke /etc/glob if $1 contained wildcards (it was more like pre-processor macro expansion there, as in the expanded value was parsed again as shell code)).

Failing globs that don't match would mean for instance that:

pattern='a.*b'
grep $pattern file

would fail the command (unless there are some a.whateverb files in the current directory). csh (which also performs globbing upon variable expansion) does fail the command in that case (and I'd argue it's better than leaving a dormant bug there, even if it's not as good as not doing globbing at all like in zsh).

Related Solutions

Word Splitting in Shell – What is Word Splitting and Its Importance in Shell Programming

Early shells had only a single data type: strings. But it is common to manipulate lists of strings, typically when passing multiple file names as arguments to a program. Another common use case for splitting is when a command outputs a list of results: the command's output is a string, but the desired data is a list of strings. To store a list of file names in a variable, you would put spaces between them. Then a shell script like this

files="foo bar qux"
myprogram $files

called myprogram with three arguments, as the shell split the string $files into words. At the time, spaces in file names were either forbidden or widely considered Not Done.

The Korn shell introduced arrays: you could store a list of strings in a variable. The Korn shell remained compatible with the then-established Bourne shell, so bare variable expansions kept undergoing word splitting, and using arrays required some syntactic overhead. You would write the snippet above

files=(foo bar qux)
myprogram "${files[@]}"

Zsh had arrays from the start, and its author opted for a saner language design at the expense of backward compatibility. In zsh (under the default expansion rules) $var does not perfom word splitting; if you want to store a list of words in a variable, you are meant to use an array; and if you really want word splitting, you can write $=var.

files=(foo bar qux)
myprogram $files

These days, spaces in file names are something you need to cope with, both because many users expect them to work and because many scripts are executed in security-sensitive contexts where an attacker may be in control of file names. So automatic word splitting is often a nuisance; hence my general advice to always use double quotes, i.e. write "$foo", unless you understand why you need word splitting in a particular use case. (Note that bare variable expansions undergo globbing as well.)

Shell – Filename Pattern That Expands to Dot Files but Not ‘..’

Bash, ksh and zsh have better solutions, but in this answer I assume a POSIX shell.

The pattern .[!.]* matches all files that begin with a dot followed by a non-dot character. (Note that [^.] is supported by some shells but not all, the portable syntax for character set complement in wildcard patterns is [!.].) It therefore excludes . and .., but also files that begin with two dots. The pattern ..?* handles files that begin with two dots and aren't just ...

chown -R root .[!.]* ..?*

This is the classical pattern set to match all files:

* .[!.]* ..?*

A limitation of this approach is that if one of the patterns matches nothing, it's passed to the command. In a script, when you want to match all files in a directory except . and .., there are several solutions, all of them cumbersome:

Use * .* to enumerate all entries, and exclude . and .. in a loop. One or both of the patterns may match nothing, so the loop needs to check for the existence of each file. You have the opportunity to filter on other criteria; for example, remove the -h test if you want to skip dangling symbolic links.
```
for x in * .*; do
  case $x in .|..) continue;; esac
  [ -e "$x" ] || [ -h "$x" ] || continue
  somecommand "$x"
done
```
A more complex variant where the command is run only once. Note that the positional parameters are clobbered (POSIX shells don't have arrays); put this in a separate function if this is an issue.
```
set --
for x in * .[!.]* ..?*; do
  case $x in .|..) continue;; esac
  [ -e "$x" ] || [ -h "$x" ] || continue
  set -- "$@" "$x"
done
somecommand "$@"
```
Use the * .[!.]* ..?* triptych. Again, one or more of the patterns may match nothing, so we need to check for existing files (including dangling symbolic links).
```
for x in * .[!.]* ..?*; do
  [ -e "$x" ] || [ -h "$x" ] || continue
  somecommand "$x"
done
```
Use the * .[!.]* ..?* tryptich, and run the command once per pattern but only if it matched something. This runs the command only once. Note that the positional parameters are clobbered (POSIX shells don't have arrays), put this in a separate function if this is an issue.
```
set -- *
[ -e "$1" ] || [ -h "$1" ] || shift
set -- .[!.]* "$@"
[ -e "$1" ] || [ -h "$1" ] || shift
set -- ..?* "$@"
[ -e "$1" ] || [ -h "$1" ] || shift
somecommand "$@"
```
Use find. With GNU or BSD find, avoiding recursion is easy with the options -mindepth and -maxdepth. With POSIX find, it's a little trickier, but can be done. This form has the advantage of easily allowing to run the command a single time instead of once per file (but this is not guaranteed: if the resulting command is too long, the command will be run in several batches).
```
find . -name . -o -exec somecommand {} + -o -type d -prune
```

Best Answer

Related Solutions

Word Splitting in Shell – What is Word Splitting and Its Importance in Shell Programming

Shell – Filename Pattern That Expands to Dot Files but Not ‘..’

Related Question