Bash Expansion Asymmetry – Opening and Creating Files

bashbrace-expansionfileswildcards

I have come upon the following example of asymmetry in a Bash regex that is confusing me. I would like to know what I am doing which is non-standard and causes this behaviour, or else what is the logic behind this behaviour which I am missing.

Opening files

Suppose I have a directory with a list of files called file1.txt to file20.txt. I would like to open all of them in my favourite text editor. To do this, in some sense Bash has to "read" the contents of the directory and pass them to Vim. I can achieve to do this with the following regex:

vim file{[1-9],1[0-9],20}.txt

This works. After executing this command, Vim opens and in the buffer list I can see all the files file1.txt to file20.txt.

Creating files

Now suppose we are in a different scenario: we start from an empty directory, and we want to create the files file1.txt to file20.txt. To do this, in some sense Bash has to "write" the names of the files to the directory. Unfortunately, in this scenario the previous command does not work. Instead of creating the desired 20 files, I end up with the following files in the buffer list:

file[1-9].txt
file[0-9].txt
file20.txt

So instead of interpreting the square brackets [] as part of the regex, they have been incorporated into the name.

Why does this asymmetry take place when reading vs writing and how can I avoid it in the future?

Best Answer

What you are using are not regular expressions, but a combination of brace expansion and filename expansion (a.k.a globbing). That is important because while brace expansion simply expands the string containing the { ... } construct into several different strings, the globbing part actually tries to match existing files to the pattern. This is where the problem lies (btw, even regular expressions are used for matching an existing string to the pattern, not generating a string according to the pattern).

In particular, note that brace expansion is performed before filename expansion.

So

file{[1-9],1[0-9],20}.txt

is expanded by the shell into the three space-separated tokens

file[1-9].txt file1[0-9].txt file20.txt

which are then subject to the actual filename expansion in which the shell checks which of the existing files match that glob pattern. The important part is that if no file matches one of the patterns, the patterns are taken literally.

So in the case of your opening, what happens is that

  1. vim file{[1-9],1[0-9],20}.txt is expanded to vim file[1-9].txt file1[0-9].txt file20.txt
  2. vim file[1-9].txt file1[0-9].txt file20.txt is expanded to vim file1.txt file2.txt ... file20.txt because all of these files exists (it would not be expanded to any non-existing files in that number range)
  3. vim opens all these files.

However, when using e.g. touch with the same arguments to create non-existing files, what happens is that

  1. touch file{[1-9],1[0-9],20}.txt is expanded to touch file[1-9].txt file1[0-9].txt file20.txt
  2. Since no file matches that pattern, the [1-9], 1[0-9] and 20 remain literally
  3. touch creates these three files with the names taken literally.

If you want to avoid that, and since you want to create all files in that range, you could simply limit your command-line to the brace expansion, i.e.

touch file{1..20}.txt

(as also noted in a comment by pLumo)


As a sidenote (suggested by @Quasimodo), in bash and many other shells the globbing behavior can be adjusted via shell options, in bash specifically using shopt -s option.

Here, in particular the nullglob option is interesting as it will make the shell expand a globbing pattern that does not match any filenames to the empty string instead of keeping the pattern literally in it. This is particularly useful if you want to iterate over all files that match a pattern using a for loop:

  • Without the nullglob option, a loop of the form
    for f in *.txt
    
    would execute exactly once with $f set to the literal *.txt if no .txt were present in the current directory, which can lead to unexpected behavior (i.e. code trying to operate on a non-existant file)
  • With the nullglob option, the shell would not enter the loop body at all.

On the other hand (as noted correctly by @Barmar), many programs that operate on files will silently try to read from stdin if you supply them with a glob pattern that evaluates to "nothing" because no filenames match, so using this option can have strange side-effects if you are not careful.

In addition nullglob, Bash has the failglob option, which will give an error instead of running the command if there is a glob that doesn't match anything.

Related Question