Bash Wildcards – Extended Glob: Difference in Syntax Between ?(list), *(list), +(list) and @(list)

bashwildcards

I have a question after reading about extended glob.

After using shopt -s extglob,

What is the difference in the following?

?(list): Matches zero or one occurrence of the given patterns.

*(list): Matches zero or more occurrences of the given patterns.

+(list): Matches one or more occurrences of the given patterns.

@(list): Matches one of the given patterns.

Yes, I have read the above description that accompanies them, but for practical purpose, I can't see situations where people would prefer ?(list) over *(list). That is, I don't see any difference.

I've tried the following:

$ ls
> test1.in test2.in test1.out test2.out`

$ echo *(*.in)
> test1.in test2.in

$ echo ?(*.in)
> test1.in test2.in

I'd expect $ echo ?(*.in) to output test1.in only, from the description, but it does not appear to be the case. Thus, could anyone give an example where it makes a difference regarding the type of extended glob used?

Source: http://mywiki.wooledge.org/BashGuide/Patterns#Extended_Globs

Best Answer

$ shopt -s extglob
$ ls
abbc  abc  ac
$ echo a*(b)c
abbc abc ac
$ echo a+(b)c
abbc abc
$ echo a?(b)c
abc ac
$ echo a@(b)c
abc

Related Solutions

Bash Wildcards – Difference Between [[ $a == z* ]] and [ $a == z* ]?

The difference between [[ … ]] and [ … ] is mostly covered in Why does parameter expansion with spaces without quotes work inside double brackets "[[" but not inside single brackets "["?. Crucially, [[ … ]] is special syntax, whereas [ is a funny-looking name for a command. [[ … ]] has special syntax rules for what's inside, [ … ] doesn't.

With the added wrinkle of a wildcard, here's how [[ $a == z* ]] is evaluated:

Parse the command: this is the [[ … ]] conditional construct around the conditional expression $a == z*.
Parse the conditional expression: this is the == binary operator, with the operands $a and z*.
Expand the first operand into the value of the variable a.
Evaluate the == operator: test if the value of the variable a matches the pattern z*.
Evaluate the conditional expression: its result is the result of the conditional operator.
The command is now evaluated, its status is 0 if the conditional expression was true and 1 if it was false.

Here's how [ $a == z* ] is evaluated:

Parse the command: this is the [ command with the arguments formed by evaluating the words $a, ==, z*, ].
Expand $a into the value of the variable a.
Perform word splitting and filename generation on the parameters of the command.
- For example, if the value of a is the 6-character string foo b* (obtained by e.g. a='foo b*') and the list of files in the current directory is (bar, baz, qux, zim, zum), then the result of the expansion is the following list of words: [, foo, bar, baz, ==, zim, zum, ].
Run the command [ with the parameters obtained in the previous step.
- With the example values above, the [ command complains of a syntax error and returns the status 2.

Note: In [[ $a == z* ]], at step 3, the value of a does not undergo word splitting and filename generation, because it's in a context where a single word is expected (the left-hand argument of the conditional operator ==). In most cases, if a single word makes sense at that position then variable expansion behaves like it does in double quotes. However, there's an exception to that rule: in [[ abc == $a ]], if the value of a contains wildcards, then abc is matched against the wildcard pattern. For example, if the value of a is a* then [[ abc == $a ]] is true (because the wildcard * coming from the unquoted expansion of $a matches bc) whereas [[ abc == "$a" ]] is false (because the ordinary character * coming from the quoted expansion of $a does not match bc). Inside [[ … ]], double quotes do not make a difference, except on the right-hand side of the string matching operators (=, ==, != and =~).

Bash – Difference Between ‘du -sh ’ and ‘du -sh ./’

$ touch ./-c $'a\n12\tb' foo
$ du -hs *
0       a
12      b
0       foo
0       total

As you can see, the -c file was taken as an option to du and is not reported (and you see the total line because of du -c). Also, the file called a\n12\tb is making us think that there are files called a and b.

$ du -hs -- *
0       a
12      b
0       -c
0       foo

That's better. At least this time -c is not taken as an option.

$ du -hs ./*
0       ./a
12      b
0       ./-c
0       ./foo

That's even better. The ./ prefix prevents -c from being taken as an option and the absence of ./ before b in the output indicates that there's no b file in there, but there's a file with a newline character (but see below¹ for further digressions on that).

It's good practice to use the ./ prefix when possible, and if not and for arbitrary data, you should always use:

cmd -- "$var"

or:

cmd -- $patterns

If cmd doesn't support -- to mark the end of options, you should report it as a bug to its author (except when it's by choice and documented like for echo).

There are cases where ./* solves problems that -- doesn't. For instance:

awk -f file.awk -- *

fails if there is a file called a=b.txt in the current directory (sets the awk variable a to b.txt instead of telling it to process the file).

awk -f file.awk ./*

Doesn't have the problem because ./a is not a valid awk variable name, so ./a=b.txt is not taken as a variable assignment.

cat -- * | wc -l

fails if there a file called - in the current directory, as that tells cat to read from its stdin (- is special to most text processing utilities and to cd/pushd).

cat ./* | wc -l

is OK because ./- is not special to cat.

Things like:

grep -l -- foo *.txt | wc -l

to count the number of files that contain foo are wrong because it assumes file names don't contain newline characters (wc -l counts the newline characters, those output by grep for each file and those in the filenames themselves). You should use instead:

grep -l foo ./*.txt | grep -c /

(counting the number of / characters is more reliable as there can only be one per filename).

For recursive grep, the equivalent trick is to use:

grep -rl foo .//. | grep -c //

./* may have some unwanted side effects though.

cat ./*

adds two more character per file, so would make you reach the limit of the maximum size of arguments+environment sooner. And sometimes you don't want that ./ to be reported in the output. Like:

grep foo ./*

Would output:

./a.txt: foobar

instead of:

a.txt: foobar

Further digressions

¹. I feel like I have to expand on that here, following the discussion in comments.

$ du -hs ./*
0       ./a
12      b
0       ./-c
0       ./foo

Above, that ./ marking the beginning of each file means we can clearly identify where each filename starts (at ./) and where it ends (at the newline before the next ./ or the end of the output).

What that means is that the output of du ./*, contrary to that of du -- *) can be parsed reliably, albeit not that easily in a script.

When the output goes to a terminal though, there are plenty more ways a filename may fool you:

Control characters, escape sequences can affect the way things are displayed. For instance, \r moves the cursor to the beginning of the line, \b moves the cursor back, \e[C forward (in most terminals)...
many characters are invisible on a terminal starting with the most obvious one: the space character.
There are Unicode characters that look just the same as the slash in most fonts
```
 $ printf '\u002f \u2044 \u2215 \u2571 \u29F8\n'
 / ⁄ ∕ ╱ ⧸
```

(see how it goes in your browser).

An example:

$ touch x 'x ' $'y\bx' $'x\n0\t.\u2215x' $'y\r0\t.\e[Cx'
$ ln x y
$ du -hs ./*
0       ./x
0       ./x
0       ./x
0       .∕x
0       ./x
0       ./x

Lots of x's but y is missing.

Some tools like GNU ls would replace the non-printable characters with a question mark (note that ∕ (U+2215) is printable though) when the output goes to a terminal. GNU du does not.

There are ways to make them reveal themselves:

$ ls
x  x   x?0?.∕x  y  y?0?.?[Cx  y?x
$ LC_ALL=C ls
x  x?0?.???x  x   y  y?x  y?0?.?[Cx

See how ∕ turned to ??? after we told ls that our character set was ASCII.

$ du -hs ./* | LC_ALL=C sed -n l
0\t./x$
0\t./x $
0\t./x$
0\t.\342\210\225x$
0\t./y\r0\t.\033[Cx$
0\t./y\bx$

$ marks the end of the line, so we can spot the "x" vs "x ", all non-printable characters and non-ASCII characters are represented by a backslash sequence (backslash itself would be represented with two backslashes) which means it is unambiguous. That was GNU sed, it should be the same in all POSIX compliant sed implementations but note that some old sed implementations are not nearly as helpful.

$ du -hs ./* | cat -vte
0^I./x$
0^I./x $
0^I./x$
0^I.M-bM-^HM-^Ux$

(not standard but pretty common, also cat -A with some implementations). That one is helpful and uses a different representation but is ambiguous ("^I" and <TAB> are displayed the same for instance).

$ du -hs ./* | od -vtc
0000000   0  \t   .   /   x  \n   0  \t   .   /   x      \n   0  \t   .
0000020   /   x  \n   0  \t   . 342 210 225   x  \n   0  \t   .   /   y
0000040  \r   0  \t   . 033   [   C   x  \n   0  \t   .   /   y  \b   x
0000060  \n
0000061

That one is standard and unambiguous (and consistent from implementation to implementation) but not as easy to read.

^{You'll notice that y never showed up above. That's a completely unrelated issue with du -hs * that has nothing to do with file names but should be noted: because du reports disk usage, it doesn't report other links to a file already listed (not all du implementations behave like that though when the hard links are listed on the command line).}

Best Answer

Related Solutions

Bash Wildcards – Difference Between [[ $a == z* ]] and [ $a == z* ]?

Bash – Difference Between ‘du -sh *’ and ‘du -sh ./*’

Further digressions

Related Question

Bash – Difference Between ‘du -sh ’ and ‘du -sh ./’