I'd like to ask:
Why is echo {1,2,3}
expanded to 1 2 3 which is an expected behavior,
while echo [[:digit:]]
returns [[:digit:]]
while I expected it to print all digits from 0
to 9
?
shellwildcards
I'd like to ask:
Why is echo {1,2,3}
expanded to 1 2 3 which is an expected behavior,
while echo [[:digit:]]
returns [[:digit:]]
while I expected it to print all digits from 0
to 9
?
There are several things to consider here.
i=`cat input`
can be expensive and there's a lot of variations between shells.
That's a feature called command substitution. The idea is to store the whole output of the command minus the trailing newline characters into the i
variable in memory.
To do that, shells fork the command in a subshell and read its output through a pipe or socketpair. You see a lot of variation here. On a 50MiB file here, I can see for instance bash being 6 times as slow as ksh93 but slightly faster than zsh and twice as fast as yash
.
The main reason for bash
being slow is that it reads from the pipe 128 bytes at a time (while other shells read 4KiB or 8KiB at a time) and is penalised by the system call overhead.
zsh
needs to do some post-processing to escape NUL bytes (other shells break on NUL bytes), and yash
does even more heavy-duty processing by parsing multi-byte characters.
All shells need to strip the trailing newline characters which they may be doing more or less efficiently.
Some may want to handle NUL bytes more gracefully than others and check for their presence.
Then once you have that big variable in memory, any manipulation on it generally involves allocating more memory and coping data across.
Here, you're passing (were intending to pass) the content of the variable to echo
.
Luckily, echo
is built-in in your shell, otherwise the execution would have likely failed with an arg list too long error. Even then, building the argument list array will possibly involve copying the content of the variable.
The other main problem in your command substitution approach is that you're invoking the split+glob operator (by forgetting to quote the variable).
For that, shells need to treat the string as a string of characters (though some shells don't and are buggy in that regard) so in UTF-8 locales, that means parsing UTF-8 sequences (if not done already like yash
does), look for $IFS
characters in the string. If $IFS
contains space, tab or newline (which is the case by default), the algorithm is even more complex and expensive. Then, the words resulting from that splitting need to be allocated and copied.
The glob part will be even more expensive. If any of those words contain glob characters (*
, ?
, [
), then the shell will have to read the content of some directories and do some expensive pattern matching (bash
's implementation for instance is notoriously very bad at that).
If the input contains something like /*/*/*/../../../*/*/*/../../../*/*/*
, that will be extremely expensive as that means listing thousands of directories and that can expand to several hundred MiB.
Then echo
will typically do some extra processing. Some implementations expand \x
sequences in the argument it receives, which means parsing the content and probably another allocation and copy of the data.
On the other hand, OK, in most shells cat
is not built-in, so that means forking a process and executing it (so loading the code and the libraries), but after the first invocation, that code and the content of the input file will be cached in memory. On the other hand, there will be no intermediary. cat
will read large amounts at a time and write it straight away without processing, and it doesn't need to allocate huge amount of memory, just that one buffer that it reuses.
It also means that it's a lot more reliable as it doesn't choke on NUL bytes and doesn't trim trailing newline characters (and doesn't do split+glob, though you can avoid that by quoting the variable, and doesn't expand escape sequence though you can avoid that by using printf
instead of echo
).
If you want to optimise it further, instead of invoking cat
several times, just pass input
several times to cat
.
yes input | head -n 100 | xargs cat
Will run 3 commands instead of 100.
To make the variable version more reliable, you'd need to use zsh
(other shells can't cope with NUL bytes) and do it:
zmodload zsh/mapfile
var=$mapfile[input]
repeat 10 print -rn -- "$var"
If you know the input doesn't contain NUL bytes, then you can reliably do it POSIXly (though it may not work where printf
is not builtin) with:
i=$(cat input && echo .) || exit # add an extra .\n to avoid trimming newlines
i=${i%.} # remove that trailing dot (the \n was removed by cmdsubst)
n=10
while [ "$n" -gt 10 ]; do
printf %s "$i"
n=$((n - 1))
done
But that is never going to be more efficient than using cat
in the loop (unless the input is very small).
The open bracket [
is a special character to the shell; it opens up a pattern matching algorithm that says "match any of the characters inside the brackets". Because you have 4 files named as: 1, 4, 5, and 6 in your current directory, when the characters inside the brackets contain any of those digits, your shell replaces the pattern-match with those filenames. When you instead use echo [ 9876543210 ]
you are calling echo with 3 parameters: [
, 9876543210
, and ]
.
You should quote your echo statement's parameters to prevent the shell from seeing it as a pattern matching request.
$ echo '[9876543210]'
[9876543210]
(or remove the files named 1, 4, 5, and 6 -- but that's a workaround to demonstrate the behavior, not a fix).
Best Answer
Because they are two different things. The
{1,2,3}
is an example of brace expansion. The{1,2,3}
construct is expanded by the shell, beforeecho
even sees it. You can see what happens if you useset -x
:As you can see, the command
echo {1,2,3}
is expanded to:However,
[[:digit:]]
is a POSIX character class. When you give it toecho
, the shell also processes it first, but this time it is being processed as a shell glob. it works the same way as if you runecho *
which will print all files in the current directory. But[[:digit:]]
is a shell glob that will match any digit. Now, in bash, if a shell glob doesn't match anything, it will be expanded to itself:If the glob does match something, that will be printed:
In both cases,
echo
just prints whatever the shell tells it to print, but in the second case, since the glob matches something (/etc
) it is told to print that something.So, since you don't have any files or directories whose name consists of exactly one digit (which is what
[[:digit:]]
would match), the glob is expanded to itself and you get:Now, try creating a file called
5
and running the same command:And if there are more than one matching files:
This is (sort of) documented in
man bash
in the explanation of thenullglob
options which turns this behavior off:If you set this option: