So per POSIX specification we have the following definition for *
:
Expands to the positional parameters, starting from one, initially
producing one field for each positional parameter that is set. When
the expansion occurs in a context where field splitting will be
performed, any empty fields may be discarded and each of the non-empty
fields shall be further split as described in Field Splitting. When
the expansion occurs in a context where field splitting will not be
performed, the initial fields shall be joined to form a single field
with the value of each parameter separated by the first character of
the IFS variable if IFS contains at least one character, or separated
by a if IFS is unset, or with no separation if IFS is set to a
null string.
For a vast majority of people we are aware of the famous ARG_MAX
limitation:
$ getconf ARG_MAX
2621440
which may lead to:
$ cat * | sort -u > /tmp/bla.txt
-bash: /bin/cat: Argument list too long
Thankfully the good people behind bash
([include all POSIX-like others]) provided us with printf
as a built-in, so we can simply:
printf '%s\0' * | sort -u --files0-from=- > /tmp/bla.txt
And everything is transparent for the user.
Could someone please let me know why this is so trivial to bypass the ARG_MAX
limitation using a built-in
command and why it is so damn hard to provide a conforming POSIX shell interpreter which would handle gracefully *
special parameter to a standalone executable:
$ cat *
Would that break something ? I am not asking bash
people to provide cat
as a built-in, I am solely interested in the order of operations and why is *
expanded in different behavior depending whether the command is build-in or is a standalone executable.
Best Answer
The limitation is not in the shell but in the
exec()
family of functions.The POSIX standard says in relation to this:
To run utilities that are built into the shell, the shell will not need to call
exec()
, so it is unaffected by this limitation.Notice, too, that it's not simply the length of the command line that is limited, but the combination of the length of the command, its arguments, and the current environment variables and their values.
Also notice that
printf
is not a built in utility in e.g.pdksh
(which happens to act assh
andksh
on OpenBSD). Relying on it being a built-in will need to take the specific shell which is being used into account.