Early shells had only a single data type: strings. But it is common to manipulate lists of strings, typically when passing multiple file names as arguments to a program. Another common use case for splitting is when a command outputs a list of results: the command's output is a string, but the desired data is a list of strings. To store a list of file names in a variable, you would put spaces between them. Then a shell script like this
files="foo bar qux"
myprogram $files
called myprogram
with three arguments, as the shell split the string $files
into words. At the time, spaces in file names were either forbidden or widely considered Not Done.
The Korn shell introduced arrays: you could store a list of strings in a variable. The Korn shell remained compatible with the then-established Bourne shell, so bare variable expansions kept undergoing word splitting, and using arrays required some syntactic overhead. You would write the snippet above
files=(foo bar qux)
myprogram "${files[@]}"
Zsh had arrays from the start, and its author opted for a saner language design at the expense of backward compatibility. In zsh (under the default expansion rules) $var
does not perfom word splitting; if you want to store a list of words in a variable, you are meant to use an array; and if you really want word splitting, you can write $=var
.
files=(foo bar qux)
myprogram $files
These days, spaces in file names are something you need to cope with, both because many users expect them to work and because many scripts are executed in security-sensitive contexts where an attacker may be in control of file names. So automatic word splitting is often a nuisance; hence my general advice to always use double quotes, i.e. write "$foo"
, unless you understand why you need word splitting in a particular use case. (Note that bare variable expansions undergo globbing as well.)
There is no standard way to retrieve the list of configuration variables that are supported on a system. If you program for a given POSIX version, the list in that version of the POSIX specification is your reference list. On Linux, getconf -a
lists all available variable.
fpathconf
isn't specific to PATH. It's about variables that are related to files, which are the ones that may vary from file to file.
Regarding ARG_MAX
on Linux, the rationale for depending on the stack size is that the arguments end up on the stack, so there had better be enough room for them plus everything else that must fit. Most other implementations (including older versions of Linux) have a fixed size.
Most limits go together with resource availability, with different resources depending on the limit. For example, a process may be unable to open a file even if it has fewer than OPEN_MAX
files open, if the system is out of memory that can be used for the file-related data.
Linux is POSIX-compliant on this point by default, so I don't know where you're getting at.
If you use ulimit -s
to restrict the stack size to less than ARG_MAX
, you're making the system no longer compliant. A POSIX system can typically be made non-compliant in any number of ways, including PATH=/nowhere
(making all standard utilities unavailable) or rm -rf /
.
The value of ARG_MAX
in limits.h
provides a minimum that applications can rely on. A POSIX-compliant system is allowed to let execve
succeed even if the arguments exceed that size. The guarantee related to ARG_MAX
is that if the arguments fit in that size then execve
will not fail due E2BIG
.
Best Answer
Tilde expansion, parameter expansion, command substitution and arithmetic expansion are listed in the same step. That means that they are performed at the same time. The result of tilde expansion does not undergo parameter expansion, the result of parameter expansion does not undergo tilde expansion, and so on. For example, if the value of
foo
is$(bar) qux
, then the word$foo
expands to$(bar) qux
at step 1; the text resulting from parameter expansion is not subject to any further transformation at step 1, but it then gets split by step 2.“Beginning to end” means left-to-right processing, which matters e.g. when assignments occur:
a=1; echo $a$((a=2))$a
prints122
, because arithmetic expansion of$((a=2))
is performed, settinga
to 2, between the parameter expansion of the first$a
and the parameter expansion of the second$a
.The reason for the order is historical usage. POSIX usually follows existing implementation, it rarely specifies new behavior. There are multiple shells around; for the most part, POSIX follows the Korn shell but omits most features that are not present in the Bourne shell (as the Bourne shell is largely abandoned, the next version of POSIX is likely to include new ksh features though).
The reason why the Bourne shell performed parameter expansion then field splitting then globbing is that it allowed a glob to be stored in a variable: you can set
a
to*.txt *.pdf
and then use$a
to stand for the list of names of files matching*.txt
followed by the list of names matching*.pdf
(assuming both patterns match). (I'm not saying this is the best design possible, just that it was designed this way.) It's less clear to me why one would want command substitution to be placed at a particular step in the Bourne shell; in the Korn shell, its syntax$(…)
is close to parameter expansion${…}
so it makes sense to perform them together.The placement of tilde expansion is a historical oddity. It would have made more sense to place it later, so that you could write
~$some_user
and have it expand to the home directory of the user whose name is the value of the variablesome_user
. I don't know why it wasn't done this way. This order even requires a special statement that the result of tilde expansion does not undergo other expansions (going by the passage you quoted, ifHOME
is/foo bar
then~
would expand to the two words/foo
andbar
due to field splitting, but no shell does that and POSIX.2008 explicitly states that “the pathname resulting from tilde expansion shall be treated as if quoted”).There is no brace expansion in POSIX, otherwise the specification would state it.
Word expansion is only performed on WORDs, and with caveats mentioned in the following sections (e.g. field splitting and pathname generation are only performed in contexts that allow multiple words, not e.g. between double quotes). NAMEs, NEWLINEs, IO_NUMBERs and so on don't contain anything that could be expanded anyway.