Shell – Understand the Order Between Expansions

posixshell

From POSIX 7:

The order of word expansion shall be as follows:

  1. Tilde expansion (see Section 2.6.1), parameter expansion (see Section 2.6.2), command substitution (see Section 2.6.3),
    and arithmetic expansion (see Section 2.6.4) shall be performed,
    beginning to end. See item 5 in Section 2.3.

  2. Field splitting (see Section 2.6.5) shall be performed on the portions of the fields generated by step 1, unless IFS is
    null.

  3. Pathname expansion (see Section 2.6.6) shall be performed, unless set −f is in effect.

  4. Quote removal (see Section 2.6.7) shall always be performed last.

  1. Do tilde expansion, parameter expansion, command substitution,
    and arithmetic expansion perform in the specified order?

    Does the order between them matter? If yes, how shall we understand why the order is as specified?

  2. Why does pathname expansion happen after field splitting, while other expansions before field splitting?

    In particular, both tilde expansion and pathname expansion are about pathnames and filenames, why are they placed differently with respect to field splitting?

  3. Is there no brace expansion in POSIX?

  4. I notice "word expansion". Do expansions apply only to tokens with token identifier WORD, and not to tokens with other token identifiers (e.g. NAME, specific operator, NEWLINE, IO_NUMBER, ASSIGNMENT)?

Best Answer

Tilde expansion, parameter expansion, command substitution and arithmetic expansion are listed in the same step. That means that they are performed at the same time. The result of tilde expansion does not undergo parameter expansion, the result of parameter expansion does not undergo tilde expansion, and so on. For example, if the value of foo is $(bar) qux, then the word $foo expands to $(bar) qux at step 1; the text resulting from parameter expansion is not subject to any further transformation at step 1, but it then gets split by step 2.

“Beginning to end” means left-to-right processing, which matters e.g. when assignments occur: a=1; echo $a$((a=2))$a prints 122, because arithmetic expansion of $((a=2)) is performed, setting a to 2, between the parameter expansion of the first $a and the parameter expansion of the second $a.

The reason for the order is historical usage. POSIX usually follows existing implementation, it rarely specifies new behavior. There are multiple shells around; for the most part, POSIX follows the Korn shell but omits most features that are not present in the Bourne shell (as the Bourne shell is largely abandoned, the next version of POSIX is likely to include new ksh features though).

The reason why the Bourne shell performed parameter expansion then field splitting then globbing is that it allowed a glob to be stored in a variable: you can set a to *.txt *.pdf and then use $a to stand for the list of names of files matching *.txt followed by the list of names matching *.pdf (assuming both patterns match). (I'm not saying this is the best design possible, just that it was designed this way.) It's less clear to me why one would want command substitution to be placed at a particular step in the Bourne shell; in the Korn shell, its syntax $(…) is close to parameter expansion ${…} so it makes sense to perform them together.

The placement of tilde expansion is a historical oddity. It would have made more sense to place it later, so that you could write ~$some_user and have it expand to the home directory of the user whose name is the value of the variable some_user. I don't know why it wasn't done this way. This order even requires a special statement that the result of tilde expansion does not undergo other expansions (going by the passage you quoted, if HOME is /foo bar then ~ would expand to the two words /foo and bar due to field splitting, but no shell does that and POSIX.2008 explicitly states that “the pathname resulting from tilde expansion shall be treated as if quoted”).

There is no brace expansion in POSIX, otherwise the specification would state it.

Word expansion is only performed on WORDs, and with caveats mentioned in the following sections (e.g. field splitting and pathname generation are only performed in contexts that allow multiple words, not e.g. between double quotes). NAMEs, NEWLINEs, IO_NUMBERs and so on don't contain anything that could be expanded anyway.