Bash – the data structure of $@ in shell

bashshell

We usually use $@ to represent all of argument except $0. However, I don't know what data structure $@ is.

Why it behave differently with $* when including in double quote, could anyone give me a interpreter-level explanation?

It can be iterated in for loop, so it seems to be array.
However, it can also echoed entirely with simple echo $@, if it is an array, only first element will be shown. Due to the limitation of shell, I cannot write more experiment code to carry it out.

Difference between this post: This post show how $@ behaves differently from $*. But I am wondering about the data type of $@. Shell as a interpreting language, like Python, should representing data according to a series of fundamental types. Or in other words, I want to know how $@ stored in computer memory.

Is it a string, a multi-line string or a array?

If it is a unique data type, is it possible to define a custom variable as an instance of this type?

Best Answer

That started as a hack in the Bourne shell. In the Bourne shell, IFS word splitting was done (after tokenisation) on all words in list context (command line arguments or the words the for loops loop on). If you had:

IFS=i var=file2.txt
edit file.txt $var

That second line would be tokenised in 3 words, $var would be expanded, and split+glob would be done on all three words, so you would end up running ed with t, f, le.txt, f, le2.txt as arguments.

Quoting parts of that would prevent the split+glob. The Bourne shell initially remembered which characters were quoted by setting the 8th bit on them internally (that changed later when Unix became 8bit clean, but the shell still did something similar to remember which byte was quoted).

Both $* and $@ were the concatenation of the positional parameters with space in-between. But there was a special processing of $@ when inside double-quotes. If $1 contained foo bar and $2 contained baz, "$@" would expand to:

foo bar baz
^^^^^^^ ^^^

(with the ^s above indicating which of the characters have the 8th bit set). Where the first space was quoted (had the 8th bit set) but not the second one (the one added in-between words).

And it's the IFS splitting that takes care of separating the arguments (assuming the space character is in $IFS as it is by default). That's similar to how $* was expanded in its predecessor the Mashey shell (itself based on the Thomson shell, while the Bourne shell was written from scratch).

That explains why in the Bourne shell initially "$@" would expand to the empty string instead of nothing at all when the list of positional parameters was empty (you had to work around it with ${1+"$@"}), why it didn't keep the empty positional parameters and why "$@" didn't work when $IFS didn't contain the space character.

The intention was to be able to pass the list of arguments verbatim to another command, but that didn't work properly for the empty list, for empty elements or when $IFS didn't contain space (the first two issues were eventually fixed in later versions).

The Korn shell (on which the POSIX spec is based) changed that behaviour in a few ways:

  • IFS splitting is only done on the result of unquoted expansions (not on literal words like edit or file.txt in the example above)
  • $* and $@ are joined with the first character of $IFS or space when $IFS is empty except that for a quoted "$@", that joiner is unquoted like in the Bourne shell, and for a quoted "$*" when IFS is empty, the positional parameters are appended without separator.
  • it added support for arrays, and with ${array[@]} ${array[*]} reminiscent of Bourne's $* and $@ but starting at indice 0 instead of 1, and sparse (more like associative arrays) which means $@ cannot really be treated as a ksh array (compare with csh/rc/zsh/fish/yash where $argv/$* are normal arrays).
  • The empty elements are preserved.
  • "$@" when $# is 0 now expands to nothing instead of the empty string, "$@" works when $IFS doesn't contain spaces except when IFS is empty. An unquoted $* without wildcards expands to one argument (where the positional parameters are joined with space) when $IFS is empty.

ksh93 fixed the remaining few problems above. In ksh93, $* and $@ expands to the list of positional parameters, separated regardless of the value of $IFS, and then further split+globbed+brace-expanded in list contexts, $* joined with first byte (not character) of $IFS, "$@" in list contexts expands to the list of positional parameters, regardless of the value of $IFS. In non-list context, like in var=$@, $@ is joined with space regardless of the value of $IFS.

bash's arrays are designed after the ksh ones. The differences are:

  • no brace-expand upon unquoted expansion
  • first character of $IFS instead of for byte
  • some corner case differences like the expansion of $* when non-quoted in non-list context when $IFS is empty.

While the POSIX spec used to be pretty vague, it now more or less specifies the bash behaviour.

It's different from normal arrays in ksh or bash in that:

  • Indices start at 1 instead of 0 (except in "${@:0}" which includes $0 (not a positional parameter, and in functions gives you the name of the function or not depending on the shell and how the function was defined)).
  • You can't assign elements individually
  • it's not sparse, you can't unset elements individually
  • shift can be used.

In zsh or yash where arrays are normal arrays (not sparse, indices start at one like in all other shells but ksh/bash), $* is treated as a normal array. zsh has $argv as an alias for it (for compatibility with csh). $* is the same as $argv or ${argv[*]} (arguments joined with the first character of $IFS but still separated out in list contexts). "$@" like "${argv[@]}" or "${*[@]}"} undergoes the Korn-style special processing.

Related Question