Shell – What is List Context and String Context?

shell

I have seen several times the use of "list context" and "string context".

I know and understand the use of such descriptions in perl. They apply to $ and @.

However, when used in shell descriptions:

They seem diffuse as a term that has not been defined anywhere or
at best, poorly documented.

There is no definition in POSIX for that, acording to google

Is this (from this) the gist of it ? :

In a nutshell, double quotes are necessary wherever a list of words or a pattern is expected. They are optional in contexts where a raw string is expected by the parser.

But it seems like a dificult term to use. How could we find "what the result should be" when "the result is needed" to know if it is a string or list context.

Or could it be preciselly and correctly defined?

Best Answer

There is no such concept in the standard shell language. There are no "contexts" only expansion steps.

Quotes are first identified in the tokenization which produces words. They glue words together so that abc"spaces here"xyz is one "word".

The important thing to understand is that quotes are preserved through the subsequent expansion steps, and the original quotes are distinguished from quotes that might arise out of expansions.

Parameters are expanded without regard for double quotes. Later, though, a field splitting process takes place which harkens back to the first tokenization. Once again, quotes prevent splitting and, once again, are preserved.

Pathname expansion ("globbing") takes place after this splitting. The preserved quotes prevent it: globbing operators are not recognized inside quotes.

Finally the quotes are removed by a late stage called "quote removal". Of course, only the original quotes!

POSIX does a good job of presenting the process in a way that is understandable; attempts to demystify it with extraneous concepts (that may be misleading) are only going to muddle the understanding.

People throwing around ad hoc concepts like "list context" should formalize their thinking to the point that it can provide a complete alternative specification for all of the processing, which is equivalent (produces the same results). And then, avoid mixing concepts between the parallel designs: use one explanation or the other. A "list context" or "string context" makes sense in a theory of shell expansion in which these are well defined, and the processing steps are organized around these concepts.

If I were to guess, then "list context" refers to the idea that the shell is working with a list of tokenized words such as the two-word list {foo} {abc" x "def}. The quotes are not part of the second word: its content is actually abc x def; they are semantic quotes which prevent the splitting on whitespace. Inside these quotes, we have "string context".

However, a possible implementation of these expansion steps is not to actually have quotes which are identified as the original quotes, but some sort of list data structure, so that {foo} {abc" x "def} is, say, a list of lists in which the quoted parts are identified as different kinds of nodes (and the quotes are gone). Using Lisp notation it could be:

(("foo") ;; one-element word
 ("abc" (:dq-str " x ") "def")) ;; three-element word

The nodes without a label are literal text, :dq-str is a double-quote region. Another type could be :sq-str for a single quoted item.

The expansion can walk this structure, and then do different things based on whether it's looking at a string object, a :dq-str expression or whatever. File expansion and field splitting would be suppressed within both :dq-str or :sq-str. But parameter expansion does take place within :dq-str. "Quote removal" would then correspond to a final pass which takes the pieces and catenates the strings, flattening the interior list structure and losing the type indicating symbols, resulting in:

("foo"
 "abc x def") ;; plain string list, usable as command arguments

Now here, note how in the second item we have ("abc" (:dq-str " x ") "def"). The first and last items are unwrapped: they are direct elements of the list and so we can say these are in the "list context". Whereas, the middle " x " is wrapped in a :dq-str expression, so that is "(double quoted) string context".

What "list" refers to in "list context" is anyone's guess without a clearly defined model such as this. Is it the master word list? Or a list of chunks representing one word?

Related Solutions

Shell – Expansion of a shell variable and effect of glob and split on it

Variable expansion (the standard term is parameter expansion, and it's also sometimes called variable substitution) basically means replacing the variable by its value. More precisely, it means replacing the $VARIABLE construct (or ${VARIABLE} or ${VARIABLE#TEXT} or other constructs) by some other text which is built from the value of the variable. This other text is the expansion of the variable.

The expansion process goes as follows. (I only discuss the common case, some shell settings and extensions modify the behavior.)

Take the value of the variable, which is a string. If the variable is not defined, use the empty string.
If the construct includes a transformation, apply it. For example, if the construct is ${VARIABLE#TEXT}, and the value of the variable begins with TEXT, remove TEXT from the beginning of the value.
If the context calls for a single word (for example within double quotes, or in the right-hand side of an assignment, or inside a here document), stop here. Otherwise continue with the next steps.
Split the value into separate words at each sequence of whitespace. (The variable IFS can be changed to split at characters other than whitespace.) The result is thus no longer a string, but a list of strings. This list can be empty if the value contained only whitespace.
Treat each element of the list as a file name wildcard pattern, i.e. a glob. If the pattern matches some files, it is replaces by the list of matching file names, otherwise it is left alone.

For example, suppose that the variable foo contains a* b* c* and the current directory contains the files bar, baz and paz. Then ${foo#??} is expanded as follows:

The value of the variable is the 8-character string a* b* c*.
#?? means strip off the first two characters, resulting in the 6-character string b* c* (with an initial space).
If the expansion is in a list context (i.e. not in double quotes or other similar context), continue.
Split the string into whitespace-delimited words, resulting in a list of two-strings: b* and c*.
The string b*, interpreted as a pattern, matches two files: bar and baz. The string c* matches no file so it is left alone. The result is a list of three strings: bar, baz, c*.

For example echo ${foo#??} prints bar baz c* (the command echo joins its arguments with a space in between).

For more details, see:

Parameter expansion in the POSIX standard, followed by field splitting and pathname expansion
Shell parameter expansion in the bash manual, followed by word splitting and filename expansion
$VAR vs ${VAR} and to quote or not to quote
When is double-quoting necessary?

Bash – Why Does Bash Add Single Quotes to Unquoted Failed Pathname Expansions?

When instructed to echo commands as they are executed ("execution trace"), both bash and ksh add single quotes around any word with meta-characters (*, ?, ;, etc.) in it.

The meta-characters could have gotten into the word in a variety of ways. The word (or part of it) could have been quoted with single or double quotes, the characters could have been escaped with a \, or they remained as the result of a failed filename matching attempt. In all cases, the execution trace will contain single-quoted words, for example:

$ set -x
$ echo foo\;bar
+ echo 'foo;bar'

This is just an artifact of the way the shells implement the execution trace; it doesn't alter the way the arguments are ultimately passed to the command. The quotes are added, printed, and discarded. Here is the relevant part of the bash source code, print_cmd.c:

/* A function to print the words of a simple command when set -x is on. */
void
xtrace_print_word_list (list, xtflags)
...
{
  ...
  for (w = list; w; w = w->next)
    {
      t = w->word->word;
      ...
      else if (sh_contains_shell_metas (t))
        {
          x = sh_single_quote (t);
          fprintf (xtrace_fp, "%s%s", x, w->next ? " " : "");
          free (x);
        }

As to why the authors chose to do this, the code there doesn't say. But here's some similar code in variables.c, and it comes with a comment:

/* Print the value cell of VAR, a shell variable.  Do not print
   the name, nor leading/trailing newline.  If QUOTE is non-zero,
   and the value contains shell metacharacters, quote the value
   in such a way that it can be read back in. */
void
print_var_value (var, quote)
...
{
  ...
  else if (quote && sh_contains_shell_metas (value_cell (var)))
    {
      t = sh_single_quote (value_cell (var));
      printf ("%s", t);
      free (t);
    }

So possibly it's done so that it's easier to copy the command lines from the output of the execution trace and run them again.

Best Answer

Related Solutions

Shell – Expansion of a shell variable and effect of glob and split on it

Bash – Why Does Bash Add Single Quotes to Unquoted Failed Pathname Expansions?

Related Question