Bash IFS – When to Use Temporary IFS for Field Splitting

bash

In bash, say you have var=a.b.c., then:

$ IFS=. printf "%s\n" $var
a.b.c

However, such a usage of IFS does take effect while creating an array:

$ IFS=. arr=($var)
$ printf "%s\n" "${arr[@]}"
a
b
c

This is very convenient, sure, but where is this documented? A quick reading of the sections on Arrays or Word Splitting in the Bash documentation does not give any indication either way. A search for IFS through the single-page documentation doesn't provide any hints about this effect either.

I'm not sure when I can reliably do:

IFS=x do something

And expect that IFS will affect field splitting.

Best Answer

The basic idea is that VAR=VALUE some-command sets VAR to VALUE for the execution of some-command when some-command is an external command, and it doesn't get more fancy than that. If you combine this intuition with some knowledge of how a shell works, you should come up with the right answer in most cases. The POSIX reference is “Simple Commands” in the chapter “Shell Command Language”.

If some-command is an external command, VAR=VALUE some-command is equivalent to env VAR=VALUE some-command. VAR is exported in the environment of some-command, and its value (or lack of a value) in the shell doesn't change.

If some-command is a function, then VAR=VALUE some-command is equivalent to VAR=VALUE; some-command, i.e. the assignment remains in place after the function has returned, and the variable is not exported into the environment. The reason for that has to do with the design of the Bourne shell (and subsequently with backward compatibility): it had no facility to save and restore variable values around the execution of a function. Not exporting the variable makes sense since a function executes in the shell itself. However, ksh (including both ATT ksh93 and pdksh/mksh), bash and zsh implement the more useful behavior where VAR is set only during the execution of the function (it's also exported). In ksh, this is done if the function is defined with the ksh syntax function NAME …, not if it's defined with the standard syntax NAME (). In bash, this is done only in bash mode, not in POSIX mode (when run with POSIXLY_CORRECT=1). In zsh, this is done if the posix_builtins option is not set; this option is not set by default but is turned on by emulate sh or emulate ksh.

If some-command is a builtin, the behavior depends on the type of builtin. Special builtins behave like functions. Special built-ins are the ones that have to be implemented inside the shell because they affect the state shell (e.g. break affects control flow, cd affects the current directory, set affects positional parameters and options…). Other builtins are built-in only for performance and convenience (mostly — e.g. the bash feature printf -v can only be implemented by a builtin), and they behave like an external command.

The assignment takes place after alias expansion, so if some-command is an alias, expand it first to find what happens.

Note that in all cases, the assignment is performed after the command line is parsed, including any variable substitution on the command line itself. So var=a; var=b echo $var prints a, because $var is evaluated before the assignment takes place. And thus IFS=. printf "%s\n" $var uses the old IFS value to split $var.

I've covered all the types of commands, but there's one more case: when there is no command to execute, i.e. if the command consists only of assignments (and possibly redirections). In that case, the assignment remains in place. VAR=VALUE OTHERVAR=OTHERVALUE is equivalent to VAR=VALUE; OTHERVAR=OTHERVALUE. So after IFS=. arr=($var), IFS remains set to .. Since you could use $IFS in the assignment to arr with the expectation that it already has its new value, it makes sense that the new value of IFS is used for the expansion of $var.

In summary, you can use IFS for temporary field splitting only:

  • by starting a new shell or a subshell (e.g. third=$(IFS=.; set -f; set -- $var; echo "$3") is a complicated way of doing third=${var#*.*.} except that they behave differently when the value of var contains less than two . characters);
  • in ksh, with IFS=. some-function where some-function is defined with the ksh syntax function some-function …;
  • in bash and zsh, with IFS=. some-function as long as they are operating in native mode as opposed to compatibility mode.
Related Question