Shell – Why do `env var=value` allow arbitrary name in var

environment-variablesshell

Some have suggested that env is redundant since the same effect is
achieved by:

name=value … utility [ argument … ]

The example is equivalent to
env when an environment variable is being added to the environment of
the command, but not when the environment is being set to the given
value. The env utility also writes out the current environment if
invoked without arguments. There is sufficient functionality beyond
what the example provides to justify inclusion of env.

AFAICT, the above statement meaning var=value command will be the same as env var="value" command, and not when using as env -i var="value" command.

Now, at least with env implementation on GNU system, FreeBSD and Solaris 11, I realize that they're not equivalent, because env allow any characters, except = and \0 in var name:

$ env 'BASH_FUNC_foo%%=() { echo foo; }' bash -c foo

print foo, while you can't use BASH_FUNC_foo%%='() { echo foo; }' in any shells, because BASH_FUNC_foo%% clearly not a valid variable name.

In POSIX shells, except bash, this left a variable named BASH_FUNC_foo%% in environment variables, which the shell can not access it.

So, what is the purpose of allowing arbitrary name in form env var=value and was it allowed by POSIX?

Best Answer

So, what is the purpose of allowing arbitrary name in form env var=value and was it allowed by POSIX?

Quoting from POSIX: Environment Variables:

Environment variable names used by the utilities in the Shell and Utilities volume of POSIX.1-2008 consist solely of uppercase letters, digits, and the ( '_' ) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names.

Note: Other applications may have difficulty dealing with environment variable names that start with a digit. For this reason, use of such names is not recommended anywhere.

So implementations of env may permit arbitrary environment variable names - and most, if not all, implementations do so, accepting every non-NUL character to the left of an '=' - and implementations of other utilities (such as the shell) may or may not permit arbitrary names.

The statement that name=value ... utility is equivalent to env var="value" utility will only be true if the implementation of env and the shell both permit name to be an environment variable.

There's an interesting Austin Group thread about this issue here: Invalid shell assignments in environment. One point mentioned is that shells generally only allow environment variables whose names can be represented as shell variables. Several participants in that thread participate in unix.stackexchange.com and can hopefully add some more info about the issue.

Related Solutions

Exception of inheritance of environment variables

In the unix model, starting another program involves two primivites:

fork() creates an (almost) identical copy of the calling process. The new process is called the child process and the original process is called the parent process. The child process runs the same code as the original, has the same permissions, has the same environment, and receives a copy of the mutable data memory of the parent process. The most visible difference between the two processes is that they have different process IDs and different parent process IDs (the child's PPID is the parent's PID).
execve() replaces the code and data of the current process by code and data loaded from an executable file. This system call takes the new environment of the process as an argument.

Most high-level functions built around fork() and execve() pass the process's current environment to execve(). Thus, unless the process changes its own environment or calls execve() directly, the called program will inherit the calling program's environment.

Shells normally pass their environment to the programs they call. You can change the environment of a shell at any time by assigning a value to an environment variable (foo="some value"; you must call export foo if the variable isn't in the environment already), or remove a variable from the environment by unsetting it (unset foo). If you want to start an external program with different or additional environment variables, there is a shortcut syntax:

foo="some value" mycommand

is roughly equivalent to

(foo="some value"; export foo; exec mycommand)

(where the parentheses restrict the scope of the setting of foo).

Shell – Variable assignments affect present running shell

As you've found, that's spec'd behavior. But it also makes sense.

The value is retained in the shell's environment for the same reason the value of other environment variables are retained by other commands when you prefix definitions to their command-lines - you're setting the variables in their environment.

The special builtins are generally the most intrinsic variety in any shell - eval is essentially an accessible name for the shell's parser, set tracks and configures shell options and shell parameters, return/break/continue trigger loop control flow, trap handles signals, exec opens/closes files. These are all fundamental utilities - and are typically implemented with barely-there wrappers over the meat and potatoes of your shell.

Executing most commands involves some layered environment - a subshell environment (which needn't necessarily be a separate process) - which you don't get when calling the special builtins. So when you set environment for one of these commands you set environment for your shell. Because they basically represent your shell.

But they're not the only commands that retain environment that way - functions do the same as well. And errors behave differently for special built-ins - try cat <doesntexist and then try exec <doesntexist or even just : <doesntexist and while the cat command will complain, the exec or : will kill a POSIX shell. The same is true of expansion errors on their command line. They're the main loop, basically.

These commands don't have to retain environment - some shells wrap their internals up more tightly than others, expose less of the core functionality and add more buffer between the programmer and the interface. These same shells might also tend to be a bit slower than others. Definitely they require a lot of non-standard adjustments to make them to conform to spec. And anyway, it's not as if this is a bad thing:

fn(){ bad_command || return=$some_value return; }

That stuff is easy. How else would you preserve the return of bad_command so simply without having to set a bunch of extra environment and yet still do assignments conditionally?

arg=$1 shift; x=$y unset y

That kinda stuff works too. In place swaps are more simple.

IFS=+  set -- "$IFS" x y z
x="$*" IFS=$1 shift
echo "${x#"$IFS"}" "$*"

+x+y+z x y z

...or...

expand(){
    PS4="$*" set -x "" "$PS4" 
    { $1; }  2>&1
    PS4=$2   set +x
}   2>/dev/null

x='echo kill my computer; $y'
y='haha! just kidding!' expand "${x##*[\`\(]*}"

...is another one I like to use...

echo kill my computer; haha! just kidding!

Best Answer

Related Solutions

Exception of inheritance of environment variables

Shell – Variable assignments affect present running shell

Related Question