Shell – Strange behavior of `/proc/self/environ` in some shells; what is going on

environment-variablesio-redirectionprocshell

I'm on Debian GNU/Linux 9. I know /proc is special, I know what /proc/self is.

This command

sh -c '/bin/cat /proc/self/comm - </proc/self/comm'

yields

cat
sh

The pattern will be similar if I use dash instead of sh. But with bash, ksh or zsh the result is

cat
cat

Taking /proc/self/stat instead of /proc/self/comm I can confirm the two cat-s are in fact the same single process. Apparently shells differ under the hood, it's OK. Now let's take

sh -c '/bin/cat /proc/self/environ - </proc/self/environ'

Having observed the above, with sh or dash I expect to see the environment of the cat first, the environment of the shell later. It seems to work (both environments are most likely identical anyway so it's hard to tell if everything works as expected, but my point is: neither environ is empty).

With bash, ksh or zsh I expect to see the environment of the cat twice, but it's only printed once. Splitting into two separate cases:

  • bash -c '/bin/cat - </proc/self/environ' prints nothing, as if environ was empty;
  • bash -c '/bin/cat /proc/self/environ' prints something as expected.

What is going on? This is not the case with comm or stat. Why is environ different?

$ uname -a
Linux barbaz 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29) x86_64 GNU/Linux

Best Answer

The differences between shells are due to differences in process setup. dash sets redirections up before forking, so /proc/self points at the shell; bash and zsh set them up after forking, so /proc/self points at the new process. You can see this happen with strace -f:

  • strace -f dash -c '/bin/cat /proc/self/comm - </proc/self/comm' shows (among many other things)

    open("/proc/self/comm", O_RDONLY)       = 3
    fcntl(0, F_DUPFD, 10)                   = 10
    close(0)                                = 0
    fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
    dup2(3, 0)                              = 0
    close(3)                                = 0
    clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f12581299d0) = 7743
    strace: Process 7743 attached
    [pid  7742] wait4(-1,  <unfinished ...>
    [pid  7743] execve("/bin/cat", ["/bin/cat", "/proc/self/comm", "-"], [/* 43 vars */]) = 0
    

    (/proc/self/comm is opened before the clone system call, which is where the process forks);

  • strace -f bash -c '/bin/cat /proc/self/comm -

    clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb506bdee10) = 8106
    strace: Process 8106 attached
    [... snip a ton of signal-handling setup ...]
    [pid  8106] open("/proc/self/comm", O_RDONLY) = 3
    [pid  8106] dup2(3, 0)                  = 0
    [pid  8106] close(3)                    = 0
    [pid  8106] execve("/bin/cat", ["/bin/cat", "/proc/self/comm", "-"], [/* 43 vars */]) = 0
    

    (/proc/self/comm is opened after the clone call, in the child process, 8106).

Understanding why environ shows up empty requires a bit more explanation. When /proc/<pid>/environ is opened, the kernel saves a copy of the pointer to the task’s mm_struct, which contains pointers to the environment. But execve, which is used to start the cat process, creates a new mm_struct for the process. Thus the redirection ends up pointing at obsolete information and when cat reads its input it doesn’t see its real environment. The environment it does see should be a copy of its parent’s, but the shells involved clean it up before forking and setting up the new environment (which is set up by execve).