Why doesn’t unsetenv() modify /proc/pid/environ

environment-variablesglibc

I was just looking at this question and wrote a noddy program to demonstrate unsetenv() modifying /proc/pid/environ. To my surprise it has no effect!

Here's what I did:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main(void)
{
  printf("pid=%d\n", getpid());
  printf("sleeping 10...\n");
  sleep(10);
  printf("unsetenv result: %d\n", unsetenv("WIBBLE"));
  printf("unset; sleeping 10 more...\n");
  sleep(10);

  return 0;
}

However, when I run

WIBBLE=hello ./test_program

then I see WIBBLE in the environment both before and after the unsetenv() runs:

# before the unsetenv()
$ tr '\0' '\n' < /proc/498/environ | grep WIBBLE
WIBBLE=hello
# after the unsetenv()
$ tr '\0' '\n' < /proc/498/environ | grep WIBBLE
WIBBLE=hello

Why doesn't unsetenv() modify /proc/pid/environ?

Best Answer

When a program starts, it receives its environment as an array of pointers to some strings in the format var=value. On Linux, those are located at the bottom of the stack. At the very bottom, you have all the strings tucked one after the other (that's what's shown in /proc/pid/environ). And above you have an array of pointers (NULL terminated) to those strings (that's what goes into char *envp[] in your int main(int argc, char* argv[], char* envp[]), and the libc would generally initialise environ to).

putenv()/setenv()/unsetenv(), do not modify those strings, they don't generally even modify the pointers. On some systems, those (strings and pointers) are read-only.

While the libc will generally initialise char **environ to the address of the first pointer above, any modification of the environment (and those are for future execs), will generally cause a new array of pointers to be created and assigned to environ.

If environ is initially [a,b,c,d,NULL], where a is a pointer to x=1, b to y=2, c to z=3, d to q=5, if you do a unsetenv("y"), environ would have to become [a,c,d,NULL]. On systems where the initial array list is read-only, a new list would have to be allocated and assigned to environ and [a,c,d,NULL] stored in there. Upon the next unsetenv(), the list could be modified in place. Only if you did unsetenv("x") above could a list not be reallocated (environ could just be incremented to point to &envp[1]. I don't know if some libc implementations actually perform that optimisation).

In anycase, there's no reason for the strings themselves stored at the bottom of the stack to be modified in any way. Even if an unsetenv() implementation was actually modifying the data initially received on the stack in-place, it would only modify the pointers, it wouldn't go all the trouble of also erasing the strings they point to. (that seems to be what the GNU libc does on Linux systems (with ELF executables at least), it does modify the list of pointers at envp in place as long as the number of environment variables doesn't increase.

You can observe the behaviour using a program like:

#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
extern char **environ;
int main(int argc, char* argv[], char* envp[]) {
  char cmd[128];
  int i;

  printf("envp: %p environ: %p\n", envp, environ);
  for (i = 0; envp[i]; i++)
    printf("  envp[%d]: %p (%s)\n", i, envp[i], envp[i]);

#define DO(x) x; puts("\nAfter " #x "\n"); \
  printf("envp: %p environ: %p\n", envp, environ); \
  for (i = 0; environ[i]; i++) \
    printf("  environ[%d]: %p (%s)\n", i, environ[i], environ[i])

  DO(unsetenv("a"));
  DO(setenv("b", "xxx", 1));
  DO(setenv("c", "xxx", 1));

  puts("\nAddress of heap and stack:");
  sprintf(cmd, "grep -e stack -e heap /proc/%u/maps", getpid());

  fflush(stdout);
  system(cmd);
}

On Linux with the GNU libc (same with klibc, musl libc or dietlibc except for the fact that they use mmapped anonymous memory instead of the heap for allocated memory), when run as env -i a=1 x=3 ./e, that gives (comments inline):

envp: 0x7ffc2e7b3238 environ: 0x7ffc2e7b3238
  envp[0]: 0x7ffc2e7b4fec (a=1)
  envp[1]: 0x7ffc2e7b4ff0 (x=3)
   # envp[1] is almost at the bottom of the stack. I lied above in that
   # there are more things like the path of the executable
   # environ initially points to the same pointer list as envp

After unsetenv("a")

envp: 0x7ffc2e7b3238 environ: 0x7ffc2e7b3238
  environ[0]: 0x7ffc2e7b4ff0 (x=3)
   # here, unsetenv has reused the envp[] list and has not allocated a new
   # list. It has shifted the pointers though and not done the optimisation
   # I mention above

After setenv("b", "xxx", 1)

envp: 0x7ffc2e7b3238 environ: 0x1bb3420
  environ[0]: 0x7ffc2e7b4ff0 (x=3)
  environ[1]: 0x1bb3440 (b=xxx)
   # a new list has been allocated on the heap. (it could have reused the
   # slot freed by unsetenv() above but didn't, Solaris' version does).
   # the "b=xxx" string is also allocated on the heap.

After setenv("c", "xxx", 1)

envp: 0x7ffc2e7b3238 environ: 0x1bb3490
  environ[0]: 0x7ffc2e7b4ff0 (x=3)
  environ[1]: 0x1bb3440 (b=xxx)
  environ[2]: 0x1bb3420 (c=xxx)

Address of heap and stack:
01bb3000-01bd4000 rw-p 00000000 00:00                              [heap]
7ffc2e794000-7ffc2e7b5000 rw-p 00000000 00:00 0                    [stack]

On FreeBSD (11-rc1 here), a new list is allocated already upon unsetenv(). Not only that, but the strings themselves are being copied onto the heap as well so environ is completely disconnected from the envp[] that the program received on start-up after the first modification of the environment:

envp: 0x7fffffffedd8 environ: 0x7fffffffedd8
  envp[0]: 0x7fffffffef74 (x=2)
  envp[1]: 0x7fffffffef78 (a=1)

After unsetenv("a")

envp: 0x7fffffffedd8 environ: 0x800e24000
  environ[0]: 0x800e15008 (x=2)

After setenv("b", "xxx", 1)

envp: 0x7fffffffedd8 environ: 0x800e24000
  environ[0]: 0x800e15018 (b=xxx)
  environ[1]: 0x800e15008 (x=2)

After setenv("c", "xxx", 1)

envp: 0x7fffffffedd8 environ: 0x800e24000
  environ[0]: 0x800e15020 (c=xxx)
  environ[1]: 0x800e15018 (b=xxx)
  environ[2]: 0x800e15008 (x=2)

On Solaris (11 here), we see the optimisation mentioned above (where unsetenv("a") ends up being done with a environ++), the slot freed by unsetenv() being reused for b, but of course a new list of pointers has to be allocated upon the insertion of a new environment variable (c):

envp: 0xfeffef6c environ: 0xfeffef6c
  envp[0]: 0xfeffefec (a=1)
  envp[1]: 0xfeffeff0 (x=2)

After unsetenv("a")

envp: 0xfeffef6c environ: 0xfeffef70
  environ[0]: 0xfeffeff0 (x=2)

After setenv("b", "xxx", 1)

envp: 0xfeffef6c environ: 0xfeffef6c
  environ[0]: 0x806145c (b=xxx)
  environ[1]: 0xfeffeff0 (x=2)

After setenv("c", "xxx", 1)

envp: 0xfeffef6c environ: 0x8061c48
  environ[0]: 0x8061474 (c=xxx)
  environ[1]: 0x806145c (b=xxx)
  environ[2]: 0xfeffeff0 (x=2)
Related Question