Linux – How does the /proc//exe symlink differ from ordinary symlinks

linuxprocsymlink

If I start a process and then delete the binary of it, I can still recover it from /proc/<pid>/exe:

$ cp `which sleep` .
$ ./sleep 10m &
[1] 13728
$ rm sleep
$ readlink /proc/13728/exe                           
/tmp/sleep (deleted)
$ cp /proc/13728/exe ./sleep-copy
$ diff sleep-copy `which sleep` && echo not different
not different
$ stat /proc/13728/exe 
  File: ‘/proc/13728/exe’ -> ‘/tmp/sleep (deleted)’
  Size: 0           Blocks: 0          IO Block: 1024   symbolic link

On the other hand, if I make a symbolic link myself, delete the target and attempt to copy:

cp: cannot stat ‘sleep’: No such file or directory

/proc is an interface to the kernel. So does this symbolic link actually point to the copy loaded in memory, but with a more useful name? How does the exe link work, exactly?

Best Answer

/proc/<pid>/exe does not follow the normal semantics for symbolic links. Technically this might count as a violation of POSIX, but /proc is a special filesystem after all.

/proc/<pid>/exe appears to be a symlink when you stat it. This is a convenient way for the kernel to export the pathname it knows for the process' executable. But when you actually open that "file", there is none of the normal procedure of reading the following the contents of a symlink. Instead the kernel just gives you access to the open file entry directly.

Notice that when you ls -l a /proc/<pid>/exe pseudofile for a process whose executable has been deleted the symlink target has the string " (deleted)" at the end of it. This would normally be non-sensical in a symlink: there definitely isn't a file that lives at the target path with a name that ends with " (deleted)".

tl;dr The proc filesystem implementation just does its own magic thing with pathname resolution.

Related Solutions

Does /proc/[pid]/status Always Use kB?

Yes, it's always in kB. KiB (1024-bytes, not 1000) to be exact.

At least in Linux 4.0 (and this code has been largely unchanged since at least April 2005—that's when Linus switched to git, and I don't care to check back further) that output comes from task_mem in fs/proc/task_mmu.c. Excerpting a few lines:

seq_printf(m,
    "VmPeak:\t%8lu kB\n"
    "VmSize:\t%8lu kB\n"
    "VmLck:\t%8lu kB\n"
    "VmPin:\t%8lu kB\n"
    "VmHWM:\t%8lu kB\n"
    "VmRSS:\t%8lu kB\n"
    "VmData:\t%8lu kB\n"
    "VmStk:\t%8lu kB\n"
    "VmExe:\t%8lu kB\n"
    "VmLib:\t%8lu kB\n"
    "VmPTE:\t%8lu kB\n"
    "VmPMD:\t%8lu kB\n"
    "VmSwap:\t%8lu kB\n",
    hiwater_vm << (PAGE_SHIFT-10),
    ⋮
);

Not sure if you can read C, but that "kB" is hardcoded there. There is no logic to output any other unit.

Linux – How does Linux decide the /proc/PID/stat “name” of a process

This is down to a Linux:

When a program starts another, it should use the name of the executable file as command line parameter $0, but it may choose to do otherwise. The Name field of /proc/PID/status is always set to the name of the executable by the kernel (but truncated to 15 characters).

The application itself can change a name. You can get the longer name from /proc/PID/cmdline (read up to the first null byte).

Best Answer

Related Solutions

Does /proc/[pid]/status Always Use kB?

Linux – How does Linux decide the /proc/PID/stat “name” of a process

Related Question