Proc Character Encoding – Unexpected Non-Null Encoding of /proc//cmdline

character encodingproc

I am parsing the /proc/pid/cmdline value for a number of processes on my Linux system (Ubuntu 16.04) and have found that while most of the entries are null-encoded, as expected, at least one uses spaces for delimiters which I find unexpected.

From the documentation for proc(5) I don't see any indication that this should be happening. Are there any cases where I should expect spaces as delimiters instead of null values? If so, where can I find documentation that describes the behavior?


Behavior

This is what I see when I try to cat the cmdline for one of the chromium-browser processes (note the space character is used to delimit the values):

user@host:~$ cat /proc/2721/cmdline
/usr/lib/chromium-browser/chromium-browser --type=gpu-process --field-trial-handle=2073283832741738928,4790986738309707242,131072 --gpu-preferences=GAAAAAAAAAAAAQAAAQAAAAAAAAAAAGAA --gpu-vendor-id=0x15ad --gpu-device-id=0x0405 --gpu-driver-vendor=Mesa --gpu-driver-version=17.2.8 --gpu-driver-date --service-request-channel-token=3778166CAD6E96F44A7268DF1AB1DD53

I would expect to see something like this (null values as delimiter), which is what I do see from other processes on the system:

 ~$ cat /proc/354/cmdline
vmware-vmblock-fuse/run/vmblock-fuse-orw,subtype=vmware-vmblock,default_permissions,allow_other,dev,suid

Best Answer

at least one uses spaces for delimiters

Incorrect.

If you look at the end of the pseudo-file on FreeBSD/TrueOS, where you can encounter exactly the same behaviour with Chromium, you will find a . This is ␀-terminated. It is all one single argument.

Chromium is overwriting its arguments after a fork(), to give you something interesting to look at in the output of ps. It is using the setproctitle() library function. This is part of the BSD C libraries. It is not part of the GNU C library. On GNU C platforms, Chromium uses a setproctitle() of its own that overwrites the argv data directly.

setproctitle() is not in fact the right tool for this job, because it does not allow for setting more than one argument string. It sets the formatted "title" as the 0th argument and sets the argument count to 1. Everything is marshalled through the library function as one single argument.

This is not the only problem with setproctitle(). The FreeBSD/OpenBSD/NetBSD C library version also has an arbitrary 2KiB limitation, inherited straight from the old BSD sendmail program (from which the library function was originally lifted in the FreeBSD case), which is far too short for what Chromium often sets command lines to. And both Chromium's own and the FreeBSD/OpenBSD/NetBSD C library version have extra functionality, of the format string being a null pointer, that Chromium does not use (but, ironically, has to deal with in its own setproctitle() implementation nonetheless).

One can do a lot better with less code. The underlying system call on FreeBSD/TrueOS that the library function calls to do the work once it has constructed the argument data, is the sysctl() function, taking CTL_KERN, KERN_PROC, KERN_PROC_ARGS, and a process ID as the address. This can accept multiple ␀-terminated strings. I wrote a fairly simple setprocargv() function for my toolsets that employs this.

extern
void
setprocargv (
    size_t argc,
    const char * argv[]
) {
#if defined(__FreeBSD__) || defined(__DragonFly__)
    std::string s;
    for (size_t c(0); c < argc; ++c) {
        if (!argv[c]) break;
        s += argv[c];
        s += '\0';
    }
    const int oid[4] = { CTL_KERN, KERN_PROC, KERN_PROC_ARGS, getpid() };
    sysctl(oid, sizeof oid/sizeof *oid, 0, 0, s.data(), s.length());
#elif defined(__OpenBSD__) …

(OpenBSD/NetBSD do things the old way that FreeBSD/TrueOS used to, with a ps_strings structure in application memory, but it is still sysctl() that is the underlying system call used, to find the location of that structure.)

% /package/admin/nosh/command/exec foreground pause \; true &
[1] 30318
% hexdump -C /proc/30318/cmdline
00000000  66 6f 72 65 67 72 6f 75  6e 64 00 70 61 75 73 65  |foreground.pause|
00000010  00 3b 00 74 72 75 65 00                           |.;.true.|
00000018
% hexdump -C /proc/30319/cmdline
00000000  70 61 75 73 65 00                                 |pause.|
00000006
%

Because setproctitle() is the wrong tool for the job, Chromium is taking the new argv members and constructing a single long ␠-delimited string of them, to be passed as a single argument to setproctitle().

  for (size_t i = 1; i < command_line->argv().size(); ++i) {
    if (!title.empty())
      title += " ";
    title += command_line->argv()[i];
  }
  // Disable prepending argv[0] with '-' if we prepended it ourselves above.
  setproctitle(have_argv0 ? "-%s" : "%s", title.c_str());

As you can see, Chromium itself already has the new argument vector as a series of ␀-terminated strings. It is passing it through an intermediate library layer that needs them all bunched up into one string, where the actual system call level nonetheless operates in terms of an argument vector of ␀-terminated strings.

Hence the behaviour that you are witnessing, where Chromium is presenting its altered argument vectors to the system as one single argument.

Perhaps you could persuade the writers of Chromium to adopt something like setprocargv(). ☺

Further reading

  • Peter Wemm (1995-12-16). setproctitle. FreeBSD Library Functions Manual. FreeBSD.
Related Question