How to Unambiguously Parse /proc//stat in Linux

proc

In linux procfs, /proc/<pid>/stat includes as its second argument the name of the process in parentheses. As far as I can tell (by experimentation) this is unescaped. For example, I have been able to create the following

$ gcc test.c -o 'hello) (world'
...
$ cat /proc/9115/stat
9115 (hello) (world) S 8282 9115 ...

(similarly gcc test.c -o 'name) S 42 23' can allow processes to accidentally or deliberately create fields which will probably mislead naive parsers).

I need to "get at" one of the later fields so need a correct way of skipping this field. I've searched for quite a while to find a reliable way of parsing this line, but have failed to find a canonical question or example.

However, from what I can tell ) is not valid in any field to the right of this field, so a scan from right to left to find the rightmost ) should correctly delimit this second field. Is this correct? This seems a little flaky to me (what if some new field allows ) at a later date)? Is there a better way to parse this file that I've overlooked?

Best Answer

The format of /proc/<pid>/stat is documented in the proc(5) manpage.

There cannot be another (...) field, nor could be added in the future, because that would make the format ambiguous. That's quite easy to see in.

The kernel code which formats the /proc/<pid>/stat file is in fs/proc/array.c.

The OP won't tell which language they're using. In perl, something like this could be used:

my @s = readfile("/proc/$pid/stat") =~ /(?<=\().*(?=\))|[^\s()]+/gs;

Notice the s: the "command" field can also contain newlines.

Related Question