How to `cat /proc/$pid/cmdline` take several seconds

procpsswap

I encountered this strange behavior yesterday on one of our servers. ps, pgrep and htop (on startup) were very slow. strace ps showed that read('/proc/$pid/cmdline) took several seconds on some processes. Why did this happen?

Some observations:

  • The processes executable was on NFS
  • The processes (about 20+) were doing unlink and symlink operations on files also on NFS, in parallel
  • They're forked from the same parent process
  • There're 80GB of RAM available (mostly cached), but swap (only 4GB) is in full use
  • I run while true; do cat /proc/$pid/status; sleep .1; done, cat returned immediately if State is S or R, but took several seconds when State is D

I did some Google'ing and found some SO answers suggesting that when State is D, reading /proc/$pid/cmdline would stall. Is that true? And how does that work? Why was /proc/$pid/cmdline, which was set before the program started, affected by what it was doing after that?

Best Answer

Same here, reading /proc/$pid/cmdline for a special $pid was very slow, even when State is R. And thanks to the links above which point out it might related to NUMA, I found out it caused by numad moving processes from nodes to nodes, this is from /var/log/numad.log :

Thu Jul 18 20:06:41 2019: Advising pid 9565 ($name) move from nodes (0-1) to nodes (1)
Thu Jul 18 20:06:45 2019: PID 9565 moved to node(s) 1 in 3.91 seconds
Thu Jul 18 20:11:50 2019: Advising pid 9565 ($name) move from nodes (1) to nodes (1)
Thu Jul 18 20:12:00 2019: PID 9565 moved to node(s) 1 in 9.72 seconds
Thu Jul 18 20:17:05 2019: Advising pid 9565 ($name) move from nodes (1) to nodes (1)
Thu Jul 18 20:17:23 2019: PID 9565 moved to node(s) 1 in 17.85 seconds
Thu Jul 18 20:22:28 2019: Advising pid 9565 ($name) move from nodes (1) to nodes (1)
Thu Jul 18 20:22:51 2019: PID 9565 moved to node(s) 1 in 22.73 seconds
Thu Jul 18 20:27:56 2019: Advising pid 9565 ($name) move from nodes (1) to nodes (1)
Thu Jul 18 20:28:23 2019: PID 9565 moved to node(s) 1 in 26.88 seconds
Thu Jul 18 20:33:28 2019: Advising pid 9565 ($name) move from nodes (1) to nodes (1)
Thu Jul 18 20:33:44 2019: PID 9565 moved to node(s) 1 in 15.49 seconds

When moving processes, read cmdline is slow, because cmdline is from user space and kernel need to lock(?) the page and read.

I guess the later move from the same node1 to node1 is needed because the process 9565 was on node1, but it might use remote memory.

% numactl -s
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cpubind: 0 1
nodebind: 0 1
membind: 0 1

Thanks.