Python – Custom lsof output

lsofposixpython

I need a list of the opened files, ports and so on by a process. Now whenever I use lsof -p <PID> I can parse the output, in a python script, but the problem is that sometimes I am getting some columns that are empty. Therefore I am getting bad results while parsing the output.

I know that I can manually look for the FDs in /proc for each process, but this has to be in POSIX standard. So my question is, is there anyway to make lsof print just the list of the opened files and nothing else?

I am thinking something like for example the user specific ps command (ps -eopid,user,comm,command), where we can specify what commands come in the output. In this case I want to specify only the 'Name' columns in the lsof -p <PID> output.

Best Answer

lsof has a "post-processable" output format with the -F option (see the OUTPUT FOR OTHER PROGRAMS section in the manual).

lsof -nPMp "$pid" -Fn | sed '
  \|^n/|!d
  s/ type=STREAM$//; t end
  s/ type=DGRAM$//; t end
  s/ type=SEQPACKET$//
  : end
  s|^n||'

Will list open files that resolve to a path on the file system.

  • -nPM disables some of the processing that lsof does by default and which we don't care for here like resolving IP adresses, port or rpc names.
  • -p "$pid", specify the process whose open files to list
  • -Fn: by field output. Ask for the name part.
  • | sed post process with sed to select only the part we're interested in:
  • \|^n/|!d: skip anything that doesn't start with n/
  • s/ type=...$/;t end: remove those strings at the end of the line and jump to the end label if successful.
  • : end: the end label.
  • s|^n||: remove the leading n character that lsof inserts to identify the field being output.

However note that non-printable characters in file names are encoded (like \n for newline, ^[ for ESC...) in an ambiguous way (as in ^[ could mean either ^[ and ESC).

Also, for deleted files, on Linux at least, you'll still get a file path but with (deleted) appended. Again, no way to discriminate between a deleted file and a file whose name ends in (deleted). Looking at the link count will not necessarily help as the deleted file could be linked elsewhere.

See also the removing of type=* which we do for Unix domain sockets that may have actually occurred in the file name.

What that means is that though it will work in most cases, you can't post-process that output reliably in the general case.

Not to mention that lsof itself may fail to parse the information returned by the kernel correctly, or that the kernel may fail to provide that information in a reliably parseable format

Related Question