Understanding lsof during long operation on big file

iolsofntfs

Background

I'm running Musicbrainz Picard to update a ~11M ogg file on a 500GB NTFS disk (Transcend StoreJet) connected via USB and mounted using autofs. The connection is through docking station. I can't be sure I always unmounted it properly…

I'm concerned about is that the operation is taking extremely long; I expected the whole folder be processed under minute but it's taking maybe few hours already. When I fire iotop(1), it reports ~ 25K/s disk write, with ~99% for the picard process. (Picard is not totally hung, GUI does refresh/respond once in few minutes).

In hope to see some progress, I keep checking lsof; the whole output looks like:

$ lsof /mnt/greeno-ntfs
COMMAND  PID    USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
picard  2885 amahdal  mem    REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg
picard  2885 amahdal   14u   REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg
picard  2885 amahdal   16u   REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg

but I can't really make sense out of all observations — I can only make some assumptions. So I figured I'd ask here.

Questions

Is it normal that there are 3 FDs for the file? One 'mem' and two
"regular" ones?

I tried to create trivial script that opens and updates a file
(sleeping in between to take long), and no, there was just one
regular FD (5u), so apparently normal open won't behave like
this.

Assumption: It's a result of (maybe generic) technique to deal
with potentially long file I/O that Picard (or its lib) deliberately
explains. If so, can somebody shed some light on it? (Eg. why
2+1?)
As I have noticed, the SIZE/OFFSET column is actually shrinking
over time.

Assumption: This corresponds to Picard actually seeking inside file
and updating, right?

Random assumptions – at this point, does any of them make sense as possible cause?

Picard is buggy (extremely inefficient in updating/shrinking),
disk is failing (it is 5+ years old…),
the filesystem is is badly mounted (who knows how to mount ntfs
optimally),
the filesystem is is broken from un/docking (can't check as I don't
have chkdsk)…

So what next? What can I see next to learn about what's happening?

Best Answer

Your guesses are easy to test other than picard being buggy. If the file is really only 11 MB large, just copy it to /dev/shm (RAM) or a native filesystem to test it.

And I would certainly expect NTFS to be much slower and use far more CPU, but would not change minutes to hours normally. Avoiding ntfs should be the first thing to test.

And lsof output can be very weird... sometimes even "lsof -p $pid" vs "lsof /path" have different output even if the 2nd lists only that one pid. So I wouldn't try to figure out if that's "normal" or not.

If you want to know about seeking and writing of the file, you should use strace, not lsof.

FUSE and its access rights

lsof by default checks all mounted file systems including FUSE - file systems implemented in user space which have special access rights in Linux.

As you can see in this answer on Ask Ubuntu a mounted GVFS file system (special case of FUSE) is normally accessible only to the user which mounted it (the owner of gvfsd-fuse). Even root cannot access it. To override this restriction it is possible to use mount options allow_root and allow_other. The option must be also enabled in the FUSE daemon which is described for example in this answer ...but in your case you do not need to (and should not) change the access rights.

Excluding file systems from lsof

In your case lsof does not need to check the GVFS file systems so you can exclude the stat() calls on them using the -e option (or you can just ignore the waring):

lsof -e /run/user/1000/gvfs

Checking certain files by lsof

You are using lsof to get information about all processes running on your system and only then you filter the complete output using grep. If you want to check just certain files and the related processes use the -f option without a value directly following it then specify a list of files after the "end of options" separator --. This will be considerably faster.

lsof -e /run/user/1000/gvfs -f -- /tmp/report.csv

General solution

To exclude all mounted file systems on which stat() fails you can run something like this (in bash):

x=(); for a in $(mount | cut -d' ' -f3); do test -e "$a" || x+=("-e$a"); done
lsof "${x[@]}" -f -- /tmp/report.csv

Or to be sure to use stat() (test -e could be implemented a different way):

x=(); for a in $(mount | cut -d' ' -f3); do stat --printf= "$a" 2>/dev/null || x+=("-e$a"); done

Background

Questions

Best Answer

Related Solutions

Centos – lsof – age of file

Lsof: WARNING: can’t stat() fuse.gvfsd-fuse file system

FUSE and its access rights

Excluding file systems from lsof

Checking certain files by lsof

General solution

Related Question