Understanding lsof during long operation on big file

iolsofntfs

Background

I'm running Musicbrainz Picard to update a ~11M ogg file on a 500GB NTFS disk (Transcend StoreJet) connected via USB and mounted using autofs. The connection is through docking station. I can't be sure I always unmounted it properly…

I'm concerned about is that the operation is taking extremely long; I expected the whole folder be processed under minute but it's taking maybe few hours already. When I fire iotop(1), it reports ~ 25K/s disk write, with ~99% for the picard process. (Picard is not totally hung, GUI does refresh/respond once in few minutes).

In hope to see some progress, I keep checking lsof; the whole output looks like:

$ lsof /mnt/greeno-ntfs
COMMAND  PID    USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
picard  2885 amahdal  mem    REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg
picard  2885 amahdal   14u   REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg
picard  2885 amahdal   16u   REG   8,17 11121609 44661 /mnt/greeno-ntfs/path/to.ogg

but I can't really make sense out of all observations — I can only make some assumptions. So I figured I'd ask here.

Questions

  • Is it normal that there are 3 FDs for the file? One 'mem' and two
    "regular" ones?

    I tried to create trivial script that opens and updates a file
    (sleeping in between to take long), and no, there was just one
    regular FD (5u), so apparently normal open won't behave like
    this.

    Assumption: It's a result of (maybe generic) technique to deal
    with potentially long file I/O that Picard (or its lib) deliberately
    explains. If so, can somebody shed some light on it? (Eg. why
    2+1?)

  • As I have noticed, the SIZE/OFFSET column is actually shrinking
    over time.

    Assumption: This corresponds to Picard actually seeking inside file
    and updating, right?

Random assumptions – at this point, does any of them make sense as possible cause?

  • Picard is buggy (extremely inefficient in updating/shrinking),

  • disk is failing (it is 5+ years old…),

  • the filesystem is is badly mounted (who knows how to mount ntfs
    optimally),

  • the filesystem is is broken from un/docking (can't check as I don't
    have chkdsk)…

So what next? What can I see next to learn about what's happening?

Best Answer

Your guesses are easy to test other than picard being buggy. If the file is really only 11 MB large, just copy it to /dev/shm (RAM) or a native filesystem to test it.

And I would certainly expect NTFS to be much slower and use far more CPU, but would not change minutes to hours normally. Avoiding ntfs should be the first thing to test.

And lsof output can be very weird... sometimes even "lsof -p $pid" vs "lsof /path" have different output even if the 2nd lists only that one pid. So I wouldn't try to figure out if that's "normal" or not.

If you want to know about seeking and writing of the file, you should use strace, not lsof.