Bash – How to copy all files in a folder excluding files which are being written

bashfile-copyfilesfindshell-script

I download multiple files to a folder downloading via HTTPie. A bash script aims to process downloaded files, and I tried to copy the downloaded files to another folder as

find /folder/downloading -type f -exec mv '{}' /folder/downloaded \;

but this also copies the files, which have not been finished yet. I tried to limit the transfer to older files by adding -mmin +5 to the command. What is the efficient command to leave the files are being written and transfer downloaded files only?

Best Answer

Not very efficient, but you could do:

find /folder/downloading -type f -exec sh -c '
  for file do
    lsof -F a "$file" | grep -q w || mv "$file" /folder/downloaded
  done' sh {} +

That is check that the file is not listed with a write access mode in the list of open files before moving.

The psmisc implementation of fuser as typically found on Linux-based operating systems has a -w function (to check for files open for writing) but unfortunately it only work with -k to kill the corresponding processes. However, it seems you can still use it by using the pseudo-signal 0 that does nothing:

find /folder/downloading -type f -exec sh -c '
  for file do
    fuser -s -w -k -0 "$file"  || mv "$file" /folder/downloaded
  done' sh {} +

Remove the -s (or even replace it with -v) if you want to see what process(es) is(are) preventing the move.

Note that if you're not running those commands as super-user, you will only get information about your processes. If the processes downloading the files are running as a different user, they will remain undetected.

Also note that unless you're moving the files to a different file system, moving the files will not prevent whatever process is currently writing to the file from finishing writing to it.

However, depending of what they've been designed to do afterwards, they might be confused if after they finish writing, the file is no longer there (for instance if they want to change some attributes of the file after downloading it and do so not via the file descriptor (like chmod() vs fchmod(), or utimes() which cannot be done via a file descriptor)).

Related Question