Linux – Using Inotifywait for Large Number of Files in a Directory

awkinotifylinuxperformance

What I want to do is, to monitor a directory (not recursive, just one) for new files created and append those files to one single big file as they are being written.

The number of files that are being written is huge, could reach as much as 50,000.

By using inotifywait, I am monitoring the directory like:

inotifywait -m -e create ~/folder | awk '($2=="CREATE"){print $3}' > ~/output.file

So I am storing names of new files created in ~/output.file and then using a for loop

for FILE in `cat ~/output.file` 
do
    cat $FILE >> ~/test.out
done

It works fine, if the rate at which a file is being written (created) in ~/folder is like 1 file per second.

But the requirement is large, and the rate at which the files are being created is very high, like 500 files per minute (or even more).

I checked the number of files in the ~/folder after the process is complete, but it does not match the inotifywait output. There is a difference of like 10–15 files, varies.

Also, the loop

for FILE in `cat ~/output.file`
do
done

doesn't process all the files in ~/output.file as they are being written.

Can anyone please suggest me an elegant solution to this problem?

Best Answer

Is there a particular reason you are using:

 | awk '($2=="CREATE"){print $3}' > ~/output.file

instead inotifywait options like --format and --outfile ?

If I run:

inotifywait -m --format '%f' -e create /home/don/folder/ --outfile /home/don/output.file

then open another tab, cd to ~/folder and run:

time seq -w 00001 50000 | parallel touch {}

real    1m44.841s
user    3m22.042s
sys     1m34.001s

(so I get much more than 500 files per minute) everything works fine and output.file contains all the 50000 file names that I just created.
Once the process has finished writing the files to disk you can append them to your test.out (assuming you are always in ~/folder):

xargs < /home/don/output.file cat >> final.file

Or use read if you want to process files as they are created. So, while in ~/folder you could run:

inotifywait -m --format '%f' -e create ~/folder | while read file; do cat -- "$file" >> ~/test.out; done

Note that in inotifywait stable, -m and -t cannot be used together. Support for usage of both switches has been recently added so if you build inotify-tools from git you should be able to use monitor with timeout (to specify how long it has to wait for an appropriate event to occur before exiting). I've tested the git version on my system (exit if no create events occur within 2 seconds) and it works fine:

inotifywait -m -t 2 --format '%f' -e create ~/folder | while read file; do cat -- "$file" >> ~/test.out; done
Related Question