Efficiency of lots of inotify watches or stat calls

efficiencyinotifystat

I am developing software that will utilize inotify to track changes on a large amount of files (tens to hundreds of thousands of files). I have come up with these ideas:

one watch per file
one watch per parent directory
avoid inotify and periodically scan the fs for changes (not preferred)

I will have a database of all of the files I am watching and some basic stat information (like mtime and size), however, I would have to stat each file in that directory until I found the one that changed.

Which would be faster, tons (100,000+) of inotify watches or tons of stat calls?

I'm thinking that reducing the number of stat calls would be better, but I don't know enough about inotify.

Note:

This will be running on a workstation, not a server. It's main purpose is to synchronize changes (potentially to an entire filesystem) between a client and a remote server.

Best Answer

When you read() an inotify fd, the name field of the returned struct tells you which file was modified relative to the directory being watched, so you shouldn't have to stat every file in a directory after the event.

See http://linux.die.net/man/7/inotify

Specifically:

struct inotify_event {
    int      wd;       /* Watch descriptor */
    uint32_t mask;     /* Mask of events */
    uint32_t cookie;   /* Unique cookie associating related
                          events (for rename(2)) */
    uint32_t len;      /* Size of 'name' field */
    char     name[];   /* Optional null-terminated name */
};
The name field is only present when an event is returned for a file inside a watched directory; it identifies the file pathname relative to the watched directory. This pathname is null-terminated, and may include further null bytes to align subsequent reads to a suitable address boundary.

Related Solutions

How does inotify work

Inotify is an internal kernel facility. There is no “inotify file”. There are dedicated system calls inotify_init, inotify_add_watch and inotify_rm_watch that allow processes to register themselves to be notified when certain filesystem events happen. When the event happens, the process receives a description of the event through the file descriptor returned by inotify_init.

The OS isn't “told” that a file has been changed: it knows, because it's doing the changing. It's the application that's told that a file has been changed instead of having to go looking.

The program inotifywait provides a simple way to use inotify from the command line.

Linux – List current inotify watches (pathname, PID)

Maybe the fdinfo for the fd of the watch can be useful:

$ readlink /proc/$(pgrep inotify)/fd/3
anon_inode:inotify
$ cat /proc/$(pgrep inotify)/fdinfo/3
pos:    0
flags:  00
mnt_id: 11
inotify wd:1 ino:357a sdev:700000 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7a35000000000000

The sdev seems to be the major:minor device number combination, as seen in the output of lsblk, for example:

$ lsblk | grep 7
loop0    7:0    0  80.5M  1 loop /snap/core/2462

(I was indeed monitoring /snap/core/2462.)

For my /dev/sda1 which is 8:1, the output looked like so:

pos:    0
flags:  00
mnt_id: 11
inotify wd:1 ino:aae1b sdev:800001 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:1bae0a0038e16969

This should be sufficient to find out what's blocking unmounting, even though the specific directories or files being watched aren't listed.

Best Answer

Related Solutions

How does inotify work

Linux – List current inotify watches (pathname, PID)

Related Question