Efficiency of lots of inotify watches or stat calls

efficiencyinotifystat

I am developing software that will utilize inotify to track changes on a large amount of files (tens to hundreds of thousands of files). I have come up with these ideas:

  • one watch per file
  • one watch per parent directory
  • avoid inotify and periodically scan the fs for changes (not preferred)

I will have a database of all of the files I am watching and some basic stat information (like mtime and size), however, I would have to stat each file in that directory until I found the one that changed.

Which would be faster, tons (100,000+) of inotify watches or tons of stat calls?

I'm thinking that reducing the number of stat calls would be better, but I don't know enough about inotify.

Note:

This will be running on a workstation, not a server. It's main purpose is to synchronize changes (potentially to an entire filesystem) between a client and a remote server.

Best Answer

When you read() an inotify fd, the name field of the returned struct tells you which file was modified relative to the directory being watched, so you shouldn't have to stat every file in a directory after the event.

See http://linux.die.net/man/7/inotify

Specifically:

struct inotify_event {
    int      wd;       /* Watch descriptor */
    uint32_t mask;     /* Mask of events */
    uint32_t cookie;   /* Unique cookie associating related
                          events (for rename(2)) */
    uint32_t len;      /* Size of 'name' field */
    char     name[];   /* Optional null-terminated name */
};

The name field is only present when an event is returned for a file inside a watched directory; it identifies the file pathname relative to the watched directory. This pathname is null-terminated, and may include further null bytes to align subsequent reads to a suitable address boundary.

Related Question