Linux – Does inotify fire a notification when a write is started or when it is completed

filesinotifylinuxstat

Imagine two processes, a reader and a writer, communicating via a regular file on an ext3 fs. Reader has an inotify IN_MODIFY watch on the file. Writer writes 1000 bytes to the file, in a single write() call. Reader gets the inotify event, and calls fstat on the file. What does Reader see?

  1. Is there any guarantee that Reader will get back at least 1000 for st_size on the file? From my experiments, it seems not.

  2. Is there any guarantee that Reader can actually read() 1000 bytes?

This is happening on a seriously I/O bound box. For example, sar shows an await times of about 1 second. In my case the Reader is actually waiting 10 seconds AFTER getting the inotify event before calling stat, and getting too-small results.

What I had hoped was that the inotify event would not be delivered until the file was ready. What I suspect is actually happening is that the inotify event fires DURING the write() call in the Writer, and the data is actually available to other processes on the system whenever it happens to be ready. In this case, 10s is not enough time.

I guess I am just looking for confirmation that the kernel actually implements inotify the way I am guessing. Also, if there are any options, possibly, to alter this behavior?

Finally- what is the point of inotify, given this behavior? You're reduced to polling the file/directory anyway, after you get the event, until the data is actually available. Might as well be doing that all along, and forget about inotify.

*** EDIT ****
Okay, as often happens, the behavior I am seeing actually makes sense, now that I understand what I am really doing. ^_^

I am actually responding to an IN_CREATE event on the directory the file lives in. So I am actually stat()'ing the file in response to the creation of the file, not necessarily the IN_MODIFY event, which may be arriving later.

I am going to change my code so that, once I get the IN_CREATE event, I will subscribe to IN_MODIFY on the file itself, and I won't actually attempt to read the file until I get the IN_MODIFY event. I realize that there is a small window there in which I may miss a write to the file, but this is acceptable for my application, because in the worst case, the file will be closed after a maximum number of seconds.

Best Answer

From what I see in the kernel source, inotify does only fire up after a write is completed (i.e. your guess is wrong). After the notification is triggered, only two more things happen in sys_write, the function that implements the write syscall: setting some scheduler parameters, and updating the position on the file descriptor. This code has been similar as far back as 2.6.14. By the time the notification fires, the file already has its new size.

Check for things that may go wrong:

  • Maybe the reader is getting old notifications, from the previous write.
  • If the reader calls stat and then calls read or vice versa, something might happen in between. If you keep appending to the file, calling stat first guarantees that you'll be able to read that far, but it's possible that more data has been written by the time the reader calls read, even if it hasn't yet received the inotify notification.
  • Just because the writer calls write doesn't mean that the kernel will write the requested number of characters. There are very few circumstances where atomic writes are guaranteed up to any size. Each write call is guaranteed atomic, however: at some point the data isn't written yet, and then suddenly n bytes have been written, where n is the return value of the write call. If you observe a partially-written file, it means that write returned less than its size argument.

Useful tools to investigate what's going on include:

  • strace -tt
  • the auditd subsystem
Related Question