Linux – Are pipe reads not greater than PIPE_BUF atomic

concurrencylinuxpipe

The GNU C library manual briefly mentioned that both reads and writes for a pipe are atomic

Reading or writing pipe data is atomic if the size of data written is not greater than PIPE_BUF.

However, the manual pages on Linux, like man 7 pipe, do not mentioned that reads are atomic and man 2 read explicitly states that read may return less than the requested amount if read was interrupted by a signal.

So are the read calls for a pipe with the read length under PIPE_BUF are truly atomic on Linux?

In particular, if a single writer to the pipe always write, for example, 12 byte chunk and there are 2 concurrent readers for the pipe that read the pipe by 12 bytes, do those readers either get exactly 12 byte read or an error like EAGAIN with no possibility of getting a partial read?

Also, what about the case when the writer writes by 12 byte chunks but concurrent readers try to read up to PIPE_BUF/12 chunks at once? Does a successful read then always return the exact multiplier of 12 bytes or can it return any number of bytes?

Best Answer

Looking at the source code, the implementation of pipe_read in source/fs/pipe.c has changed quite a bit in the Linux kernel, but from a quick reading of the code in 2.0.40, 2.4.37, 2.6.32, 3.11 and 4.9, it seems to me that whenever there has been (or is, while read is blocking) a write of size w and a read of size r with r > w then read will return at least w bytes. So if you have fixed-size chunks (of a size smaller than PIPE_BUF) and always make reads of that same size, then you are in practice guaranteed to always read a whole chunk.

On the other hand, if you have variable-sized chunks, then you have no such guarantee. There is a guarantee of atomicity only on the write side: a write of less than PIPE_BUF will not be cut by another writer. But on the reader side, if there have been e.g. a write of 10 bytes followed by a write of 20 bytes, and you later try to read 15 bytes, then you'll get the complete first write and the first 5 bytes of the second write. The read call doesn't stop reading data until it would have to block or its output buffer is full.

If you want to transmit data in chunks, use a datagram socket instead of a pipe.

Related Question