From reading the man pages on the read()
and write()
calls it appears that these calls get interrupted by signals regardless of whether they have to block or not.
In particular, assume
- a process establishes a handler for some signal.
- a device is opened (say, a terminal) with the
O_NONBLOCK
not set (i.e. operating in blocking mode) - the process then makes a
read()
system call to read from the device and as a result executes a kernel control path in kernel-space. - while the precess is executing its
read()
in kernel-space, the signal for which the handler was installed earlier is delivered to that process and its signal handler is invoked.
Reading the man pages and the appropriate sections in SUSv3 'System Interfaces volume (XSH)', one finds that:
i. If a read()
is interrupted by a signal before it reads any data (i.e. it had to block because no data was available), it returns -1 with errno
set to [EINTR].
ii. If a read()
is interrupted by a signal after it has successfully read some data (i.e. it was possible to start servicing the request immediately), it returns the number of bytes read.
Question A):
Am I correct to assume that in either case (block/no block) the delivery and handling of the signal is not entirely transparent to the read()
?
Case i. seems understandable since the blocking read()
would normally place the process in the TASK_INTERRUPTIBLE
state so that when a signal is delivered, the kernel places the process into TASK_RUNNING
state.
However when the read()
doesn't need to block (case ii.) and is processing the request in kernel-space, I would have thought that the arrival of a signal and its handling would be transparent much like the arrival and proper handling of a HW interrupt would be. In particular I would have assumed that upon delivery of the signal, the process would be temporarily placed into user mode to execute its signal handler from which it would return eventually to finish off processing the interrupted read()
(in kernel-space) so that the read()
runs its course to completion after which the process returns back to the point just after the call to read()
(in user-space), with all of the available bytes read as a result.
But ii. seems to imply that the read()
is interrupted, since data is available immediately, but it returns returns only some of the data (instead of all).
This brings me to my second (and final) question:
Question B):
If my assumption under A) is correct, why does the read()
get interrupted, even though it does not need to block because there is data available to satisfy the request immediately?
In other words, why is the read()
not resumed after executing the signal handler, eventually resulting in all of the available data (which was available after all) to be returned?
Best Answer
Summary: you're correct that receiving a signal is not transparent, neither in case i (interrupted without having read anything) nor in case ii (interrupted after a partial read). To do otherwise in case i would require making fundamental changes both to the architecture of the operating system and the architecture of applications.
The OS implementation view
Consider what happens if a system call is interrupted by a signal. The signal handler will execute user-mode code. But the syscall handler is kernel code and does not trust any user-mode code. So let's explore the choices for the syscall handler:
The main difference with an interrupt is that the interrupt code is trusted, and highly constrained. It's usually not allowed to allocate resources, or run forever, or take locks and not release them, or do any other kind of nasty things; since the interrupt handler is written by the OS implementer himself, he knows that it won't do anything bad. On the other hand, application code can do anything.
The application design view
When an application is interrupted in the middle of a system call, should the syscall continue to completion? Not always. For example, consider a program like a shell that's reading a line from the terminal, and the user presses
Ctrl+C
, triggering SIGINT. The read must not complete, that's what the signal is all about. Note that this example shows that theread
syscall must be interruptible even if no byte has been read yet.So there must be a way for the application to tell the kernel to cancel the system call. Under the unix design, that happens automatically: the signal makes the syscall return. Other designs would require a way for the application to resume or cancel the syscall at its leasure.
The
read
system call is the way it is because it's the primitive that makes sense, given the general design of the operating system. What it means is, roughly, “read as much as you can, up to a limit (the buffer size), but stop if something else happens”. To actually read a full buffer involves runningread
in a loop until as many bytes as possible have been read; this is a higher-level function,fread(3)
. Unlikeread(2)
which is a system call,fread
is a library function, implemented in user space on top ofread
. It's suitable for an application that reads for a file or dies trying; it's not suitable for a command line interpreter or for a networked program that must throttle connections cleanly, nor for a networked program that has concurrent connections and doesn't use threads.The example of read in a loop is provided in Robert Love's Linux System Programming:
It takes care of
case i
andcase ii
and few more.