Interrupted system call

interruptsignalssystem-calls

I am reading APUE and the Interrupted System Calls chapter confuses me.

I would like to write down my understanding based on the book, please correct me.

A characteristic of earlier UNIX systems was that if a process caught
a signal while the process was blocked in a ‘‘slow’’ system call, the
system call was interrupted. The system call returned an error and
errno was set to EINTR. This was done under the assumption that since
a signal occurred and the process caught it, there is a good chance
that something has happened that should wake up the blocked system
call.

So it's saying that the earlier UNIX systems has a feature: if my program uses a system call, it would be interrupted/stopped, if at any time the program catches a signal. (Does default handler also count as a catch?)

For example, if I have a read system call, which reads 10GB data, when it's reading, I send any one of signals(e.g.kill -SIGUSR1 pid), then read would fail and return.

To prevent applications from having to handle interrupted system
calls, 4.2BSD introduced the automatic restarting of certain
interrupted system calls. The system calls that were automatically
restarted are ioctl, read, readv, write, writev, wait, and waitpid. As
we’ve mentioned, the first five of these functions are interrupted by
a signal only if they are operating on a slow device; wait and waitpid
are always interrupted when a signal is caught. Since this caused a
problem for some applications that didn’t want the operation restarted
if it was interrupted, 4.3BSD allowed the process to disable this
feature on a per-signal basis.

So before automatic restarting was introduced, I had to handle interrupted system call on my own. I need write code like:

The problem with interrupted system calls is that we now have to
handle the error return explicitly. The typical code sequence
(assuming a read operation and assuming that we want to restart the
read even if it’s interrupted) would be:

again:
    if ((n = read(fd, buf, BUFFSIZE)) < 0) {
        if (errno == EINTR)
        goto again; /* just an interrupted system call */
        /* handle other errors */
}

But nowadays I don't have to write this kind of code, beacause of the automatic restarting mechanism.

So if I my understanding are all correct, what/why should I care about interrupted system call now..? It seems the system/OS handles it automatically.

Best Answer

Interruption of a system call by a signal handler occurs only in the case of various blocking system calls, and happens when the system call is interrupted by a signal handler that was explicitly established by the programmer.

Furthermore, in the case where a blocking system call is interrupted by a signal handler, automatic system call restarting is an optional feature. You elect to automatically restart system calls by specifying the SA_RESTART flag when establishing the signal handler. As stated in (for example) the Linux signal(7) manual page:

   If  a  signal  handler  is  invoked while a system call or library
   function call is blocked, then either:

   * the call is automatically restarted  after  the  signal  handler
     returns; or

   * the call fails with the error EINTR.

   Which  of  these two behaviors occurs depends on the interface and
   whether or not  the  signal  handler  was  established  using  the
   SA_RESTART  flag (see sigaction(2)).

As hinted by the last sentence quoted above, even when you elect to use this feature, it does not work for all system calls, and the set of system calls for which it does work varies across UNIX implementations. The Linux signal(7) manual page notes a number of system calls that are automatically restarted when using the SA_RESTART flag, but also goes on to note various system calls that are never restarted, even if you specify that flag when establishing a handler, including:

   * "Input" socket interfaces, when a timeout (SO_RCVTIMEO) has been
     set  on  the  socket  using  setsockopt(2):  accept(2), recv(2),
     recvfrom(2), recvmmsg(2) (also with  a  non-NULL  timeout  argu‐
     ment), and recvmsg(2).

   * "Output"  socket  interfaces,  when  a timeout (SO_RCVTIMEO) has
     been set on the socket using setsockopt(2): connect(2), send(2),
     sendto(2), and sendmsg(2).

   * File   descriptor   multiplexing   interfaces:    epoll_wait(2),
     epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).

   * System  V  IPC  interfaces:  msgrcv(2), msgsnd(2), semop(2), and
     semtimedop(2).

For these system calls, manual restarting using a loop of the form described in APUE is essential, something like:

while ((ret = some_syscall(...)) == -1 && errno == EINTR)
    continue;
if (ret == -1)
    /* Handle error */ ;

Related Solutions

Difference between slow system calls and fast system calls

There are in fact three gradations in system calls.

Some system calls return immediately. “Immediately” means that the only thing they need is a little processor time. There's no hard limit to how long they can take (except in real-time systems), but these calls return as soon as they've been scheduled for long enough.
These calls are usually called non-blocking. Examples of non-blocking calls are calls that just read a bit of system state, or make a simple change to system state, such as getpid, gettimeofday, getuid or setuid. Some system calls can be blocking or non-blocking depending on the circumstances; for example read never blocks if the file is a pipe or other type that supports non-blocking reads and the O_NONBLOCK flag is set.
A few system calls can take a while to complete, but not forever. A typical example is sleep.
Some system calls will not return until some external event happens. These calls are said to be blocking. For example, read called on a blocking file descriptor is blocking, and so is wait.

The distinction between “fast” and “slow” system calls is close to non-blocking vs. blocking, but this time from the point of view of the kernel implementer. A fast syscall is one that is known to be able to complete without blocking or waiting. When the kernel encounters a fast syscall, it knows it can execute the syscall immediately and keep the same process scheduled. (In some operating systems with non-preemptive multitasking, fast syscalls may be non-preemptive; this is not the case in normal unix systems.) On the other hand, a slow syscall potentially requires waiting for another task to complete, so the kernel must prepare to pause the calling process and run another task.

Some cases are a bit of a gray area. For example a disk read (read from a regular file) is normally considered non-blocking, because it's not waiting for another process; it's only waiting for the disk, which normally takes only a little time to answer, but won't take forever (so that's case 2 above). But from the kernel's perspective, the process has to wait for the disk driver to complete, so it's definitely a slow syscall.

Interruption of system calls when a signal is caught

Summary: you're correct that receiving a signal is not transparent, neither in case i (interrupted without having read anything) nor in case ii (interrupted after a partial read). To do otherwise in case i would require making fundamental changes both to the architecture of the operating system and the architecture of applications.

The OS implementation view

Consider what happens if a system call is interrupted by a signal. The signal handler will execute user-mode code. But the syscall handler is kernel code and does not trust any user-mode code. So let's explore the choices for the syscall handler:

Terminate the system call; report how much was done to the user code. It's up to the application code to restart the system call in some way, if desired. That's how unix works.
Save the state of the system call, and allow the user code to resume the call. This is problematic for several reasons:
- While the user code is running, something could happen to invalidate the saved state. For example, if reading from a file, the file might be truncated. So the kernel code would need a lot of logic to handle these cases.
- The saved state can't be allowed to keep any lock, because there's no guarantee that the user code will ever resume the syscall, and then the lock would be held forever.
- The kernel must expose new interfaces to resume or cancel ongoing syscalls, in addition to the normal interface to start a syscall. This is a lot of complication for a rare case.
- The saved state would need to use resources (memory, at least); those resources would need to be allocated and held by the kernel but be counted against the process's allotment. This isn't insurmountable, but it is a complication.
  - Note that the signal handler might make system calls that themselves get interrupted; so you can't just have a static resource allotment that covers all possible syscalls.
  - And what if the resources cannot be allocated? Then the syscall would have to fail anyway. Which means the application would need to have code to handle this case, so this design would not simplify the application code.
Remain in progress (but suspended), create a new thread for the signal handler. This, again, is problematic:
- Early unix implementations had a single thread per process.
- The signal handler would risk overstepping on the syscall's shoes. This is an issue anyway, but in the current unix design, it's contained.
- Resources would need to be allocated for the new thread; see above.

The main difference with an interrupt is that the interrupt code is trusted, and highly constrained. It's usually not allowed to allocate resources, or run forever, or take locks and not release them, or do any other kind of nasty things; since the interrupt handler is written by the OS implementer himself, he knows that it won't do anything bad. On the other hand, application code can do anything.

The application design view

When an application is interrupted in the middle of a system call, should the syscall continue to completion? Not always. For example, consider a program like a shell that's reading a line from the terminal, and the user presses Ctrl+C, triggering SIGINT. The read must not complete, that's what the signal is all about. Note that this example shows that the read syscall must be interruptible even if no byte has been read yet.

So there must be a way for the application to tell the kernel to cancel the system call. Under the unix design, that happens automatically: the signal makes the syscall return. Other designs would require a way for the application to resume or cancel the syscall at its leasure.

The read system call is the way it is because it's the primitive that makes sense, given the general design of the operating system. What it means is, roughly, “read as much as you can, up to a limit (the buffer size), but stop if something else happens”. To actually read a full buffer involves running read in a loop until as many bytes as possible have been read; this is a higher-level function, fread(3). Unlike read(2) which is a system call, fread is a library function, implemented in user space on top of read. It's suitable for an application that reads for a file or dies trying; it's not suitable for a command line interpreter or for a networked program that must throttle connections cleanly, nor for a networked program that has concurrent connections and doesn't use threads.

The example of read in a loop is provided in Robert Love's Linux System Programming:

ssize_t ret;
while (len != 0 && (ret = read (fd, buf, len)) != 0) {
  if (ret == -1) {
    if (errno == EINTR)
      continue;
    perror ("read");
    break;
  }
  len -= ret;
  buf += ret;
}

It takes care of case i and case ii and few more.