Summary: you're correct that receiving a signal is not transparent, neither in case i (interrupted without having read anything) nor in case ii (interrupted after a partial read). To do otherwise in case i would require making fundamental changes both to the architecture of the operating system and the architecture of applications.
The OS implementation view
Consider what happens if a system call is interrupted by a signal. The signal handler will execute user-mode code. But the syscall handler is kernel code and does not trust any user-mode code. So let's explore the choices for the syscall handler:
- Terminate the system call; report how much was done to the user code. It's up to the application code to restart the system call in some way, if desired. That's how unix works.
- Save the state of the system call, and allow the user code to resume the call. This is problematic for several reasons:
- While the user code is running, something could happen to invalidate the saved state. For example, if reading from a file, the file might be truncated. So the kernel code would need a lot of logic to handle these cases.
- The saved state can't be allowed to keep any lock, because there's no guarantee that the user code will ever resume the syscall, and then the lock would be held forever.
- The kernel must expose new interfaces to resume or cancel ongoing syscalls, in addition to the normal interface to start a syscall. This is a lot of complication for a rare case.
- The saved state would need to use resources (memory, at least); those resources would need to be allocated and held by the kernel but be counted against the process's allotment. This isn't insurmountable, but it is a complication.
- Note that the signal handler might make system calls that themselves get interrupted; so you can't just have a static resource allotment that covers all possible syscalls.
- And what if the resources cannot be allocated? Then the syscall would have to fail anyway. Which means the application would need to have code to handle this case, so this design would not simplify the application code.
- Remain in progress (but suspended), create a new thread for the signal handler. This, again, is problematic:
- Early unix implementations had a single thread per process.
- The signal handler would risk overstepping on the syscall's shoes. This is an issue anyway, but in the current unix design, it's contained.
- Resources would need to be allocated for the new thread; see above.
The main difference with an interrupt is that the interrupt code is trusted, and highly constrained. It's usually not allowed to allocate resources, or run forever, or take locks and not release them, or do any other kind of nasty things; since the interrupt handler is written by the OS implementer himself, he knows that it won't do anything bad. On the other hand, application code can do anything.
The application design view
When an application is interrupted in the middle of a system call, should the syscall continue to completion? Not always. For example, consider a program like a shell that's reading a line from the terminal, and the user presses Ctrl+C
, triggering SIGINT. The read must not complete, that's what the signal is all about. Note that this example shows that the read
syscall must be interruptible even if no byte has been read yet.
So there must be a way for the application to tell the kernel to cancel the system call. Under the unix design, that happens automatically: the signal makes the syscall return. Other designs would require a way for the application to resume or cancel the syscall at its leasure.
The read
system call is the way it is because it's the primitive that makes sense, given the general design of the operating system. What it means is, roughly, “read as much as you can, up to a limit (the buffer size), but stop if something else happens”. To actually read a full buffer involves running read
in a loop until as many bytes as possible have been read; this is a higher-level function, fread(3)
. Unlike read(2)
which is a system call, fread
is a library function, implemented in user space on top of read
. It's suitable for an application that reads for a file or dies trying; it's not suitable for a command line interpreter or for a networked program that must throttle connections cleanly, nor for a networked program that has concurrent connections and doesn't use threads.
The example of read in a loop is provided in Robert Love's Linux System Programming:
ssize_t ret;
while (len != 0 && (ret = read (fd, buf, len)) != 0) {
if (ret == -1) {
if (errno == EINTR)
continue;
perror ("read");
break;
}
len -= ret;
buf += ret;
}
It takes care of case i
and case ii
and few more.
Recent versions of the nVidia proprietary drivers (possibly combined with other recent versions of libraries) have a bug which causes them to corrupt the signal mask.
You can look at signal masks like this:
anthony@Zia:~$ ps -eo blocked,pid,cmd | egrep -v '^0+ '
BLOCKED PID CMD
fffffffe7ffbfeff 605 udevd --daemon
0000000000000002 4052 /usr/lib/policykit-1/polkitd --no-debug
0000000000087007 4646 /usr/sbin/mysqld --basedir=/usr […]
0000000000010000 15508 bash
That's about what it should look like. If you run that on a system with the proprietary nVidia drivers, you'll see all kinds of crazy values for BLOCKED
, for many of your programs—including, likely, all the misbehaving ones.
Note that signal masks are passed from parent to child through fork
/exec
, so once a parent process has a corrupt one, all the children it spawns from that point forward will, too.
See also my question After upgrade, X button in titlebar no longer closes xterm and various distro bugs you'll be able to find now, knowing which package to look at. You can modify the code in my answer to that question to reset the signal mask to none blocked (Elide sigaddset
and change SIG_UNBLOCK
to SIG_SETMASK
).
Best Answer
There doesn't seem to be a pure bash solution.
Ksh (both ksh93 and mksh) unblock all signals (tested on Debian wheezy), so if you can use ksh instead of bash, it will solve your problem.
If you can't change the fact that bash is invoked, you might be able to make bash execute ksh and make ksh execute the child process: replace
by
Beware of quoting issues!
Ksh is fast and easy to use, but often not part of the default installation. If that is an issue, you can use Perl instead, which is part of the default installation in most non-embedded Linux systems.