Normally, tr
shouldn't be able to write that error message because it should have been killed by a SIGPIPE signal when trying to write something after the other end of the pipe has been closed upon termination of head
.
You get that error message because somehow, the process running tr
has been configured to ignore SIGPIPEs. I suspect that might be done by the popen()
implementation in your language there.
You can reproduce it by doing:
sh -c 'trap "" PIPE; tr -dc "[:alpha:]" < /dev/urandom | head -c 8'
You can confirm that's what is happening by doing:
strace -fe signal sh your-program
(or the equivalent on your system if not using Linux). You'll then see something like:
rt_sigaction(SIGPIPE, {SIG_IGN, ~[RTMIN RT_1], SA_RESTORER, 0x37cfc324f0}, NULL, 8) = 0
or
signal(SIGPIPE, SIG_IGN)
done in one process before that same process or one of its descendants executes the /bin/sh
that interprets that command line and starts tr
and head
.
If you do a strace -fe write
, you'll see something like:
write(1, "AJiYTlFFjjVIzkhCAhccuZddwcydwIIw"..., 4096) = -1 EPIPE (Broken pipe)
The write
system call fails with an EPIPE error instead of triggering a SIGPIPE.
In any case tr
will exit. When ignoring SIGPIPE, because of that error (but that also triggers an error message). When not, it exits upon receiving the SIGPIPE. You do want it to exit, since you don't want it carrying on reading /dev/urandom
after those 8 bytes have been read
by head
.
To avoid that error message, you can restore the default handler for SIGPIPE with:
trap - PIPE
Prior to calling tr
:
popen("trap - PIPE; { tr ... | head -c 8; } 2>&1", ...)
Newer derivatives of the OpenBSD netcat
, including FreeBSD[1] and Debian[2], support a -d
flag which prevents reading from stdin and fixes the problem you described.
The problem is that netcat is polling stdin as well as its "network" fd, and stdin is reopened from /dev/null
in the second case above, where the shell function is run in the background before the pipeline is created. That means an immediate EOF on the first read from stdin (fd 0), but netcat will continue to poll(2)
on the now-closed stdin, creating an endless loop.
Here is the redirection of stdin before the pipeline creation:
249 [pid 23186] open("/dev/null", O_RDONLY <unfinished ...>
251 [pid 23186] <... open resumed> ) = 3
253 [pid 23186] dup2(3, 0) = 0
254 [pid 23186] close(3) = 0
Now when netcat (pid 23187) calls its first poll(2)
, it reads EOF from stdin and closes fd 0:
444 [pid 23187] poll([{fd=4, events=POLLIN}, {fd=0, events=POLLIN}], 2, 4294967295) = 2 ([{fd=4, revents=POLLIN|POLLHUP}, {fd=0, revents=POLLIN}])
448 [pid 23187] read(0, <unfinished ...>
450 [pid 23187] <... read resumed> "", 2048) = 0
456 [pid 23187] close(0 <unfinished ...>
458 [pid 23187] <... close resumed> ) = 0
The next call to accept(2)
yields a client on fd 0, which is now the lowest-numbered free fd:
476 [pid 23187] accept(3, <unfinished ...>
929 [pid 23187] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 0
Note here that netcat is now including fd 0 in the args to poll(2)
twice: once for STDIN_FILENO
, which is always included in the absence of the -d
command-line parameter, and once for the newly-connected client:
930 [pid 23187] poll([{fd=0, events=POLLIN}, {fd=0, events=POLLIN}], 2, 4294967295) = 2 ([{fd=0, revents=POLLIN|POLLHUP}, {fd=0, revents=POLLIN|POLLHUP}])
The client sends EOF and netcat disconnects:
936 [pid 23187] read(0, <unfinished ...>
938 [pid 23187] <... read resumed> "", 2048) = 0
940 [pid 23187] shutdown(0, SHUT_WR <unfinished ...>
942 [pid 23187] <... shutdown resumed> ) = 0
944 [pid 23187] close(0 <unfinished ...>
947 [pid 23187] <... close resumed> ) = 0
But now it's in trouble because it will continue to poll on fd 0, which is now closed. The netcat code does not handle the case of POLLNVAL
being set in the .revents
member of struct pollfd
, so it gets into an endless loop, never to call accept(2)
again:
949 [pid 23187] poll([{fd=0, events=POLLIN}, {fd=-1}], 2, 4294967295 <unfinished ...>
951 [pid 23187] <... poll resumed> ) = 1 ([{fd=0, revents=POLLNVAL}])
953 [pid 23187] poll([{fd=0, events=POLLIN}, {fd=-1}], 2, 4294967295 <unfinished ...>
955 [pid 23187] <... poll resumed> ) = 1 ([{fd=0, revents=POLLNVAL}])
...
In the first command, where the pipeline is backgrounded but is not run in a shell function, stdin is left open, so this case doesn't arise.
Code references (see the readwrite
function):
- http://svnweb.freebsd.org/base/head/contrib/netcat/
- https://sources.debian.net/src/netcat-openbsd/1.105-7/
Best Answer
A 141 exit code indicates that the process failed with
SIGPIPE
; this happens toyes
when the pipe closes. To mask this for your CI, you need to mask the error using something likeThis will run
yes phrase
, and if it fails, run:
which exits with code 0. This is safe enough sinceyes
doesn’t have much cause to fail apart from being unable to write.To debug pipe issues such as these, the best approach is to look at
PIPESTATUS
:This will show the exit codes for all parts of the pipe on failure. Those which fail with exit code 141 can then be handled appropriately. The generic handling pattern for a specific error code is
(thanks Hauke Laging); this runs
command
, and exits with code 0 ifcommand
succeeds or if it exits with code 141. Other exit codes are reflected as-is.