Shell – How to forward between processes with named pipes

bufferfifopipeshell

The /tmp/in, /tmp/out and /tmp/err are named pipes, already created and opened by some process (for reading, writing and writing, respectively).

I would like to create a new process that pipes its stdin into /tmp/in, and writes the contents of /tmp/out to its stdout, and the contents of /tmp/err to its stderr as they become available. Everything should work in a line buffered fashion. The process should exit when the other process, that created /tmp/in, stops reading and closes /tmp/in. The solution should work on Ubuntu, preferably without installing any extra package. I would like to solve it in a bash script.

mikeserv pointed out that without an SSCCE, it is hard to understand what I want. So, below is an SSCCE, but keep in mind that it is a minimal example so it is pretty silly.

The original setup

A parent process launches a child process and communicates with it through the child's stdin and stdout line-by-line. If I run it, I get:

$ python parent.py 
Parent writes to child:  a
Response from the child: A

Parent writes to child:  b
Response from the child: B

Parent writes to child:  c
Response from the child: C

Parent writes to child:  d
Response from the child: D

Parent writes to child:  e
Response from the child: E

Waiting for the child to terminate...
Done!
$

parent.py

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)
child_stdin  = os.fdopen(os.dup(child.stdin.fileno()), 'w')
child_stdout = os.fdopen(os.dup(child.stdout.fileno()))

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child_stdin.write(letter+'\n')
    child_stdin.flush()
    response = child_stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child_stdin.write('quit\n')
child_stdin.flush()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

child.py, must be executable!

#!/usr/bin/env python
from __future__ import print_function
from sys import stdin, stdout

while True:
    line = stdin.readline()
    if line == 'quit\n':
        quit()
    stdout.write(line.upper())
    stdout.flush()

The desired setup and a hackish solution

Neither the parent's source file nor the child's source file can be edited; it is not allowed.

I rename the child.py to child_original.py (and make it executable). Then, I put a bash script (a proxy or a middle man if you wish) called child.py, start the child_original.py myself before running python parent.py and have the parent.py call the fake child.py which is now my bash script, forwarding between the parent.py and the child_original.py.

The fake child.py

#!/bin/bash
parent=$$
cat std_out &
(head -n 1 shutdown; kill -9 $parent) &
cat >>std_in

The start_child.sh to start child_original.py before executing the parent:

#!/bin/bash
rm -f  std_in std_out shutdown
mkfifo std_in std_out shutdown
./child_original.py <std_in >std_out
echo >shutdown
sleep 1s
rm -f  std_in std_out shutdown

The way of executing them:

$ ./start_child.sh & 
[1] 7503
$ python parent.py 
Parent writes to child:  a
Response from the child: A

Parent writes to child:  b
Response from the child: B

Parent writes to child:  c
Response from the child: C

Parent writes to child:  d
Response from the child: D

Parent writes to child:  e
Response from the child: E

Waiting for the child to terminate...
Done!
$ echo 

[1]+  Done                    ./start_child.sh
$

This hackish solution works. As far as I know, it does not meet the line buffered requirement and there is an extra shutdown fifo to inform the start_child.sh that child_original.py has closed the pipes and start_child.sh can safely exit.

The question asks for an improved fake child.py bash script, meeting the requirements (line buffered, exits when the child_original.py closes any of the pipes, does not require an extra shutdown pipe).

Stuff I wish I had known:

If a high-level API is used for opening a fifo as a file, it must be opened for both reading and writing, otherwise the call to open already blocks. This is incredibly counter-intuitive. See also Why does a read-only open of a named pipe block?
In reality, my parent process is a Java application. If you work with an external process from Java, read the stdout and stderr of the external process from daemon threads (call setDamon(true) on those threads before starting them). Otherwise, the JVM will hang forever, even if everybody is done. Although unrelated to the question, other pitfalls include: Navigate yourself around pitfalls related to the Runtime.exec() method.
Apparently, unbuffered means buffered, but we don't wait until the buffer gets full but flush it as soon as we can.

Best Answer

If you get rid of the killing and shutdown stuff (which is unsafe and you may, in an extreme, but not unfathomable case when child.py dies before the (head -n 1 shutdown; kill -9 $parent) & subshell does end up kill -9ing some innocent process), then child.py won't be terminating because your parent.py isn't behaving like a good UNIX citizen.

The cat std_out & subprocess will have finished by the time you send the quit message, because the writer to std_out is child_original.py, which finishes upon receiving quit at which moment it closes its stdout, which is the std_out pipe and that close will make the cat subprocess finish.

The cat > std_in isn't finishing because it's reading from a pipe originating in the parent.py process and the parent.py process didn't bother to close that pipe. If it did, cat > stdin_in and consequently the whole child.py would finish by itself and you wouldn't need the shutdown pipe or the killing part (killing a process that isn't your child on UNIX is always a potential security hole if a race condition caused due to rapid PID recycling should occur).

Processes at the right end of a pipeline generally only finish once they're done reading their stdin, but since you're not closing that (child.stdin), you're implicitly telling the child process "wait, I have more input for you" and then you go kill it because it does wait for more input from you as it should.

In short, make parent.py behave reasonably:

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child.stdin.write(letter+'\n')
    child.stdin.flush()
    response = child.stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child.stdin.write('quit\n')
child.stdin.flush()
child.stdin.close()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

And your child.py can be as simple as

#!/bin/sh
cat std_out &
cat > std_in
wait #basically to assert that cat std_out has finished at this point

(Note that I got rid of that fd dup calls because otherwise you'd need to close both child.stdin and the child_stdin duplicate).

Since parent.py operates in line-oriented fashion, gnu cat is unbuffered (as mikeserv pointed out) and child_original.py operates in a line oriented fashion, you've effectively got the whole thing line-buffered.

Note on Cat: Unbufferred might not be the luckiest term, as gnu cat does use a buffer. What it doesn't do is try to get the whole buffer full before writing things out (unlike stdio). Basically it makes read requests to the os for a specific size (its buffer size), and writes whatever it receives without waiting to get a whole line or the whole buffer. (read(2) can be lazy and give you only what it can give you at the moment rather than the whole buffer you've asked for.)

(You can inspect the source code at http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat.c ; safe_read (used instead of plain read) is in the gnulib submodule and it's a very simple wrapper around read(2) that abstracts away EINTR (see the man page)).

Related Solutions

Shell Pipes – How to Understand Pipes

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

Problem with pipes. Pipe terminates when reader done

As for the cause, use strace.

tail -f | strace bash >> foo

The second echo echo hello > pToB gives me then this:

rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, "e", 1)                         = 1
read(0, "c", 1)                         = 1
read(0, "h", 1)                         = 1
read(0, "o", 1)                         = 1
read(0, " ", 1)                         = 1
read(0, "h", 1)                         = 1
read(0, "e", 1)                         = 1
read(0, "l", 1)                         = 1
read(0, "l", 1)                         = 1
read(0, "o", 1)                         = 1
read(0, "\n", 1)                        = 1
write(1, "hello\n", 6)                  = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3299, si_uid=1000} ---
+++ killed by SIGPIPE +++

So, the second time it tries to write hello\n, it gets a broken pipe error; that's why you can't read hello (it was never written), and bash quits so that's the end of it.

You'd have to use something that keeps the pipe open, I guess.

How about this?

(while read myline; do echo $myline; done) < pToP

For more background information, man 7 pipe may be relevant, it describes the various error cases around pipes.

Best Answer

Related Solutions

Shell Pipes – How to Understand Pipes

Problem with pipes. Pipe terminates when reader done

Related Question