Bash – How Does Pipe Work in Bash Commands?

bashpipeshell-script

Does anything symbolic happen in chaining bash commands via a pipe or is it all compute-pass-compute-pass?

For example in head t.txt -n 5 | tail -n 2, is head t.txt -n 5 getting computed and then tail -n 2 executes over it. Or first there is some abstraction to tell the shell that lines 3 to 5 are to be read? It might not make a difference in this example, but I guess can in other scenarios.

Best Answer

The shell uses the pipe(2) system call to create a bounded buffer in the kernel with two file descriptors, one to enable processes to write to the buffer, and another to enable processes to read from the buffer.

Consider a simple case:

$ p1 | p2

In this case, conceptually, the shell creates the above-mentioned pipe, fork()s, the child connects its standard output stream to the write-end of the pipe, then the child exec()s p1. Next, the shell fork()s again, the child connects its standard input stream to the read-end of the pipe, then the child exec()s p2. (I say conceptually because shells might do things in different orders, but the idea is the same.)

At that point, p1 and p2 are running concurrently. p1 will write to the pipe, and the kernel will copy the written data to the buffer. p2 will read from the pipe, and the kernel will copy the read data from the buffer. If the pipe gets full, then the kernel will block p1 in its call to write() until p2 reads something from the pipe, freeing up some space. If the pipe is empty, then the kernel will block p2 in its call to read() until p1 writes more data to the pipe.

Related Solutions

Shell – How to concatenate two files on the fly and reference result as new file

You could possibly write a script that sits behind a named pipe and dumps the contents of both staticEntries.dic and dynamicEntries.dic whenever it's opened and read from. Take note of the pipe being closed and terminate output until it is opened again.

But you'd have to leave that script running in the background, and remember to start it up again after logout/login or reboot.

More importantly, it is not a novice shell programming task.

Sometimes (usually), the simplest solution is best.

It is far simpler to just create a Makefile that defines mydict.dic as being dependant on the other two files and remembering to run make to update it when you need it. or just a shell script - the advantage of a Makefile is that you could also run it from cron and it would only update the target file (mydict.dic) if either of the source files had changed.

for example:

#!/usr/bin/make -f

all: mydict.dic

mydict.dic: staticEntries.dic dynamicEntries.dic
        cat staticEntries.dic dynamicEntries.dic > mydict.dic.tmp
        mv mydict.dic.tmp mydict.dic

the lines with cat and mv start with a tab, not spaces.

The concatenated file is created as a tempfile first and then moved into place, so the replacement of the old with the new is an atomic operation. this is done so that whenever you use the file, you have either the complete old version or the complete new version, but never a partial version of the new.

if either of the source .dic files are in a different directory, you'll need to specify the full pathnames to the files.

Bash – Using “ifne” in a pipeline – running multiple commands

ifne doesn't set an exit code based on whether the input is empty or not, so && and || aren't going to work as hoped. An alternate approach to Babyy's answer is to use pee from the same package:

printf "asdf\n" | pee 'ifne cat -' 'ifne echo "stream not empty"'

This works like tee, but duplicates the input stream into a number of pipes, treating each argument as a command to run. (tpipe is a similar command, but behaves slightly differently.)

A possible issue though is that each of the commands may be writing to stdout in parallel, depending on buffering and length of input/output there is a chance that output will be interleaved, or vary from run to run (effectively a race). This can probably be eliminated using sponge (same package) instead of cat, and/or other buffering/unbuffering solutions. It affects the example you gave, but may not affect your real use-case.

Best Answer

Related Solutions

Shell – How to concatenate two files on the fly and reference result as new file

Bash – Using “ifne” in a pipeline – running multiple commands

Related Question