Bash – How Does Pipe Work in Bash Commands?

bashpipeshell-script

Does anything symbolic happen in chaining bash commands via a pipe or is it all compute-pass-compute-pass?

For example in head t.txt -n 5 | tail -n 2, is head t.txt -n 5 getting computed and then tail -n 2 executes over it. Or first there is some abstraction to tell the shell that lines 3 to 5 are to be read? It might not make a difference in this example, but I guess can in other scenarios.

Best Answer

The shell uses the pipe(2) system call to create a bounded buffer in the kernel with two file descriptors, one to enable processes to write to the buffer, and another to enable processes to read from the buffer.

Consider a simple case:

$ p1 | p2

In this case, conceptually, the shell creates the above-mentioned pipe, fork()s, the child connects its standard output stream to the write-end of the pipe, then the child exec()s p1. Next, the shell fork()s again, the child connects its standard input stream to the read-end of the pipe, then the child exec()s p2. (I say conceptually because shells might do things in different orders, but the idea is the same.)

At that point, p1 and p2 are running concurrently. p1 will write to the pipe, and the kernel will copy the written data to the buffer. p2 will read from the pipe, and the kernel will copy the read data from the buffer. If the pipe gets full, then the kernel will block p1 in its call to write() until p2 reads something from the pipe, freeing up some space. If the pipe is empty, then the kernel will block p2 in its call to read() until p1 writes more data to the pipe.

Related Question