Piped commands run concurrently. When you run ps | grep …
, it's the luck of the draw (or a matter of details of the workings of the shell combined with scheduler fine-tuning deep in the bowels of the kernel) as to whether ps
or grep
starts first, and in any case they continue to execute concurrently.
This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example
grep pattern very-large-file | tr a-z A-Z
begins to display the matching lines in uppercase even before grep
has finished traversing the large file.
grep pattern very-large-file | head -n 1
displays the first matching line, and may stop processing well before grep
has finished reading its input file.
If you read somewhere that piped programs run in sequence, flee this document. Piped programs run concurrently and always have.
Easiest way would be to pipe through some program which sets nonblocking output.
Here is simple perl oneliner (which you can save as leakybuffer) which does so:
so your a | b
becomes:
a | perl -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | b
what is does is read the input and write to output (same as cat(1)
) but the output is nonblocking - meaning that if write fails, it will return error and lose data, but the process will continue with next line of input as we conveniently ignore the error. Process is kind-of line-buffered as you wanted, but see caveat below.
you can test with for example:
seq 1 500000 | perl -w -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | \
while read a; do echo $a; done > output
you will get output
file with lost lines (exact output depends on the speed of your shell etc.) like this:
12768
12769
12770
12771
12772
12773
127775610
75611
75612
75613
you see where the shell lost lines after 12773
, but also an anomaly - the perl didn't have enough buffer for 12774\n
but did for 1277
so it wrote just that -- and so next number 75610
does not start at the beginning of the line, making it little ugly.
That could be improved upon by having perl detect when the write did not succeed completely, and then later try to flush remaining of the line while ignoring new lines coming in, but that would complicate perl script much more, so is left as an exercise for the interested reader :)
Update (for binary files):
If you are not processing newline terminated lines (like log files or similar), you need to change command slightly, or perl will consume large amounts of memory (depending how often newline characters appear in your input):
perl -w -MFcntl -e 'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (read STDIN, $_, 4096) { print }'
it will work correctly for binary files too (without consuming extra memory).
Update2 - nicer text file output:
Avoiding output buffers (syswrite
instead of print
):
seq 1 500000 | perl -w -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { syswrite STDOUT,$_ }' | \
while read a; do echo $a; done > output
seems to fix problems with "merged lines" for me:
12766
12767
12768
16384
16385
16386
(Note: one can verify on which lines output was cut with: perl -ne '$c++; next if $c==$_; print "$c $_"; $c=$_' output
oneliner)
Best Answer
The only thing about your question that stands out as wrong is that you say
In fact, both programs would be started at pretty much the same time. If there's no input for
B
when it tries to read, it will block until there is input to read. Likewise, if there's nobody reading the output fromA
, its writes will block until its output is read (some of it will be buffered by the pipe).The only thing synchronising the processes that take part in a pipeline is the I/O, i.e. the reading and writing across the pipe. If no writing or reading happens, then the two processes will run totally independent of each other. If one ignores the reading or writing of the other, the ignored process will block and eventually be killed by a
SIGPIPE
signal (if writing) or get an end-of-file condition on its standard input stream (if reading) when the other process terminates.The idiomatic way to describe
A | B
is that it's a pipeline containing two programs. The output produced on standard output from the first program is available to be read on the standard input by the second ("[the output of]A
is piped into [the input of]B
"). The shell does the required plumbing to allow this to happen.If you want to use the words "consumer" and "producer", I suppose that's ok too.
The fact that these are programs written in C is not relevant. The fact that this is Linux, macOS, OpenBSD or AIX is not relevant.