Bash – Can’t process stdout with pipe as it comes

bashfifopipeshelltshark

I'm running tshark on a fifo, and the following is a bare example of a loop that prints the output of tshark as it comes:

tshark -i $fifo | while read line; do
    echo $line
done

The problem appears when I add filters to tshark. This example prints all $lines only after tshark exits (IP address is hidden):

tshark -i $fifo -T fields -e text -R '
    ip.src == **.**.***.**          &&
    http.response.code == 200       &&
    http.content_encoding == "gzip" &&
    http.content_type contains "text/html"
' | while read line; do
    echo $line
done

I have tried in other forms with no luck:

while read line; do
    echo $line
done < <(tshark ...)

while read line; do
    echo $line
done <<<"$(tshark ...)"

Even grep prints the lines only after tshark ends:

tshark ... | grep .

I have tried running tshark without a pipe and the lines are printed correctly as they come. Why is the command after the pipe being feeded only after tshark exits?

Additional details: | tee works, but I get everything printed again when tshark exits, so it is not a good deal.

Best Answer

Check if your tshark version has the -l option for (nearly) line-buffered output.

Related Solutions

Shell – How to forward between processes with named pipes

If you get rid of the killing and shutdown stuff (which is unsafe and you may, in an extreme, but not unfathomable case when child.py dies before the (head -n 1 shutdown; kill -9 $parent) & subshell does end up kill -9ing some innocent process), then child.py won't be terminating because your parent.py isn't behaving like a good UNIX citizen.

The cat std_out & subprocess will have finished by the time you send the quit message, because the writer to std_out is child_original.py, which finishes upon receiving quit at which moment it closes its stdout, which is the std_out pipe and that close will make the cat subprocess finish.

The cat > std_in isn't finishing because it's reading from a pipe originating in the parent.py process and the parent.py process didn't bother to close that pipe. If it did, cat > stdin_in and consequently the whole child.py would finish by itself and you wouldn't need the shutdown pipe or the killing part (killing a process that isn't your child on UNIX is always a potential security hole if a race condition caused due to rapid PID recycling should occur).

Processes at the right end of a pipeline generally only finish once they're done reading their stdin, but since you're not closing that (child.stdin), you're implicitly telling the child process "wait, I have more input for you" and then you go kill it because it does wait for more input from you as it should.

In short, make parent.py behave reasonably:

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child.stdin.write(letter+'\n')
    child.stdin.flush()
    response = child.stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child.stdin.write('quit\n')
child.stdin.flush()
child.stdin.close()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

And your child.py can be as simple as

#!/bin/sh
cat std_out &
cat > std_in
wait #basically to assert that cat std_out has finished at this point

(Note that I got rid of that fd dup calls because otherwise you'd need to close both child.stdin and the child_stdin duplicate).

Since parent.py operates in line-oriented fashion, gnu cat is unbuffered (as mikeserv pointed out) and child_original.py operates in a line oriented fashion, you've effectively got the whole thing line-buffered.

Note on Cat: Unbufferred might not be the luckiest term, as gnu cat does use a buffer. What it doesn't do is try to get the whole buffer full before writing things out (unlike stdio). Basically it makes read requests to the os for a specific size (its buffer size), and writes whatever it receives without waiting to get a whole line or the whole buffer. (read(2) can be lazy and give you only what it can give you at the moment rather than the whole buffer you've asked for.)

(You can inspect the source code at http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat.c ; safe_read (used instead of plain read) is in the gnulib submodule and it's a very simple wrapper around read(2) that abstracts away EINTR (see the man page)).

Bash – How to Time a Pipe Command

It is working.

The different parts of a pipeline are executed concurrently. The only thing that synchronises/serialises the processes in the pipeline is IO, i.e. one process writing to the next process in the pipeline and the next process reading what the first one writes. Apart from that, they are executing independently of each other.

Since there is no reading or writing happening between the processes in your pipeline, the time take to execute the pipeline is that of the longest sleep call.

You might as well have written

time ( foo.sh & bar.sh &; wait )

Terdon posted a couple of slightly modified example scripts in the chat:

#!/bin/sh
# This is "foo.sh"
echo 1; sleep 1
echo 2; sleep 1
echo 3; sleep 1
echo 4

and

#!/bin/sh
# This is "bar.sh"
sleep 2
while read line; do
  echo "LL $line"
done
sleep 1

The query was "why does time ( sh foo.sh | sh bar.sh ) return 4 seconds rather than 3+3 = 6 seconds?"

To see what's happening, including the approximate time each command is executed, one may do this (the output contains my annotations):

$ time ( env PS4='$SECONDS foo: ' sh -x foo.sh | PS4='$SECONDS bar: ' sh -x bar.sh )
0 bar: sleep 2
0 foo: echo 1     ; The output is buffered
0 foo: sleep 1
1 foo: echo 2     ; The output is buffered
1 foo: sleep 1
2 bar: read line  ; "bar" wakes up and reads the two first echoes
2 bar: echo LL 1
LL 1
2 bar: read line
2 bar: echo LL 2
LL 2
2 bar: read line  ; "bar" waits for more
2 foo: echo 3     ; "foo" wakes up from its second sleep
2 bar: echo LL 3
LL 3
2 bar: read line
2 foo: sleep 1
3 foo: echo 4     ; "foo" does the last echo and exits
3 bar: echo LL 4
LL 4
3 bar: read line  ; "bar" fails to read more
3 bar: sleep 1    ; ... and goes to sleep for one second

real    0m4.14s
user    0m0.00s
sys     0m0.10s

So, to conclude, the pipeline takes 4 seconds, not 6, due to the buffering of the output of the first two calls to echo in foo.sh.

Best Answer

Related Solutions

Shell – How to forward between processes with named pipes

Bash – How to Time a Pipe Command

Related Question