Bash IO Redirection – Can’t Redirect Cut Output

bashcutio-redirectionnetcat

When I try to redirect the output of cut it always seems to be empty. If don't redirect it, the output shows in terminal as expected. This is true for OS X 10.10 and Linux 4.1.6.

This works:

root@karla:~# nc 10.0.2.56 30003
[...] lots of lines [...]

This works:

root@karla:~# nc 10.0.2.56 30003 | cat
[...] lots of lines [...]

This works:

root@karla:~# nc 10.0.2.56 30003 | cut -d, -f 15,16
[...] lots of lines [...]

This doesn't

root@karla:~# nc 10.0.2.56 30003 | cut -d, -f 15,16 | cat
[nothing]

This again DOES

root@karla:~# cat messung1 | cut -d, -f15,16 | cat
[...] lots of lines [...]

This is not limited to cat after cut. grep, tee and standard redirection using > don't work either.

What's wrong there?

Best Answer

It's not that much that there's no output as that it's coming in chunks.

Like many programs, when its output is no longer a terminal, cut buffers its output. That is, it only writes data when it has accumulated a buffer-full of it. Typically, something like 4 or 8 KiB though YMMV.

You can easily verify it by comparing:

(echo foo; sleep 1; echo bar) | cut -c2-

With:

(echo foo; sleep 1; echo bar) | cut -c2- | cat

In the first case, cut outputs oo\n and then ar\n one second later, while in the second case, cut outputs oo\nar\n after 1 seconds, that is when it sees the end of its input and flushes its output upon exit.

In your case, since stdin is nc, it would only see the end of its input when the connection is closed, so it would only start outputting anything after it has accumulated 4KiB worth of data to write.

To work around that, several approaches are possible.

On GNU or FreeBSD systems, you can use the stdbuf utility that can tweak the buffering behaviour of some commands (it doesn't work for all as it uses a LD_PRELOAD hack to pre-configure the stdio buffering behaviour).
```
... | stdbuf -oL cut -d, -f15,16 | cat
```
would tell cut to do a line-based buffering on its stdout.
some commands like GNU grep have options to affect their buffering. (--line-buffered in the case of GNU grep).
you can use a pseudo-tty wrapper to force the stdout of a command to be a terminal. However most of those solutions have some drawbacks and limitations. The unbuffer expect script for instance, often mentioned to address this kind of problem has a number of bugs for instance.

One that doesn't work too bad is when using socat as:
```
... | socat -u 'exec:"cut -d, -f15,16",pty,raw' -
```
You can replace your text utility with a higher-level text processing tool that has support for unbuffered output.
- GNU awk for instance has a fflush() function to flush its output. So your cut -d, -f15,16 could be written:
```
awk -F, -vOFS=, '{print $15,$16;fflush()}'
```
- if your awk lacks the fflush() function, you can use system("") instead. That's normally the command to execute a command. Here, we would be executing an empty command, but actually using it for the fact that awk flushes its stdout before running the command.
- or you can use perl:
```
perl -F, -lane 'BEGIN{$,=",";$|=1} print @F[14..15]'
```

Related Solutions

Bash – How to Use tee to Redirect to grep

$ ps aux | tee >(head -n1) | grep syslog
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND 
syslog     806  0.0  0.0  34600   824 ?        Sl   Sep07   0:00 rsyslogd -c4

The grep and head commands start at about the same time, and both receive the same input data at their own leisure, but generally, as data becomes available. There are some things that can introduce the 'unsynchronized' output which flips lines; for example:

The multiplexed data from tee actually gets sent to one process before the other, depending primarily on the implementation of tee. A simple tee implementation will read some amount of input, and then write it twice: Once to stdout and once to its argument. This means that one of those destinations will get the data first.

However, pipes are all buffered. It is likely that these buffers are 1 line each, but they might be larger, which can cause one of the receiving commands to see everything it needs for output (ie. the grepped line) before the other command (head) has received any data at all.
Notwithstanding the above, it's also possible that one of these commands receives the data but is unable to do anything with it in time, and then the other command receives more data and processes it quickly.

For example, even if head and grep are sent the data one line at a time, if head doesn't know how to deal with it (or gets delayed by kernel scheduling), grep can show its results before head even gets a chance to. To demonstrate, try adding a delay: ps aux | tee >(sleep 1; head -n1) | grep syslog This will almost certainly output the grep output first.

$ ps aux | tee >(grep syslog) | head -n1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

I believe you often only get one line here, because head receives the first line of input and then closes its stdin and exits. When tee sees that its stdout has been closed, it then closes its own stdin (output from ps) and exits. This could be implementation-dependent.

Effectively, the only data that ps gets to send is the first line (definitely, because head is controlling this), and maybe some other lines before head & tee close their stdin descriptors.

The inconsistency with whether the second line appears is introduced by timing: head closes stdin, but ps is still sending data. These two events are not well-synchronized, so the line containing syslog still has a chance of making it to tee's argument (the grep command). This is similar to the explanations above.

You can avoid this problem altogether by using commands that wait for all input before closing stdin/exiting. For example, use awk instead of head, which will read and process all its lines (even if they cause no output):

ps aux | tee >(grep syslog) | awk 'NR == 1'

But note that the lines can still appear out-of-order, as above, which can be demonstrated by:

ps aux | tee >(grep syslog) | (sleep 1; awk 'NR == 1')

Hope this wasn't too much detail, but there are a lot of simultaneous things interacting with each other. Separate processes run simultaneously without any synchronization, so their actions on any particular run can vary; sometimes it helps to dig deep into the underlying processes to explain why.

How to grep netcat output

You could use the read command (bash builtin) to force characters to be read one by one :

netcat localhost 9090 | (
    cnt=0
    line=
    while read -N 1 c; do
        line="$line$c"
        if [ "$c" = "{" ]; then
            cnt=$((cnt+1))
        elif [ "$c" = "}" ]; then
            cnt=$((cnt-1))
            if [ $cnt -eq 0 ]; then
                printf "%s\n" "$line"
                line=
            fi
        fi
    done
) | grep sender

This script should print every full output with balancing {and }, but you can change the script to do whatever you want. This script would NOT do well on a benchmark compared to pretty much anything, but it's pretty simple and seems to work for me...

Note that your test sample didn't have matching {and }, so if this is the case of the real input, you might want to change the criteria to print the line.

Best Answer

Related Solutions

Bash – How to Use tee to Redirect to grep

How to grep netcat output

Related Question