Command Piping – Good Examples of Piping Commands Together

command linepipe

If you were helping someone to learn the concept of pipes on the command line what example would you use? The example that actually came up was as follows:

cat whatever.txt | less

I feel like that's not the best example, namely because there's only one step. What's a good, but fundemental, use of |?

Ideally the example I'll present will use programs that have outputs themselves that can be run independently and then shown piped together.

Best Answer

I'm going to walk you through a somewhat complex example, based on a real life scenario.

Problem

Let's say the command conky stopped responding on my desktop, and I want to kill it manually. I know a little bit of Unix, so I know that what I need to do is execute the command kill <PID>. In order to retrieve the PID, I can use ps or top or whatever tool my Unix distribution has given me. But how can I do this in one command?

Answer

$ ps aux | grep conky | grep -v grep | awk '{print $2}' | xargs kill

DISCLAIMER: This command only works in certain cases. Don't copy/paste it in your terminal and start using it, it could kill processes unsuspectingly. Rather learn how to build it.

How it works

1- ps aux

This command will output the list of running processes and some info about them. The interesting info is that it'll output the PID of each process in its 2nd column. Here's an extract from the output of the command on my box:

$ ps aux
 rahmu     1925  0.0  0.1 129328  6112 ?        S    11:55   0:06 tint2
 rahmu     1931  0.0  0.3 154992 12108 ?        S    11:55   0:00 volumeicon
 rahmu     1933  0.1  0.2 134716  9460 ?        S    11:55   0:24 parcellite
 rahmu     1940  0.0  0.0  30416  3008 ?        S    11:55   0:10 xcompmgr -cC -t-5 -l-5 -r4.2 -o.55 -D6
 rahmu     1941  0.0  0.2 160336  8928 ?        Ss   11:55   0:00 xfce4-power-manager
 rahmu     1943  0.0  0.0  32792  1964 ?        S    11:55   0:00 /usr/lib/xfconf/xfconfd
 rahmu     1945  0.0  0.0  17584  1292 ?        S    11:55   0:00 /usr/lib/gamin/gam_server
 rahmu     1946  0.0  0.5 203016 19552 ?        S    11:55   0:00 python /usr/bin/system-config-printer-applet
 rahmu     1947  0.0  0.3 171840 12872 ?        S    11:55   0:00 nm-applet --sm-disable
 rahmu     1948  0.2  0.0 276000  3564 ?        Sl   11:55   0:38 conky -q

2- grep conky

I'm only interested in one process, so I use grep to find the entry corresponding to my program conky.

$ ps aux | grep conky
 rahmu     1948  0.2  0.0 276000  3564 ?        Sl   11:55   0:39 conky -q
 rahmu     3233  0.0  0.0   7592   840 pts/1    S+   16:55   0:00 grep conky

3- grep -v grep

As you can see in step 2, the command ps outputs the grep conky process in its list (it's a running process after all). In order to filter it, I can run grep -v grep. The option -v tells grep to match all the lines excluding the ones containing the pattern.

$ ps aux | grep conky | grep -v grep
 rahmu     1948  0.2  0.0 276000  3564 ?        Sl   11:55   0:39 conky -q

NB: I would love to know a way to do steps 2 and 3 in a single grep call.

4- awk '{print $2}'

Now that I have isolated my target process. I want to retrieve its PID. In other words I want to retrieve the 2nd word of the output. Lucky for me, most (all?) modern unices will provide some version of awk, a scripting language that does wonders with tabular data. Our task becomes as easy as print $2.

$ ps aux | grep conky | grep -v grep | awk '{print $2}'
 1948

5- xargs kill

I have the PID. All I need is to pass it to kill. To do this, I will use xargs.

xargs kill will read from the input (in our case from the pipe), form a command consisting of kill <items> (<items> are whatever it read from the input), and then execute the command created. In our case it will execute kill 1948. Mission accomplished.

Final words

Note that depending on what version of unix you're using, certain programs may behave a little differently (for example, ps might output the PID in column $3). If something seems wrong or different, read your vendor's documentation (or better, the man pages). Also be careful as long pipes can be dangerous. Don't make any assumptions especially when using commands like kill or rm. For example, if there was another user named 'conky' (or 'Aconkyous') my command may kill all his running processes too!

What I'm saying is be careful, especially for long pipes. It's always better to build it interactively as we did here, than make assumptions and feel sorry later.

Related Solutions

How Do Pipelines Limit Memory Usage? – Detailed Explanation

The data doesn’t need to be stored in RAM. Pipes block their writers if the readers aren’t there or can’t keep up; under Linux (and most other implementations, I imagine) there’s some buffering but that’s not required. As mentioned by mtraceur and JdeBP (see the latter’s answer), early versions of Unix buffered pipes to disk, and this is how they helped limit memory usage: a processing pipeline could be split up into small programs, each of which would process some data, within the limits of the disk buffers. Small programs take less memory, and the use of pipes meant that processing could be serialised: the first program would run, fill its output buffer, be suspended, then the second program would be scheduled, process the buffer, etc. Modern systems are orders of magnitude larger than the early Unix systems, and can run many pipes in parallel; but for huge amounts of data you’d still see a similar effect (and variants of this kind of technique are used for “big data” processing).

In your example,

sed 'simplesubstitution' file | sort | uniq > file2

sed reads data from file as necessary, then writes it as long as sort is ready to read it; if sort isn’t ready, the write blocks. The data does indeed live in memory eventually, but that’s specific to sort, and sort is prepared to deal with any issues (it will use temporary files it the amount of data to sort is too large).

You can see the blocking behaviour by running

strace seq 1000000 -1 1 | (sleep 120; sort -n)

This produces a fair amount of data and pipes it to a process which isn’t ready to read anything for the first two minutes. You’ll see a number of write operations go through, but very quickly seq will stop and wait for the two minutes to elapse, blocked by the kernel (the write system call waits).

Bash – Pass input to multiple commands and compare their outputs

Since the accepted answer is using perl, you can just as well do the whole thing in perl, without other non-standard tools and non-standard shell features, and without loading unpredictably long chunks of data in the memory, or other such horrible misfeatures.

The ytee script from the end of this answer, when used in this manner:

ytee command filter1 filter2 filter3 ...

will work just like

command <(filter1) <(filter2) <(filter3) ...

with its standard input piped to filter1, filter2, filter3, ... in parallel, as if it were with

tee >(filter1) >(filter2) >(filter3) ...

Example:

echo 'Line 1
Line B
Line iii' | ytee 'paste' 'sed s/B/b/g | nl' 'sed s/iii/III/ | nl'
     1  Line 1       1  Line 1
     2  Line b       2  Line B
     3  Line iii             3  Line III

This is also an answer for the two very similar questions: here and here.

ytee:

#! /usr/bin/perl
#   usage: ytee [-r irs] { command | - } [filter ..]
use strict;
if($ARGV[0] =~ /^-r(.+)?/){ shift; $/ = eval($1 // shift); die $@ if $@ }
elsif(! -t STDIN){ $/ = \0x8000 }
my $cmd = shift;
my @cl;
for(@ARGV){
    use IPC::Open2;
    my $pid = open2 my $from, my $to, $_;
    push @cl, [$from, $to, $pid];
}
defined(my $pid = fork) or die "fork: $!";
if($pid){
    delete $$_[0] for @cl;
    $SIG{PIPE} = 'IGNORE';
    my ($s, $n);
    while(<STDIN>){
        for my $c (@cl){
            next unless exists $$c[1];
            syswrite($$c[1], $_) ? $n++ : delete $$c[1]
        }
        last unless $n;
    }
    delete $$_[1] for @cl;
    while((my $p = wait) > 0){ $s += !!$? << ($p != $pid) }
    exit $s;
}
delete $$_[1] for @cl;
if($cmd eq '-'){
    my $n; do {
        $n = 0; for my $c (@cl){
            next unless exists $$c[0];
            if(my $d = readline $$c[0]){ print $d; $n++ }
            else{ delete $$c[0] }
        }
    } while $n;
}else{
    exec join ' ', $cmd, map {
        use Fcntl;
        fcntl $$_[0], F_SETFD, fcntl($$_[0], F_GETFD, 0) & ~FD_CLOEXEC;
        '/dev/fd/'.fileno $$_[0]
    } @cl;
    die "exec $cmd: $!";
}

notes:

code like delete $$_[1] for @cl will not only remove the file handles from the array, but will also close them immediately, because there's no other reference pointing to them; this is different from (properly) garbage collected languages like javascript.
the exit status of ytee will reflect the exit statuses of the command and filters; this could be changed/simplified.

Best Answer

Related Solutions

How Do Pipelines Limit Memory Usage? – Detailed Explanation

Bash – Pass input to multiple commands and compare their outputs

Related Question