I'm going to walk you through a somewhat complex example, based on a real life scenario.
Problem
Let's say the command conky
stopped responding on my desktop, and I want to kill it manually. I know a little bit of Unix, so I know that what I need to do is execute the command kill <PID>
. In order to retrieve the PID, I can use ps
or top
or whatever tool my Unix distribution has given me. But how can I do this in one command?
Answer
$ ps aux | grep conky | grep -v grep | awk '{print $2}' | xargs kill
DISCLAIMER: This command only works in certain cases. Don't copy/paste it in your terminal and start using it, it could kill processes unsuspectingly. Rather learn how to build it.
How it works
1- ps aux
This command will output the list of running processes and some info about them. The interesting info is that it'll output the PID of each process in its 2nd column. Here's an extract from the output of the command on my box:
$ ps aux
rahmu 1925 0.0 0.1 129328 6112 ? S 11:55 0:06 tint2
rahmu 1931 0.0 0.3 154992 12108 ? S 11:55 0:00 volumeicon
rahmu 1933 0.1 0.2 134716 9460 ? S 11:55 0:24 parcellite
rahmu 1940 0.0 0.0 30416 3008 ? S 11:55 0:10 xcompmgr -cC -t-5 -l-5 -r4.2 -o.55 -D6
rahmu 1941 0.0 0.2 160336 8928 ? Ss 11:55 0:00 xfce4-power-manager
rahmu 1943 0.0 0.0 32792 1964 ? S 11:55 0:00 /usr/lib/xfconf/xfconfd
rahmu 1945 0.0 0.0 17584 1292 ? S 11:55 0:00 /usr/lib/gamin/gam_server
rahmu 1946 0.0 0.5 203016 19552 ? S 11:55 0:00 python /usr/bin/system-config-printer-applet
rahmu 1947 0.0 0.3 171840 12872 ? S 11:55 0:00 nm-applet --sm-disable
rahmu 1948 0.2 0.0 276000 3564 ? Sl 11:55 0:38 conky -q
2- grep conky
I'm only interested in one process, so I use grep
to find the entry corresponding to my program conky
.
$ ps aux | grep conky
rahmu 1948 0.2 0.0 276000 3564 ? Sl 11:55 0:39 conky -q
rahmu 3233 0.0 0.0 7592 840 pts/1 S+ 16:55 0:00 grep conky
3- grep -v grep
As you can see in step 2, the command ps
outputs the grep conky
process in its list (it's a running process after all). In order to filter it, I can run grep -v grep
. The option -v
tells grep
to match all the lines excluding the ones containing the pattern.
$ ps aux | grep conky | grep -v grep
rahmu 1948 0.2 0.0 276000 3564 ? Sl 11:55 0:39 conky -q
NB: I would love to know a way to do steps 2 and 3 in a single grep
call.
4- awk '{print $2}'
Now that I have isolated my target process. I want to retrieve its PID. In other words I want to retrieve the 2nd word of the output. Lucky for me, most (all?) modern unices will provide some version of awk
, a scripting language that does wonders with tabular data. Our task becomes as easy as print $2
.
$ ps aux | grep conky | grep -v grep | awk '{print $2}'
1948
5- xargs kill
I have the PID. All I need is to pass it to kill
. To do this, I will use xargs
.
xargs kill
will read from the input (in our case from the pipe), form a command consisting of kill <items>
(<items>
are whatever it read from the input), and then execute the command created. In our case it will execute kill 1948
. Mission accomplished.
Final words
Note that depending on what version of unix you're using, certain programs may behave a little differently (for example, ps
might output the PID in column $3). If something seems wrong or different, read your vendor's documentation (or better, the man
pages). Also be careful as long pipes can be dangerous. Don't make any assumptions especially when using commands like kill
or rm
. For example, if there was another user named 'conky' (or 'Aconkyous') my command may kill all his running processes too!
What I'm saying is be careful, especially for long pipes. It's always better to build it interactively as we did here, than make assumptions and feel sorry later.
Best Answer
When the data producer (
tar
) tries to write to the pipe too quickly for the consumer (lzip
) to have time to read all of it, it will block untillzip
has had time to read whattar
is writing. There is a small buffer associated with the pipe, but its size is likely to be smaller than the size of mosttar
archives. There is no risk of filling up your system's RAM with your pipeline."Blocking" simply means that when
tar
does a call to thewrite()
library function (or equivalent), the call won't return until the data has been delivered to the pipe buffer, which could take a bit of time iflzip
is slow to read from that same buffer. You should be able to see this intop
wheretar
would slow down and sleep a lot compared tolzip
(assumingtar
is in fact quicker thanlzip
).You would therefore not fill up a significant amount of RAM with your pipeline. To do that (if you wanted to), you could use something like
pv
in the middle, with some large buffer (here, a gigabyte):This would still block
tar
wheneverpv
blocks.pv
would block when its buffer is full and it can't write tolzip
.The reverse situation works in a similar way, i.e. if you have a slow left-hand side of a pipe writing to a fast right-hand side, the consumer on the right would block on
read()
until there is data to be read from the pipe.This (data I/O) is the only thing that synchronises the processes taking part in a pipeline. Apart from reading and writing (and occasionally blocking while waiting for someone else to read or write), they would run independently of each other.