You can use tee
and process substitution for this:
cat file.txt | tee >(pbcopy) | grep errors
This will send all the output of cat file.txt
to pbcopy
, and you'll only get the result of grep
on your console.
You can put multiple processes in the tee
part:
cat file.txt | tee >(pbcopy) >(do_stuff) >(do_more_stuff) | grep errors
The second example:
find . -name '*.txt' -print0 | xargs -0 cat > out.txt
Is completely legal and will recreate the file, out.txt
each time it's run, while the first will concatenate to out.txt
if it runs. But both commands are doing essentially the same thing.
What's confusing the issue is the xargs -0 cat
. People think that the redirect to out.txt
is part of that command when it isn't. The redirect is happening after xargs -o cat
has taken input in via STDIN, and then cat'ing that output as a single stream out to STDOUT. The xargs
is optimizing the cat'ing of the files not their output.
Here's an example that kind of shows what I'm saying. If we insert a pv -l
in between the xargs -0 cat
and the output to the file out.txt
we can see how many lines cat has written.
Example
To show this I created a directory with 10,000 files in it.
for i in `seq -w 1 10000`;do echo "contents of file$i.txt" > file$i.txt;done
Each file looks similar to this:
$ more file00001.txt
contents of file00001.txt
The output from pv
:
$ find . -name '*.txt' -print0 | xargs -0 cat | pv -l > singlefile.rpt
10k 0:00:00 [31.1k/s] [ <=>
As we can see, 10k lines were written out to my singlefile.rpt
file. If xargs
were passing us chunks of output, then we'd see that by a reduction in the number of lines that were being presented to pv
.
Best Answer
The
cat foo bar
example is not what I meant. Herecat
only has one input and one output at a time.tee
is an example: it outputs to all the arguments, plus its standard output at the same time. Using the same kind of ASCII art diagram as in my previous answer, here's howtee foo bar
looks like when it's operating in a terminal.In this example,
tee
is sending “useful” output to three channels: to the terminal (because that's where its standard output is connected to), and to two files. In addition,tee
has one more output channel for errors.A program normally has three input/output channels, identified by their file descriptor number:
The purpose of file descriptors 0, 1 and 2 is only a matter of convention — nothing enforces that a program cannot attempt to write to file descriptor 0 or read from descriptors 1 and 2 — but this is a convention that is pretty much universally followed.
If you run a program from a terminal, file descriptors 0, 1 and 2 start out connected to that terminal, unless they have been redirected. Other file descriptors start out closed, and will be used if the program opens other files.
In particular, all commands have two outputs: standard output (for the command's payload, the “useful” output), and standard error (for error or informational messages).
A pipeline in the shell (
command1 | command2 | command3 | …
) connects each command's standard output to the next command's standard input. All commands' standard error goes to the terminal (unless redirected).Shells provide ways to redirect other file descriptors. You've probably encountered
2>&1
or2>file
to redirect standard error. See When would you use an additional file descriptor? and the other posts it links to for examples of manipulations of other file descriptors.Feature-rich shells also offer process substitution to generalize file redirection to piped commands, so that you aren't limited to a linear pipe with each command having a single input and a single output.
Very few commands attempt to access file descriptors above 2, except after they've opened a file (opening a file chooses a free file descriptor and returns its number to the application). One example is GnuPG, which expects to read the data to encrypt/decrypt/sign/verify on its standard input and to write the result to standard output. It can be told to read a passphrase on a different file descriptor with the
--passphrase-fd
option. GnuPG also has options to report status data on other file descriptors, so you can have the payload output on stdout, error messages on stderr, and status information on another file descriptor. Here's an example where the output of a piped command is used as a passphrase: