How a Command Can Have Multiple Outputs

io-redirectionstdinstdout

In this answer, at the very bottom, Gilles mentions that a command can have more than one output or input.

Yes, there's cat foo bar | something, for having both foo and bar as inputs, and there's tee for outputs; but this doesn't really seem to be what he's talking about.

How can a program have more than one input or output?

Best Answer

The cat foo bar example is not what I meant. Here cat only has one input and one output at a time.

tee is an example: it outputs to all the arguments, plus its standard output at the same time. Using the same kind of ASCII art diagram as in my previous answer, here's how tee foo bar looks like when it's operating in a terminal.

   +------------------+    
   |       tee        |    
===|<stdin            |         +------------+
→  |                  |         |  terminal  |
   |           stdout>|=========|<input      |
   |                  |   → ##==|<           |
   |                  |     ||  +------------+
   |           stderr>|=====##
   |                  |   →
   |                  |       +-------------+
   |                3>|=======|> file "foo" |
   |                  |   →   +-------------+
   |                  |       +-------------+
   |                4>|=======|> file "bar" |
   |                  |   →   +-------------+
   |                  |    
   +------------------+

In this example, tee is sending “useful” output to three channels: to the terminal (because that's where its standard output is connected to), and to two files. In addition, tee has one more output channel for errors.

A program normally has three input/output channels, identified by their file descriptor number:

standard input (stdin for short, file descriptor number 0);
standard output (stdout for short, file descriptor number 1);
standard error (stderr for short, file descriptor number 2).

The purpose of file descriptors 0, 1 and 2 is only a matter of convention — nothing enforces that a program cannot attempt to write to file descriptor 0 or read from descriptors 1 and 2 — but this is a convention that is pretty much universally followed.

If you run a program from a terminal, file descriptors 0, 1 and 2 start out connected to that terminal, unless they have been redirected. Other file descriptors start out closed, and will be used if the program opens other files.

In particular, all commands have two outputs: standard output (for the command's payload, the “useful” output), and standard error (for error or informational messages).

A pipeline in the shell (command1 | command2 | command3 | …) connects each command's standard output to the next command's standard input. All commands' standard error goes to the terminal (unless redirected).

Shells provide ways to redirect other file descriptors. You've probably encountered 2>&1 or 2>file to redirect standard error. See When would you use an additional file descriptor? and the other posts it links to for examples of manipulations of other file descriptors.

Feature-rich shells also offer process substitution to generalize file redirection to piped commands, so that you aren't limited to a linear pipe with each command having a single input and a single output.

Very few commands attempt to access file descriptors above 2, except after they've opened a file (opening a file chooses a free file descriptor and returns its number to the application). One example is GnuPG, which expects to read the data to encrypt/decrypt/sign/verify on its standard input and to write the result to standard output. It can be told to read a passphrase on a different file descriptor with the --passphrase-fd option. GnuPG also has options to report status data on other file descriptors, so you can have the payload output on stdout, error messages on stderr, and status information on another file descriptor. Here's an example where the output of a piped command is used as a passphrase:

echo fjbeqsvfu | rot13 | gpg -d --passphrase-fd=3 3<&0 <file.encrypted >file.plaintext

Related Solutions

shell – How to Send stdout to Multiple Commands

You can use tee and process substitution for this:

cat file.txt | tee >(pbcopy) | grep errors

This will send all the output of cat file.txt to pbcopy, and you'll only get the result of grep on your console.

You can put multiple processes in the tee part:

cat file.txt | tee >(pbcopy) >(do_stuff) >(do_more_stuff) | grep errors

Shell – Concatenating Thousands of Files: > vs >>

The second example:

find . -name '*.txt' -print0 | xargs -0 cat > out.txt

Is completely legal and will recreate the file, out.txt each time it's run, while the first will concatenate to out.txt if it runs. But both commands are doing essentially the same thing.

What's confusing the issue is the xargs -0 cat. People think that the redirect to out.txt is part of that command when it isn't. The redirect is happening after xargs -o cat has taken input in via STDIN, and then cat'ing that output as a single stream out to STDOUT. The xargs is optimizing the cat'ing of the files not their output.

Here's an example that kind of shows what I'm saying. If we insert a pv -l in between the xargs -0 cat and the output to the file out.txt we can see how many lines cat has written.

Example

To show this I created a directory with 10,000 files in it.

for i in `seq -w 1 10000`;do echo "contents of file$i.txt" > file$i.txt;done

Each file looks similar to this:

$ more file00001.txt 
contents of file00001.txt

The output from pv:

$ find . -name '*.txt' -print0 | xargs -0 cat | pv -l > singlefile.rpt
  10k 0:00:00 [31.1k/s] [  <=>

As we can see, 10k lines were written out to my singlefile.rpt file. If xargs were passing us chunks of output, then we'd see that by a reduction in the number of lines that were being presented to pv.

Best Answer

Related Solutions

shell – How to Send stdout to Multiple Commands

Shell – Concatenating Thousands of Files: > vs >>

Example

Related Question