Pipe is used to pass output to another program or utility.
Redirect is used to pass output to either a file or stream.
Example: thing1 > thing2
vs thing1 | thing2
thing1 > thing2
- Your shell will run the program named
thing1
- Everything that
thing1
outputs will be placed in a file called thing2
. (Note - if thing2
exists, it will be overwritten)
If you want to pass the output from program thing1
to a program called thing2
, you could do the following:
thing1 > temp_file && thing2 < temp_file
which would
- run program named
thing1
- save the output into a file named
temp_file
- run program named
thing2
, pretending that the person at the keyboard typed the contents of temp_file
as the input.
However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2
does the same thing as thing1 > temp_file && thing2 < temp_file
EDIT to provide more details to question in comment:
If >
tried to be both "pass to program" and "write to file", it could cause problems in both directions.
First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.
Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.
Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt
it will tell you bash: gerp: command not found
. If >
meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt
).
The way question reads it sounds like you want one stdin redirected to two different commands. If that's the case, take advantage of tee
plus process substitution:
some-expensive-command | tee >(grep 'pattern' > output.txt) >(grep -v 'pattern' | another-command)
Process substitutions are in fact anonymous pipelines implemented within bash itself ( on subprocess level ). We can also make the use of a named pipeline + tee
. For instance, in terminal A do
$ mkfifo named.fifo
$ cat /etc/passwd | tee named.fifo | grep 'root'
And in another terminal B do
$ grep -v 'root' named.fifo
Another way to look at this is by recognizing that grep
is line pattern matching tool, so by reading line at a time and using that same line in multiple commands we can achieve exactly the same effect:
rm output.txt # get rid of file so that we don't add old and new output
some-expensive-command | while IFS= read -r line || [ -n "$line" ]; do
printf "%s\n" "$line" | grep 'pattern' >> output.txt
printf "%s\n" "$line" | grep -v 'pattern' | another-command
done
# or if another-command needs all of the output,
# place `| another-comand` after `done` clause
Yet another way is to abandon grep
and use something more powerful, like awk
:
some-expensive-command | awk '/pattern/{print >> "output.txt"}; !/pattern/{print}' | another-command.
Practically speaking, don't worry about using temporary files, so long as you clean them up after using. If it works, it works.
Best Answer
The key point to remember is that pipes are inter-process communication device that allows two processes ( and that's what commands really are) to exchange data, while redirection operators are for manipulating where particular process writes.
In the video Unix Pipeline, the creator of
awk
language and one of the original people who worked on AT&T Unix Brian Kernighan explains:As you can see, within the context which the pipelines were created, they actually were not just communication device, but also save storage space and simplify the development. Sure, we can use output/input redirection for everything (especially nowadays with storage capacity being in the range of Terabytes), however that would be inefficient from the storage point of view, and also processing speed - remember that you're directly feeding output from one command to another with
|
. Consider something likecommand1 | grep 'something'
. If you write output ofcommand1
first to a file, it will take time to write everything, then letgrep
go through the whole file. With pipeline and the fact that the output is buffered (meaning that left-side process pauses before right-side process is ready to read again), the output goes directly from one command to the other, saving time.It is worth noting, that for inter-process communication, there's a use case of named pipes, to which you can use
>
operator to write from one command, and<
to let another command read from it, and it's a use case where you do want to have particular destination on filesystem where multiple scripts/commands can write to and agree on that particular destination. But when it's unnecessary, anonymous pipe|
is all you really need.