Ubuntu – Why pipes are used instead of input redirection

outputpiperedirect

I'm new to linux systems and I can't really understand why wee need two operators that can redirect output: pipe as | and ouput redirection operator >? Can't we just always use the second? Most of the times I see that the pipe is used if multiple commands are chained together. If however, the output is redirected to file, as in echo 'hello' > filename, the output redirection operator is used. What am I missing here?

Best Answer

The key point to remember is that pipes are inter-process communication device that allows two processes ( and that's what commands really are) to exchange data, while redirection operators are for manipulating where particular process writes.

In the video Unix Pipeline, the creator of awk language and one of the original people who worked on AT&T Unix Brian Kernighan explains:

First, you don't have to write one big massive program - you've got existing smaller programs that may already do parts of the job...Another is that it's possible that the amount of data you're procesing would not fit if you stored it in a file...because remember, we're back in the days when disks on these things had, if you were lucky, a Megabyte or two of data...So the pipeline never had to instantiate the whole output

As you can see, within the context which the pipelines were created, they actually were not just communication device, but also save storage space and simplify the development. Sure, we can use output/input redirection for everything (especially nowadays with storage capacity being in the range of Terabytes), however that would be inefficient from the storage point of view, and also processing speed - remember that you're directly feeding output from one command to another with |. Consider something like command1 | grep 'something'. If you write output of command1 first to a file, it will take time to write everything, then let grep go through the whole file. With pipeline and the fact that the output is buffered (meaning that left-side process pauses before right-side process is ready to read again), the output goes directly from one command to the other, saving time.

It is worth noting, that for inter-process communication, there's a use case of named pipes, to which you can use > operator to write from one command, and < to let another command read from it, and it's a use case where you do want to have particular destination on filesystem where multiple scripts/commands can write to and agree on that particular destination. But when it's unnecessary, anonymous pipe | is all you really need.

Related Solutions

Ubuntu – the difference between “Redirection” and “Pipe”

Pipe is used to pass output to another program or utility.

Redirect is used to pass output to either a file or stream.

Example: thing1 > thing2 vs thing1 | thing2

thing1 > thing2

Your shell will run the program named thing1
Everything that thing1 outputs will be placed in a file called thing2. (Note - if thing2 exists, it will be overwritten)

If you want to pass the output from program thing1 to a program called thing2, you could do the following:

thing1 > temp_file && thing2 < temp_file

which would

run program named thing1
save the output into a file named temp_file
run program named thing2, pretending that the person at the keyboard typed the contents of temp_file as the input.

However, that's clunky, so they made pipes as a simpler way to do that. thing1 | thing2 does the same thing as thing1 > temp_file && thing2 < temp_file

EDIT to provide more details to question in comment:

If > tried to be both "pass to program" and "write to file", it could cause problems in both directions.

First example: You are trying to write to a file. There already exists a file with that name that you wish to overwrite. However, the file is executable. Presumably, it would try to execute this file, passing the input. You'd have to do something like write the output to a new filename, then rename the file.

Second example: As Florian Diesch pointed out, what if there's another command elsewhere in the system with the same name (that is in the execute path). If you intended to make a file with that name in your current folder, you'd be stuck.

Thirdly: if you mis-type a command, it wouldn't warn you that the command doesn't exist. Right now, if you type ls | gerp log.txt it will tell you bash: gerp: command not found. If > meant both, it would simply create a new file for you (then warn it doesn't know what to do with log.txt).

Ubuntu – how to split input to two pipes

The way question reads it sounds like you want one stdin redirected to two different commands. If that's the case, take advantage of tee plus process substitution:

some-expensive-command | tee >(grep 'pattern' > output.txt) >(grep -v 'pattern' | another-command)

Process substitutions are in fact anonymous pipelines implemented within bash itself ( on subprocess level ). We can also make the use of a named pipeline + tee. For instance, in terminal A do

$ mkfifo named.fifo
$ cat /etc/passwd | tee named.fifo | grep 'root'

And in another terminal B do

$ grep -v 'root' named.fifo

Another way to look at this is by recognizing that grep is line pattern matching tool, so by reading line at a time and using that same line in multiple commands we can achieve exactly the same effect:

rm output.txt # get rid of file so that we don't add old and new output
some-expensive-command | while IFS= read -r line || [ -n "$line" ]; do
    printf "%s\n" "$line" | grep 'pattern' >> output.txt
    printf "%s\n" "$line" | grep -v 'pattern' | another-command
done
# or if another-command needs all of the output, 
# place `| another-comand` after `done` clause

Yet another way is to abandon grep and use something more powerful, like awk:

some-expensive-command | awk '/pattern/{print >> "output.txt"}; !/pattern/{print}' | another-command.

Practically speaking, don't worry about using temporary files, so long as you clean them up after using. If it works, it works.

Best Answer

Related Solutions

Ubuntu – the difference between “Redirection” and “Pipe”

Ubuntu – how to split input to two pipes

Related Question