A lot of command-line utilities can take their input either from a pipe or as a filename argument. For long shell scripts, I find starting the chain off with a cat
makes it more readable, especially if the first command would need multi-line arguments.
Compare
sed s/bla/blaha/ data \
| grep blah \
| grep -n babla
and
cat data \
| sed s/bla/blaha/ \
| grep blah \
| grep -n babla
Is the latter method less efficient? If so, is the difference enough to care about if the script is run, say, once a second? The difference in readability is not huge.
Best Answer
The "definitive" answer is of course brought to you by The Useless Use of
cat
Award.Instantiating cat just so your code reads differently makes for just one more process and one more set of input/output streams that are not needed. Typically the real hold-up in your scripts is going to be inefficient loops and actuall processing. On most modern systems, one extra
cat
is not going to kill your performance, but there isalmostalways another way to write your code.Most programs, as you note, are able to accept an argument for the input file. However, there is always the shell builtin
<
that can be used wherever a STDIN stream is expected which will save you one process by doing the work in the shell process that is already running.You can even get creative with WHERE you write it. Normally it would be placed at the end of a command before you specify any output redirects or pipes like this:
But it doesn't have to be that way. It can even come first. For instance your example code could be written like this:
If script readability is your concern and your code is messy enough that adding a line for
cat
is expected to make it easier to follow, there are other ways to clean up your code. One that I use a lot that helps make scripts easiy to figure out later is breaking up pipes into logical sets and saving them in functions. The script code then becomes very natural, and any one part of the pipline is easier to debug.You could then continue with
fix_blahs < data | fix_frogs | reorder | format_for_sql
. A pipleline that reads like that is really easy to follow, and the individual components can be debuged easily in their respective functions.