The "definitive" answer is of course brought to you by The Useless Use of cat
Award.
The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.
Instantiating cat just so your code reads differently makes for just one more process and one more set of input/output streams that are not needed. Typically the real hold-up in your scripts is going to be inefficient loops and actuall processing. On most modern systems, one extra cat
is not going to kill your performance, but there is almost always another way to write your code.
Most programs, as you note, are able to accept an argument for the input file. However, there is always the shell builtin <
that can be used wherever a STDIN stream is expected which will save you one process by doing the work in the shell process that is already running.
You can even get creative with WHERE you write it. Normally it would be placed at the end of a command before you specify any output redirects or pipes like this:
sed s/blah/blaha/ < data | pipe
But it doesn't have to be that way. It can even come first. For instance your example code could be written like this:
< data \
sed s/bla/blaha/ |
grep blah |
grep -n babla
If script readability is your concern and your code is messy enough that adding a line for cat
is expected to make it easier to follow, there are other ways to clean up your code. One that I use a lot that helps make scripts easiy to figure out later is breaking up pipes into logical sets and saving them in functions. The script code then becomes very natural, and any one part of the pipline is easier to debug.
function fix_blahs () {
sed s/bla/blaha/ |
grep blah |
grep -n babla
}
fix_blahs < data
You could then continue with fix_blahs < data | fix_frogs | reorder | format_for_sql
. A pipleline that reads like that is really easy to follow, and the individual components can be debuged easily in their respective functions.
I recommend reading a book on unix or Linux shell and command line usage, in order to learn basic usage and get a feeling for some advanced features. Then you can turn to reference documentation.
The usage of specific commands is described in their manual. man cat
will show the manual of the cat
command on your system. Manual pages are usually references, not tutorials, though they often contain examples. On Linux, cat --help
shows a terse usage message (meant for quick perusal when you already know the fundamentals and want to find an option for a specific task).
The POSIX standard specifies a minimum set of commands, options and shell features that every unix system is supposed to support. Most current systems by and large support POSIX:2004 (also known as Single UNIX version 3 and the Open Group Base Specifications issue 6). GNU software (the utilities found on Linux) often have many extensions to this minimum set.
There are common conventions for command-line arguments. POSIX specifies utility conventions that most utilities follow, in particular:
- Options consist of
-
followed by a single letter; -ab
is shorthand for -a -b
.
--
signifies the end of options. For example, in rm -- -a
, -a
is not an option but an operand, i.e. a file to act upon, so this commands removes the file called -a
.
- A lone
-
stands for standard input, where an input file is expected. It stands for standard output where an output file is expected.
GNU utilities and others also support “long options” of the form --name
. Some utilities go against the general convention and take multi-letter options with a single leading dash: -name
.
Redirection is a shell feature, so you'll find it in your shell's manual. <<<
to use a string as standard input is a ksh extension, also supported by bash and zsh. As long as the shell supports it, it can be used on any command.
Best Answer
The
cat file | command
syntax is considered a Useless Use ofCat
. Of all your options, it takes a performance hit because it has to spawn another process in the kernel. However insignificant this may turn out to be in the big picture, it's overhead the other forms don't have. This has been covered on questions such as: Should I care about unnecessary cats?Between the other two forms there are virtually no performance differences. STDIN is a special file node that the process has to open and read just like any other. Passing a file name instead of STDIN just makes it open a different file.
The difference would be in what features / flexibility you are looking for.
sed -i
for in-place editing. (Note: since this has to create a new file behind the scenes it's not a performance gain over other redirects but it is a convenience step.)sed [exp] < file1 file2
or evensed [exp] < <(grep command)
. Details of this use case can be found on this question: Process substitution and pipe