Shell – Why do some commands not read from their standard input

argumentspipeshellstdin

I wonder what when we should use pipeline and when we shouldn't.

Say for instance, to kill certain process which handling pdf files, the following will not work by using pipeline:

ps aux | grep pdf | awk '{print $2}'|kill

Instead, we can only do it with following ways:

kill $(ps aux| grep pdf| awk '{print $2}')

or

ps aux | grep pdf | awk '{print $2}'| xargs kill

According to man bash ( version 4.1.2 ):

The standard output of command is connected via a pipe to the standard input of command2.

For above scenario:

  • the stdin of grep is the stdout of ps. That works.
  • the stdin of awk is the stdout of grep. That works.
  • the stdin of kill is the stdout of awk. That doesn't work.

The stdin of the following command is always getting input from the previous command's stdout.

  • Why doesn't it work with kill or rm?
  • What's the different between kill, rm input with grep, awk input?
  • Are there any rules?

Best Answer

This is an interesting question, and it deals with a part of the Unix/Linux philosophy.

So, what is the difference between programs like grep, sed, sort on the one hand and kill, rm, ls on the other hand? I see two aspects.

The filter aspect

  • The first kind of programs is also called filters. They take an input, either from a file or from STDIN, modify it, and generate some output, mostly to STDOUT. They are meant to be used in a pipe with other programs as sources and destinations.

  • The second kind of programs acts on an input, but the output they give is often not related to the input. kill has no output when it works regularly, neither does ls. The just have a return value to show success. They do not normally take input from STDIN, but mostly give output to STDOUT.

For programs like ls, the filter aspect does not work that good. It can certainly have an input (but does not need one), and the output is closely related to that input, but it does not work as a filter. However, for that kind of programs, the other aspect still works:

The semantic aspect

  • For filters, their input has no semantic meaning. They just read data, modify data, output data. It doesn't matter whether this is a list of numeric values, some filenames or HTML source code. The meaning of this data is only given by the code you provide to the filter: the regex for grep, the rules for awk or the Perl program.

  • For other programs, like kill or ls, their input has a meaning, a denotation. kill expects process numbers, ls expects file or path names. They cannot handle arbitrary data and they are not meant to. Many of them do not even need any input or parameters, like ps. They do not normally read from STDIN.

One could probably combine these two aspects: A filter is a program whose input does not have a semantic meaning for the program.

I'm sure I have read about this philosophy somewhere, but I don't remember any sources at the moment, sorry. If someone has some sources present, please feel free to edit.

Related Question