In what order do piped commands run

pipeps

I've never really thought about how the shell actually executes piped commands. I've always been told that the "stdout of one program gets piped into the stdin of another," as a way of thinking about pipes. So naturally, I thought that in the case of say, A | B, A would run first, then B gets the stdout of A, and uses the stdout of A as its input.

But I've noticed that when people search for a particular process in ps, they'd include grep -v "grep" at the end of the command to make sure that grep doesn't appear in the final output.
This means that in the command ps aux | grep "bash" | grep -v "grep" it is implied that ps knew that grep was running and therefore is in the output of ps. But if ps finishes running before its output gets piped to grep, how did it know that grep was running?

flamingtoast@FTOAST-UBUNTU: ~$ ps | grep ".*"
PID TTY          TIME CMD
3773 pts/0    00:00:00 bash
3784 pts/0    00:00:00 ps
3785 pts/0    00:00:00 grep

Best Answer

Piped commands run concurrently. When you run ps | grep …, it's the luck of the draw (or a matter of details of the workings of the shell combined with scheduler fine-tuning deep in the bowels of the kernel) as to whether ps or grep starts first, and in any case they continue to execute concurrently.

This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example

grep pattern very-large-file | tr a-z A-Z

begins to display the matching lines in uppercase even before grep has finished traversing the large file.

grep pattern very-large-file | head -n 1

displays the first matching line, and may stop processing well before grep has finished reading its input file.

If you read somewhere that piped programs run in sequence, flee this document. Piped programs run concurrently and always have.

Related Solutions

Pass the output of previous command to next as an argument

You are confusing two very different types of inputs.

Standard input (stdin)
Command line arguments

These are different, and are useful for different purposes. Some commands can take input in both ways, but they typically use them differently. Take for example the wc command:

Passing input by stdin:
```
ls | wc -l
```
This will count the lines in the output of ls
Passing input by command line arguments:
```
wc -l $(ls)
```
This will count lines in the list of files printed by ls

Completely different things.

To answer your question, it sounds like you want to capture the rate from the output of the first command, and then use the rate as a command line argument for the second command. Here's one way to do that:

rate=$(command1 | sed -ne 's/^rate..\([0-9]*\)%.*/\1/p')
command2 -t "rate was $rate"

Explanation of the sed:

The s/pattern/replacement/ command is to replace some pattern
The pattern means: the line must start with "rate" (^rate) followed by any two character (..), followed by 0 or more digits, followed by a %, followed by the rest of the text (.*)
\1 in the replacement means the content of the first expression captured within $...$, so in this case the digits before the % sign
The -n flag of the sed command means to not print lines by default. The p at the end of the s/// command means to print the line if there was a replacement. In short, the command will print something only if there was a match.

Bash – eval limitation with piped commands

The problem here is actually an issue with the bash parser. There is no workaround other than editing and recompiling bash, and the 3333 limit is likely to be the same on all platforms.

The bash parser is generated with yacc (or, typically, with bison but in yacc mode). yacc parsers are bottom-up parsers, using the LALR(1) algorithm which builds a finite state machine with a pushdown stack. Loosely speaking, the stack contains all not-yet-reduced symbols, along with enough information to decide which productions to use to reduce the symbols.

Such parsers are optimized for left-recursive grammar rules. In the context of an expression grammar, a left-recursive rule applies to a left-associative operator, such as a−b in ordinary mathematics. That's left associative because the expression a−b−c groups ("associates") to the left, making it equal to (a−b)−c rather than a−(b−c). By contrast, exponentiation is right-associative, so that a^{b^c} is by convention evaluated as a^{(b^c)} rather than (a^b)^{^c}.

bash operators are process operators, rather than arithmetic operators; these include short-circuit booleans (&& and ||) and pipes (| and |&), as well as sequencing operators ; and &. Like mathematical operators, most of these associate to the left, but the pipe operators associate to the right, so that cmd1 | cmd2 | cmd3 is parsed as though it were cmd1 | { cmd2 | cmd3 ; } as opposed to { cmd1 | cmd2 ; } | cmd3. (Most of the time the difference is not important, but it is observable. [See Note 1])

To parse an expression which is a sequence of left associative operators, you only need a small parser stack. Every time you hit an operator, you can reduce (parenthesize, if you like) the expression to the left of it. By contrast, parsing an expression which is a sequence of right associative operators requires that you put all of the symbols onto the parser stack until you reach the end of the expression, because only then can you start reducing (inserting parentheses). (That explanation involves quite a bit of hand-waving, since it was intended to be non-technical, but it is based on the working of the real algorithm.)

Yacc parsers will resize their parser stack at runtime, but there is a compile-time maximum stack size, which by default is 10000 slots. If the stack reaches the maximum size, any attempt to expand it will trigger an out-of-memory error. Because | is right associative, an expression of the form:

statement | statement | ... | statement

will eventually trigger this error. If it were parsed in the obvious way, that would happen after 5,000 pipe symbols (with 5,000 statements). But because of the way the bash parser handles newlines, the actual grammar used is (roughly):

pipeline: command '|' optional_newlines pipeline

with the consequence that there is an optional_newlines grammar symbol after every |, so each pipe occupies three stack slots. Hence, the out-of-memory error is generated after 3,333 pipe symbols.

The yacc parser detects and signals the stack overflow, which it signals by calling yyerror("memory exhausted"). However, the bash implementation of yyerror tosses away the provided error message, and substitutes a message like "syntax error detected near unexpected token...". That's a bit confusing in this case.

Notes

The difference in associativity is most easily observed using the |& operator, which pipes both stderr and stdout. (Or, more accurately, duplicates stdout into stderr after establishing the pipe.) For a simple example, suppose that the file foo does not exist in the current directory. Then

# There is a race condition in this example. But it's not relevant.
$ ls foo | ls foo |& tr n-za-m a-z
ls: cannot access foo: No such file or directory
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
# Associated to the left:
$ { ls foo | ls foo ; } |& tr n-za-m a-z
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
# Associated to the right:
$ ls foo | { ls foo |& tr n-za-m a-z ; }
ls: cannot access foo: No such file or directory
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel

Best Answer

Related Solutions

Pass the output of previous command to next as an argument

Bash – eval limitation with piped commands

Notes

Related Question