Your redirections have a race condition. This:
>(wc -l | awk '{print $1}' > n.txt)
runs in parallel with:
awk 'BEGIN{getline n < "n.txt"}...'
later in the pipeline. Sometimes, n.txt
is still empty when the awk
program starts running.
This is (obliquely) documented in the Bash Reference Manual. In a pipeline:
The output of each command in the pipeline is connected via a pipe to the input of the next command. That is, each command reads the previous command’s output. This connection is performed before any redirections specified by the command.
and then:
Each command in a pipeline is executed in its own subshell
(emphasis added). All the processes in the pipeline are started, with their input and output connected together, without waiting for any of the earlier programs to finish or even start doing anything. Before that, process substitution with >(...)
is:
performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion.
What that means is that the subprocess running the wc -l | awk ...
command starts early on, and the redirection empties n.txt
just before that, but the awk
process that causes the error is started shortly after. Both of those commands execute in parallel - you'll have several processes going at once here.
The error occurs when awk
runs its BEGIN
block before the wc
command's output has been written into n.txt
. In that case, the n
variable is empty, and so is zero when used as a number. If the BEGIN
runs after the file is filled in, everything works.
When that happens depends on the operating system scheduler, and which process gets a slot first, which is essentially random from the user perspective. If the final awk
gets to run early, or the wc
pipeline gets scheduled a little later, the file will still be empty when awk
starts doing its work and the whole thing will break. In all likelihood the processes will run on different cores actually simultaneously, and it's down to which one gets to the point of contention first. The effect you'll get is probably of the command working more often than not, but sometimes failing with the error you post.
In general, pipelines are only safe in so far as they're just pipelines - standard output into standard input is fine, but because the processes execute in parallel it's not reliable to rely on the sequencing of any other communication channels, like files, or of any part of any one process executing before or after any part of another unless they're locked together by reading standard input.
The workaround here is probably to do all your file writing in advance of needing them: at the end of a line, it's guaranteed that an entire pipeline and all of its redirections have completed before the next command runs. This command will never be reliable, but if you really do need it to work in this sort of a structure you can insert a delay (sleep
) or loop until n.txt
is non-empty before running the final awk
command to increase the chances of things working how you want.
Okay, let's break this down. A subshell executes its contents in a chain (i.e., it groups them). This actually makes intuitive sense as a subshell is created simply by surrounding the chain of commands with ()
. But, aside from the contents of the subshell being grouped together in execution, you can still use a subshell as if it were a single command. That is, a subshell still has an stdin
, stdout
and stderr
so you can pipe things to and from a subshell.
On the other hand, command substitution is not the same thing as simply chaining commands together. Rather, command substitution is meant to act a bit like a variable access but with a function call. Variables, unlike commands, do not have the standard file descriptors so you cannot pipe anything to or from a variable (generally speaking), and the same is true of command substitutions.
To try to make this more clear, what follows are a set of maybe-unclear (but accurate) examples and a set of, what I think may be, more easily-understood examples.
Let's say the date -u
command gives the following:
Thu Jul 2 13:42:27 UTC 2015
But, we want to manipulate the output of this command. So, let's pipe it into something like sed
:
user@host~> date -u | sed -e 's/ / /g'
Thu Jul 2 13:42:27 UTC 2015
Wow, that was fun! The following is completely equivalent to above (barring some environment differences that you can read about in the man pages about your shell):
user@host~> (date -u) | sed -e 's/ / /g'
Thu Jul 2 13:42:27 UTC 2015
That should be no surprise since all we did was group date -u
. However, if we do the following, we are going to get something that may seem a bit odd at first:
user@host~> $(date -u) | sed -e 's/ / /g'
command not found: Thu
This is because $(date -u)
is equivalent to typing out exactly what date -u
outputs. So the above is equivalent to the following:
user@host~> Thu Jul 2 13:42:27 UTC 2015 | sed -e 's/ / /g'
Which will, of course, error out because Thu
is not a command (at least not one I know of); and it certainly doesn't pipe anything to stdout
(so sed
will never get any input).
But, since we know that command substitutions act like variables, we can easily fix this problem because we know how to pipe the value of a variable into another command:
user@host~> echo $(date -u) | sed -e 's/ / /g'
Thu Jul 2 13:42:27 UTC 2015
But, as with any variable in bash, you should probably quote command substitutions with ""
.
Now, for the perhaps-simpler example; consider the following:
user@host~> pwd
/home/hypothetical
user@host~> echo pwd
pwd
user@host~> echo "$(pwd)"
/home/hypothetical
user@host~> echo "$HOME"
/home/hypothetical
user@host~> echo (pwd)
error: your shell will tell you something weird that roughly means “Whoa! you tried to have me echo something that isn't text!”
user@host~> (pwd)
/home/hypothetical
I am not sure how to describe it any simpler than that. The command substitution works just like a variable access where the subshell still operates like a command.
Best Answer
The
|
will take the output of the command on the left and give it to the input of the command on the right. The>
operator will take the output of the command and put it into a file. That means, in your example, by the time it gets to the|
there is no output left; it's all gone intoa.txt
. So thesort
on the right operates on an empty string and saves that tob.txt
What you would probably like is to use the
tee
command which will both write to a file and stdout likeThough I'm really curious what you're trying to do, since
ls
can/will sort things for you as well.