Someone asked how to pass the output of two commands as files to another command and they got the answer below.
( cmd1 | ( cmd2 | ( main_command /dev/fd/3 /dev/fd/4 ) 4<&0 ) 3<&0 )
I need to unpack this.
Say I have a text file some_file
and I wish to pass it as input to main_command
. main_command
takes two files as input. If I want to use main_command
with some_file
and with the output of the command cmd2
, one way to do that is
( cmd2 | ( main_command some_file /dev/fd/4 ) 4<&0 )
- The "deepest" part of this (i.e. where it all culminates) is
main_command some_file /dev/fd/4
. This is simply passing the files
some_file
and/dev/fd/4
as arguments tomain_command
. - The
4<&0
part says thatstdin
will point to file descriptor4
. cmd2 |
connects the output ofcmd2
with the input of whatever follows.- I don't really know what is the function of the parentheses. Do they exist merely for parsing purposes or do they something more?
My questions are:
- How do I unpack the command at the beginning of the question?
- What do the parentheses do?
- Is my explanation of the simpler command correct?
Edit: I should have said if my logic is correct, then there's no need to answer 1.
Best Answer
This is a pretty complex command. I've answered your questions directly right at the end, but all of this until then is unpacking the command itself. I've tried to be comprehensive so there may be a bit more detail than you need in places.
The parentheses create a subshell:
means to fork a new shell from the current one, to execute
x y z
in (and then return to the current shell). The subshell inherits everything about the current one, but is a separate process: that means it can have input piped into it, and can have its own environmental changes inside that don't affect the parent.Every open file has a numeric "file descriptor" associated with it. "File" in this context includes any sort of input or output stream, including real files, sockets, and standard I/O streams. The numbers are handles that can be used directly with the C
read
function to identify which stream you're talking about, and with the corresponding system call provided by the operating system, along with all the other IO functions.4<&0
performs a redirection cloning the standard input file descriptor (0) as file descriptor 4. That means FD 0 is copied to 4, not the other way around. In this case, it's modifying the open files for the subshell that precedes the redirection. For the moment, that is just creating another "name" for the input stream. A key part though is that the two names are independent of each other thereafter: FD 4 will always refer to the same stream, even if FD 0 is changed to refer to something else and the two diverge./dev/fd/4
is a (non-standard) way for a program to access its own open file descriptors. On Linux, it's a symlink to/proc/self/fd
, which reifies the file descriptor table of the current process. A program canopen("/dev/fd/4", O_RDONLY)
and get a file handle that refers to the stream that this program has on FD 4 (such as4
itself). As far as the program is concerned, this is just a regular file that can be opened, closed, and read like any other. Because open file descriptors are inherited by subprocesses,main_command
has the same file descriptor 4 as the subshell it's inside, and so/dev/fd/4
works there too.cmd2 | x
runscmd2
, and connects its standard output to the standard input - or FD 0 - ofx
. In your command,x
is the subshell expression.Our overall command
then has three main parts:
cmd2
and pipe its output into( main_command /dev/fd/4 ) 4<&0
.4
another name for the stream identified by0
(standard input) of( main_command /dev/fd/4 )
.main_command
with/dev/fd/4
as an argument, which it will (presumably) open as a file and read from, getting the output ofcmd2
.The final effect is that
main_command
gets a pathname argument it can open and read the output ofcmd2
from, exactly as would happen for Bash process substitutionmain_command <(cmd2)
: in fact, that would likely give/dev/fd/63
as the argument and otherwise proceed very similarly on the inside.For the complete command
we have nested subshells: that's because we want to make two copies of standard input, but it's two different standard inputs: one is the output of
cmd1
, which is put into FD 3 after being piped into the larger subshell, and the other is the output ofcmd2
, which is put into FD 4 after being piped into the innermost subshell. The two0
s both refer to standard input, but each command's standard input is distinct because we have something different piped into it.That is the most confusing part of the issue, I think. Each command - here, each subshell - has its own standard input, piped in from
cmd1
orcmd2
, and that unique standard input stream gets aliased to3
or4
. Those open file descriptors are inherited by the next layer of subshell and child commands, so/dev/fd/3
in the innermost command refers to the same thing it did outside, even though standard input now points to something else.The outer parentheses are not strictly required, though they make it slightly more robust for some commands and are probably a good practice. The inner ones are: those are used to create a new subprocess that can have its own set of redirections inside it, and its own standard input stream piped in.
The innermost redirection is actually redundant:
cmd2 | main_command /dev/fd/3 /dev/stdin
would also work, since there's no further change to standard input made.To address your questions directly:
The unpacking is the entire post to this point.
The parentheses create a subshell, an independent shell process that can be used like any other command, including having input piped into it, but can perform ordinary shell operations inside, such as redirections.
Partially.
4<&0
says that file descriptor 4 will point to stdin, and importantly to what is called stdin right now - not to the concept of standard input./dev/fd/4
is a "file" in the "everything is a file sense", but more specifically it's a pathname that, when opened, hands you back your FD 4.