Dash – Understanding Pipes and Redirections

file-descriptorsio-redirectionpipeposixshell

Someone asked how to pass the output of two commands as files to another command and they got the answer below.

( cmd1 | ( cmd2 | ( main_command /dev/fd/3 /dev/fd/4 ) 4<&0 ) 3<&0 )

I need to unpack this.

Say I have a text file some_file and I wish to pass it as input to main_command. main_command takes two files as input. If I want to use main_command with some_file and with the output of the command cmd2, one way to do that is

( cmd2 | ( main_command some_file /dev/fd/4 ) 4<&0 )

The "deepest" part of this (i.e. where it all culminates) is
main_command some_file /dev/fd/4. This is simply passing the files
some_file and /dev/fd/4 as arguments to main_command.
The 4<&0 part says that stdin will point to file descriptor 4.
cmd2 | connects the output of cmd2 with the input of whatever follows.
I don't really know what is the function of the parentheses. Do they exist merely for parsing purposes or do they something more?

My questions are:

How do I unpack the command at the beginning of the question?
What do the parentheses do?
Is my explanation of the simpler command correct?

Edit: I should have said if my logic is correct, then there's no need to answer 1.

Best Answer

This is a pretty complex command. I've answered your questions directly right at the end, but all of this until then is unpacking the command itself. I've tried to be comprehensive so there may be a bit more detail than you need in places.

The parentheses create a subshell:

( x y z )

means to fork a new shell from the current one, to execute x y z in (and then return to the current shell). The subshell inherits everything about the current one, but is a separate process: that means it can have input piped into it, and can have its own environmental changes inside that don't affect the parent.

Every open file has a numeric "file descriptor" associated with it. "File" in this context includes any sort of input or output stream, including real files, sockets, and standard I/O streams. The numbers are handles that can be used directly with the C read function to identify which stream you're talking about, and with the corresponding system call provided by the operating system, along with all the other IO functions.

4<&0 performs a redirection cloning the standard input file descriptor (0) as file descriptor 4. That means FD 0 is copied to 4, not the other way around. In this case, it's modifying the open files for the subshell that precedes the redirection. For the moment, that is just creating another "name" for the input stream. A key part though is that the two names are independent of each other thereafter: FD 4 will always refer to the same stream, even if FD 0 is changed to refer to something else and the two diverge.

/dev/fd/4 is a (non-standard) way for a program to access its own open file descriptors. On Linux, it's a symlink to /proc/self/fd, which reifies the file descriptor table of the current process. A program can open("/dev/fd/4", O_RDONLY) and get a file handle that refers to the stream that this program has on FD 4 (such as 4 itself). As far as the program is concerned, this is just a regular file that can be opened, closed, and read like any other. Because open file descriptors are inherited by subprocesses, main_command has the same file descriptor 4 as the subshell it's inside, and so /dev/fd/4 works there too.

cmd2 | x runs cmd2, and connects its standard output to the standard input - or FD 0 - of x. In your command, x is the subshell expression.

Our overall command

cmd2 | ( main_command /dev/fd/4 ) 4<&0

then has three main parts:

Run cmd2 and pipe its output into ( main_command /dev/fd/4 ) 4<&0.
Make 4 another name for the stream identified by 0 (standard input) of ( main_command /dev/fd/4 ).
Run main_command with /dev/fd/4 as an argument, which it will (presumably) open as a file and read from, getting the output of cmd2.

The final effect is that main_command gets a pathname argument it can open and read the output of cmd2 from, exactly as would happen for Bash process substitution main_command <(cmd2): in fact, that would likely give /dev/fd/63 as the argument and otherwise proceed very similarly on the inside.

For the complete command

( cmd1 | ( cmd2 | ( main_command /dev/fd/3 /dev/fd/4 ) 4<&0 ) 3<&0 )

we have nested subshells: that's because we want to make two copies of standard input, but it's two different standard inputs: one is the output of cmd1, which is put into FD 3 after being piped into the larger subshell, and the other is the output of cmd2, which is put into FD 4 after being piped into the innermost subshell. The two 0s both refer to standard input, but each command's standard input is distinct because we have something different piped into it.

That is the most confusing part of the issue, I think. Each command - here, each subshell - has its own standard input, piped in from cmd1 or cmd2, and that unique standard input stream gets aliased to 3 or 4. Those open file descriptors are inherited by the next layer of subshell and child commands, so /dev/fd/3 in the innermost command refers to the same thing it did outside, even though standard input now points to something else.

The outer parentheses are not strictly required, though they make it slightly more robust for some commands and are probably a good practice. The inner ones are: those are used to create a new subprocess that can have its own set of redirections inside it, and its own standard input stream piped in.

The innermost redirection is actually redundant: cmd2 | main_command /dev/fd/3 /dev/stdin would also work, since there's no further change to standard input made.

To address your questions directly:

How do I unpack the command at the beginning of the question?

The unpacking is the entire post to this point.
What do the parentheses do?

The parentheses create a subshell, an independent shell process that can be used like any other command, including having input piped into it, but can perform ordinary shell operations inside, such as redirections.
Is my explanation of the simpler command correct?

Partially. 4<&0 says that file descriptor 4 will point to stdin, and importantly to what is called stdin right now - not to the concept of standard input. /dev/fd/4 is a "file" in the "everything is a file sense", but more specifically it's a pathname that, when opened, hands you back your FD 4.

Best Answer

Related Solutions

shell – How to Send stdout to Multiple Commands

bash – Understanding the Order of Redirections

Related Question