Bash – Significance of arrows symbols in duplicating/closing file descriptors under bash

bashfile-descriptorsio-redirection

I'm reading a book about Linux command line where author doesn't seem to follow the conventions in bash manual regarding arrows symbols used in redirection operations. Namely, he always uses left arrow < in duplicating and closing file descriptors regardless of whether the descriptors are input or output ones.

Here is an example:

exec 3<&0 4<&1 #shouldn't be 4>&1 ?
#...
exec 3<&- 4<&- #shouldn't be 4>&- ?

Bash man page is vague in this point, according to it, the duplicating/closing and moving file descriptors have the following syntaxes:

#Duplicating and closing (in case word expands to -):
[n]<&word 
[n]>&word

#Moving:
[n]<&digit-
[n]>&digit-

They are described to have different behaviour only if we don't explicitly supply the n. But when we do, does it mean that we can use these forms interchangeably?

Best Answer

It doesn't matter because both 4>&1 and 4<&1 do the same thing: dup2(1, 4) which is the system call to duplicate a fd onto another. The duplicated fd automatically inherits the I/O direction of the original fd. (same for 4>&- vs 4<&- which both resolve to close(4), and 4>&1- which is the dup2(1, 4) followed by close(1)).

However, the 4<&1 syntax is confusing unless for some reason the fd 1 was explicitly open for reading (which would be even more confusing), so in my mind should be avoided.

The duplicated fd shares the same open file description which means they share the same offset in the file (for those file types where it makes sense) and same associated flags (I/O redirection/opening mode, O_APPEND and so on).

On Linux, there's another way to duplicate a fd (which is not really a duplication) and create a new open file description for the same resource but with possibly different flags.

exec 3> /dev/fd/4

While on Solaris and probably most other Unices, that is more or less equivalent to dup2(4, 3), on Linux, that opens the same resource as that pointed to by the fd 4 from scratch.

That is an important difference, because for instance, for a regular file, the offset of fd 3 will be 0 (the beginning of the file) and the file will be truncated (which is why for instance on Linux you need to write tee -a /dev/stderr instead of tee /dev/stderr).

And the I/O mode can be different.

Interestingly, if fd 4 pointed to the reading end of a pipe, then fd 3 now points to the writing end (/dev/fd/3 behaves like a named pipe):

$ echo a+a | { echo a-a > /dev/fd/0; tr a b; }
b+b
b-b

$ echo a+a | { echo a-a >&0; tr a b; }
bash: echo: write error: Bad file descriptor
b+b

Related Solutions

Shell Scripting – File Descriptors and IO Redirection

First, note that the syntax for closing is 5>&- or 6<&-, depending on whether the file descriptor is being read for writing or for reading. There seems to be a typo or formatting glitch in that blog post.

Here's the commented script.

exec 5>/tmp/foo       # open /tmp/foo for writing, on fd 5
exec 6</tmp/bar       # open /tmp/bar for reading, on fd 6
cat <&6 |             # call cat, with its standard input connected to
                      # what is currently fd 6, i.e., /tmp/bar
while read a; do      # 
  echo $a >&5         # write to fd 5, i.e., /tmp/foo
done                  #

There's no closing here. Because all the inputs and outputs are going to the same place in this simple example, the use of extra file descriptors is not necessary. You could write

cat </tmp/bar |
while read a; do
  echo $a
done >/tmp/foo

Using explicit file descriptors becomes useful when you want to write to multiple files in turn. For example, consider a script that outputs data to a data output file and logging data to a log file and possibly error messages as well. That means three output channels: one for data, one for logs and one for errors. Since there are only two standard descriptors for output, a third is needed. You can call exec to open the output files:

exec >data-file
exec 3>log-file
echo "first line of data"
echo "this is a log line" >&3
…
if something_bad_happens; then echo error message >&2; fi
exec >&-  # close the data output file
echo "output file closed" >&3

The remark about efficiency comes in when you have a redirection in a loop, like this (assume the file is empty to begin with):

while …; do echo $a >>/tmp/bar; done

At each iteration, the program opens /tmp/bar, seeks to the end of the file, appends some data and closes the file. It is more efficient to open the file once and for all:

while …; do echo $a; done >/tmp/bar

When there are multiple redirections happening at different times, calling exec to perform redirections rather than wrapping a block in a redirection becomes useful.

exec >/tmp/bar
while …; do echo $a; done

You'll find several other examples of redirection by browsing the io-redirection tag on this site.

Practical Uses for Moving File Descriptors in Bash

3>&4- is a ksh93 extension also supported by bash and that is short for 3>&4 4>&-, that is 3 now points to where 4 used to, and 4 is now closed, so what was pointed to by 4 has now moved to 3.

Typical usage would be in cases where you've duplicated stdin or stdout to save a copy of it and want to restore it, like in:

Suppose you want to capture the stderr of a command (and stderr only) while leaving stdout alone in a variable.

Command substitution var=$(cmd), creates a pipe. The writing end of the pipe becomes cmd's stdout (file descriptor 1) and the other end is read by the shell to fill up the variable.

Now, if you want stderr to go to the variable, you could do: var=$(cmd 2>&1). Now both fd 1 (stdout) and 2 (stderr) go to the pipe (and eventually to the variable), which is only half of what we want.

If we do var=$(cmd 2>&1-) (short for var=$(cmd 2>&1 >&-), now only cmd's stderr goes to the pipe, but fd 1 is closed. If cmd tries to write any output, that would return with a EBADF error, if it opens a file, it will get the first free fd and the open file will be assigned it to stdout unless the command guards against that! Not what we want either.

If we want the stdout of cmd to be left alone, that is to point to the same resource that it pointed to outside the command substitution, then we need somehow to bring that resource inside the command substitution. For that we can do a copy of stdout outside the command substitution to take it inside.

{
  var=$(cmd)
} 3>&1

Which is a cleaner way to write:

exec 3>&1
var=$(cmd)
exec 3>&-

(which also has the benefit of restoring fd 3 instead of closing it in the end).

Then upon the { (or the exec 3>&1) and up to the }, both fd 1 and 3 point to the same resource fd 1 pointed to initially. fd 3 will also point to that resource inside the command substitution (command substitution only redirects the fd 1, stdout). So above, for cmd, we've got for fds 1, 2, 3:

the pipe to var
untouched
same as what 1 points to outside the command substitution

If we change it to:

{
  var=$(cmd 2>&1 >&3)
} 3>&1-

Then it becomes:

same as what 1 points to outside the command substitution
the pipe to var
same as what 1 points to outside the command substitution

Now, we've got what we wanted: stderr goes to the pipe and stdout is left untouched. However, we're leaking that fd 3 to cmd.

While commands (by convention) assume fds 0 to 2 to be open and be standard input, output and error, they don't assume anything of other fds. Most likely they will leave that fd 3 untouched. If they need another file descriptor, they'll just do an open()/dup()/socket()... which will return the first available file descriptor. If (like a shell script that does exec 3>&1) they need to use that fd specifically, they will first assign it to something (and in that process, the resource held by our fd 3 will be released by that process).

It's good practice to close that fd 3 since cmd doesn't make use of it, but it's no big deal if we leave it assigned before we call cmd. The problems may be: that cmd (and potentially other processes that it spawns) has one fewer fd available to it. A potentially more serious problem is if the resource that that fd points to may end up held by a process spawned by that cmd in background. It can be a concern if that resource is a pipe or other inter-process communication channel (like when your script is being run as script_output=$(your-script)), as that will mean the process reading from the other end will never see end-of-file until that background process terminates.

So here, it's better to write:

{
  var=$(cmd 2>&1 >&3 3>&-)
} 3>&1

Which, with bash can be shorten to:

{
  var=$(cmd 2>&1 >&3-)
} 3>&1

To sum up the reasons why it's rarely used:

it's non-standard and just syntactic sugar. You've got to balance saving a few keystrokes with making your script less portable and less obvious to people not used to that uncommon feature.
The need to close the original fd after duplicating it is often overlooked because most of the time, we don't suffer from the consequence, so we just do >&3 instead of >&3- or >&3 3>&-.

Proof that it's rarely used, as you found out is that it is bogus in bash. In bash compound-command 3>&4- or any-builtin 3>&4- leaves fd 4 closed even after compound-command or any-builtin has returned. A patch to fix the issue is now (2013-02-19) available.

Best Answer

Related Solutions

Shell Scripting – File Descriptors and IO Redirection

Practical Uses for Moving File Descriptors in Bash

Related Question