Bash – Pipeline and redirection understanding

bashio-redirection

Please explain this script (from ABSG. Ch. 20):

exec 3>&1                              # Save current "value" of stdout.
ls -l 2>&1 >&3 3>&- | grep bad 3>&-    # Close fd 3 for 'grep' (but not 'ls').
#              ^^^^   ^^^^
exec 3>&-                              # Now close it for the remainder of the script

Currently I imagine how it works:

script connects fd/3 output to stdout
pipeline connects stdout of ls to stdin of grep (fd/3 is inherited by both processes)
redirerection of stderr of ls to stdout
stdout of ls is changed to fd/3 (so grep process has no more ls output, only errors)
fd/3 is closed for ls

Is this the exactly sequence of redirections?

Can't understand why I see output of ls if there are no errors and we close fd/3 where stdout is redirected.

What is the purpose of closing fd/3 for grep process (grep bad **3>&-**)?

Best Answer

It helps a bit if you think the file descriptors as variables that accept a file as a value (or call it an i/o stream) and the order they appear is the order of their evaluation.

What happens in the above example is:

1) The script starts (as per default and unless otherwise inherited) with the following

fd/0 = stdin    # that's the keyboard
fd/1 = stdout   # that's the screen
fd/2 = stderr   # the screen again, but different stream

2) The exec command translates to declaring a new variable and assigning a value

fd/3 = fd/1  # same as stdout

So now, two file descriptors have the value stdout, i.e. both can be used to print to the screen.

3) before ls is executed and inherits all open file descriptors, the following setup happens

ls.fd/1 = grep.fd/0    # pipe gets precedence, ls.fd/1 writes to grep.stdin
ls.fd/2 = ls.fd/1      # ls.fd/2 writes to grep.stdin 
ls.fd/1 = ls.fd/3      # ls.fd/1 writes to stdout
ls.fd/3 = closed       # fd/3 will not be inherited by `ls`

fd/3 has served the purpose of keeping the stdout value long enough to return it to fd/1. So now everything that ls sends to fd/1 goes to stdout and not grep's stdin.

The order is important, e.g. if we'd run ls -l >&3 2>&1 3>&-, ls.fd/2 would write to stdout instead of grep's stdin.

4) fd/3 for grep is closed and not inherited. It would be unused anyway. grep can only filter error messages from ls

The example provided in ABSG is probably not the most helpful and the comment "Close fd 3 for 'grep' (but not 'ls')" is a bit misleading. You can interpret it as: "for the ls, pass the value of ls.fd/3 to ls.fd/1 before unsetting so it won't get closed".

Related Solutions

Shell Command – Difference Between ‘cat file | ./binary’ and ‘./binary < file'

./binary < file

binary's stdin is the file open in read-only mode. Note that bash doesn't read the file at all, it just opens it for reading on the file descriptor 0 (stdin) of the process it executes binary in.

In:

./binary << EOF
test
EOF

Depending on the shell, binary's stdin will be either a deleted temporary file (AT&T ksh, zsh, bash...) that contains test\n as put there by the shell or the reading end of a pipe (dash, yash; and the shell writes test\n in parallel at the other end of the pipe). In your case, if you're using bash, it would be a temp file.

In:

cat file | ./binary

Depending on the shell, binary's stdin will be either the reading end of a pipe, or one end of a socket pair where the writing direction has been shut down (ksh93) and cat is writing the content of file at the other end.

When stdin is a regular file (temporary or not), it is seekable. binary may go to the beginning or end, rewind, etc. It can also mmap it, do some ioctl()s like FIEMAP/FIBMAP (if using <> instead of <, it could truncate/punch holes in it, etc).

pipes and socket pairs on the other hand are an inter-process communication means, there's not much binary can do beside reading the data (though there are also some operations like some pipe-specific ioctl()s that it could do on them and not on regular files).

Most of the times, it's the missing ability to seek that causes applications to fail/complain when working with pipes, but it could be any of the other system calls that are valid on regular files but not on different types of files (like mmap(), ftruncate(), fallocate()). On Linux, there's also a big difference in behaviour when you open /dev/stdin while the fd 0 is on a pipe or on a regular file.

There are many commands out there that can only deal with seekable files, but when that's the case, that's generally not for the files open on their stdin.

$ unzip -l file.zip
Archive:  file.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       11  2016-12-21 14:43   file
---------                     -------
       11                     1 file
$ unzip -l <(cat file.zip)
     # more or less the same as cat file.zip | unzip -l /dev/stdin
Archive:  /proc/self/fd/11
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /proc/self/fd/11 or
        /proc/self/fd/11.zip, and cannot find /proc/self/fd/11.ZIP, period.

unzip needs to read the index stored at the end of the file, and then seek within the file to read the archive members. But here, the file (regular in the first case, pipe in the second) is given as a path argument to unzip, and unzip opens it itself (typically on fd other than 0) instead of inheriting a fd already opened by the caller. It doesn't read zip files from its stdin. stdin is mostly used for user interaction.

If you run that binary of yours without redirection at the prompt of an interactive shell running in a terminal emulator, then binary's stdin will be inherited from its caller the shell, which itself will have inherited it from its caller the terminal emulator and will be a pty device open in read+write mode (something like /dev/pts/n).

Those devices are not seekable either. So, if binary works OK when taking input from the terminal, possibly the issue is not about seeking.

If that 14 is meant to be an errno (an error code set by failing system calls), then on most systems, that would be EFAULT (Bad address). The read() system call would fail with that error if asked to read into a memory address that is not writable. That would be independent of whether the fd to read the data from points to a pipe or regular file and would generally indicate a bug¹.

binary possibly determines the type of file open on its stdin (with fstat()) and runs into a bug when it's neither a regular file nor a tty device.

Hard to tell without knowing more about the application. Running it under strace (or truss/tusc equivalent on your system) could help us see what is the system call if any that is failing here.

¹ The scenario envisaged by Matthew Ife in a comment to your question sounds a lot plausible here. Quoting him:

I suspect it is seeking to the end of file to get a buffer size for reading the data, badly handling the fact that seek doesn't work and attempting to allocate a negative size (not handling a bad malloc). Passing the buffer to read which faults given the buffer is not valid.

Bash – How to Save and Redirect stdout and stderr in Shell Script

Simply expanding on your approach:

exec 2> >(tee -a stderr stdall) 1> >(tee -a stdout stdall)

Standard error will be written to the file named stderr, standard output to stdout and both standard error and standard output will also be written to the console (or whatever the two file descriptors are pointing at the time exec is run) and to stdall.
tee -a (append) is required to prevent stdall from being overwritten by the second tee that starts writing to it.

Note that the order in which redirections are performed is relevant: the second process substitution is affected by the first redirection, i.e. the errors it emitted would be sent to >(tee -a stderr stdall). You can, of course, redirect the second process substitution's standard error to /dev/null to avoid this side effect. Redirecting standard output before standard error would send every error to stdout and stdall too.

Since the commands in Bash's process substitutions are executed asynchronously, there is no way to guarantee that their output will be displayed in the order it was generated. Worse, fragments from standard output and standard error are likely to end up appearing on the same line.

Best Answer

Related Solutions

Shell Command – Difference Between ‘cat file | ./binary’ and ‘./binary < file'

Bash – How to Save and Redirect stdout and stderr in Shell Script

Related Question