Shell – Buffering stdout separately from stderr

bufferpipeshell-script

I am running a utility, which emits the following:

  • Progress to its Standard Error
  • Data/Yield/Output to its Standard Output

I did not build the utility, nor can I easily modify it.

I wish to do the following:

  • Send its Standard Error directly to Standard Output
  • Buffer its Output, and flush it to Standard Output once the command exits
    (There is less than 10KiB of data liable to be here, so RAM is no issue.)

Can this be done in POSIX sh (and calling in only utilities common to both Linux and OpenBSD), without the indeterminism / potential race conditions / etc. arising from a named pipe or temporary file?

Best Answer

You should be able to do something like:

{
  cmd 2>&3 3>&- |
    awk '    {saved = saved $0 ORS}
         END {printf "%s", saved}' 3>&-
} 3>&1

Here using awk to hold all cmd's output (after cmd has written its stderr output to the script's stdout).

awk will read until the writing end of the pipe is closed. Normally, that only happens when cmd (and all the processes it forks and still hold a fd to the pipe) finishes. If for some reason, cmd decides to explicitly close its stdout and later write some more progress on stderr, that extra progress could end up after the normal output. You could work around that by replacing cmd with (cmd; exit), where awk would then also wait for that subshell (which also has its stdout open to the pipe) to finish and that subshell happens to wait for cmd to finish (and report its exit status with exit).

But that should not be necessary with a well-behaved cmd. That would also not address the case where cmd forks (and doesn't wait for) a child process with its stdout redirected, which could write to its stderr long after awk or even that script finishes (probably a more likely scenario than a command that explicitly closes its stdout).

If cmd's output is not text, note that not all awk implementations can deal with byte 0 or extra-long lines, and a newline character will be added at the end if it was not already in the input.

The POSIX toolchest doesn't have any command that can store arbitrary amounts of binary data in memory and display it later.

If perl is available, you can replace the awk command with just perl -0777 -pe ''.

Here, instead of memory, you could store the output in a temporary file, which would address the binary output issue and would likely scale better to larger outputs.

Unfortunately, the only POSIX way to create a temporary file reliably is using the m4 utility, but that utility (even though one mandated by POSIX) is not always found on production systems these days. You're probably more likely to find perl than m4.

In any case, that could be:

die() {
  [ "$#" -eq 0 ] || printf >&2 '%s\n' "$@"
  exit 1
}

tmpdir=${TMPDIR:-/tmp}
tmpfile=$(
  echo 'mkstemp(TEMPLATE)' |
    m4 -D "TEMPLATE=${tmpdir%/}/XXXXXXX"
) && [ -n "$tmpfile" ] || die 'Cannot get a temp file'

{
  rm -f -- "$tmpfile" || die "Cannot remove $tmpfile"
  cmd 2>&1 >&3 3>&- 4<&-
  cat <&4
} 3> "$tmpfile" 4< "$tmpfile"

Here unlinking the temporary file after it's been opened but before running cmd as a neat way to handle clean-up.

If you're targeting only GNU (remember "Linux" is not an OS, just a kernel found on a great variety of OSes, some of which don't even have a sh) and OpenBSD systems, then you should be able to use mktemp instead of m4 to create the temporary file.