Bash – How to combine bash command grouping and pipe status

bash

How do I combine bash command grouping and pipe status?

This is an example command group:

{ tar -cf - my_folder 2>&1 1>&3 | grep -v "Removing leading" 1>&2; } 3>&1 | gzip --rsyncable > my_file.tar.gz

This is an example pipe status readout to go with the above:

[[ ${PIPESTATUS[*]} =~ [1-9] ]] && rm my_file.tar.gz

In this example, the command group keeps mail spool free of warnings about "Removing leading /" from tar, being delivered via cron because they land on stderr (Unices lack a stdwarn and tar lacks a quiet option), while letting real errors pass through.

The pipe status readout makes sure that corrupt backup files are immediately removed, to prevent a later cleanup using a standard FIFO algorithm from removing older valid files.

But this example does not work. In the above, the pipe status contains [1 0], that is, the exit code of grep and gzip, but not tar.

One attempt I tried was this:

{ tar -cf - my_folder 2>&1 1>&3 | grep -v "Removing leading" 1>&2; GROUPSTATUS=${PIPESTATUS[*]}; } 3>&1 | gzip --rsyncable > my_file.tar.gz
[[ $GROUPSTATUS =~ [1-9] ]] && rm my_file.tar.gz

But GROUPSTATUS is empty upon leaving the group.

(Note that by setting GROUPSTATUS to anything other than the pipe's status, eg. a literal of some sort, it can be verified that the variable does in fact escape the command grouping scope under normal circumstances.)

I've also tried if return from within the group can deliver the first pipe component's exit code to the outside, but return inside a command group just yields an error message from bash.

Best Answer

When you execute a pipeline, each pipe-separated element is executed in its own process. Variable assignments only take effect in their own process. Under ksh and zsh, the last element of the pipeline is executed in the original shell; under other shells such as bash, each pipeline element is executed in its own subshell and the original shell just waits for them all to end.

$ bash -c 'GROUPSTATUS=foo; echo GROUPSTATUS is $GROUPSTATUS'
GROUPSTATUS is foo
$ bash -c 'GROUPSTATUS=foo | :; echo GROUPSTATUS is $GROUPSTATUS'
GROUPSTATUS is 

In your case, since you only care about all the commands succeeding, you can make the status code flow up.

{ tar -cf - my_folder 2>&1 1>&3 | grep -v "Removing leading" 1>&2;
  ! ((PIPESTATUS[0])); } 3>&1 |
gzip --rsyncable > my_file.tar.gz;
if ((PIPESTATUS[0] || PIPESTATUS[1])); then rm my_file.tar.gz; fi

If you want to get more than 8 bits of information out of the left side of a pipe, you can write to yet another file descriptor. Here's a proof-of-principle example:

{ { tar …; echo $? >&4; } | …; } | { gzip …; echo $? >&4; } \
  4>&1 | ! grep -vxc '0'

Once you get data on standard output, you can feed it into a shell variable using command substitution, i.e. $(…). Command substitution reads from the command's standard output, so if you also meant to print things to script's standard output, they need to temporarily go through another file descriptor. The following snippet uses fd 3 for things that eventually go to the script's stdout and fd 4 for things that are captured into $statuses.

statuses=$({ { tar -v … >&3; echo tar $? >&4; } | …; } |
           { gzip …; echo gzip $? >&4; } 4>&1) 3>&1

If you need to capture the output from different commands into different variables, I think there is no direct way even in “advanced” shells such as bash, ksh or zsh. Here are some workarounds:

  • Use temporary files.
  • Use a single output stream, with e.g. a prefix on each line to indicate its origin, and filter at the top level.
  • Use a more advanced language such as Perl or Python.
Related Question