SSH Shell Script – Why SSH Causes while Loop to Stop

inputio-redirectionshell-scriptssh

I have finally managed to boil down a problem I have been struggling with for a few weeks. I use SSH with "authorized keys" to run commands remotely. All is fine except when I do it in a while loop. The loop terminates after completing any iteration with an ssh command.

For a long time I thought this was some kind of ksh weirdness, but I now discovered bash does in fact behave identically.

A small sample program to reproduce the problem. This is distilled from a larger implementation which takes snapshots and replicates them amongst the nodes in a cluster.

#!/bin/bash

set -x

IDTAG=".*zone"
MARKER="mark-$(date +%Y.%m.%d.%H.%M.%S)"
REMOTE_HOST=sol10-target
ZFSPARENT=rpool

ssh $REMOTE_HOST zfs list -t filesystem -rHo name,mounted $ZFSPARENT | grep "/$IDTAG    " > /tmp/actionlist

#for RMT_FILESYSTEM in $(cat /tmp/actionlist)
cat /tmp/actionlist | while read RMT_FILESYSTEM ISMOUNTED
do
   echo ${RMT_FILESYSTEM}@${MARKER}
   [ "$ISMOUNTED" = "yes" ] && ssh $REMOTE_HOST zfs snapshot -r ${RMT_FILESYSTEM}@${MARKER}
   echo Remote Command Return Code: $?
done

(Note there is a TAB character in the grep search expression as per the definition of the behaviour of the zfs list "-H" option.)

My sample have some ZFS filesystems for the root where all the "zones" have their root file system on a dataset named similar to

POOL/zones/app1zone
POOL/zones/group2/app2zone

etc.

The above loop should create a snapshot for each of the selected datasets, but in stead it operates only on the first one and then exits.

That the program finds the right number of datasets can be easily confirm by checking the "/tmp/actionlist" file after the script exists.

If the ssh command is replaced by, for example, an echo command, then the loop iterates through all the input lines. Or my favourite – prepend "echo" to the offending command.

If I use a for loop in stead then it also works, but due to the potential size of the list of datasets this could cause problems with the maximum expanded command line length.

I am now 99.999% sure that only those loops with ssh commands in them give me problems!

Note that the iteration in which the ssh command runs, completes! It is as if the data pipped in to the while loop is suddenly lost… If the first few input lines don't perform an ssh command, then the loop goes on until it actually runs the SSH command.

On my laptop where I am testing this I have two Solaris 10 VMs with only about two or three sample datasets, but the same is happening on the large SPARC systems where this is meant to go live, and there are many datasets.

Best Answer

SSH might be reading from standard input, eating up your actionlist. Try to redirect ssh's standard input to /dev/null:

ssh $REMOTE_HOST zfs snapshot -r ${RMT_FILESYSTEM}@${MARKER} </dev/null

As a general rule, when running commands that may interfere with standard input under a while read-style loop, I like to wrap the whole loop body into braces:

cat /tmp/uuoc | while read RMT_FILESYSTEM ISMOUNTED
do {
    echo ${RMT_FILESYSTEM}@${MARKER}
    [ "$ISMOUNTED" = "yes" ] && ssh $REMOTE_HOST zfs snapshot -r ${RMT_FILESYSTEM}@${MARKER}
    echo Remote Command Return Code: $?
} < /dev/null; done
Related Question