Bash – Cron only occasionally sends e-mail on output and errors

bashcrondebianlinuxshell

On Debian 8.1, I'm using a Bash feature to detect whether the stackoverflow.com website is reachable:

(echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

This is Bash-specific and will not work in sh, the default shell of cron.

If we, on purpose, try the script in sh, we get:

$ /bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent

Hence, if I only put the following in my personal crontab (without setting SHELL to /bin/bash) via crontab -e, I expect that once per minute, the script will be executed, and I therefore expect to also get the above error sent per mail once per minute:

* * * * * (echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

And indeed, exactly as expected, we see from /var/log/syslog that the entry is executed once per minute:

# sudo grep stackoverflow /var/log/syslog
Aug 24 18:58:01 localhost CRON[13719]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
Aug 24 18:59:01 localhost CRON[13723]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
Aug 24 19:00:01 localhost CRON[13727]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
...

During the last ~2 hours, this was executed more than 120 times already, as I can verify with piping the output to wc -l.

However, from these >120 times the shell command (to repeat: the shell command is invalid for /bin/sh) has been executed, I only got three e-mails:

The first one at 19:10:01, the second at 20:15:01, and the third at 20:57:01.

The content of all three mails reads exactly as expected and contains exactly the error message that is to be expected from running the script in an incompatible shell (on purpose). For example, the second mail I received reads (and the other two are virtually identical):

From mat@myhost.com  Mon Aug 24 20:15:01 2015
From: root@myhost.com (Cron Daemon)
To: mat@myhost.com
Subject: Cron  (echo >/dev/tcp/stackoverflow.com/80)&>/dev/null || echo "stackoverflow unreachable"
...

/bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent`

From /var/log/mail.log, I see that these three mails were the only mails sent and received in the last hours.

Thus, where are the >100 additional mails we would expect to receive from cron due to the above output that is created by the erroneous script?

To summarize:

  1. Mail is configured correctly on this system, I can send and receive mails without problem with /usr/bin/sendmail.
  2. Cron is set up correctly, notices the task as expected and executes it precisely at the configured times. I have tried many other tasks and scheduling options, and cron executed them all exactly as expected.
  3. The script always writes output (see below) and we thus expect cron to send the output to me via mail for each invocation.
  4. The output is mailed to me only occasionally, and apparently ignored in most cases.

There are many ways to work around the obvious mistake that led to the above observations:

  1. I can set SHELL=/bin/bash in my crontab.
  2. I can create a heartbeat.sh with #!/bin/bash, and invoke that.
  3. I can invoke the script with /bin/bash -c ... within crontab.
  4. etc., all fixing the mistake of using a Bash-specific feature within sh.

However, all of this does not address the core issue of this question, which is that in this case, cron does not reliably send mails even though the script always creates output.

I have verified that the script always creates output by creating wrong.sh (which again on purpose uses the unsuitable /bin/sh shell, to produce the same error that cron should see):

#!/bin/sh
(echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

Now I can invoke the script in a loop and see if there ever is a case where it finishes without creating output. Using Bash:

$ while true; do [[ -n $(./wrong.sh 2>&1 ) ]]; echo  $?; done | grep -v 0

Even in thousands of invocations, I could not reproduce a case where the script finishes without creating output.

What may be the cause of this unpredictable behaviour? Can anyone reproduce this? To me, it looks like there may be a race condition where cron can miss a script's output, possibly primarily involving cases where the error stems from the shell itself. Thank you!

Best Answer

Upon further testing, I suspect the & is messing with your results. As you point out, &>/dev/null is bash syntax, not sh syntax. As a result, sh is creating a subshell and backgrounding it. Sure, the subshell's echo creates stderr, but my theory is that:

  1. cron is not catching the subshell's stderr, and
  2. the backgrounding of the subshell always completes successfully, thus bypassing your || echo ....

... causing the cron job to have no output and thus no mail. Based on my reading of the vixie-cron source, it would seem that the job's stderr and stdout would be captured by cron, but it must be getting lost by the subshell.

Test it yourself in a /bin/sh environment (assuming you do not have a file named 'bar' here):

(grep foo bar) &
echo $?
Related Question