On Debian 8.1, I'm using a Bash feature to detect whether the stackoverflow.com website is reachable:
(echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"
This is Bash-specific and will not work in sh
, the default shell of cron
.
If we, on purpose, try the script in sh
, we get:
$ /bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent
Hence, if I only put the following in my personal crontab (without setting SHELL
to /bin/bash
) via crontab -e
, I expect that once per minute, the script will be executed, and I therefore expect to also get the above error sent per mail once per minute:
* * * * * (echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"
And indeed, exactly as expected, we see from /var/log/syslog
that the entry is executed once per minute:
# sudo grep stackoverflow /var/log/syslog Aug 24 18:58:01 localhost CRON[13719]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable") Aug 24 18:59:01 localhost CRON[13723]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable") Aug 24 19:00:01 localhost CRON[13727]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable") ...
During the last ~2 hours, this was executed more than 120 times already, as I can verify with piping the output to wc -l
.
However, from these >120 times the shell command (to repeat: the shell command is invalid for /bin/sh
) has been executed, I only got three e-mails:
The first one at 19:10:01, the second at 20:15:01, and the third at 20:57:01.
The content of all three mails reads exactly as expected and contains exactly the error message that is to be expected from running the script in an incompatible shell (on purpose). For example, the second mail I received reads (and the other two are virtually identical):
From mat@myhost.com Mon Aug 24 20:15:01 2015 From: root@myhost.com (Cron Daemon) To: mat@myhost.com Subject: Cron (echo >/dev/tcp/stackoverflow.com/80)&>/dev/null || echo "stackoverflow unreachable" ... /bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent`
From /var/log/mail.log
, I see that these three mails were the only mails sent and received in the last hours.
Thus, where are the >100 additional mails we would expect to receive from cron due to the above output that is created by the erroneous script?
To summarize:
- Mail is configured correctly on this system, I can send and receive mails without problem with
/usr/bin/sendmail
. - Cron is set up correctly, notices the task as expected and executes it precisely at the configured times. I have tried many other tasks and scheduling options, and cron executed them all exactly as expected.
- The script always writes output (see below) and we thus expect cron to send the output to me via mail for each invocation.
- The output is mailed to me only occasionally, and apparently ignored in most cases.
There are many ways to work around the obvious mistake that led to the above observations:
- I can set
SHELL=/bin/bash
in mycrontab
. - I can create a
heartbeat.sh
with#!/bin/bash
, and invoke that. - I can invoke the script with
/bin/bash -c ...
withincrontab
. - etc., all fixing the mistake of using a Bash-specific feature within
sh
.
However, all of this does not address the core issue of this question, which is that in this case, cron
does not reliably send mails even though the script always creates output.
I have verified that the script always creates output by creating wrong.sh
(which again on purpose uses the unsuitable /bin/sh
shell, to produce the same error that cron
should see):
#!/bin/sh (echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"
Now I can invoke the script in a loop and see if there ever is a case where it finishes without creating output. Using Bash:
$ while true; do [[ -n $(./wrong.sh 2>&1 ) ]]; echo $?; done | grep -v 0
Even in thousands of invocations, I could not reproduce a case where the script finishes without creating output.
What may be the cause of this unpredictable behaviour? Can anyone reproduce this? To me, it looks like there may be a race condition where cron can miss a script's output, possibly primarily involving cases where the error stems from the shell itself. Thank you!
Best Answer
Upon further testing, I suspect the
&
is messing with your results. As you point out,&>/dev/null
is bash syntax, not sh syntax. As a result,sh
is creating a subshell and backgrounding it. Sure, the subshell'secho
creates stderr, but my theory is that:|| echo ...
.... causing the cron job to have no output and thus no mail. Based on my reading of the vixie-cron source, it would seem that the job's stderr and stdout would be captured by cron, but it must be getting lost by the subshell.
Test it yourself in a /bin/sh environment (assuming you do not have a file named 'bar' here):