How to know when a cron job was killed or it crashed

cron

Usually when a cron job crash, it will leave some error messages in the log.

We run shell script and some java program with cron job. Recently we found some weird thing out from the log. Obviously the program was either crashed or killed because there is a program lock we set when the program was initialed, which was not released. We guess the program was killed because the log of the program didn't show the finish message.

Who can possibly killed the job and how can I get notified through email when a cron job is dead?

EDIT: I don't want the crontab way to receive email because it just push every standard output to the email. In my case there are a lot of other system output from different program because some of them ain't using log4j or they are echo by shell script. Because there are many users in the system, we can't require all the users to manage their program's standard output.

Best Answer

To debug this you can put

set -e -u

at the top of your shell script - it then ends with an error exit status when a command fails or an undefined variable is used.

Then you can call from the cron-job a wrapper script that calls the main script like this

sh -x main_script.sh || echo Failed with exit status: $?

With -x every line is printed out before it is executed. The output is mailed by the cron daemon to you.

You can also use a temp file when the output is too big:

sh -x main_script.sh > $TEMPFILE 2>&1
if [ $? -ne 0 ]; then echo Failed with exit status $? - see $TEMPFILE; fi

In case the exit status is > 128 the command was interrupted by a signal - e.g. someone 'killed' it, a segmentation fault occured or there was a out-of-memory situtation (how to get the signal from the exit status).

Related Question