Postgresql – What to look for in bad pg_dump log

dumpmonitoringpg-dumppostgresql

We want to programmatically detect errors in cron scheduled pg_dumps.

Apart from cheking whether or not the log file ends in "pg_dump: saving database definition":

What other tell-tale string can I grep for in order to programatically check a dump is ok.

Best Answer

The main indicator is the exit status of the pg_dump command. If it's non-zero, then something went wrong otherwise it worked all along. This is the implicit contract between any command and the shell, and breaching it would be a bug. And it's also testable when the sub-command could't be launched at all.

Here's a basic skeleton in Unix shell that tests the exit code and acts accordingly:

d=`date +%Y%m%d`
dump=/path/to/backups/daily-backup-"$d"
error=/path/to/backups/errors-backup-"$d".txt

pg_dump -U username dbname >$dump 2>$error
code=$?
if [ $code -ne 0 ]; then
  echo 1>&2 "The backup failed (exit code $code), check for errors in $error"
fi

It is necessary to keep the error messages in durable storage so that they can be analyzed later. When no error at all occurs, the file for errors should be there but empty, unless the -v option is used but it's intentionally not in the above script.

About the message: saving database definition

It is displayed along with lots of other informational messages only when using the verbose (-v) option. The problem is that in case of error, these messages will be mixed with error messages, and it requires some level of expertise to distinguish what's normal from what's not.