Postgresql – Am I supposed to let PG know in advance before I back up its datadir

backuppostgresql

I run PostgreSQL on Windows. Its data dir is located in "My documents". I often back up that entire dir tree (among others) with ROBOCOPY, which "robustly" copies the dir tree to local and external disks.

Since PG is running at the same time as my ROBOCOPY script, I just had the idea that maybe this is not safe? Should I "let PG know" that I'm about to back up/copy its data dir, and then let it know when I'm done? Or is that unnecessary? Maybe even not a "thing" at all?

It strikes me that it could theoretically be bad if PG is writing to its database files while ROBOCOPY is also copying the same files. But maybe this has been solved fully 30 years ago and I'm just still stuck in early 1990s computing thinking.

I also (separately) use pg_dump regularly. The idea of shutting down PG entirely many times a day is not an attractive one.

Best Answer

In order to create a usable online file system backup, you have to do one of the following:

  • Run pg_basebackup with -X stream (the default in later versions). That will do everything for you.

  • If you want to use this Robocopy, you have to use the low-level API:

    • enable WAL archiving by setting archive_mode and archive_command to archive WAL segments (you might use Robocopy there)

    • run SELECT pg_start_backup('label'); to get the database into backup mode

    • backup the data directory with Robocopy

    • end backup mode with SELECT pg_stop_backup();

    To recover, you wilk have to use restore_command

    Note that I showed you the comparatively simple deprecated "exclusive backup", the "non-exclusive backup" is somewhat more complicated.

Consult the documentation for details.