PostgreSQL pg_dump – What Data Gets Backed Up on a Live Server?

pg-dumppostgresql

I'm wondering: If I start doing a pg_dump of a very large database (it would take hours), that is still running actively receiving writes, what is then the last data that goes into the backup? Is it:

  1. The data as-it-was, at the point in time where the pg_dump command was initiated.
  2. The last changes that it encountered at some point where it was dumping that individual record.
  3. Something else.

Bonus question: If I'm trying to dump a database as part of trying to rescue it from corrupt data, will it then make any difference whether I use the directory format or the custom format?

Best Answer

It's the data at the start of the command for an entire database. According to the manpage:

It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

and in SQL Dump:

Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running

Dumping in parallel (--jobs) may be problematic with changing data, but only when targeting less recent versions:

For a consistent backup, the database server needs to support synchronized snapshots, a feature that was introduced in PostgreSQL 9.2

I don't think the output format makes any difference in the rescue operation. Note that for a parallel dump, directory is the only possible format.