PostgreSQL pg_dump – What Data Gets Backed Up on a Live Server?

pg-dumppostgresql

I'm wondering: If I start doing a pg_dump of a very large database (it would take hours), that is still running actively receiving writes, what is then the last data that goes into the backup? Is it:

The data as-it-was, at the point in time where the pg_dump command was initiated.
The last changes that it encountered at some point where it was dumping that individual record.
Something else.

Bonus question: If I'm trying to dump a database as part of trying to rescue it from corrupt data, will it then make any difference whether I use the directory format or the custom format?

Best Answer

It's the data at the start of the command for an entire database. According to the manpage:

It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

and in SQL Dump:

Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running

Dumping in parallel (--jobs) may be problematic with changing data, but only when targeting less recent versions:

For a consistent backup, the database server needs to support synchronized snapshots, a feature that was introduced in PostgreSQL 9.2

I don't think the output format makes any difference in the rescue operation. Note that for a parallel dump, directory is the only possible format.

Related Solutions

Postgresql – pg_dump custom format between postgres versions

In my experience, you can use pg_dump without formatting to backup and restore databases between versions without any issues. This is when using pg_dump to simply generate a SQL file.

example

pg_dump mydb > db.sql

However, when using -Fc as an option

$ pg_dump -Fc mydb > db.dump

You can only restore the dump file with a version of PostgreSQL that's higher i.e. more recent. So if you create the dump file with PostgreSQL 8.1, you can only restore that same file with pg_dump from a 8.1 or more recent version of PostgreSQL.

There may be other elements that must exist on the target server such as plpgsql or postgis, if those same elements exist on the source server.

Here is the documentation from Postgres on pg_dump.

Here is the relative paragraph

Because pg_dump is used to transfer data to newer versions of PostgreSQL, the output of pg_dump can be expected to load into PostgreSQL server versions newer than pg_dump's version. pg_dump can also dump from PostgreSQL servers older than its own version. (Currently, servers back to version 7.0 are supported.) However, pg_dump cannot dump from PostgreSQL servers newer than its own major version; it will refuse to even try, rather than risk making an invalid dump. Also, it is not guaranteed that pg_dump's output can be loaded into a server of an older major version — not even if the dump was taken from a server of that version. Loading a dump file into an older server may require manual editing of the dump file to remove syntax not understood by the older server.

PostgreSQL Backup – Fix pg_dump Memory Allocation Error

After a lot of help from over at the PostgreSQL irc channel it was determined that the database was corrupted. The most likely cause of which was a hardware error on my ageing PC but also and much less likely a potential PG bug. I'm going to have to start again

Best Answer

Related Solutions

Postgresql – pg_dump custom format between postgres versions

PostgreSQL Backup – Fix pg_dump Memory Allocation Error

Related Question