PostgreSQL – Running pg_dump on a Hot Standby Server

backuppostgresqlreplication

Disclaimer: I admittedly haven't tried this yet, but I'm not sure I would know if it wasn't working correctly, so I wanted to ask.

I would like to run a nightly backup job (via pg_dumpall) from a hot standby server running streaming replication, to avoid putting that load on the primary. I've only seen mention of some gotchas people have run into, e.g. here and here, but very little guidance. It's okay if the backup lags behind the primary slightly, as long as it's consistent (which it should be).

My questions are:

Do I really want to do this, or should the backup be done on the primary server? Why?
When doing a dump on the standby, what settings do I need and procedure should I use to do that correctly? e.g. must I stop replication for the duration of the backup?

Best Answer

AFAIK, running pg_dump on a hot standby is one of the major things standbys are useful for. It's perfectly safe, though it isn't perfectly reliable - the dumps can fail if the standby aborts the transaction when it's falling too far behind the master.

The only thing you really need to watch is to make sure the standby is current and is keeping up. If the standby lost its connection to master and fell too far behind, you don't want to be merrily backing up a three week out-of-date standby.

You will need to allow the standby to fall quite far behind the master during the backup, since it'll otherwise have to cancel your pg_dump transaction in order to continue replaying WAL. See the documentation on hot standby, particularly the "handling query conflicts" section, and the max_standby_archive_delay and max_standby_streaming_delay parameters.

Note that the master must be willing to keep enough WAL archives to allow the slave to catch up again.

Related Solutions

Postgresql 9.0 adding new database to streaming replication

Yes, you can restore the backupfile to your hot-standby. But there are some detail you should notice. The following are steps I suggest

create your database user on primary node
create tablespace directory both on the primary node and standby node. (If you don't create the same directory on standby node, then after you create a database on primary node, standby PostgreSQL server will down. So, take care!)
create tablespace on primary node
create database on primary node. If a same name was automatlly created on the standby node then all is okey.
restore the backup file to your primary database. then the standby database will replicate the data from the primary db.

those steps will have an impact on performace, so take care.

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Best Answer

Related Solutions

Postgresql 9.0 adding new database to streaming replication

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Related Question