PostgreSQL – How to Perform Live Backup

backupindex-tuningpostgresqlpostgresql-9.3replication

Goal: Maintaining a constant live backup of a postgres cluster. In case of cluster failure, backup/Secondary will contain most up to date view of Primary cluster. This is not a hot standby circumstance. When the Primary postgres machine can be fixed/switched, etc. the new machine's Postgres cluster will be restored using the backup.

Current direction: Using postgres 9.3 streaming replication to a remote postgres server. (Currently unsure whether it will be synchronous or asynchronous, depending on consistency requirements of the system.)

Question: Is this a reasonable solution?

Given this solution, is there any way to tune the Secondary/Slave postgres cluster? For instance, since the backup server won't be responding to queries it doesn't need indices. Can I somehow disable these for the Secondary cluster?

In what ways can I save in disk space and memory on my Secondary/Slave cluster given the knowledge that it's only serving as a 'dumb' backup?

Best Answer

Given this solution, is there any way to tune the Secondary/Slave postgres cluster? For instance, since the backup server won't be responding to queries it doesn't need indices. Can I somehow disable these for the Secondary cluster?

In what ways can I save in disk space and memory on my Secondary/Slave cluster given the knowledge that it's only serving as a 'dumb' backup?

If you do only logical backups (pg_dump), you'll save lots of space - but you'll have much larger data loss windows.

Currently your only other options are WAL archiving and streaming replication, and you're right that they have some big overheads.

You cannot exclude indexes or unwanted tables from the replica stream. Nor can you select only some databases to replicate. It's an all-or-nothing prospect. The only exception is unlogged tables - their contents don't get replicated, and neither do contents of their indexes. So you can use unlogged tables for transient data on the primary to save some I/O, WAL space, and disk space.

There are some logical streaming systems like Londiste and Slony-I, but they're only asynchronous. So you can have a potentially unbounded data loss window if the replica isn't keeping up with the primary.

There's ongoing work into adding synchronous streaming logical replication to PostgreSQL via the BDR project, but it won't be in 9.4 and might not be in 9.5 either, so it's a long-term thing.

Thankfully, a physical replica server has quite low CPU overheads and its disk I/O overhead is usually only moderate. So long as you throw enough spare disk space it it, it won't unduly burden your other application. There isn't really much you can do to tune it, though spreading out checkpoints longer on master can help reduce writes on the replica.

One option you may wish to consider is to sacrifice recovery time for lower overheads. Use WAL archiving with an archive_timeout on the master and monitor log shipping closely. Set up alerts to warn you urgently if logs stop arriving at the destination storage. Then just do a pg_basebackup and use WAL archiving to keep an asynchronous replica. It'll take a lot longer to start up if you ever need to restore it and it needs a ton of disk space, but it has near-zero CPU and RAM overhead and does very efficient sequential disk writes that the I/O subsystem barely notices. You can also have it write to any storage you feel like, not necessarily the spare server. The key thing with this is that you must test restores carefully - make sure you can deal with the long recovery times (and measure them for your data), ensure your base backup procedures are correct, rotate your base backups regularly, etc. I recommend PgBarman, which automates a lot of this for you. You must understand though that WAL archiving is asynchronous - there's always a data loss window of up to archive_timeout at any time, and it can get a lot bigger if your WAL archiving fails due to (eg) running out of disk space, as the master server will not stop or pause if the replication stops.

So for safety, stick with synchronous streaming replication. If you need to, buy a better server.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql – Streaming Replication Failover – how to point second slave at new master

In PostgreSQL 9.2, you should be able to just repoint server C to B by changing it's recovery.conf file, if you are also using log archiving (and have a restore_command defined in your recovery.conf). It should work automatically - but it does require the log archive to work (machine B will send critical information about timeline switching to the log archive, which machine C will then replay).

PostgreSQL 9.3 will be able to deal with this repointing of the slave over pure streaming replication.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql – Streaming Replication Failover – how to point second slave at new master

Related Question