PostgreSQL – Shipping Big WAL Archives to Hot Standby

postgresqlpostgresql-9.4write-ahead-logging

I'm looking to extend my PostgreSQL 9.4 master with a few slaves in hot standby read-only for load balancing.

The idea would be to update the slaves only at defined times (once every 24/48 hours) to avoid migration issues with the application server code and also because the "freshness" of the slaves is not so important.

What would be the effect of suddenly introducing a 1-2 GB of WAL archives to the WAL restore folder on the slave? Would there be a big performance effect on the incoming queries to the slave? Would the slave be available for queries while the WAL logs are restored into the DB?

Alternatively, I've been thinking about copying the whole data folder over from the master with rsync. The negatives I see here are that the required disk space on the slave is 2 x the data folder (extra non-running copy needed to do rsync on), and that the DB server on the slaves has to be restarted to switch to the new data folder.

Best Answer

First, in case there's an implicit misconception here: Note that the restore is not atomic. Your queries on the replica will see the intermediate states between the prior and new master state. It'll happen faster than it did on the master - assuming the replica replays reasonably fast - but clients will still see it and there's no way around that.

If you're trying to give clients an apparently atomic switch between old and new states you can't do it just by copying a bunch of WAL over.

What would be the effect of suddenly introducing a 1-2 GB of WAL archives to the WAL restore folder on the slave?

Minimal. The only significant effect will be the disk I/O of writing it as you copy it.

Would there be a big performance effect on the incoming queries to the slave?

Mild to moderate, probably mild. It'll have more impact than streaming them continuously would since the replica will be running restore flat-out so it'll churn the disk, kernel buffer cache and shared_buffers more.

Hard to say in detail, depends on workload, RAM, disk subsystem, etc.

Would the slave be available for queries while the WAL logs are restored into the DB?

Yes.

Another alternative you may not have considered. It still needs 2x the disk, but:

Run pgbouncer in front of the replica.
pg_basebackup the replica locally to a temporary datadir. Yes, you can take a pg_basebackup from a replica.
Copy the WAL to the copied replica.
Start it, wait for it to catch up.
When it's caught up, change pgbouncer's configuration to point to the new DB and sighup it to get it to reload its configuration.
Shut down the old replica

... and repeat, switching between the two datadirs.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql streaming replication and wal archive

Further testing revealed that the different WAL files were mostly those that had been rotated but not yet written to. The new slave would try to archive these (possibly fixed in 9.2) but they wouldn't be used for recovery.

The other problem is the overwriting. We disabled the overwrite (instead storing with a new filename so we at least keep the WAL just in case) and everything works... both failover and later restore.

NOTE: my earlier answer here that was completely wrong. See the edit log if you need to see how wrong.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql streaming replication and wal archive

Related Question