PostgreSQL hot standby promotion, multiple hot standbys

failoverpostgresqlreplication

I'm trying to work out failover and disaster recovery procedures for a
cluster of three servers. Streaming replication is being used with a high
wal_keep_segments, no log shipping is happening. I need to avoid the
several hours it takes to rebuild a hot standby from scratch.

ServerA is the master.
ServerB is a streaming hot standby and prefered failover server.
ServerC is a streaming hot standby.

For a planned failover, maintenance on ServerA:

  1. Shutdown ServerB & ServerC
  2. Shutdown ServerA
  3. Copy pg_xlog from ServerA to ServerB and ServerC
  4. Reconfigure ServerB as master, start it up.
  5. Reconfigure ServerC as streaming hot standby of ServerB. Start it.
  6. After maintenance, reconfigure ServerA as streaming hot standby of
    ServerB. Start it.

For an unplanned failover, ServerA has exploded:

  1. Run 'SELECT pg_last_xlog_receive_location()' on ServerB and ServerC, determining which is most up to date.
  2. Shutdown ServerB and ServerC
  3. If ServerC is more up to date, copy pg_xlog from ServerC to ServerB.
  4. Reconfigure ServerB as master, start it up.
  5. Reconfigure ServerC as streaming hot standby of ServerB, start it up.

Does this look correct to people?

Am I going to end up in trouble copying files into pg_xlog like this on a
busy system? With further reading, I suspect I need a recovery.conf on the new master with a restore_command pointing to an archive of the old master's WAL files (if available).

Is it overengineered? eg. will a master ensure everything is streamed to
connected hot standbys before a graceful shutdown?

Best Answer

I don't think copying pg_xlog after you've lost the master is going to allow you to make ServerB or ServerC the new master. If you might need to remaster, you really need to add log-shipping archives as well (this is a current limitation in Postgres, at least through 9.2).

Also, if the master has been shut down cleanly, you don't need to copy anything.