Postgresql – Postgres automatic failover, multiple clients to multiple masters

failoverpostgresql

I'm looking into providing a method of having multiple hot spare PostgreSQL instances to our website.

The majority of implementations that I have found require a single point of failure with PgPool II or PGBouncer proxying the connection for every app server to the correct database.

Example:

App Server 1 \              Spare
App Server 2 -  PGPool II - Master
App Server 3 /              Spare

I'm looking to try a method of automatic fail-over, where each app server will fail-over automatically to a new master DB quickly and correctly. It can still use PGPool or PGBouncer, but each app server should be separate and autonomous with no message/state passing from app server to app server.

Example:

App Server 1 -  PGPool II \ Master 1
App Server 2 -  PGPool II - Master 2
App Server 3 -  PGPool II / Master 3

A maximum time for switch over should be 1-2 seconds.

  1. Does this make sense and is it possible?
  2. Is this an easy to implement via some tool/configuration? If so, how? If not, why?

Best Answer

I can recommend using Slony for PostgreSQL replication from my experience with it.

I can't respond better then the Slony docs section on failover so I'll quote it:

If some more serious problem occurs on the "origin" server, it may be necessary to SLONIK FAILOVER to a backup server. This is a highly undesirable circumstance, as transactions "committed" on the origin, but not applied to the subscribers, will be lost. You may have reported these transactions as "successful" to outside users. As a result, failover should be considered a last resort. If the "injured" origin server can be brought up to the point where it can limp along long enough to do a controlled switchover, that is greatly preferable.

Slony-I does not provide any automatic detection for failed systems. Abandoning committed transactions is a business decision that cannot be made by a database system. If someone wants to put the commands below into a script executed automatically from the network monitoring system, well ... it's your data, and it's your failover policy.

To your question: does it make sense?, personally, I would say, no. I agree with the Slony docs that it's better to have a human understand why there was a failure and then manually (and quickly) failover.

Having managed PostgreSQL for more than a decade, unless there's a hardware failure, it's difficult to make it seriously fail. And I recommend involving a human to confirm hardware failure and check the state of the system before failing over.

If you are getting started with a high-availability PostgreSQL, I would first look at the hardware, tuning and RAID configuration behind your PostgreSQL database before considering automatic failover.