Postgresql – Postgres automatic failover, multiple clients to multiple masters

failoverpostgresql

I'm looking into providing a method of having multiple hot spare PostgreSQL instances to our website.

The majority of implementations that I have found require a single point of failure with PgPool II or PGBouncer proxying the connection for every app server to the correct database.

Example:

App Server 1 \              Spare
App Server 2 -  PGPool II - Master
App Server 3 /              Spare

I'm looking to try a method of automatic fail-over, where each app server will fail-over automatically to a new master DB quickly and correctly. It can still use PGPool or PGBouncer, but each app server should be separate and autonomous with no message/state passing from app server to app server.

Example:

App Server 1 -  PGPool II \ Master 1
App Server 2 -  PGPool II - Master 2
App Server 3 -  PGPool II / Master 3

A maximum time for switch over should be 1-2 seconds.

Does this make sense and is it possible?
Is this an easy to implement via some tool/configuration? If so, how? If not, why?

Best Answer

I can recommend using Slony for PostgreSQL replication from my experience with it.

I can't respond better then the Slony docs section on failover so I'll quote it:

If some more serious problem occurs on the "origin" server, it may be necessary to SLONIK FAILOVER to a backup server. This is a highly undesirable circumstance, as transactions "committed" on the origin, but not applied to the subscribers, will be lost. You may have reported these transactions as "successful" to outside users. As a result, failover should be considered a last resort. If the "injured" origin server can be brought up to the point where it can limp along long enough to do a controlled switchover, that is greatly preferable.

Slony-I does not provide any automatic detection for failed systems. Abandoning committed transactions is a business decision that cannot be made by a database system. If someone wants to put the commands below into a script executed automatically from the network monitoring system, well ... it's your data, and it's your failover policy.

To your question: does it make sense?, personally, I would say, no. I agree with the Slony docs that it's better to have a human understand why there was a failure and then manually (and quickly) failover.

Having managed PostgreSQL for more than a decade, unless there's a hardware failure, it's difficult to make it seriously fail. And I recommend involving a human to confirm hardware failure and check the state of the system before failing over.

If you are getting started with a high-availability PostgreSQL, I would first look at the hardware, tuning and RAID configuration behind your PostgreSQL database before considering automatic failover.

Related Solutions

Mysql – JBoss don’t fail over MySQL slave using datasource

Indeed it seems that I "forgot" one parameter

<autoReconnect>true</autoReconnect>

I said forgot with precaution, because as the per doc says : https://community.jboss.org/wiki/SetUpAMysqlDatasource

Automatic reconnect

WARNING: DO NOT ENABLE AUTO RECONNECT IF YOU ARE USING MANAGED TRANSACTIONS

The auto reconnect does not preserve transaction state in the database.

It is ok if you are ALWAYS using auto-commit=true.

I wasn't sure that I could activate it ... and I'm still questionning.

EDIT SPLIT thread, because my first question is (auto)answered. I will add some information about Alfresco, auto commit and managed transaction if I get some.

read-write mode issue : https://stackoverflow.com/questions/10945261/jboss-alfresco-cannot-write-on-slave-after-failover

Postgresql – Streaming Replication Failover – how to point second slave at new master

In PostgreSQL 9.2, you should be able to just repoint server C to B by changing it's recovery.conf file, if you are also using log archiving (and have a restore_command defined in your recovery.conf). It should work automatically - but it does require the log archive to work (machine B will send critical information about timeline switching to the log archive, which machine C will then replay).

PostgreSQL 9.3 will be able to deal with this repointing of the slave over pure streaming replication.

Best Answer

Related Solutions

Mysql – JBoss don’t fail over MySQL slave using datasource

Postgresql – Streaming Replication Failover – how to point second slave at new master

Related Question