PostgreSQL failover and replication

failoverpostgresqlreplication

I'm evaluating PostgreSQL 9.1 and have few questions related to failover and replication details.

I have few test scenarios. First one with a Master server and few Slaves. In case Master crashes I want one of the Slaves to become a Master. After Master get back to normal state it should sync with other servers in cluster (apply all changes done while it was down) and claim back Master role or become a Slave.

The problems I see with PostgreSQL and current scenario are the following.

1) I don't see built-in tools for detecting Master server outage. I read that pgpool can handle it and create trigger file, I also read that people use Linux heartbeat or similiar tools for this. Okay, I can detect failover and assign a new Master in cluster. Will the other Slaves understand that there is a new Master and they should backup it now?

2) I don't understand the failback procedure. Master and Slave host configurations are different. So will I have two Masters after crashed Master failback? How will servers get back in sync? I saw only manual solutions like "transfer data folder to server and restart it". So what is the solution or best practice or at least key principal here?

3) How should I handle server outage on client side? When I create connection I explicitly specify server IP. Should I develop some kind of ConnectionManager which will know my Master-Slave structure, send requests to Master only and in case of connection loss will switch to backup servers and so on? I read that pgpool can be an entry point for applications and manage connections in correct way. Is pgpool the only solution here? Does it handle failover and failback well?

4) Are there any solutions (commercial also) so I could avoid manually copying the data, reconfiguring PostgreSQL instances and other stuff which should be done by hands? So kind of cluster configuration when everyone in sync, it's clear who is Master and everything switches automatically without operator attention?

According to these threads and articles

Streaming replication and failover on PostgreSQL

Automating failover in PostgreSQL 9.1

http://denishjpatel.blogspot.com/2010/11/possibility-of-graceful-switchover.html

there is no single fully automatic solution to solve these questions. Am I right?

Thanks!

Best Answer

  1. slaves wont understand new master. you should manually do that.
  2. yes they are different and you should create new ones for old master.however old standby will continue to work as a master but you should set max_wal_senders on that node. you should also set pg_hba.conf of the new master after failover. after failover (when nodes changes roles master->slave slave->master), you should transfer new wal files to new standby folders data directory which you set in recovery.conf file. or simply you can use rsync.

  3. may be you can use pgbouncer. this way you will just change the pgbouncer server adres to new master.

  4. EnterpriseDB has some commercial tools. may be you can check them.

and finally yes you are right. there is no single fully automatic solution to solve these questions.