Check out repmrg:
repmgr is a set of open source tools that helps DBAs and System
administrators manage a cluster of PostgreSQL databases..
By taking advantage of the Hot Standby capability introduced in
PostgreSQL 9, repmgr greatly simplifies the process of setting up and
managing database with high availability and scalability requirements.
repmgr simplifies administration and daily management, enhances
productivity and reduces the overall costs of a PostgreSQL cluster by:
- monitoring the replication process; allowing DBAs to issue high
- availability operations such as switch-overs and fail-overs.
It does two things:
- repmgr: command program that performs tasks on your cluster and then exits
- repmgrd: management and monitoring daemon that watches the cluster and can automate remote actions.
For automatic failover, repmgrd does the trick and is not a SPOF in your network, like pgPool. However, it is still important to monitor all deamons and bring them back up after failure.
Version 2.0 is about to be released, including RPM's.
If you are interested in synchronous DB Replication and Failover, I have a suggestion. You could look into using two tools:
DRBD (Disk Replicated Block Device) provides Disk Level Replication (synchronous) between disks on two different servers. The disk changes are replicated over the network. It is best to use a crossover cable on eth2 for the NICs using the 192.168.x.x netblock.
ucarp allows for DBVIP management and failover between two different servers.
You can use them in conjunction as follows:
- DBServer1 has IP 10.1.2.30
- DBServer2 has IP 10.1.2.40
- DB VIP is 10.1.2.70
Setup ucarp on two different servers in such a way that each ucarp instance communicates with the other along a virtual router ID
DBServer1 will have
/usr/local/sbin/ucarp -v 200 -p sometagname --interface=eth2 -a 10.1.2.70 -s 10.1.2.30 -b 3 -r 5 --upscript=/usr/local/sbin/vip-up.sh --downscript=/usr/local/sbin/vip-down.sh --neutral -z -B
DBServer2 will have
/usr/local/sbin/ucarp -v 200 -p sometagname --interface=eth2 -a 10.1.2.70 -s 10.1.2.40 -b 4 -r 5 --upscript=/usr/local/sbin/vip-up.sh --downscript=/usr/local/sbin/vip-down.sh --neutral -z -B
What's the reason -b (broadcasts) is 3 on one server and 4 on the other? In case both servers are restarted, the server with the lowest -b will bring ucarp first. As for -r, that is dead ratio. Thus, DBServer1 will come up in 15 seconds (3X5) and DBServer2 comes up in 20 seconds (4X5).
Let's say DBServer1 crashes. ucarp on DBServer2 will check for 20 seconds for a handshaking fro ucarp on DBServer1. If nothing is detected, the vip-up.sh script on DBServer2 will take control of the DBVIP.
OK all of this is just for failover and DBVIP management. What about the PostgreSQL database? Surprisingly, PostgreSQL is only running on one server at a time. Whoever has the DBVIP should have DRBD in Primary state. This ties back in to the vip-up script. How do you script it?
Here is /usr/local/sbin/vip-down.sh script (conceptual)
#! /bin/sh
exec 2> /dev/null
/sbin/drbdadm disconnect drbd0
/sbin/drbdadm primary drbd0 &&
/bin/mount postgres-data-folder /mnt/drbd0
/sbin/ip addr add 10.1.2.70/24 dev eth2
/sbin/service postgres start &&
/usr/local/sbin/vipmon.sh &
touch /tmp/vip-up
Here is /usr/local/sbin/vip-down.sh script (conceptual)
#! /bin/sh
exec 2> /dev/null
/sbin/service postgres stop
/bin/umount -l /dev/drbd0
/sbin/drbdadm secondary drbd0
/sbin/ip addr del 10.1.2.70/24 dev eth2
/bin/rm /tmp/vip-up
/usr/bin/killall -9 ucarp
/usr/local/sbin/vip-down.sh
kill `ps auxww | grep vipmon | awk '{print $2}'`
All you need to do is setup pg_hba.conf to make sure all users come via 10.1.2.70
Essentially, vip-up performs this sequence
- Have DRBD go Primary
- Mount postgres data folder on /dev/drbd0
- Startup postgres
- start vipmon process
Contrawise, vip-down performs this sequence
- Shutdown postgres
- unount /dev/drbd0
- Have DRBD go Secondary
What about vipmon.sh? You could script that to check, in an indefinite loop, the state of postgres (it is running), check the DRBD device (can you still write to the posts data folder)
With this setup, you have postgres on the DRBD Primary and a disk level copy of the postgres data folder on the other DBServer (the DRBD Secondary). postgres is not running on the DRBD Secondary.
When a failover happens, you just need time for ucarp to detect is is safe to mount the postgres data, startup postgres, and acivate the vipmon script.
What great about this setup is that in the event of a failover, the DBServer that becomes DRBD Primary should have a 100% disk level copy of the server you failed away from. Thus starting postgres should level you at a consistent state. The transaction logs (in pg_xlog) should be in tact (less the intermittency due to failover)
I hope these suggestions help. Although I am a MySQL DBA, I use MySQL and DRBD at my employer's web hosting company on a regular basis. I installed MySQL/DRBD in the aforementioned manner. I did this once for a client using PostgreSQL/DRBD. It works great. It's Open Source. You just need to perform due diligence in learning DRBD and ucarp. This would include reconnecting DRBD after a failover, handling split brain scenario where both DB Servers go Primary, and things like these.
Best Answer
May I suggest upgrading to PostgreSQL 9.3 (freshly out!)? Some write that the new Streaming-Only Remastering capability turns a replica in a set of replicas into the new master for all of the other replicas. Apparently, it's even more useful in combination with the cascading replication you are using in 9.2
In PostgreSQL 9.2, cascading replication really requires file-based WAL archiving to work in case of disaster recovery. PostgreSQL 9.3 will not require that: you can set up large replication clusters of whatever length you want without any WAL archiving.
More information: