Postgresql – Automating failover in PostgreSQL 9.1

How does one setup two identical servers for automatic failover in PostgreSQL 9.1.

Centos 5
PostgreSQL 9.1 compiled from source
The postgres user account exists on both machines and has a ssh passwordless key to connect to both machines.

My Current Setup :

Master server configuration:

postgresql.conf:

listen_address = '*'
wal_level = hot_standby
max_wal_senders = 3
checkpoint_segments = 16    
wal_keep_segments = 8 
archive_mode = on    
archive_command = 'cp "%p" /opt/pgsql91/archive/"%f"'

pg_hba.conf:

 host  replication   all   10.0.66.1/32      trust
 host  replication   all   10.0.66.2/32      trust

Standby Server

postgresql.conf and pg_hba.conf are identical to what is configured on the master server.

recovery.conf:

 standby_mode = 'on'
 primary_conninfo = 'host=10.0.66.1'
 trigger_file = '/opt/pgsql91/data/trigger.txt'

Thanks to hzRoot, I now understand how to switch the server from standby to master.

Using the following commands, I can synchronize the new slave with the new master and then get replication backup and running.

On the new master (10.0.66.2)

su – postgres
touch trigger.txt in /opt/pgsql91/data/
recovery.conf becomes recovery.done
psql -c ";SELECT pg_start_backup('backup', true)";
rsync -a -v -e ssh /opt/pgsql91/data/ 10.0.66.1:/opt/pgsql91/data/ –exclude postmaster.pid
psql -c ";SELECT pg_stop_backup()";

On the new slave (10.0.66.1)

create the recovery.conf : cp recovery.done to recovery.conf
vi recovery.conf change ip address : primary_conninfo = 'host=10.0.66.2'
start postgresql

So my questions are now :

Is this the correct way to switch roles?
Has anyone automated this process, if so what did you do?
If synchronous replication is enabled, I noticed the new master server won't commit any transactions because it is waiting for the slave to respond. There is no slave however because the other server, the old master is down. Is this correct or do I need to temporarily disable synchronous replication while the new slave is down?

Best Answer

Check out repmrg:

repmgr is a set of open source tools that helps DBAs and System administrators manage a cluster of PostgreSQL databases..

By taking advantage of the Hot Standby capability introduced in PostgreSQL 9, repmgr greatly simplifies the process of setting up and managing database with high availability and scalability requirements.

repmgr simplifies administration and daily management, enhances productivity and reduces the overall costs of a PostgreSQL cluster by:

monitoring the replication process; allowing DBAs to issue high

availability operations such as switch-overs and fail-overs.

It does two things:

repmgr: command program that performs tasks on your cluster and then exits
repmgrd: management and monitoring daemon that watches the cluster and can automate remote actions.

For automatic failover, repmgrd does the trick and is not a SPOF in your network, like pgPool. However, it is still important to monitor all deamons and bring them back up after failure.

Version 2.0 is about to be released, including RPM's.

Best Answer

Related Solutions

Postgresql 9.1 replication delay

Postgresql – Does PostgreSQL 9.1 Streaming Replication catch up after a lag without WAL archiving

Related Question