Check out repmrg:
repmgr is a set of open source tools that helps DBAs and System
administrators manage a cluster of PostgreSQL databases..
By taking advantage of the Hot Standby capability introduced in
PostgreSQL 9, repmgr greatly simplifies the process of setting up and
managing database with high availability and scalability requirements.
repmgr simplifies administration and daily management, enhances
productivity and reduces the overall costs of a PostgreSQL cluster by:
- monitoring the replication process; allowing DBAs to issue high
- availability operations such as switch-overs and fail-overs.
It does two things:
- repmgr: command program that performs tasks on your cluster and then exits
- repmgrd: management and monitoring daemon that watches the cluster and can automate remote actions.
For automatic failover, repmgrd does the trick and is not a SPOF in your network, like pgPool. However, it is still important to monitor all deamons and bring them back up after failure.
Version 2.0 is about to be released, including RPM's.
The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
Best Answer
Well, according to the instructions you should use the postgres 12 tools against the 9.1 database when dumping for an upgrade
so your first step it to upgrade your backup procedure to use the pg_dump from postgres 12
If your network between the two VPSs has low tatency you can do that over the network, but if not you'll need to find a different way.
Possibly copy the 9.1 data files onto the 12 server using
rsync
and the runpg_upgrade --link
over those files to upgrade the 9.1 files to 12you can do an initial pass with rsync while the database service is running and then shut it down for the final pass.