The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.
So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.
Best Answer
First, never, ever, ever delete anything from
pg_xlog
, ever. You will corrupt your database. Now, on to what you've misunderstood about all this:PITR doesn't work like that. It doesn't revert a running database to an older point in time.
How it works is that you set it up first, before you need to restore:
Set up WAL archiving, so you record WAL to a safe location.
Before you need to revert, you make a base backup. You can use
pg_start_backup()
and rsync for this, but it's easiest and safest to usepg_basebackup()
. See the documentation for details on creating base backups. You must do this after setting up WAL archiving, so the archive contains all WAL from the moment the basebackup is taken onward.Continue to run with WAL archiving, periodically taking a new base backup to reduce the amount of old WAL you have to keep and the amount of recovery time you need. Automated tools like pgbarman help with this.
When you need to create a historical copy of the database state to recover from something, you:
Make a copy of the base backup to writeable storage
Set the copy up with a
recovery.conf
file that specifies an appropriaterestore_command
for fetching WAL and therecovery_target_time
you want to replay up to. Settingprimary_conninfo
makes no sense, since you won't be using this as a warm or hot standby streaming replica.Start the copy, and allow it to replay WAL until it reports that recovery is complete.
Connect to the copy, and do whatever you need to grab the data you wish to recover out of it. You may use
pg_dump
to extract a database or parts of it, useCOPY
to extract contents of tables, etc.If you wish to completely revert a database you can delete the datadir, replace it with the copy, and follow the procedure above to replay the copy up to a point in time.
What you cannot do is take an existing database and roll it back to some point in the past. WAL replay only goes forward in time. So you must restore an old base backup then allow it to replay WAL up to the point in time you want.