The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
how is it possible for the Slave to have more pg_xlog/ log files than the Master?
The whole point of archiving WAL on the master to some external location is to let the master then delete it to free space in its pg_xlog
, while replicas might still need it.
A replica can have more archives in pg_xlog
than the master, and older ones, if it's lagging behind the master due to failure to keep up with replay. However, with pg_standby
that shouldn't happen - the archive might contain more xlogs, but the replica should only be reading them on-demand.
It's hard to be specific, because you've given a broad description of the issue rather than actual directory listings, and haven't explained the exact steps you followed to set up the replica. Or shown the exact log file output from the replica. So the best I can do is "it sounds like the replica setup is broken somehw".
to resync the servers in warm standby mode: do I have to do pg_basebackup again (to essentially copy Master's /data
and /pg_xlog
directory) to the Slave?
Assuming that here /data
is the main datadir, containing global
, base
, pg_clog
, etc, and that pg_xlog
is the transaction logs from a different disk: Yes, that's right.
You must use the pg_basebackup
command, though, or follow the instructions in the manual for correct file system level copies using pg_start_backup()
and rsync/cp.
You also have to make sure you've stopped the replica first. Overwriting its datadir while it's running will make it quite upset.
Streaming replication vs warm standby
Hot vs warm standby is orthogonal to streaming vs log shipping replication.
What you're trying to do is use log shipping instead of streaming replication. It doesn't matter for this purpose if the replica is a hot standby or a warm standby, i.e. whether or not it's accepting queries.
Personally I recommend using both methods - use streaming, and fall back to log shipping if there's a problem with streaming. PostgreSQL does this automatically if both are configured.
Best Answer
The wal files are archived when they are no longer needed in the pg_xlog directory (usually when there are 16 segments and another one is needed - the one that is rolled over and reused is first archived). You can force the system to archive some files to verify manually if you want - to do this run the following:
Once pg_stop_backup() is finished, you should see a few files in the archive directory.
Postgres uses the archive status folder to record "notes" about archiving attempts and so on with regard to the WAL files.