The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
UPDATE: it looks like this is a bug in the Debian/Ubuntu packaging of PostgreSQL, where the init scripts - extremely unsafely - kill -9
the postmaster and remove postmaster.pid
. See this post on pgsql-general.
See:
Personally, I've gone and edited my init scripts to get rid of this rather hairy and dangerous code.
The original answer
Please go back in the logs to before the restart and see if you can find any errors. WAL corruption absolutely should not happen, so if it has it's important to look into why. If you can upload a copy of the whole log to a pastebin or something that'd be really handy.
The only time where WAL corruption is an accepted possibility with PostgreSQL is if you are running with fsync=off
set in PostgreSQL.conf and your system crashes or unexpectedly loses power. If that's not the cause, it'd be really good to look into what happened.
Please do not use pg_resetxlog
without some idea why your xlogs are damaged. If the transaction logs become damaged something is badly wrong and you need to find out what. If you band-aid it now, you might be bitten by it later when you care about the data.
The transaction logs exist for a reason and just removing them can leave your tables and indexes in an inconsistent, damaged state. After a pg_resetxlog
it's a very good idea to pg_dumpall
, drop your cluster, re-initdb, and reload the DB. As I said, though, this should not happen and you should look back in the logs for clues about what could've happened.
Now read the comments
Best Answer
Locate your
postgresql.conf
in the datadirFind the section that looks like this
simply uncomment and use these parameters. Then, restart the postgres service,
Since you are running postgresql in Windows, it is possible that you may not be allowed to edit
postgresql.conf
while the service is up. If that is the case:postgresql.conf
Give it a Try !!!