The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.
So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.
Best Answer
To generate 10k WAL files overnight, your database is obviously very busy. It is not surprising that the non-WAL of such a busy database might also grow rapidly. Indeed, why wouldn't it? If your database were that busy but not growing, that is the weird thing which would be in need of explanation, based on what you know about how your database is used.
Do you have hot_standby_feedback turned on?
If your database is still up, you can use these queries to figure out where all the space is going, although I usually just start with '\l+' and '\dit+' in psql (those ones don't sort by size, so if you have a lot of objects to scan through, the other ones may be better). If your database is not up, you can use OS tools like 'du' and 'ls' to to figure out where the space is going. But translating file names back to table names without a running database will not be easy.