You asked:
how postgreSQL will handle the recovery with a pg_data content containing some files which are inconsistent.
pg_start_backup()
ensure the data file is at least as new as the checkpoint. On recovery, the logs are applied.
If the data is old, the log will update it..
If the data is new, the log will have same content. There is no hurt writing it again.
The data are never newer then the log, because the logs are write ahead (WAL).
You asked:
... xfs-freeze
...
xfs-freeze
is alike to pg_start_backup()
, it don't take a snapshot. You need a volume manager to do that.
You asked:
... why do create tablespace & create database statements are unsupported if the WAL can replay everything?
It is supported, just some little gotcha. See http://www.postgresql.org/docs/8.1/static/backup-online.html :
23.3.5. Caveats
CREATE TABLESPACE commands are WAL-logged with the literal absolute
path, and will therefore be replayed as tablespace creations with the
same absolute path. This might be undesirable if the log is being
replayed on a different machine. It can be dangerous even if the log
is being replayed on the same machine, but into a new data directory:
the replay will still overwrite the contents of the original
tablespace. To avoid potential gotchas of this sort, the best practice
is to take a new base backup after creating or dropping tablespaces.
The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages
:
http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
After the rsync, did you really run what you show?:
pgsql -c "select pg_stop_backup();";
Since there is, so far as I know, no pgsql
executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql
, because otherwise I don't see how the slave would have logged such success messages as:
Log: consistent recovery state reached at 0/BF0000B0
and:
Log: streaming replication successfully connected to primary
Did you try connecting to the slave at this point? What happened?
The "Success. You can now start..." message you mention is generated by initdb
, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:
The only ways I have restarted Postgres is through the service
postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands.
After I receive this error, I kill all processes and again try to
restart the database...
Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:
log_line_prefix = '[%m] %p %q<%u %d %r> '
The recovery.conf
script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?
Best Answer
In order to restore a backup, you need to have the base archive of all the data files, plus a sequence of xlogs. An "incremental backup" can be made, of just some more xlogs in the sequence. Note that if you have any missing xlogs, then recovery will stop early.
So it's not clear here exactly what you've done, unless you changed the level of detail you're mentioning part way through your list. When you make a copy of more segments that have been put into the archive directory after adding more data, you need to ensure that all the data has been archived: using
pg_start_backup
andpg_stop_backup
usually does this for you, but you don't mention it the second time. You need to at least do apg_switch_xlog
to have the current xlog segment immediately archived.If you think that recovery is not consuming enough xlog segments, look at the recovery log to see if it tried to take them all. And have your recovery command make some sort of mark on which xlog files were taken.