PostgreSQL 9.4 – Understanding Restore Behavior

backuppg-basebackuppostgresqlpostgresql-9.4restore

I have following: 2 nodes one of which is current production db and another is for fail recovery. I made base backup on first node with following command pg_basebackup -h 127.0.0.1 -U postgres -D bckp -Ft -z -P -R -x and got bacup file base.tar.gz. Now i'm trying to recover it on failover node. I cleared data directory and unzipped there contents of the backup. As I do not need wal logs from master node to stream to this server i should create recovery.conf with at least restore_command specifying command that will copy wal logs from archive directory to xp_log. But it was already there in xp_log. There was file 000000010000000000000017. As i'm new to postgres and linux I decided to move this file to archive directory and set recovery.conf contents to restore_command = 'cp /var/lib/postgresql/archive/%f /var/lib/postgresql/data/%p' then started postgres. It printed ot logs

2015-10-26 14:06:16 UTC [16-3] LOG:  starting archive recovery
2015-10-26 14:06:16 UTC [16-4] LOG:  restored log file "000000010000000000000017" from archive
2015-10-26 14:06:16 UTC [16-5] LOG:  redo starts at 0/17000060
2015-10-26 14:06:16 UTC [16-6] LOG:  consistent recovery state reached at 0/17000160
cp: cannot stat ‘/var/lib/postgresql/backup/000000010000000000000018’: No such file or directory
2015-10-26 14:06:16 UTC [16-7] LOG:  redo done at 0/17000160
2015-10-26 14:06:16 UTC [16-8] LOG:  restored log file "000000010000000000000017" from archive
cp: cannot stat ‘/var/lib/postgresql/backup/00000002.history’: No such file or directory
2015-10-26 14:06:16 UTC [16-9] LOG:  selected new timeline ID: 2
cp: cannot stat ‘/var/lib/postgresql/backup/00000001.history’: No such file or directory
2015-10-26 14:06:16 UTC [16-10] LOG:  archive recovery complete
2015-10-26 14:06:17 UTC [16-11] LOG:  MultiXact member wraparound protections are now enabled
2015-10-26 14:06:17 UTC [30-1] LOG:  autovacuum launcher started
2015-10-26 14:06:17 UTC [1-2] LOG:  database system is ready to accept connections

Finally it restores and works fine but there are unclear moments for me:
With restoring of 000000010000000000000017 i understand, but why it tries to recover 000000010000000000000018 as it's not there? And what are 00000002.history and 00000001.history files and why it tries to find them?

Best Answer

With regard to your questions, please read the documentation completely, as it addresses all of your concerns.

In terms of looking for 000000010000000000000018 when it doesn't exist .. the recovery process just parses all WAL files until it encounters "EOF" .. which usually happens when it tries to read from the next WAL file and it doesn't exist.. So it uses that case as a termination point for the recovery process.. The important bit here is that it reached a checkpoint in the recovery process as stated with consistent recovery state reached at 0/17000160. This means that your recovery is reliable.

Continuous Archiving Documentation Section 24.3.5. - Timelines

Timelines are a way for Postgres to handle "alternate universes" so to speak... For example, if you have a situation where your replication gets messed up in some way and suddenly you have slave promoted into master status (while your master never went down) ... some machines are writing to the master still while some are writing to the slave.. But it's ok, because the writes to the current master will all happen on timeline 1 and the writes to the slave will all happen on timeline 2 .. so you can, theoretically, still merge all the changes together (although I'm sure it wouldn't be easy).. I imagine it's similar to a branch in a version control system. Except there's no real "trunk".

For the postgres gurus out there - if I have anything incorrect, please correct me. :)

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql – Streaming Replication in PostgreSQL

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql – Streaming Replication in PostgreSQL

Related Question