PostgreSQL 9.4 – Understanding Restore Behavior

backuppg-basebackuppostgresqlpostgresql-9.4restore

I have following: 2 nodes one of which is current production db and another is for fail recovery. I made base backup on first node with following command pg_basebackup -h 127.0.0.1 -U postgres -D bckp -Ft -z -P -R -x and got bacup file base.tar.gz. Now i'm trying to recover it on failover node. I cleared data directory and unzipped there contents of the backup. As I do not need wal logs from master node to stream to this server i should create recovery.conf with at least restore_command specifying command that will copy wal logs from archive directory to xp_log. But it was already there in xp_log. There was file 000000010000000000000017. As i'm new to postgres and linux I decided to move this file to archive directory and set recovery.conf contents to restore_command = 'cp /var/lib/postgresql/archive/%f /var/lib/postgresql/data/%p' then started postgres. It printed ot logs

2015-10-26 14:06:16 UTC [16-3] LOG:  starting archive recovery
2015-10-26 14:06:16 UTC [16-4] LOG:  restored log file "000000010000000000000017" from archive
2015-10-26 14:06:16 UTC [16-5] LOG:  redo starts at 0/17000060
2015-10-26 14:06:16 UTC [16-6] LOG:  consistent recovery state reached at 0/17000160
cp: cannot stat ‘/var/lib/postgresql/backup/000000010000000000000018’: No such file or directory
2015-10-26 14:06:16 UTC [16-7] LOG:  redo done at 0/17000160
2015-10-26 14:06:16 UTC [16-8] LOG:  restored log file "000000010000000000000017" from archive
cp: cannot stat ‘/var/lib/postgresql/backup/00000002.history’: No such file or directory
2015-10-26 14:06:16 UTC [16-9] LOG:  selected new timeline ID: 2
cp: cannot stat ‘/var/lib/postgresql/backup/00000001.history’: No such file or directory
2015-10-26 14:06:16 UTC [16-10] LOG:  archive recovery complete
2015-10-26 14:06:17 UTC [16-11] LOG:  MultiXact member wraparound protections are now enabled
2015-10-26 14:06:17 UTC [30-1] LOG:  autovacuum launcher started
2015-10-26 14:06:17 UTC [1-2] LOG:  database system is ready to accept connections

Finally it restores and works fine but there are unclear moments for me:
With restoring of 000000010000000000000017 i understand, but why it tries to recover 000000010000000000000018 as it's not there? And what are 00000002.history and 00000001.history files and why it tries to find them?

Best Answer

With regard to your questions, please read the documentation completely, as it addresses all of your concerns.

In terms of looking for 000000010000000000000018 when it doesn't exist .. the recovery process just parses all WAL files until it encounters "EOF" .. which usually happens when it tries to read from the next WAL file and it doesn't exist.. So it uses that case as a termination point for the recovery process.. The important bit here is that it reached a checkpoint in the recovery process as stated with consistent recovery state reached at 0/17000160. This means that your recovery is reliable.

Continuous Archiving Documentation Section 24.3.5. - Timelines

Timelines are a way for Postgres to handle "alternate universes" so to speak... For example, if you have a situation where your replication gets messed up in some way and suddenly you have slave promoted into master status (while your master never went down) ... some machines are writing to the master still while some are writing to the slave.. But it's ok, because the writes to the current master will all happen on timeline 1 and the writes to the slave will all happen on timeline 2 .. so you can, theoretically, still merge all the changes together (although I'm sure it wouldn't be easy).. I imagine it's similar to a branch in a version control system. Except there's no real "trunk".

For the postgres gurus out there - if I have anything incorrect, please correct me. :)