Postgresql – How does the recovery strategy of PostgreSQL exactly work

postgresqlpostgresql-11recovery

I'm testing and trying to figure out how the backup and recovery strategy in PostgreSQL 11 works but it does not work as expected. When I recover the backup, I don't get the correct state. I get any state after a given timestamp. In detail:

I have this setup in database:

wal_level = replica 
max_wal_senders = 10
archive_mode = on

I have executed the following commands:

postgres=# select pg_create_physical_replication_slot('slot1');
pg_basebackup -D /media/extern/postgresql_basebackup/
pg_receivewal -D /media/extern/postgresql_archive -S slot1 -v

pg_receivewal is writing into postgresql_archive, this seems to work.

then I insert something into the database, let's say at 16:21.
I wait some minutes, let's say until 16:27 and delete all the insert.

Then I want to recover the inserts by doing…

service postgresql stop

(do I need to stop pg_receivewal as well)?

# mv main main.before_recovery
# cp -rp /media/extern/postgresql_basebackup main

nano /var/lib/postgresql/11/main/recovery.conf

I add the following content

restore_command = 'cp -r /media/extern/postgresql_archive/%f %p'
recovery_target_time='2018-01-06 16:22:00'

Then I execute

# chown postgres:postgres recovery.conf 
# chmod 700 recovery.conf 

service postgresql start

After startup, the file changes to recovery.done, but the state in database is wrong. It didn't restore the data. The database is still empty, so the PITR didn't work, why?

What should I do In case if everything was successful after recovery?. I'm unsure about these questions:

Creating a new slot2 by executing pg_create_physical_replication_slot('slot2'); and streaming from this slot by pg_receivewal -D /media/extern/postgresql_archive -S slot2 -v?
Doing a pg_receivewal -D /media/extern/postgresql_archive -S slot1 -v again?
Do I have to delete anywhen any data in postgresql_archive? (-> Probably not)
When do I have to create another base backup?
When and why do I execute select pg_switch_wal(); in this recovery concept?
How do I deal with new 'timelines' after recovery? I understand the concept of the timelines, but I'm unsure how I should proceed with it after recovery.

[UPDATE]
jjanes pointed out, that the date is wrong. I corrected it, but I still get an empty database after recovery. I tried these steps after I recovered the database, which is now empty:

2019-01-08, 18:50: create a new base backup from this empty database
to have a good starting point.

2019-01-08, 18:55 created slot1

pg_receivewal -D /media/extern/postgresql_archive -S slot1 -v

2019-01-08, 18:57 insert

2019-01-08, 19:38 delete

service postgresql stop

# mv main main.before_recovery
# cp -rp /media/extern/postgresql_basebackup main
# chmod 700 main

# nano /var/lib/postgresql/11/main/recovery.conf

restore_command = 'cp -r /media/extern/postgresql_archive/%f %p'
recovery_target_time='2019-01-08 19:20:00'

# chown postgres:postgres recovery.conf 
# chmod 700 recovery.conf 

# service postgresql start

(btw: I saw that it starts by itself new timelimes after recovery.)

[UPDATE]
I activated in postgresql.conf

checkpoint_flush_after = 256kB 
checkpoint_timeout = 30s

I tried to restore the database by using the previous method but without recovery_target_time and without any DELETE statement. I just wanted to look if the current full state can be rebuilt at all. Additionally I renamed the current wal file 00000007000000000000002D.partial to 00000007000000000000002D after stopping the database and before recovering the data.

I'm currently not sure which of these things worked, but at least, the non-PITR recovery seem to work, because I was able to restore the latest data by installing the (empty) base backup and the wal files. At least a step into the right direction, but this only works when you don't miss any shipped wal files. Here, question 2 comes into play: What is the correct workflow to handle the "next shipping iteration".
At first, you can't start pg_receivewal -D /media/extern/postgresql_archive -S slot1 -v again without deleting the shipped WALs in the external folder. Otherwise it would not start shipping the logs and it would abort with connection errors. This means that you can throw away your old base backup, because when you try the recover again with the old base backup, you would miss all the wal logs between base backup and last recovery point (because you deleted them). That also means that you have to create a new base backup immediately from the recovery point state. When you start log shipping again, you have no "gap" then between the new base backup and the wal files.

[UPDATE]
I got PITR running with the same approach, so I was able to "undo" a delete now. After recovery and service postgresql start, the database remains in read only mode. I had to execute select pg_wal_replay_resume(); to make it RW again. As already mentioned, I can't do WAL shipping again. When I don't modify the archive folder and try to restart the WAL to ship inside the same folder, I get the following exception:

pg_receivewal: could not send replication command "START_REPLICATION":
ERROR: requested starting point 0/34000000 on timeline 11 is not in
this server's history DETAIL: This server's history forked from
timeline 11 at 0/3310ADA8.

Sure, I could do log shipping to a new archive folder or something, but I guess this is not the professional way to do that. What is the common used strategy to resume log shipping after recovery?

Best Answer

I've done a test following these guidelines:

1) create a new base backup  
2) create slot  
3) launch pg_receivewal -D PATH -S slotname -v
4) delete records
5) postgresql stop / delete PGDATA
6) cp -rp postgresql_basebackup PGDATA
7) edit recovery.conf
   restore_command = 'cp -ir /PATH/%f %p'
   recovery_target_time=timestamp
8) postgresql start

and PITR worked as aspected. Following what I did with comments. I have a fresh postgres installation where I created a db (test) and one table t1 whith 200 rows created with a simple loop. Contents:

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d t1
                    Table "public.t1"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 col1   | character varying(20) |           |          |

test=# select * from t1 limit 5;
 col1
-------
 1ABCD
 2ABCD
 3ABCD
 4ABCD
 5ABCD
(5 rows)

test=# select count(1) from t1;
count
-------
   200
(1 row)

STEP 1), 2), 3)

First start from a clean situation:

    postgres=# select now();
              now
-------------------------------
 2019-01-11 15:19:19.441022+01
(1 row)

postgres=# select * from pg_catalog.pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)

postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/B000190
(1 row)

$ cd /PGDATA-11/pg_wal/
$ ls -lrt
total 32768
drwx------ 2 postgres postgres        6 Jan 11 15:15 archive_status
-rw------- 1 postgres postgres 16777216 Jan 11 15:20 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jan 11 15:20 000000010000000000000002

Create a backup:

$ /usr/pgsql-11/bin/pg_basebackup -D /home/postgres/HOT-BACKUP/  

postgres=# select pg_create_physical_replication_slot('test1');  
pg_create_physical_replication_slot
-------------------------------------
 (test1,)
(1 row)  

$ /usr/pgsql-11/bin/pg_receivewal -D /home/postgres/BCK-PITR-TEST/ -S test1 -v -n &
[1] 23350
$ pg_receivewal: starting log streaming at 0/4000000 (timeline 1)

I started pg_receivewall with "-n" to make it stop when postgres server goes down, and put it in background.

$ ls -rlt /home/postgres/BCK-PITR-TEST/
total 16384
-rw------- 1 postgres postgres 16777216 Jan 11 15:29 000000010000000000000004.partial

STEP 4)

test=# select now();
          now
-------------------------------
 2019-01-11 15:31:24.201177+01
(1 row)

test=# select count(1) from t1 where col1 like ('1%');
 count
-------
   111
(1 row)

SLEEP two minutes.........

test=# select now();
              now
-------------------------------
 2019-01-11 15:33:24.886394+01
(1 row)

test=# delete from t1 where col1 like '1%';
DELETE 111
test=#

STEP 5)

$ /usr/pgsql-11/bin/pg_ctl stop -D /PGDATA-11/  
waiting for server to shut down....pg_receivewal: not renaming "000000010000000000000004.partial", segment is not complete
pg_receivewal: replication stream was terminated before stop point
pg_receivewal: disconnected
 done
server stopped
[1]+  Exit 1                  /usr/pgsql-11/bin/pg_receivewal -D /home/postgres/BCK-PITR-TEST/ -S test1 -v -n

Note how pg_receivewal stops itself because of parameter "-n" during start.
Let's check the contents in the most recent WAL (XXX4):

 $ /usr/pgsql-11/bin/pg_waldump 000000010000000000000004 000000010000000000000004
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/04000028, prev 0/030000F8, desc: RUNNING_XACTS nextXid 572 latestCompletedXid 571 oldestRunningXid 572
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/04000060, prev 0/04000028, desc: RUNNING_XACTS nextXid 572 latestCompletedXid 571 oldestRunningXid 572
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 0/04000098, prev 0/04000060, desc: CHECKPOINT_ONLINE redo 0/4000060; tli 1; prev tli 1; fpw true; xid 0:572; oid 24576; multi 1; offset 0; oldest xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 572; online
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/04000108, prev 0/04000098, desc: RUNNING_XACTS nextXid 572 latestCompletedXid 571 oldestRunningXid 572
rmgr: Heap        len (rec/tot):     59/  7283, tx:        572, lsn: 0/04000140, prev 0/04000108, desc: DELETE off 1 KEYS_UPDATED , blkref #0: rel 1663/16384/16385 blk 0 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx:        572, lsn: 0/04001DB8, prev 0/04000140, desc: DELETE off 10 KEYS_UPDATED , blkref #0: rel 1663/16384/16385 blk 0
rmgr: Heap        len (rec/tot):     54/    54, tx:        572, lsn: 0/04001DF0, prev 0/04001DB8, desc: DELETE off 11 KEYS_UPDATED , blkref #0: rel 1663/16384/16385 blk 0
...........................................................................................................................................................................
rmgr: Transaction len (rec/tot):     34/    34, tx:        572, lsn: 0/040035E0, prev 0/040035A8, desc: COMMIT 2019-01-11 15:33:31.342763 CET

In the last line, you can see the commit timestamp for the delete: COMMIT 2019-01-11 15:33:31.342763 CET. This is the timestamp we need to avoid to reach, in order to reject the delete statement. So the recovery.conf will be edited with a timestamp sometimes before this one.
After deleting PGDATA and restoring backup files from HOT-backup path into PGDATA, I edited recovery.conf and restarted postgres.

STEPS 6), 7), 8):

$ cp -rp /home/postgres/HOT-BACKUP/* /PGDATA-11/
$ vi recovery.conf

$ cat recovery.conf
restore_command = 'cp -ir /home/postgres/BCK-PITR-TEST/%f %p'
recovery_target_time='2019-01-11 15:31:24.201177+01'

$ /usr/pgsql-11/bin/pg_ctl start -D /PGDATA-11/ &

Look at the server log, I left only important lines:

2019-01-11 15:56:46.873 CET [28768] LOG:  database system was interrupted; last known up at 2019-01-11 15:27:52 CET
2019-01-11 15:56:46.885 CET [28768] LOG:  starting point-in-time recovery to 2019-01-11 15:31:24.201177+01
2019-01-11 15:56:46.911 CET [28768] LOG:  redo starts at 0/3000028
2019-01-11 15:56:46.913 CET [28768] LOG:  consistent recovery state reached at 0/30000F8
2019-01-11 15:56:46.914 CET [28766] LOG:  database system is ready to accept read only connections
2019-01-11 15:56:46.931 CET [28768] LOG:  restored log file "000000010000000000000004" from archive
2019-01-11 15:56:46.951 CET [28768] LOG:  recovery stopping before commit of transaction 572, time 2019-01-11 15:33:31.342763+01
2019-01-11 15:56:46.951 CET [28768] LOG:  recovery has paused
2019-01-11 15:56:46.951 CET [28768] HINT:  Execute pg_wal_replay_resume() to continue.

We can see that the recovery process reached a consistent state at lsn 0/30000F8, which is in WAl file XXX3: following WAL XXX4 has been copied but not applied because of the timestamp I configured in recovery.conf.

Of course the rows deleted are now present into the db, so I executed pg_wal_replay_resume() to bring the db RW again:

test=# select count(1) from t1;
 count
-------
    200
(1 row)

test=# select pg_wal_replay_resume();
 pg_wal_replay_resume
----------------------

(1 row)

Now wal files moved to timeline 2:

$  ls -lrt
total 49156
-rw------- 1 postgres postgres 16777216 Jan 11 15:54 000000020000000000000005
-rw------- 1 postgres postgres 16777216 Jan 11 15:56 000000010000000000000004
drwx------ 2 postgres postgres       42 Jan 11 15:56 archive_status
-rw------- 1 postgres postgres       50 Jan 11 15:59 00000002.history
-rw------- 1 postgres postgres 16777216 Jan 11 15:59 000000020000000000000004

I did a similar test using an archive_command insted of pg_receivewal and obtained identical results. I suggest you to put the db log in debug, and to check with pg_waldump what is inside WAL files. Something strange is happening to you, and it could be interesting catch out WHAT is happening.

Hope this help. MarcoP

Some answers added on 15 Jan.

What should I do In case if everything was successful after recovery?. I'm unsure about these questions:

Creating a new slot2 by executing pg_create_physical_replication_slot('slot2'); and streaming from this slot by pg_receivewal -D /media/extern/postgresql_archive -S slot2 -v?

This could be a choice. It's up to you to decide which techology best fit your requirements. We do not user pg_receivewal at all, we prefer to use an archive_command.

Doing a pg_receivewal -D /media/extern/postgresql_archive -S slot1 -v again?

No, you cant! replication slost are not included in backup by pg_basebackup. If you look at serverlog during basebackup you will see:

2019-01-14 16:32:02.886 CET sending backup "pg_basebackup base backup"     5c3cab72.27fc-12 4/0-0 [10236] DEBUG:  contents of directory "pg_replslot"   excluded from backup

so, when you restore the PGDATA from hot-backup you will not restore any replication slot.

When do I have to create another base backup?

It really depends on your requirements. What are your constraints about Recovery Time Objective and Recovery Point Objective?

How many Wal files your db switches in a single day?

From the doc: Since you have to keep around all the archived WAL files back to your last base backup, the interval between base backups should usually be chosen based on how much storage you want to expend on archived WAL files. You should also consider how long you are prepared to spend recovering, if recovery should be necessary — the system will have to replay all those WAL segments, and that could take awhile if it has been a long time since the last base backup.

When and why do I execute select pg_switch_wal(); in this recovery concept?

This command basically force wal file to shitch to the following one. This grants that the last completed wal is moved to archive dir. In case of disaster where the live PGDATA is lost (and so the pg_wal subdir) this can be important. Wal file are 16MB by default. archive_command copies the wal to the archive destination when switch occurs.

If a disaster happens, let say a single byte before the switch, you lose 16MB of transaction activity. This is why someone prefer pg_receivewal which does not wait for wal completition. BTW you can user archive_timeout to set a value which is compatible with your requirements.

Related Solutions

Postgresql – Recovery using recovery.conf

DISCLAIMER : I am not a PostgreSQL DBA, though I dabble a lot with it.

You probably need to check your timezone in the OS and in psql.

In the OS run this:

[postgres@radarPG-db1 ~]$ date
Fri Jun 17 12:55:37 EDT 2011

In psql, run the following:

postgres=# SELECT EXTRACT(timezone_hour FROM now()),EXTRACT(timezone_minute FROM now());
 date_part | date_part
-----------+-----------
        -4 |         0

postgres=# select now();
              now
-------------------------------
 2011-06-17 12:54:39.195291-04

Many times, some forget to have postgres default the timezone to that of the OS. If the WAL files also contain the timezone in its internal timestamps, you must align the timezone set in postgres with that of the WAL files.

Something else to consider is the following:

The default behavior of recovery is to recover along the same timeline that was current when the base backup was taken. If you want to recover into some child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in 'recovery.conf'. You cannot recover into timelines that branched off earlier than the base backup.

UPDATE 2011-06-20 15:08 EDT

This is right from the online documentation. Please look this over and see if you missed any WAL files along the way. If you waited too long to setup the WAL files on the new server, postgres may have deleted them on startup. You may need to find those WAL files. See if they reside back in pg_xlog folder from the old server.

Recovering using a Continuous Archive Backup

Stop the server, if it's running.
If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you need at the least to copy the contents of the pg_xlog subdirectory of the cluster data directory, as it might contain logs which were not archived before the system went down.
Clean out all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
Restore the database files from your base backup. Be careful that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.
Remove any files present in pg_xlog/; these came from the backup dump and are therefore probably obsolete rather than current. If you didn't archive pg_xlog/ at all, then recreate it, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.
If you had unarchived WAL segment files that you saved in step 2, copy them into pg_xlog/. (It is best to copy them, not move them, so that you still have the unmodified files if a problem occurs and you have to start over.)
Create a recovery command file recovery.conf in the cluster data directory (see Recovery Settings). You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery has worked.
Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode in case of a crash later) and then commence normal database operations.
Inspect the contents of the database to ensure you have recovered to where you want to be. If not, return to step 1. If all is well, let in your users by restoring pg_hba.conf to normal.

PostgreSQL on Ubuntu – Using conf files for recovery

The recovery.conf file needs to go into the data directory (/var/lib/postgresql/...). The other configuration files can stay where they are (/etc/postgresql/...). Then a normal restart should do what you want.

service postgresql stop

Best Answer

Related Solutions

Postgresql – Recovery using recovery.conf

PostgreSQL on Ubuntu – Using conf files for recovery

Related Question