Further testing revealed that the different WAL files were mostly those that had been rotated but not yet written to. The new slave would try to archive these (possibly fixed in 9.2) but they wouldn't be used for recovery.
The other problem is the overwriting. We disabled the overwrite (instead storing with a new filename so we at least keep the WAL just in case) and everything works... both failover and later restore.
NOTE: my earlier answer here that was completely wrong. See the edit log if you need to see how wrong.
As of >= 1.3.1
Barman supports backup from a standby replica (concurrent_backup
). Barman config, e.g. /etc/barman.d/standby.conf
looks like this:
[standby]
description = "Replica of main PostgreSQL DB"
ssh_command = ssh postgres@db02
conninfo = host=db02 user=postgres
backup_options = concurrent_backup
streaming_conninfo = host=db02 user=postgres
streaming_archiver = on
If your master is running on PostgreSQL <= 9.5 you'd have to install pgespresso extension (there are binary packages e.g. for Debian from PGDG APT repos). PostgreSQL 9.6 introduced native streaming API, there's no need for special extension.
On standby server make sure to configure archive_command
:
wal_level = hot_standby
archive_mode = on
archive_command = 'rsync -a %p barman@backup:/var/lib/barman/standby/incoming/%f'
the incoming directory should match
barman:~$ barman diagnose | grep incoming_wals_directory
Also on standby server update pg_hba.conf
(where 10.0.0.3
is ipaddress of barman server):
host all postgres 10.0.0.3/32 trust
And enable WAL files streaming:
barman~$ barman receive-wal standby
You can check your configuration using:
barman:~$ barman check standby
Server standby:
PostgreSQL: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 1 backups, expected at least 0)
ssh: OK (PostgreSQL server)
pgespresso extension: OK
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK
Then you should be ready to run full backup:
barman:~$ barman backup standby
Best Answer
You might want to try to set up
max_wal_senders
on your replica, and use--xlog-method=stream
instead of--xlog-method=fetch
, which would let the basebackup grab the individual WAL records as they are shipped, rather than trying to grab the WAL segments at the end of the basebackup. If you have to use--xlog-method=fetch
, then you should setwal_keep_segments
high enough to let you take a reasonable basebackup.checkpoint_segments
x2 should be reasonable as a starting point.As a reference, the documentation for the 9.3 version of pg_basebackup is here.
I highly recommend that you test out the backup by restoring it on another server, and running a few tests against it to make sure it comes up in a consistent state, and has a reasonable snapshot of the data that you wanted backed up.
One additional note, if you set up an
archive_command
under 9.3, and arestore_command
in yourrecovery.conf
, your streaming replica will be able to recover itself and continue if network conditions cause streaming replication to lag. An example and discussion of this problem and a solution is talked about here: http://evol-monkey.blogspot.com/2014/09/offsite-replication-problems-and-how-to.htmlI hope this helps. =)