how is it possible for the Slave to have more pg_xlog/ log files than the Master?
The whole point of archiving WAL on the master to some external location is to let the master then delete it to free space in its pg_xlog
, while replicas might still need it.
A replica can have more archives in pg_xlog
than the master, and older ones, if it's lagging behind the master due to failure to keep up with replay. However, with pg_standby
that shouldn't happen - the archive might contain more xlogs, but the replica should only be reading them on-demand.
It's hard to be specific, because you've given a broad description of the issue rather than actual directory listings, and haven't explained the exact steps you followed to set up the replica. Or shown the exact log file output from the replica. So the best I can do is "it sounds like the replica setup is broken somehw".
to resync the servers in warm standby mode: do I have to do pg_basebackup again (to essentially copy Master's /data
and /pg_xlog
directory) to the Slave?
Assuming that here /data
is the main datadir, containing global
, base
, pg_clog
, etc, and that pg_xlog
is the transaction logs from a different disk: Yes, that's right.
You must use the pg_basebackup
command, though, or follow the instructions in the manual for correct file system level copies using pg_start_backup()
and rsync/cp.
You also have to make sure you've stopped the replica first. Overwriting its datadir while it's running will make it quite upset.
Streaming replication vs warm standby
Hot vs warm standby is orthogonal to streaming vs log shipping replication.
What you're trying to do is use log shipping instead of streaming replication. It doesn't matter for this purpose if the replica is a hot standby or a warm standby, i.e. whether or not it's accepting queries.
Personally I recommend using both methods - use streaming, and fall back to log shipping if there's a problem with streaming. PostgreSQL does this automatically if both are configured.
I would suggest having the same wal_level
on all servers, as as it defines the amount of information you will have when WAL is created (on the primary, these files are then streamed to the standbys).
The available wal_level
settings (with each setting writing more information to each WAL, as well as all that of lower levels) from low to high are:
minimal
replica
(older values archive
and hot_standby
are mapped to this)
logical
In this case I would set wal_level = logical
for all. As WAL is only created in one place this won't hurt, and it has the added benefit of keeping working if you ever promote a standby to a master.
You mention hot_standby
in the diagram, this has now been renamed replica
(although both still work).
Other than that this topology should be fine, we use something like this - although you have logical replication from the standby, you'll need to move this to coming from the master. There was a rejected patch to allow this to come from a standby in PostgreSQL 10, hopefully this comes through in PostgreSQL 11.
Best Answer
At least in Postgres 11 you can restart, but
wal_level
does not change if you have slots. You should drop the slots first to do the change (not only subscriptions and publications), losing the replicas.So if you want to change the
wal_level
, first you could list the slots:and then, drop the slots:
for each slot.
Once you drop all slots and change the the
wal_level
toreplica
, you can't make replicas. Evenpg_wal
is flushed when data gets old, maybe a couple of minutes after the drop, it doesn't even wait for the reboot.