I think this is normal and expected if your restore_command
is set to something like this example:
restore_command = 'cp /mnt/server/archivedir/%f "%p"'
The manual says that:
At startup, the standby begins by restoring all WAL available in the
archive location, calling restore_command
. Once it reaches the end of
WAL available there and restore_command fails, it tries to restore any
WAL available in the pg_xlog directory. If that fails, and streaming
replication has been configured, the standby tries to connect to the
primary server and start streaming WAL from the last valid record
found in archive or pg_xlog. If that fails or streaming replication is
not configured, or if the connection is later disconnected, the
standby goes back to step 1 and tries to restore the file from the
archive again. This loop of retries from the archive, pg_xlog, and via
streaming replication goes on until the server is stopped or failover
is triggered by a trigger file.
So you can expect to see exactly one restore_command
failure when you start your standby, because PostgreSQL will keep calling it (with incrementing log file names/numbers) until it fails once.
Then it will connect to the primary and start streaming as described above, and as you saw in your logs:
LOG: streaming replication successfully connected to primary
The slave is not guaranteed to be exactly up-to-date with the master, because it could be disconnected from the master for example. In particular, this line:
LOG: consistent recovery state reached at 31/B73624A0
does not mean that "the hot standby contains all the data of the master". However, if you see it followed by this line, as you did:
LOG: database system is ready to accept read only connections
then the database is "ready enough" to start functioning as a read-only standby, as the manual says:
It may take some time for Hot Standby connections to be allowed,
because the server will not accept connections until it has completed
sufficient recovery to provide a consistent state against which
queries can run. During this period, clients that attempt to connect
will be refused with an error message.
In my case, I saw consistent recovery state reached
not followed by database system is ready to accept read only connections
. This turned out to be a problem with an embedded scripting language plugin (plpython2
) having a system-wide startup script (sitecustomize.py
) which did bad things to the PostgreSQL process (enabling faulthandler
and installing a signal handler for SIGUSR2
) which caused it to never enter hot standby mode.
Best Answer
Yes, that advice remains valid.
A low level snapshot of a volume that does atomic snapshots is much like a plug-pull or server crash. When restored from the snapshot, PostgreSQL just does normal recovery where it replays the transaction logs.
It's a perfectly sensible way to take a backup, though I recommend also taking periodic dumps. Snapshot backups won't help you in the face of undetected filesystem corruption etc.