PostgreSQL – How to Determine if Hot Standby is Fully Mirrored

postgresqlpostgresql-9.2replication

I have set up a hot standby of a PostgreSQL server. It all seems to be working, but I just want to be sure that I'm not missing something. In /var/lib/pgsql/9.2/data/pg_log/postgresql-Wed.log I have the following:

LOG:  creating missing WAL directory "pg_xlog/archive_status"
cp: cannot stat `/var/lib/pgsql/9.2/wal/00000002.history': No such file or directory
LOG:  entering standby mode
cp: cannot stat `/var/lib/pgsql/9.2/wal/0000000200000031000000B4': No such file or directory
LOG:  streaming replication successfully connected to primary
LOG:  redo starts at 31/B47BFAC0
LOG:  consistent recovery state reached at 31/B73624A0
LOG:  database system is ready to accept read only connections

I am concerned about the missing WAL files. Can anyone confirm that, as long as it reaches a consistent state, the hot standby contains all the data of the master?

Everything else I check indicates that it's ok; for example, running psql -x -c "select * from pg_stat_replication;" on the master looks good, and adding a new record on the master replicates. I just want to be sure that there won't be anything missing from the slave.

Best Answer

I think this is normal and expected if your restore_command is set to something like this example:

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

The manual says that:

At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

So you can expect to see exactly one restore_command failure when you start your standby, because PostgreSQL will keep calling it (with incrementing log file names/numbers) until it fails once.

Then it will connect to the primary and start streaming as described above, and as you saw in your logs:

LOG:  streaming replication successfully connected to primary

The slave is not guaranteed to be exactly up-to-date with the master, because it could be disconnected from the master for example. In particular, this line:

LOG:  consistent recovery state reached at 31/B73624A0

does not mean that "the hot standby contains all the data of the master". However, if you see it followed by this line, as you did:

LOG:  database system is ready to accept read only connections

then the database is "ready enough" to start functioning as a read-only standby, as the manual says:

It may take some time for Hot Standby connections to be allowed, because the server will not accept connections until it has completed sufficient recovery to provide a consistent state against which queries can run. During this period, clients that attempt to connect will be refused with an error message.

In my case, I saw consistent recovery state reached not followed by database system is ready to accept read only connections. This turned out to be a problem with an embedded scripting language plugin (plpython2) having a system-wide startup script (sitecustomize.py) which did bad things to the PostgreSQL process (enabling faulthandler and installing a signal handler for SIGUSR2) which caused it to never enter hot standby mode.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Postgresql – Streaming Replication in PostgreSQL

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Postgresql – Streaming Replication in PostgreSQL

Related Question