Postgresql – unable to start the standby postgres database

postgresqlreplicationrestorestandbywrite-ahead-logging

Situation

I have 2 separate instances. Both running on Ubuntu 14.04 server.

Both have installed postgres 9.5 the following way:

sudo apt-get install software-properties-common python-software-properties
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
apt-get -y install postgresql-9.5

Version is 9.5.10

Then I named the instance representing master database flanders and the instance representing standby willie

I tested both instances on postgres user with psql All good.

I downloaded the tar file of omnipitr-1.3.3 on both instances because version 2.0 is meant for postgres 10.

in the /etc/postgresql/9.5/main/postgresql.conf of flanders I set the following:

wal_level = hot_standby
archive_mode = on
archive_command = '/var/lib/postgresql/omnipitr-1.3.3/bin/omnipitr-archive -D /var/lib/postgresql/9.5/main -l /var/log/omnipitr/archive-^Y-^m-^d.log -dr willie-postgres:/var/lib/postgresql/master_wal_archive -s /var/lib/postgresql/.omnipitr/ "%p"'
archive_timeout = 120
max_wal_senders = 2

The omnipitr-archive script successfully sends the WAL file every 2 mins to willie using the postgres and leaves them inside /var/lib/postgresql/master_wal_archive

Now I then use pg_basebackup to backup the master database at flanders then I stopped the postgres to stop the continuous archiving every 2 mins

in flanders I run

postgres@flanders:~$ pg_basebackup -D backup -Ft -z -P -x

This creates a base.tar.gz inside /var/lib/postgresql/backup

I then use rsync to send the base file across to willie.

I stopped the postgres in willie as root using service postgresql stop. Then I run the following as postgres user:

rm -rf /var/lib/postgresql/9.5/main/*

tar -xvC /var/lib/postgresql/9.5/main -f /var/lib/postgresql/base.tar.gz

went back as root user and turned on postgres using service postgresql start

Went as postgres user and psql and yes the database data is now an exact mirror as the master database.

So far so good.

Proceeding with restore

Now here's when I start to have issues.

I stopped the postgres and went to /var/lib/postgresql/9.5/main and created a recovery.conf

Inside this recovery.conf I have the following only.

standby_mode = on
restore_command='/var/lib/postgresql/omnipitr-1.3.3/bin/omnipitr-restore -l /var/log/omnipitr/restore.log -s /var/lib/postgresql/master_wal_archive/ %f %p'

which i take from https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-restore.pod#minimal-setup

Now I go to root user and start postgres up. It took longer than usual but I get an OK sign.

Then I tried to go to postgres and psql. I now have fatal error about the database starting up.

I then checked the postgres log file I get the following:

2018-02-10 17:34:53.185 CST [3896] LOG:  could not bind IPv6 socket: Cannot assign requested address
2018-02-10 17:34:53.185 CST [3896] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2018-02-10 17:34:53.197 CST [3897] LOG:  database system was shut down at 2018-02-10 17:30:10 CST
2018-02-10 17:34:53.198 CST [3897] LOG:  entering standby mode
2018-02-10 17:34:53.679 CST [3900] [unknown]@[unknown] LOG:  incomplete startup packet
2018-02-10 17:34:54.184 CST [3903] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:54.690 CST [3906] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:55.197 CST [3909] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:55.703 CST [3912] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:56.209 CST [3915] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:56.716 CST [3918] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:57.222 CST [3921] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:57.728 CST [3924] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:58.235 CST [3927] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:58.741 CST [3930] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:59.248 CST [3933] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:59.250 CST [3934] [unknown]@[unknown] LOG:  incomplete startup packet

So now I realise I need to turn on hot_standby inside the /etc/postgresql/9.5/main/postgresql.conf

 hot_standby = on

Then when I now run service postgresql start as root, I didn't get an OK sign but I did get

2018-02-10 18:05:15.024 CST [4127] LOG:  could not bind IPv6 socket: Cannot assign requested address
2018-02-10 18:05:15.024 CST [4127] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2018-02-10 18:05:15.037 CST [4128] LOG:  database system was interrupted while in recovery at 2018-02-10 18:03:55 CST
2018-02-10 18:05:15.037 CST [4128] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
2018-02-10 18:05:15.518 CST [4129] [unknown]@[unknown] LOG:  incomplete startup packet
2018-02-10 18:05:15.924 CST [4128] LOG:  entering standby mode
2018-02-10 18:05:15.924 CST [4128] LOG:  database system was not properly shut down; automatic recovery in progress
2018-02-10 18:05:15.926 CST [4128] WARNING:  WAL was generated with wal_level=minimal, data may be missing
2018-02-10 18:05:15.926 CST [4128] HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
2018-02-10 18:05:15.926 CST [4128] FATAL:  hot standby is not possible because wal_level was not set to "hot_standby" or higher on the master server
2018-02-10 18:05:15.926 CST [4128] HINT:  Either set wal_level to "hot_standby" on the master, or turn off hot_standby here.
2018-02-10 18:05:15.927 CST [4127] LOG:  startup process (PID 4128) exited with exit code 1
2018-02-10 18:05:15.927 CST [4127] LOG:  aborting startup due to startup process failure

So where did I go wrong with getting the standby server to be up?

UPDATE

As suggested by one of the answers, I have tried adding a recovery.conf into /var/lib/postgresql/9.5/main and then starting the postgres as user root.

Same result.

Best Answer

You should not have started the database and then stopped it before creating the recovery.conf file. Once you start the database without a recovery.conf, it opens for write usage, or tries to, and then is no longer eligible to replay more log files. (Although I don't understand why it led to the exact error message about wal_level=minimal).

Create the recovery.conf immediately after the tar -x.