Postgresql – unable to start the standby postgres database

postgresqlreplicationrestorestandbywrite-ahead-logging

Situation

I have 2 separate instances. Both running on Ubuntu 14.04 server.

Both have installed postgres 9.5 the following way:

sudo apt-get install software-properties-common python-software-properties
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
apt-get -y install postgresql-9.5

Version is 9.5.10

Then I named the instance representing master database flanders and the instance representing standby willie

I tested both instances on postgres user with psql All good.

I downloaded the tar file of omnipitr-1.3.3 on both instances because version 2.0 is meant for postgres 10.

in the /etc/postgresql/9.5/main/postgresql.conf of flanders I set the following:

wal_level = hot_standby
archive_mode = on
archive_command = '/var/lib/postgresql/omnipitr-1.3.3/bin/omnipitr-archive -D /var/lib/postgresql/9.5/main -l /var/log/omnipitr/archive-^Y-^m-^d.log -dr willie-postgres:/var/lib/postgresql/master_wal_archive -s /var/lib/postgresql/.omnipitr/ "%p"'
archive_timeout = 120
max_wal_senders = 2

The omnipitr-archive script successfully sends the WAL file every 2 mins to willie using the postgres and leaves them inside /var/lib/postgresql/master_wal_archive

Now I then use pg_basebackup to backup the master database at flanders then I stopped the postgres to stop the continuous archiving every 2 mins

in flanders I run

postgres@flanders:~$ pg_basebackup -D backup -Ft -z -P -x

This creates a base.tar.gz inside /var/lib/postgresql/backup

I then use rsync to send the base file across to willie.

I stopped the postgres in willie as root using service postgresql stop. Then I run the following as postgres user:

rm -rf /var/lib/postgresql/9.5/main/*

tar -xvC /var/lib/postgresql/9.5/main -f /var/lib/postgresql/base.tar.gz

went back as root user and turned on postgres using service postgresql start

Went as postgres user and psql and yes the database data is now an exact mirror as the master database.

So far so good.

Proceeding with restore

Now here's when I start to have issues.

I stopped the postgres and went to /var/lib/postgresql/9.5/main and created a recovery.conf

Inside this recovery.conf I have the following only.

standby_mode = on
restore_command='/var/lib/postgresql/omnipitr-1.3.3/bin/omnipitr-restore -l /var/log/omnipitr/restore.log -s /var/lib/postgresql/master_wal_archive/ %f %p'

which i take from https://github.com/omniti-labs/omnipitr/blob/master/doc/omnipitr-restore.pod#minimal-setup

Now I go to root user and start postgres up. It took longer than usual but I get an OK sign.

Then I tried to go to postgres and psql. I now have fatal error about the database starting up.

I then checked the postgres log file I get the following:

2018-02-10 17:34:53.185 CST [3896] LOG:  could not bind IPv6 socket: Cannot assign requested address
2018-02-10 17:34:53.185 CST [3896] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2018-02-10 17:34:53.197 CST [3897] LOG:  database system was shut down at 2018-02-10 17:30:10 CST
2018-02-10 17:34:53.198 CST [3897] LOG:  entering standby mode
2018-02-10 17:34:53.679 CST [3900] [unknown]@[unknown] LOG:  incomplete startup packet
2018-02-10 17:34:54.184 CST [3903] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:54.690 CST [3906] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:55.197 CST [3909] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:55.703 CST [3912] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:56.209 CST [3915] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:56.716 CST [3918] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:57.222 CST [3921] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:57.728 CST [3924] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:58.235 CST [3927] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:58.741 CST [3930] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:59.248 CST [3933] postgres@postgres FATAL:  the database system is starting up
2018-02-10 17:34:59.250 CST [3934] [unknown]@[unknown] LOG:  incomplete startup packet

So now I realise I need to turn on hot_standby inside the /etc/postgresql/9.5/main/postgresql.conf

 hot_standby = on

Then when I now run service postgresql start as root, I didn't get an OK sign but I did get

2018-02-10 18:05:15.024 CST [4127] LOG:  could not bind IPv6 socket: Cannot assign requested address
2018-02-10 18:05:15.024 CST [4127] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2018-02-10 18:05:15.037 CST [4128] LOG:  database system was interrupted while in recovery at 2018-02-10 18:03:55 CST
2018-02-10 18:05:15.037 CST [4128] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
2018-02-10 18:05:15.518 CST [4129] [unknown]@[unknown] LOG:  incomplete startup packet
2018-02-10 18:05:15.924 CST [4128] LOG:  entering standby mode
2018-02-10 18:05:15.924 CST [4128] LOG:  database system was not properly shut down; automatic recovery in progress
2018-02-10 18:05:15.926 CST [4128] WARNING:  WAL was generated with wal_level=minimal, data may be missing
2018-02-10 18:05:15.926 CST [4128] HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
2018-02-10 18:05:15.926 CST [4128] FATAL:  hot standby is not possible because wal_level was not set to "hot_standby" or higher on the master server
2018-02-10 18:05:15.926 CST [4128] HINT:  Either set wal_level to "hot_standby" on the master, or turn off hot_standby here.
2018-02-10 18:05:15.927 CST [4127] LOG:  startup process (PID 4128) exited with exit code 1
2018-02-10 18:05:15.927 CST [4127] LOG:  aborting startup due to startup process failure

So where did I go wrong with getting the standby server to be up?

UPDATE

As suggested by one of the answers, I have tried adding a recovery.conf into /var/lib/postgresql/9.5/main and then starting the postgres as user root.

Same result.

Best Answer

You should not have started the database and then stopped it before creating the recovery.conf file. Once you start the database without a recovery.conf, it opens for write usage, or tries to, and then is no longer eligible to replay more log files. (Although I don't understand why it led to the exact error message about wal_level=minimal).

Create the recovery.conf immediately after the tar -x.

Related Solutions

PostgreSQL: Unable to run repmgr cloned database

I searched around and realized that the problem was due to not enabling hot_standby on the standby server. I wrote a corrected, updated and simplified version of the above article as an odt for future personal reference. Here is the text in its entirety (converted to html from odt using openoffice.org) for those who are interested in setting up a single read-only clone of a database using repmgr:

Single Slave Streaming Replication with PostgreSQL

Introduction

This guide aims to quickly help you configure a PostgreSQL 9.1 server with a database, and have it replicated to a slave that can be used for read-only queries. There is no concept of failover involved here, and the slave will only have a read-only copy of the master's data.

1. Tools Needed

A stable GNU/Linux Distribution (Recommended OS: CentOS 6.2 x86-64)
PostgreSQL 9.1 – you may install from here: http://yum.postgresql.org/ (also make sure that the following packages or their equivalnets are installed)
postgresql
postgresql-client
postgresql-contrib
postgresql-server
postgresql-server-dev
2 workstations – a master that runs the primary database and a slave that runs the replicated read-only database (for the purpose of this document, their respective IP addresses have been replaced with pgmaster and pgslave so make those changes in /etc/hosts of both machines if you want to follow the following instructions word for word)

2. Installation Check and Password Creation

Run /etc/init.d/postgresql start on both systems to check if PostgreSQL is functional or not.
Run /etc/init.d/postgresql stop on both systems to stop PostgreSQL. We will not be using PostgreSQL till we finish some configuration related tasks.
Set a password for user postgres on both the systems. This user by default has no password, but we need a password to help create an SSH tunnel between pgmaster and pgslave.
Run sudo passwd postgres on both systems and type a new Unix password for both.

3. SSH Tunnel Creation

On the pgmaster do the following:
su postgres
ssh-keygen -t rsa (press enter at every prompt)
ssh-copy-id -i ~/ssh/id_rsa.pub pgslave (you need to enter pgslave's postgres password)
ssh pgslave and see if you are able to login without password
Repeat the above steps on pgslave
su postgres
sh-keygen -t rsa
ssh-copy-id -i ~/ssh/id_rsa.pubaster
ssh pgmaster and see if you are able to login without password
Make sure you log out from the remote machine after you finish checking connectivity

4. Editing postgresql.conf on pgmaster

You need to make the following changes in the file postgresql.conf that resides in the configuration directory inside /etc/postgresql/ on your machine pgmaster:

listen_addresses = '*'
wal_level = hot_standby
checkpoint_segments=30
archive_mode=on
archive_command='cd .'
max_wal_senders=2
wal_keep_segments=5000

5. Editing postgresql.conf on pgslave

You need to make the following change in the file postgresql.conf that resides in the configuration directory inside /etc/postgresql/ on machine pgslave:

hot_standby=on

6. Editing pg_hba.conf on pgmastger

You need to make the following changes in the file pg_hba.conf that resides in the configuration directory inside /etc/postgresql/ on machine pgmaster:

host all all 192.168.5.0/24 trust
host replication all 192.168.5.0/24 trust

7. Adding PostgreSQL bin folder to PATH

There are a bunch of nifty postgresql utilities we will be using here, so lets set the PATH variable so that it knows where to find them

Execute locate pgbench
The output will be something like /usr/lib/postgresql/9.1/bin/pgbench
Excluding final word pgbench, copy the above path and add it to your path variable.
Execute nano ~/.bashrc
add line export PATH+=:/usr/lib/postgresql/9.1/bin/ (or whatever location the locate command revealed)
Close the file and save changes
You may need to log out and login (or open a new shell) for changes to take effect.

8. Loading pgmaster's PostgreSQL server with dummy data

On pgmaster start the PostgreSQL server first: /etc/init.d/postgresql start
We create a test database and load it with some dummy data with the following commands:
1. su postgres
2. createdb pgbench
3. pgbench -i -s 10 pgbench
Alternatively open the database pgbench yourself, create a sample table and insert sample data into it.

9. Erasing pgslave's data and checking pgmaster connectivity

We are going to erase data directory of pgslave, so execute following on that machine:
First stop PostgreSQL server /etc/init.d/postgresql stop
Move into the PostgreSQL default data directory folder: cd /var/lib/pgsql/data (or to the data directory that is default for your installation)
Empty the directory completely with rm -rf *
Now execute psql -h pgmaster -d pgbench and see if you are able to access the database on pgmaster through pgslave.

10. Installing repmgr

Though single slave replication is fairly straightforward, we are using a tool called repmgr to help make the process even more simpler. Here is how you install it:

Grab repmgr source code from http://projects.2ndquadrant.it/sites/default/files/repmgr-1.1.0.tar.gz and copy it to /tmp for installation.
Have the following packages installed to ensure that compiling repmgr is possible:
make
gcc
postgresql-devel
libxslt-devel
pam-devel
libopenssl-devel
krb5-devel

Extract the downloaded archive and enter the source code folder.

Compile with make USE_PGXS=1

Install with make USE_PGXS=1 install

Execute below two commands to check if repmgr is installed correctly:

repmgr --version

repmgrd --version

11. Cloning pgmaster onto pgslave using repmgr

Execute su postgres and login as the postgres user on pgslave
Run this to clone database: repmgr -D /var/lib/pgsql/data -d pgbench -p 5432 -R postgres --verbose standby clone pgmaster
Note that the -D parameter /var/lib/pgsql/data should be replaced with the appropriate location of the data folder on pgslave
Also note that 5432 is the default port PostgreSQL runs in.
When the command finishes executing (it is going to take several seconds to finish if you had used pgbench to insert random data) you may start PostgreSQL on pgslave with /etc/init.d/postgresql start

12. Testing streaming replication

Congratulations! You have successfully configured pgslave to copy pgmaster's database via streaming replication. Note that due to being in continuous recovery mode, pgslave can only be used to execute read-only queries and no insertions or modifications are possible.

Insert some values into sample_table on pgmaster.
execute select * from sample_table; on pgslave's psql prompt after connecting to pgbench database.
If everything worked properly, you should be able to view newly added tupples on the output from pgslave.

Conclusion

We just learned how to quickly configure a PostgreSQL database to act as a read-only mirror to another database. Note that in case the slave goes offline, it will automatically recover data from the master and would be soon up-to-date. Streaming replication is at database level, so you need to execute the command in part 11for each new database that needs to be replicated via streaming replication.

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?