PostgreSQL 9.6 – Troubleshooting Master-Slave Replication Issues

master-slave-replicationpostgresqlpostgresql-9.6replication

I tried to create a master/slave configuration with two servers and Postgres 9.6

I modified master and slave postgresql.conf in /etc/postgresql/9.6/main
Same for master and slave:

data_directory = '/srv/postgresql' # Because we wanted to store data elsewhere
wal_level = replica
synchronous_commit = local
archive_mode = on   
archive_command = 'cd .'
max_wal_senders = 2
wal_keep_segments = 10
synchronous_standby_names = 'pgslave001'

master-specific :

listen_addresses = ' <slave_ip>, localhost'

slave-specific :

listen_addresses = ' <master_ip>, localhost'

I then added a line on pg_hba.conf on master :

host    replication     rep     <slave_ip>/32  md5

And same on slave :

host    replication     rep     <master_ip>/32  md5

And then I restarted each postgresql services

I created a role rep on master as you can see below :

Created role
I removed existing /var/lib/postgresql/9.6/main repository on slave and then executed those commands :

service postgresql stop
pg_basebackup -D /var/lib/postgresql/9.6/main -h <master_ip> -U rep

And got this answer:

NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

I then created a recovery.conf file on /var/lib/postgresql/9.6/main on slave

standby_mode = 'on'
primary_conninfo = 'host=<master_ip> port=5432 user=rep   
password=<rep_password> application_name=pgslave001'
restore_command = 'cp /var/lib/postgresql/9.6/main/archive/%f %p'
trigger_file = '/tmp/postgresql.trigger.5432'

I then restarted slave postgresql and got those logs:

LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
[unknown]@[unknown] LOG:  incomplete startup packet

Last one is a bit strange but it seems to be 'normal'

but replication seems not to work, as if I create a table on master, it's not replicated

And command :
psql -x -c "select * from pg_stat_replication;"

Gives me (0 rows)

I followed many tutorials but every time it's the same result.
My master db got some data (around 10 Go), maybe it's due to that?
What step did I miss? Is it due to an odd interaction with the fact that my data_dir is on /srv ?

My servers are on debian 9

Thanks for help, and don't hesitate to ask me for more details if needed! As I'm a beginner with Postgres, maybe I missed important informations

Best Answer

You told it look for the data in '/srv/postgresql'. But then you told pg_basebackup to clone the data into '/var/lib/postgresql/9.6/main', not '/srv/postgresql'.

When you started postgresql up again, you (apparently) started up the database instance whose data is still in '/srv/postgresql', which is not the clone you had just made.

Also, turning "archive_mode" on but then providing it with a dummy "archive_command" is not helpful and can only cause confusion and mis-reported errors. And "pg_basebackup" has a -R option for writing out the recovery.conf itself, it would probably be best to use that.