Postgresql – Streaming Replication failing from standby with connection issue

postgresqlpostgresql-9.5replication

I'm setting up streaming replication between a primary and standby server.

On the standby server, when I start up the instance I see that replication isn't taking place. Looking at the log, I see the following:

2017-02-02 20:37:03 UTC [45618-1] LOG:  database system was shut dow
n in recovery at 2017-02-02 20:36:50 UTC
2017-02-02 20:37:03 UTC [45618-2] LOG:  entering standby mode
cp: cannot stat '/etc/replarchive/000000010000000000000005': No such file or directory
2017-02-02 20:37:03 UTC [45618-3] LOG:  invalid record length at 0/5000098
2017-02-02 20:37:03 UTC [45623-1] FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "10.0.0.5", user "replication", SSL on
        FATAL:  no pg_hba.conf entry for replication connection from host "10.0.0.5", user "replication", SSL off

That's a pretty obvious error message, could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "10.0.0.5", user "replication", SSL on.

But on the primary server, pg_hba.conf indeed has the following entry:

host    all     replication 10.0.0.5/32     md5

I can't seem to wrap my head around why the standby server is failing a connection with that in place.

So I tried to connect from the standby server to the primary server via psql:

# this is on the standby server (10.0.0.5) connecting to the primary server (10.0.0.4)
psql myrepl replication -h10.0.0.4

Putting in my password when psql prompts me to, I successfully connect to the myrepl database on the primary server.

What am I missing here? Why is recovery on the standby not able to connect to the primary server when it appears obvious that the connection is indeed possible?

Thank you in advance!

Best Answer

On the primary server, in pg_hba.conf replace your entry with the below:

host    all  all  10.0.0.5/32  trust

Now reload the conf (for example, doing pg_ctl reload or doing a SELECT pg_reload_conf(); in the DB itself).

On the standby, check recovery.conf: it should contain the following:

standby_mode='on'
primary_conninfo=' host=10.0.0.5'
restore_command = 'cp /var/lib/postgresql/9.1/main/archive/%f %p'
recovery_target_timeline = 'latest'

Now restart the slave machine.

And if you are still unable to connect, check replication user attributes - replication should be set there. Use \du to check this when using psql.