I'd configured repmgr replication on node1 and node3 (primary and standby respectively), and the setup worked sucessfully creating new records and objects on standby as expected.
But after some weeks I'd noticed that replication wasn't working anymore, however some repmgr commands are returning results as the replication are working.
I tried to restart and register again the standby node, but it doesn't worked.
How can I continue to replicate?
Here's status of nodes:
-bash-4.2$ psql -V
psql (PostgreSQL) 10.3
NODE1 – PRIMARY
-bash-4.2$ repmgr node check
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending archive ready files)
Downstream servers: OK (this node has no downstream nodes)
Replication slots: OK (node has no replication slots)
-bash-4.2$
NODE3 – STANDBY
-bash-4.2$ repmgr -f /etc/repmgr/10/repmgr.conf node check
Node "node3":
Server role: OK (node is standby)
Replication lag: OK (0 seconds)
WAL archiving: OK (0 pending archive ready files)
Downstream servers: CRITICAL (1 of 1 downstream nodes not attached; missing: node3 (ID: 3))
Replication slots: OK (node has no replication slots)
-bash-4.2$ repmgr node status
Node "node3":
PostgreSQL version: 10.3
Total data size: 2393 MB
Conninfo: host=node3 user=repmgr dbname=repmgr connect_timeout=2
Role: standby
WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
Archive command: /bin/true
WALs pending archiving: 0 pending files
Replication connections: 0 (of maximal 10)
Replication slots: 0 (of maximal 10)
Upstream node: node3 (ID: 3)
Replication lag: 0 seconds
Last received LSN: 4/AC000000
Last replayed LSN: 4/AC000140
Best Answer
You should probably raise your wal limits to keep more files around, also not a bad idea is to set them aside using the archive_command, like this
Raise it high enough for your use case , 256 is just an example here, the paths need adjustments to match your installation.
secondly, use
cluster show
to verify the cluster is healty, it's more clear than to check the node.lastly: Did you register the standby after cloning ? You don't show this in your command list. After the cloning you need to start and then register it
If it already existed in the repmgr.nodes table, add
--force