PostgreSQL 9.4 – Resolving Timeline Issues


We've had to move around the master a bit. It started on server 01 got moved to 02 (which was a slave) We need to move it again so we built 04 and trying to slave it off 02 and getting the following errors.

2018-02-25 17:00:08 UTC FATAL:  highest timeline 3 of the primary is behind recovery timeline 4
2018-02-25 17:00:13 UTC FATAL:  highest timeline 3 of the primary is behind recovery timeline 4
2018-02-25 17:00:18 UTC FATAL:  highest timeline 3 of the primary is behind recovery timeline 4

The initial dump happened like

pg_basebackup --verbose --progress -d "host=10.132.x.x user=backup password=...." -D /var/lib/postgresql/9.4/main/ -l 'instance restore' --xlog-method=stream

recovery file looks like

restore_command = 'if [ -f /srv/postgresql/archive/${DATASET}/%f ]; then cp /srv/postgresql/archive/${DATASET}/%f %p; else aws s3 cp --quiet s3://company-backups/postgresql/${DATASET}/archive/%f %p; fi'
standby_mode = 'on'
primary_conninfo = 'host=10.132.x.x user=backup password=....'
recovery_target_timeline = 'latest'
trigger_file = '/var/lib/postgresql/9.4/main/failover'

Best Answer

As it sounds, you are in a split brain situation. The original master (01) was never stopped from being master, and after the promotion of 02, it became just another master.

Fixing such issues pre-9.5 is not so easy (at that version pg_rewind became an element of the PostgreSQL ecosystem) - you will need some manual cleanup, most probably. What is certain is if you got writes to 01 after promotion of 02, they will be lost (or the writes on 02, depending what you choose to do).

I'd take a logical dump from both 01 and 02 to start (to check if there is anything that has to be manually replayed from 01 to 02), stop 01 altogether, remove the older timelines WAL segments from the archive (well, you can move them somewhere else just in case) and then try to build a slave based on 02 again.

You can also use pg_xlogdump to see which relations (tables, indexes, etc.) got writes since the split brain started. (Note that from version 10 the utility name is pg_waldump.)