Postgresql – Repmgr : It automatically promotes to new master but other standby stopped

clusteringfailoverhigh-availabilitypostgresqlrepmgr

I'm trying to setup postgre with repmgr and here is the rsult

enter image description here

The scenario is,

  1. i tried to spinup 1 primary and 2 standby.
  2. then i stop the primary so the postgres-2 got promoted.
  3. Unfortunately the postgres-3 got disconnected due to some reason but here is the error log
    enter image description here

It looks like it was able to connect but the postgres restared and didn't come back.
4. I spin up another standby but the master it was pointing is the old one which is postgres-1
That might be the reason why it is saying !running and still primary even if the actual primary is postgres-2.

My question is how can i make other standby not disconnected every time i promoted (automatically due to failure etc.) a new primary?

here is my repmgr.conf

NET_IF=`netstat -rn | awk '/^0.0.0.0/ {thif=substr($0,74,10); print thif;} /^default.*UG/ {thif=substr($0,65,10); print thif;}'`
NET_IP=`ifconfig ${NET_IF} | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'` 

HOSTNAME='postgres-'${my_node}

cat<<EOF > /etc/repmgr.conf
    node_id=${my_node}
    node_name=$HOSTNAME
    conninfo='host=${NET_IP} user=repmgr password=repmgr dbname=repmgr connect_timeout=2'
    data_directory='${PGDATA}'

    log_level=INFO
    log_facility=STDERR
    log_status_interval=300
    
    pg_bindir='/usr/lib/postgresql/10/bin'
    use_replication_slots=1
    
    failover=automatic
    promote_command='repmgr standby promote'
    follow_command='repmgr standby follow -W'
EOF

Also, I'm running using docker extending the official postgres docker image

FROM postgres:10

RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main 10" \
          >> /etc/apt/sources.list.d/pgdg.list

# RUN ln -s /home/postgres/repmgr.conf /etc/repmgr.conf

RUN apt-get update && apt-get install wget -y
RUN apt-get install net-tools -y
RUN apt-get update; apt-get install -y git make postgresql-server-dev-10 libpq-dev postgresql-10-repmgr repmgr-common

#RUN wget -c https://repmgr.org/download/repmgr-5.1.tar.gz -O - | tar -xz

RUN touch /etc/repmgr.conf; \
    chown postgres:postgres /etc/repmgr.conf

ENV PRIMARY_NAME=localhost
ENV REPMGR_USER=repmgr
ENV REPMGR_DB=repmgr
ENV REPMGR_PASSWORD=repmgr

COPY postgresql.replication.conf /tmp/postgresql.replication.conf

COPY scripts/*.sh /docker-entrypoint-initdb.d/

Lastly, here is how i check if should be register as primary or standby

PGHOST=${PRIMARY_NAME}

installed=$(psql -qAt -h ${PGHOST} repmgr -c "SELECT 1 FROM pg_tables WHERE tablename='nodes'")

if [ "${installed}" != "1" ]; then
    echo "Registering as PRIMARY SERVER"
    repmgr primary register
else
    my_node=$(grep node_id /etc/repmgr.conf | cut -d= -f 2)
    is_reg=$(psql -qAt -h ${PGHOST} repmgr -c "SELECT 1 FROM repmgr.nodes WHERE node_id=${my_node}")

    if [ "${is_reg}" != "1" ] && [ ${my_node} -gt 1 ]; then
        echo "Registering as STANDBY SERVER"

        pg_ctl -D ${PGDATA} stop -m fast
        rm -Rf ${PGDATA}/*
        repmgr -h ${PRIMARY_NAME} -d repmgr standby clone --fast-checkpoint
        pg_ctl -D ${PGDATA} start &
        sleep 1
        repmgr -h ${PRIMARY_NAME} -d repmgr standby register    
    fi
fi

here is my update for the postgres.conf


sed -i "s/#*\(shared_preload_libraries\).*/\1='repmgr'/;" ${PGDATA}/postgresql.conf
sed -i "s/#port = 5432/port = 5432/g" ${PGDATA}/postgresql.conf
sed -i "s/#max_wal_senders/max_wal_senders/g"  ${PGDATA}/postgresql.conf
sed -i "s/#wal_level/wal_level/g"  ${PGDATA}/postgresql.conf
sed -i "s/#max_replication_slots/max_replication_slots/g"  ${PGDATA}/postgresql.conf
sed -i "s/#hot_standby/hot_standby/g"  ${PGDATA}/postgresql.conf

sed -i "s/#archive_mode = off/archive_mode = on/g"  ${PGDATA}/postgresql.conf

echo "archive_command = '/bin/true'" >>  ${PGDATA}/postgresql.conf

I'm using postgres:10 and repmgr-5.0

hope someone could help me on this.
Thanks,

Best Answer

I've posted this response on GitHub:

At this point we haven't made any particular provision for repmgr to run in Docker, so it's possible there may be issues of one kind or another.

I also tried adding this

# service_start_command='pg_ctl -D ${PGDATA} start'
# service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
# service_reload_command='pg_ctl -D ${PGDATA} reload'
#service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

but same result.

Did you try adding these items without the leading #? I.e.

service_start_command='pg_ctl -D ${PGDATA} start'
service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
service_reload_command='pg_ctl -D ${PGDATA} reload'
service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

By default, when restarting a node for a standby follow operation, repmgr will stop then start the server using pg_ctl, as pg_ctl restart has proven to be problematic in some environments. However the opposite might be the case here. Either way we strongly recommend using the OS level service commands where available to avoid issues like this (not sure if those would be available here).