I'm trying to setup postgre with repmgr and here is the rsult
The scenario is,
- i tried to spinup 1 primary and 2 standby.
- then i stop the primary so the
postgres-2
got promoted. - Unfortunately the
postgres-3
got disconnected due to some reason but here is the error log
It looks like it was able to connect but the postgres restared and didn't come back.
4. I spin up another standby but the master it was pointing is the old one which is postgres-1
That might be the reason why it is saying !running
and still primary even if the actual primary is postgres-2
.
My question is how can i make other standby not disconnected every time i promoted (automatically due to failure etc.) a new primary?
here is my repmgr.conf
NET_IF=`netstat -rn | awk '/^0.0.0.0/ {thif=substr($0,74,10); print thif;} /^default.*UG/ {thif=substr($0,65,10); print thif;}'`
NET_IP=`ifconfig ${NET_IF} | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'`
HOSTNAME='postgres-'${my_node}
cat<<EOF > /etc/repmgr.conf
node_id=${my_node}
node_name=$HOSTNAME
conninfo='host=${NET_IP} user=repmgr password=repmgr dbname=repmgr connect_timeout=2'
data_directory='${PGDATA}'
log_level=INFO
log_facility=STDERR
log_status_interval=300
pg_bindir='/usr/lib/postgresql/10/bin'
use_replication_slots=1
failover=automatic
promote_command='repmgr standby promote'
follow_command='repmgr standby follow -W'
EOF
Also, I'm running using docker extending the official postgres docker image
FROM postgres:10
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main 10" \
>> /etc/apt/sources.list.d/pgdg.list
# RUN ln -s /home/postgres/repmgr.conf /etc/repmgr.conf
RUN apt-get update && apt-get install wget -y
RUN apt-get install net-tools -y
RUN apt-get update; apt-get install -y git make postgresql-server-dev-10 libpq-dev postgresql-10-repmgr repmgr-common
#RUN wget -c https://repmgr.org/download/repmgr-5.1.tar.gz -O - | tar -xz
RUN touch /etc/repmgr.conf; \
chown postgres:postgres /etc/repmgr.conf
ENV PRIMARY_NAME=localhost
ENV REPMGR_USER=repmgr
ENV REPMGR_DB=repmgr
ENV REPMGR_PASSWORD=repmgr
COPY postgresql.replication.conf /tmp/postgresql.replication.conf
COPY scripts/*.sh /docker-entrypoint-initdb.d/
Lastly, here is how i check if should be register as primary or standby
PGHOST=${PRIMARY_NAME}
installed=$(psql -qAt -h ${PGHOST} repmgr -c "SELECT 1 FROM pg_tables WHERE tablename='nodes'")
if [ "${installed}" != "1" ]; then
echo "Registering as PRIMARY SERVER"
repmgr primary register
else
my_node=$(grep node_id /etc/repmgr.conf | cut -d= -f 2)
is_reg=$(psql -qAt -h ${PGHOST} repmgr -c "SELECT 1 FROM repmgr.nodes WHERE node_id=${my_node}")
if [ "${is_reg}" != "1" ] && [ ${my_node} -gt 1 ]; then
echo "Registering as STANDBY SERVER"
pg_ctl -D ${PGDATA} stop -m fast
rm -Rf ${PGDATA}/*
repmgr -h ${PRIMARY_NAME} -d repmgr standby clone --fast-checkpoint
pg_ctl -D ${PGDATA} start &
sleep 1
repmgr -h ${PRIMARY_NAME} -d repmgr standby register
fi
fi
here is my update for the postgres.conf
sed -i "s/#*\(shared_preload_libraries\).*/\1='repmgr'/;" ${PGDATA}/postgresql.conf
sed -i "s/#port = 5432/port = 5432/g" ${PGDATA}/postgresql.conf
sed -i "s/#max_wal_senders/max_wal_senders/g" ${PGDATA}/postgresql.conf
sed -i "s/#wal_level/wal_level/g" ${PGDATA}/postgresql.conf
sed -i "s/#max_replication_slots/max_replication_slots/g" ${PGDATA}/postgresql.conf
sed -i "s/#hot_standby/hot_standby/g" ${PGDATA}/postgresql.conf
sed -i "s/#archive_mode = off/archive_mode = on/g" ${PGDATA}/postgresql.conf
echo "archive_command = '/bin/true'" >> ${PGDATA}/postgresql.conf
I'm using postgres:10
and repmgr-5.0
hope someone could help me on this.
Thanks,
Best Answer
I've posted this response on GitHub:
At this point we haven't made any particular provision for
repmgr
to run in Docker, so it's possible there may be issues of one kind or another.Did you try adding these items without the leading
#
? I.e.By default, when restarting a node for a
standby follow
operation,repmgr
will stop then start the server usingpg_ctl
, aspg_ctl restart
has proven to be problematic in some environments. However the opposite might be the case here. Either way we strongly recommend using the OS level service commands where available to avoid issues like this (not sure if those would be available here).