PostgreSQL – Do Streaming Replication Slots Need to Match the Number of Replicas?

master-slave-replicationpostgresql-9.6replication

Correct me if I am wrong but I’m thinking that if I have 4 different read only replica servers I need 4 streaming replication slots, one for each of them?
Or all of them can you use one?

Best Answer

Each replication slot requires a unique primary_slot_name in the recovery.conf, so you will need to have 4 connections for a streaming replication slot.

After investigation

Profiles revealed that most time was being spent in DropRelFileNodeBuffers.

That does a linear scan through shared_buffers whenever a relfilenode is deleted - caused by truncate, drop table, cluster, drop index, etc. This must be done during WAL replay, as well as on the main node.

So this suggests that:

Your shared_buffers is probably very big; and
You're probably doing lots of operations that delete relfilenodes

Reducing shared_buffers on the replica may well help.

PostgreSQL – Change AWS Read Replica’s Streaming Replication to Trigger-Based Replication

The problem we are facing is that, that whenever there is a long running query(about 30-40 mins) on Read Replica, it doesn't complete giving a conflict error.

This behavior is controlled by the parameters max_standby_streaming_delay / max_standby_archive_delay. You can fiddle with these parameters in the RDS Parameter Group used by your Read Replica instance to allow more time for queries against your Read Replica to complete.

For this very reason and also due to new requirement, we need to make this replication process daily, i.e. instead of continous updation we need a trigger based approach(something like a daily backup approach).

If you'd like a snapshot of your primary database refreshed nightly, you could do this with a cron job restoring RDS snapshots every night. I don't think RDS has a button to do this automatically for you, but it shouldn't be too hard to script up a nightly create-db-snapshot + restore-db-instance-from-db-snapshot using the AWS CLI, or boto, or whatever interface to AWS you like. You could even maintain a Route53 entry which would always point to the most-recent instance, and leave the old instances lingering for a day or so before being killed off, so that sessions running against existing instances overnight wouldn't be interrupted.

I saw in the AWS's console,there are various parameters we can set. Is there some parameter which I can set to achieve my requirement? Or the only way out for me is to delete the read replica instance and use a trigger based replication tool like Bucardo and use an EC2 instance for this?

Supposedly it is possible to hook up Bucardo to RDS now that RDS Postgres supports the session replication role, but if you want a nightly snapshot I think you'll be much better off using RDS instance snapshots.

Best Answer

Related Solutions

Postgresql – Postgres Streaming Replication lagging, using lots of CPU and little I/O

After investigation

PostgreSQL – Change AWS Read Replica’s Streaming Replication to Trigger-Based Replication

Related Question