Current (true at least in PostgreSQL 9.4 and older) PostgreSQL releases have single-threaded WAL recovery.
This means that replay of the write-ahead log occurs in only one recovery worker, and is thus able to benefit less from I/O concurrency than a normal running master. This can result in WAL replay lagging behind in cases where the replica and master have similar hardware, even when you'd expect the master (which is generally also under more load) to be the slower server.
(It'd be great to improve this, but wanted it enough to do the work yet, and nobody seems keen to fund it at the moment).
That said, your case with very high CPU use doesn't seem to fit with issues of low I/O concurrency. In your situation I'd be attaching gdb
and seeing what the replay proess was doing, or using perf top
to examine what the system as a whole is up to, then dig deeper with perf
once I had some clues. I'd also be looking closely at iotop
, vmstat
, iostat
, the PostgreSQL logs, dmesg
, etc.
After investigation
Profiles revealed that most time was being spent in DropRelFileNodeBuffers
.
That does a linear scan through shared_buffers
whenever a relfilenode is deleted - caused by truncate
, drop table
, cluster
, drop index
, etc. This must be done during WAL replay, as well as on the main node.
So this suggests that:
- Your
shared_buffers
is probably very big; and
- You're probably doing lots of operations that delete relfilenodes
Reducing shared_buffers
on the replica may well help.
The problem we are facing is that, that whenever there is a long running query(about 30-40 mins) on Read Replica, it doesn't complete giving a conflict error.
This behavior is controlled by the parameters max_standby_streaming_delay / max_standby_archive_delay. You can fiddle with these parameters in the RDS Parameter Group used by your Read Replica instance to allow more time for queries against your Read Replica to complete.
For this very reason and also due to new requirement, we need to make this replication process daily, i.e. instead of continous updation we need a trigger based approach(something like a daily backup approach).
If you'd like a snapshot of your primary database refreshed nightly, you could do this with a cron job restoring RDS snapshots every night. I don't think RDS has a button to do this automatically for you, but it shouldn't be too hard to script up a nightly create-db-snapshot + restore-db-instance-from-db-snapshot using the AWS CLI, or boto, or whatever interface to AWS you like. You could even maintain a Route53 entry which would always point to the most-recent instance, and leave the old instances lingering for a day or so before being killed off, so that sessions running against existing instances overnight wouldn't be interrupted.
I saw in the AWS's console,there are various parameters we can set. Is there some parameter which I can set to achieve my requirement? Or the only way out for me is to delete the read replica instance and use a trigger based replication tool like Bucardo and use an EC2 instance for this?
Supposedly it is possible to hook up Bucardo to RDS now that RDS Postgres supports the session replication role, but if you want a nightly snapshot I think you'll be much better off using RDS instance snapshots.
Best Answer
Each replication slot requires a unique
primary_slot_name
in therecovery.conf
, so you will need to have 4 connections for a streaming replication slot.