Postgresql – Replication Lag on Postgres AWS RDS Service

awspostgresqlpostgresql-9.3replication

We have a single master/single streaming replica Postgres 9.3 db on AWS. The load is not terribly high – this is a development/staging environment. (The production shows similar metrics). Point is the "ReplicaLag" shown in Cloudwatch oscillates wildly during the day between 0 and 200 seconds. I've changed max_wal_senders from 5 to 10 with no change.

Any suggestions for diagnosing this?

Replica Lag

(This is a t2.small master and t2.small replica, however the production instance is large, and exhibits the same issue. CPU is < 2%, connections < 60, iops only about 5-15 count/second).

Best Answer

I had the same issue with our staging RDS. After quite a while I found a blog post that says:

In a very busy database, with many writes per second, this number will remain fairly accurate. However, in a system where there are few writes, the "replication_delay" will continually grow because the last replayed transaction timestamp isn't increasing.

To test this I wrote a simply script which updates a random record every 0.5s and the lag went to 0.

From the AWS docs under Read Replica Limitations with PostgreSQL -

A PostgreSQL Read Replica reports a replication lag of up to five minutes if there are no user transactions occurring on the source DB instance