Postgresql – How to prevent read replica restarts during high replication lag

aws-aurorapostgresqlreplication

We're running an Aurora PostgreSQL cluster, with a single read replica along with the master node.

Periodically, there is very heavy write load which causes high replication lag. This can cause the read replica to restart which is undesirable for us in a high availability environment. When this happens, clients that are connected to the cluster via the read-only endpoint get this JDBC error: org.postgresql.util.PSQLException: FATAL: the database system is starting up. Additionally, the AWS console shows these peppered throughout the logs:

Read replica has fallen behind the master too much. Restarting postgres.

followed by

DB instance restarted

We can tolerate the read replica being behind by several minutes, but we can't tolerate the read replica restarting to catch up.

Is there a way to prevent the read replica from restarting during these periods?

Alternatively, are there any recommended tweaks for reducing replication lag during periods of heavy write load?

Best Answer

I think this is what they would call "working as designed". It is stated in the documentation for Aurora MySQL (emphasis mine):

The tradeoff with having multiple Aurora Replicas is that replicas become unavailable for brief periods when the underlying database instances are restarted. These restarts can happen during maintenance operations, or when a replica begins to lag too far behind the master. Restarting a replica interrupts existing connections to the corresponding database instance.

The same I expect holds true for Aurora PostgreSQL, because the replication implementation is likely very similar, if not identical.

The reason for this behaviour I believe is explained by the way the changes are propagated from the write instance to the read replicas: the redo records are sent, and the replicas are expected to apply them to the appropriate locally cached pages in the right sequence -- see slide 27 of this presentation.

While the cache update processing should be fast, it can still overwhelm the replica, particularly if it's running on a less capable AWS instance. Once the replica loses track of what cache pages are stale and what aren't, it has no other choice than to start from scratch. I presume the original solution designers elected to use the existing instance startup process for that, instead of developing a new cache invalidation and reload mechanism, especially given that the restart can happen relatively quickly.

In other words, the answer to is there a way to prevent the read replica from restarting seems to be "no".

As to are there any recommended tweaks for reducing replication lag, since the lag depends entirely on the replica's CPU and memory capacity (as no I/O is involved), try scaling up your replica, permanently or at least during these heavy updates.

If you have control over the client application code, you could modify it to handle broken connections more gracefully and retry when needed. If you don't, you could try setting up a proxy (e.g. pgpool) between your clients and the read replica, it might relieve the pain to some extent by proactively testing and reëstablishing connections.

If you find neither of these suggestions workable, you can also contact the AWS support and see if they have any better ideas.

CHOICE #1 : Percona XtraDB Cluster

I am currently evaluating it and I think it is brilliantly designed for MultiMaster writes. It can use mysqldump (default), rsync, and xtrabackup (preferred) for initializing new Cluster node. You have total freedom and power. This may be the greatest cliche of all time but WITH GREAT POWER, THEIR MUST ALSO ALWAYS BE GREAT RESPONSIBILITY (19:16 - 19:25 of the Video).

You ultimately become responsible for

sizing memory requirements and disk configuration for InnoDB
remembering that DDL/DML on MyISAM is not replicated in the Galera Write Set Replicator Libraries. Since GRANT commands is storage-engine neutral, MyISAM table in the mysql schema is handled with no problem. Any DML against mysql.user is not replicated.
adding provisioning new Cluster Nodes for Reads/Writes

CHOICE #2 : Amazon RDS

Amazon RDS makes MySQL Database Cloud Services a snap. You must spend some time deploying Servers with one of 7 server models. By default, all InnoDB log files are 128M. Here are the only options that are unique to each Server Model:

MODEL      max_connections innodb_buffer_pool_size
---------  --------------- -----------------------
t1.micro   34                326107136 (  311M)
m1-small   125              1179648000 ( 1125M,  1.097G)
m1-large   623              5882511360 ( 5610M,  5.479G)
m1-xlarge  1263            11922309120 (11370M, 11.103G)
m2-xlarge  1441            13605273600 (12975M, 12.671G)
m2-2xlarge 2900            27367833600 (26100M, 25.488G)
m2-4xlarge 5816            54892953600 (52350M, 51.123G)

You are not given SUPER privilege and there is no direct access to my.cnf. In light of this, in order to change my.cnf options for startup, you must first create a MySQL-based DB Parameter Option List and use the RDS CLI (Command Line Interface) to change the desired Options. Then, you must do this to import the new options:

Create a Custom DB Parameter Group (call it MySettings)
Download RDS CLI and setup a config file with your AWS Credentials
Execute the following : ./rds-modify-db-parameter-group MySettings --parameters "name=whateveroption,value=whatevervalue,method=immediate"
Modify using DB Parameter Option List MySettings
Restart the MySQL RDS Instance

As for scaling out to data centers, you have the option to create read replicas. Since the default storage engine is InnoDB, making a read replica becomes seamless because data can be sync'd to Slaves without interrupting the Master.

Higher Server Models means you can have more Memory, more IOPs. Don't forget the cliche I mentioned because when it comes to Amazon RDS, with GREAT POWER COMES GREAT MONEY.

Mysql – How to configure AWS Aurora to separate write/read operations

AFAIK, you're right that AWS RDS Aurora (a MySQL 5.6 fork) does not support automatic or transparent read/write splitting: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Aurora.html

In order to do that in a way that's completely transparent to the application, you would need an intermediate proxy. Your application would then always connect to the proxy, the proxy would then have to do packet inspection to examine each incoming query to determine if it's read-write, which then gets forwarded on to the master, or read-only, which can then get forwarded to any of N replicas.

Be aware that this has some notable implications: 1. This means that the proxy needs to understand the MySQL protocol 2. It needs to inspect each packet (query) and determine if it's RW or RO 3. It then needs to forward the query to the appropriate backend MySQL instance 4. It likely needs to keep track of each connection, maintaining a map of front-end connections between your app and the proxy, and backend connections to the mysqld instances. The front-end connection would remain stable, but the backend connection could change for each query. 5. You can potentially have some state issues as a result. For example, when you start an explicit transaction, create temporary tables, or set session variables in your connection... those could get lost when (transparently) switching backends. 6. This will have an impact on SSL and other security measures, as you are explicitly using a MITM 7. All of this typically adds quite a bit of overhead, and you will typically see noticeable query latency because of it.

This is a feature that we hope to have in the MySQL Router (the replacement for the old MySQL Proxy: http://mysqlhighavailability.com/mysql-router-on-labs-the-newest-member-of-the-mysql-family/), but we do not yet. It takes a lot of time and effort to do it properly, so as to minimize the effects noted above. One such proxy that does support that today is ProxySQL: http://www.proxysql.com (See the "Read write split" section)

You can grab the source and start playing with it here: https://github.com/renecannao/proxysql

Good luck!

Best Answer

Related Solutions

Mysql – Scaling Percona datacenters: setup and replication

CHOICE #1 : Percona XtraDB Cluster

CHOICE #2 : Amazon RDS

Mysql – How to configure AWS Aurora to separate write/read operations

Related Question