Galera cluster into Google cloud platform

cloudgaleragoogle-cloud-sqlpercona

We have a galera cluster with 3 nodes, on 3 different physical machines but all located in the same datacenter.

From what I understood, the reason they deployed this in the past was to increase availability and reliability, DC failures are not a concern. Each node is installed on a VM using 12 cores and 4Gb RAM. (from the monitoring I did we can decrease the amount of cores used to 4)

We are asked to migrate this to Google Cloup Platform in order to get rid of the ops tasks. I could create 3 compute engine instances and deploy the galera cluster, GCP even have a Percona XtraDB Cluster one click deployment service, but I have difficulties to see the added value compared to Cloud SQL instances with replication and backup. I am not very familiar in scaling heavy load systems.

The db hosted in these nodes is kind of critical and should ensure maximum availability and reliability.

What strategy should I adopt in order to migrate this architecture to GCP ?

Best Answer

but I have difficulties to see the added value compared to Cloud SQL instances with replication and backup. I am not very familiar in scaling heavy load systems.

Galera-based clusters like Percona XtraDB Cluster support true active/active multi-master, so failover is seamless because you can actually be writing on any node at any time (tho it's recommended to write only on a single node to avoid performance issues due to lock-conflicts that can happen due to the optimistic locking strategy used by Galera).

This type of cluster can also do distributed fully-synchronous reads so you can scale heavy read loads without problems; One shining example is Magento, which is read-heavy and has many reads that are critical -think money- and traditionally needed to be done on master to guarantee consistency, and here they can be distributed.

Keep in mind that as with any distributed CP system, Galera-cluster will add latency to the writes as it has to verify the write is valid on the other nodes, and hence it's not designed to scale writes; That said: it can usually withstand same write load than a regular master, as long as you keep transactions small (in terms of amounts of rows affected) and short (in regards to time spent by active transaction holding locks).

Also make sure to read list of limitations here: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/limitation.html

What strategy should I adopt in order to migrate this architecture to GCP ?

To decrease downtime you can migrate by setting up the GCP cluster as an asynchronous slave of the current cluster (yes, Galera cluster nodes can act as a master and/or slave in traditional, binlog based, asynchronous replication). To prime the slave cluster you can use XtraBackup, which can take fully lock-less backups from current cluster (make sure to use latest 2.3 XtraBackup) which you can then restore on one of the nodes and allow the other two nodes to perform Snapshot State Transfer. Then simply designate one of the nodes to become the slave, and start replication using binlog coordinates from xtrabackup_binlog_info.

Overview of steps would be:

Enable log-bin, log-slave-updates, server_id on one of the nodes from the current cluster
Take a backup of that node using xtrabackup (make sure to desync the node before: https://www.percona.com/blog/2013/10/08/taking-backups-percona-xtradb-cluster-without-stalls-flow-control/).
Bring up one of the nodes in GCP cluster and restore the backup into it, and bootstrap the cluster with this node. Most likely use cloud storage to feed the backup to the cluster.
Bring up the other two nodes in GCP, one at a time and allow them to perform SST
Setup one of the GCP cluster nodes as async slave to the current cluster (this node also requires log-bin, log-slave-updates, server-id); this node will feed the other nodes with updates
Failover to new cluster once the slave node is caught up

Hope that helps! :)

Full disclosure: am a member of Percona Support team.

Best Answer

Related Solutions

MySQL – Running Database Server Cluster in Docker Containers

Mysql – Transaction speed benchmarks for theSQL v5.6 replication – seems very slow

ANALYSIS

UPDATE 2014-11-04 21:06 EST

SUGGESTION #1

SUGGESTION #2

SUGGESTION #3

SUGGESTION #4

Related Question