MySQL High Availability – Failover and Replication with Latency

ArchitectureclusteringMySQLreplication

We are in the process of implementing a new CMS (Drupal 6.x) that runs on MySQL. We have two data centers — primary and secondary — with known latency between them. We're unsure which version of MySQL we will run … either Community or Enterprise, but that is a TBD. Looks like we will be running the InnoDB engine, OS is going to be RedHat EL 5.5 The primary servers are going to be active, while the secondary will be passive or hot stand-bys.

I would like to implement replication, high availability and automated failover in MySQL across the two data centers.

After a failover to the secondary servers, when we fail-back to the primary servers, we'd like to have data synched from the secondary DB to the primary DB quickly and completely so that we can continue to serve content from the primary servers.

I am interested in knowing what technologies/tools/best practices can be used to solve/address these issues. Also, any gotchas or ah-ha moments would be much appreciated as well. I've read up on MySQL replication, clustering, and at some 3rd party tools like Tungsten and Dolphinics, but I a unsure what the is best course of action.

Thank you for your time!

KM

Best Answer

For the sake of simplicity, I recommend MySQL Circular Replication only. Here is why:

There are many technologies and topologies that are far superior to MySQL Circular Replication. My favorite, hands down, is DRBD (Distributed Replicated Block Device). However, DRBD works great when the Server Pair is in the same bulding, data center, and rack. It's even better when using a crossover cable in the 192.168.x.x. subnet between the DRBD Primary and DRBD Secondary. Unfortunately, DRBD has horrible performance over a distance between two locations, although DRBD can still work. There are no network topologies around to give you the satisfactory DRBD performance needed between two datacenters.

Once you setup MySQL Circular Replication between the two DB servers in two different data centers, the only tuning needed is for the network. In essence, the replication performance is a function of network settings (speed/latency of binary log transmission in MySQL Replication Setup) and disk I/O (DRBD).

An alternative you may want for better redundancy is the following for the sake of example:

Setup a DRBD Pair in both locations
DRBD Pair in Site #1 with VIP 111.111.111.111
DRBD Pair in Site #2 with VIP 222.222.222.222

Setup MySQL Circular Replication between the DRBD Primary Servers under these conditions:
For site #1, use 222.222.222.222 as the Master_Host in MySQL
For site #2, use 111.111.111.111 as the Master_Host in MySQL

Although introducing a level of complexity, now you have two levels of redundancy: DRBD within each site and MySQL Circular Replication between sites. You have the additional benefits of running backups via mysqldump on the DRBD Primary of the hot standby server.

As for failover, DRBD provides automatic failover at any one site.

Only in the event that a datacenter is totally unavailble would you using the DB VIP at the hot standby site.

UPDATE

I just did a double take and noticed that you are using Drupal6. I am glad you will be converting all the drupal tables to InnoDB. This will remove any chance of MyISAM table updates causing table locks to freeze DB Connections that are simply reading MyISAM tables. Any DML update (INSERTs, UPDATEs, DELETEs) against a MyISAM table WILL ALWAYS DO A FULL TABLE LOCK !!! Using InnoDB will introduce row-level locking, which eliminates full table locks.

In addition, DRBD becomes your friend when everything is InnoDB because crash recovery will be consistent between the DRBD Pair. Contrawise, DRBD with MyISAM buys you nothing because a crashed MyISAM table on the DRBD Primary is simply duplicated to the DRBD Secondary as, you guessed it, a crashed MyISAM table.

UPDATE #2

You should use two levels of redundancy

Level 1 : At each database center, use DRBD.
http://dev.mysql.com/doc/refman/5.1/en/ha-drbd.html

Set up a pair of DB Servers
Startup DRBD
Startup MySQL on the DRBD Primary

This creates redundant data at the disk level.

Level 2 : You should setup MySQL Circular Replication between
the DRBD Primary of DataCenter#1 and the DRBD Primary of DataCenter#2

Each DRBD Primary will be running MySQL and will act
as both Master and Slave to Each Other

I have setup for clients topologies like this and I consider it quite stable.