I wrote up an interesting layout last year which features DRBD pairs in two data centers (DC1,DC2) with as follows
- DRBD Pair in DC1 (db1 and db2)
- DBVIP for Primary of DRBD Pair1 is 10.1.2.30
- DRBD Pair in DC2 (db3 and db4)
- DBVIP for Primary of DRBD Pair2 is 10.1.2.40
- Have MySQL Circular Replication Between DRBD Primaries
- Have the 10.1.2.40 as Master_Host for DBRD Pair 1
- Have the 10.1.2.30 as Master_Host for DBRD Pair 2
MySQL high availability, failover and replication with Latency
MySQL Replication : 1 Slave / Multiple Masters
Here is why I suggested this: Using two data centers, you setup automatic failover for DRBD Pair in one data center. Let the other DRBD Pair in the second data center be for DB disaser site with it own local redundancy and failover. Should you ever loses one data center, the other data center is fully read with it own local failover setup. Your app would just have to use the DBVIP of the other database center in such a catastrophic case.
Please keep in mind that using DRBD in conjunction with MySQL is only beneficial if all of your data uses the InnoDB Storage Engine. Hard failovers in DRBD could easily result in crashed MyISAM tables.
Here is another setup to consider:
As with DRBD setups, any DRBD Secondary would provide just a Disk-Level copy of your MySQL Folder. It is available as a warm standby. MySQL is not being run on the DRBD Secondary. If you want the third DB server to become hot standby, ditch DRBD altogether and use pure MySQL Replication. With three DB servers, using db3 at a remote site, simply setup the following:
+--> db1 --> db2 --> db3 -->+
^ |
| V
+<--------------------------+
Using your rudimentary failover, now you have two hot standby servers. You just have to make sure each DB server has a unique server_id value. I also recommend using MySQL 5.5 because it uses SemiSync Replication which is more sensitive to communication failures and stop Replication better. You will have to setup the appropriate heartbeats and timeouts.
Semisync Replication will switch to async on the fly if needed.
I mentioned this before in my old post MySQL Replication : 1 Slave / Multiple Masters
In that post I said
Having this setup gives you block-level replication within both data centers. If you are using MySQL 5.5, you can have Semisynchronous Replication send SQL to the other DataCenter without waiting for the SQL to be executed, only acknowledged. This keeps any intermittency that standard MySQL Asynchronous Replication would normally cause a MySQL Master to a minimum. In the event of any sudden network latency, MySQL Replication between DataCenters would switch from Semisync to Async. Once conditions improve, then MySQL Replication would it switch back to Semisynch.
In another old post of mine, With MySQL Replication, what level of resilience is possible?, I mentioned the option that shows that semisynch can switch to asynch and back:
rpl_semi_sync_master_wait_no_slave : With semisynchronous replication, for each transaction, the master waits until timeout for acknowledgment of receipt from some semisynchronous slave. If no response occurs during this period, the master reverts to normal replication. This variable controls whether the master waits for the timeout to expire before reverting to normal replication even if the slave count drops to zero during the timeout period.
Therefore, you cannot setup semisync and async replication on the same instance of MySQL. Then again, you don't need to since it is designed to toggle between them depending on response time or lack of response time.
BTW to setup Semisync Replication, the semisync plugin activated on Master and Slave. See my post Is MySQL Replication Affected by a High-Latency Interconnect? on how to install Semisync Replication.
Best Answer
Check out Basic Availability Groups. These are available in Standard Edition (and Developer) and offer exactly the functionality you're after.
You have two replicas, a Primary and Secondary. The client application connects to a shared listener name, and read and Write workload between the client application and database occurs on the primary replica through this listener. Data changes are synchronised to the secondary replica behind the scenes.
When the primary replica fails, if you have configured automatic failover then the availability group moves to the secondary replica, and the read/write workload begins working off the secondary replica. As this still occurs through the listener name, no application change is required.
After failover, the secondary replica becomes the new primary and once the old primary is online again, it becomes the new secondary. Data synchronisation resumes, but in the other direction to ensure that you can failover back to the original primary at some point if needed.
Check out this Docs article for an overview of Availability Groups. Note that Basic AGs function just like normal AGs but for Standard Edition only, so they have some limitations.