MySQL Replication and High Availability Guide

high-availabilityMySQLreplication

I know this might not be a very good question to be asked in this site, but I really need some helps/hints to be directed to the right path.

I want to have failover strategy for 2 MySQL servers which are in Master-Master Replication mode. I have done the replication, and it's working just fine. Yet, when in comes to the failover strategy, I'm kind of clueless…

I've been reading on the internet over the past couple of days (I know this is a very complex procedure, and I could not master it in a week or so), but the more I read the more confuse I get. Basically, as I understood, I need to have a software to monitor the network and report any failure and another one to switch between two systems. Heartbeat apparently is the best in doing so, but in all tutorials I read, they are using many other software ( like MON for monitoring, or other software for dumping or getting backups) as well. Is it necessary to have them? Can one have fail-over strategy only using MySQL and HA tool like Heartbeat?

Each website is introducing different tool and ways, which they have their own pros and cons, and of course it depends on how I want my "cluster" to work, and how much high availability is important to my system. I am just considering pointing it out to my customers, and it's not going to the production phase any soon, since my customers running small businesses and don't care about loosing data that much, however, I wanted to dig into the concept by setting up a simple master-master replication where in case of failure of any servers, clients automatically commit changes to the standby server.

Best Answer

Given the fact that you mentioned you have Master-Master replication mode, I would not recommend any automatic failover unless you properly account for Replication Lag. After all, MySQL Replication is asynchronous.

It is theoretically possible to have the following:

  • DBServer1 as Master to DBServer2
  • DBServer2 as Master to DBServer1
  • DBVIP pointing at DBServer1
  • DBServer2 is 180 seconds behind
  • DBServer1 goes down
  • Automatic Failover moves DBVIP to DBServer2

With this scenario, DBServer2 could have auto increment keys that do not exist yet. Upon failover, the DBVIP will allow WebServers to connect to DBServer2 and ask for data that does not exist yet.

This would therefore require background processes running on each DBServer.

For the above scenario:

  • DBVIP is on DBServer1
  • DBServer1 runs HeartBeat
  • DBServer2 runs HeartBeat
  • Background Process on DBServer1 to monitor
    • a) Data Mount Availability
    • b) Data Mount Writeability
    • c) MySQL Connectivity
    • Once a,b, or c fail, kill HeartBeat

Background Process on DBServer2 to make sure DBVIP is pingable

What should killing HeartBeat do? Trigger the startup script defined for it.

What should the startup script on DBServer2 look for?

  • Loop until DBVIP is unreachable via ping
  • Connect to MySQL and
    • Run SHOW SLAVE STATUS\G in a Loop until Seconds_Behind_Master is NULL
    • RUn SHOW SLAVE STATUS\G in a Loop until `Exec_Master_Log_Pos stops changing
  • Assign DBVIP to DBServer2 via ip addr add

This is essentailly the algorithm for failing over safely to a Passive Master in a Master/Master Replication Cluster.

ALTERNATIVE

If ALL your data is InnoDB, I recommend something with less rigor. Perhaps you should look into using DRBD and HeartBeat. Here is why:

DRBD provides network RAID-1 for a Block Device on two servers.

You would essentially do this:

  • Have DBServer1's DRBD Block Device as Primary
  • Have DBServer2's DRBD Block Device as Secondary
  • Mount DBServer1's DRBD Device on /var/lib/mysql
  • Startup MySQL on DBServer1
  • Have HeartBeat Monitor Ping Activity Between Servers

What would startup script look like in a DRBD scenario?

  • Loop until DBVIP is unreachable via ping
  • Kill HeartBeart
  • Disconnect DRBD
  • Promote DRBD to Primary
  • Mount DRBD on /var/lib/mysql
  • Start MySQL (InnoDB Crash Recovery Fills in Missing Data)
  • Assign DBVIP via ip addr add

This is a lot more straightforward because only one side is Active. The Passive side (DRBD Secondary) is a Synchronous Disk Copy of the Active Side (DRBD Primary).

CAVEAT

If all or most of the working set data is MyISAM, do not touch DRBD. Crash scnearios quickly result in MyISAM tables being marked crashed and need auto-repair (which can be paintfully slow to wait for).

UPDATE 2012-12-29 08:00 EDT

Here are my past posts on using DRBD with MySQL