I wrote up an interesting layout last year which features DRBD pairs in two data centers (DC1,DC2) with as follows
- DRBD Pair in DC1 (db1 and db2)
- DBVIP for Primary of DRBD Pair1 is 10.1.2.30
- DRBD Pair in DC2 (db3 and db4)
- DBVIP for Primary of DRBD Pair2 is 10.1.2.40
- Have MySQL Circular Replication Between DRBD Primaries
- Have the 10.1.2.40 as Master_Host for DBRD Pair 1
- Have the 10.1.2.30 as Master_Host for DBRD Pair 2
MySQL high availability, failover and replication with Latency
MySQL Replication : 1 Slave / Multiple Masters
Here is why I suggested this: Using two data centers, you setup automatic failover for DRBD Pair in one data center. Let the other DRBD Pair in the second data center be for DB disaser site with it own local redundancy and failover. Should you ever loses one data center, the other data center is fully read with it own local failover setup. Your app would just have to use the DBVIP of the other database center in such a catastrophic case.
Please keep in mind that using DRBD in conjunction with MySQL is only beneficial if all of your data uses the InnoDB Storage Engine. Hard failovers in DRBD could easily result in crashed MyISAM tables.
Here is another setup to consider:
As with DRBD setups, any DRBD Secondary would provide just a Disk-Level copy of your MySQL Folder. It is available as a warm standby. MySQL is not being run on the DRBD Secondary. If you want the third DB server to become hot standby, ditch DRBD altogether and use pure MySQL Replication. With three DB servers, using db3 at a remote site, simply setup the following:
+--> db1 --> db2 --> db3 -->+
^ |
| V
+<--------------------------+
Using your rudimentary failover, now you have two hot standby servers. You just have to make sure each DB server has a unique server_id value. I also recommend using MySQL 5.5 because it uses SemiSync Replication which is more sensitive to communication failures and stop Replication better. You will have to setup the appropriate heartbeats and timeouts.
You're correct. If the slaves are in sync to the same point on Alpha, START SLAVE UNTIL
would serve no purpose.
Effectively, they're having you do is stop the slave on Gamma so that it is at the same point in Alpha's binlog as Beta. In a live, high traffic condition, that can be slightly trickier to do, so I assume that's their motivation for suggesting that approach.
If Gamma and Beta have identical Relay_Master_Log_File
and Exec_Master_Log_Pos
values, then they are at the same point relative to Alpha, and you can safely switch Gamma to Beta at Beta's current master log file and position.
However... to state the (presumably) obvious, you can't be doing any writing on Beta (other than via replication) around the time you make the cut, because if you're writing to Beta, then Gamma would miss any events that hadn't replicated from Beta to Gamma via Alpha... but the START SLAVE UNTIL
wouldn't help you with that, either.
This means you need to actually verify that all three servers are in agreement in all possible directions, so there's nothing outstanding from Beta back to Alpha. Alpha's Exec_Master_Log_Pos
(SHOW SLAVE STATUS
) needs to match Beta's Position
(SHOW MASTER STATUS
) and both Gamma and Beta need to match Alpha and each other in the opposite direction (and of course the actual names of the binlogs need to match, not just the positions).
If you want to be super safe and paranoid, verify that everything is in agreement, then issue FLUSH TABLES WITH READ LOCK;
on both Alpha and Beta (leaving the connections open to hold that global lock, which is yours once the prompt returns), then verify that everything remains in agreement at the same binlog positions with nothing written to either master server's binlog that the other hasn't already played back. Stop, re-home and restart replication on Gamma, verifying that it is connected and waiting for incoming events. Finally, verify agreement has persisted in each direction between Alpha and Beta once again, then UNLOCK TABLES;
on Alpha and Beta.
Note that above, I never stopped the slave on Beta. It's not really necessary if you have a global read lock, since the slave is stalled and binlog coordinates can't advance under that condition, but you could stop it if desired, so long as the coordinates continue to match Alpha.
Best Answer
Set up slave
You can set it up from the nightly backups. That's actually a common practice to provision slaves from backups to also have them tested. You can go back as far as you have binlogs stored on the master (expire_log_days variable).
The impact of having a slave starting from older backup is that it has to fetch the logs from the master but that shouldn't be a heavy operation. Unless your master is already on the edge you shouldn't really notice it. The other impact is that it will take longer for the server to catchup which is of course expected.
Switchover
Switchover can have some impact if you use auto-increment IDs and your passive master is not fully caught up with replication. Best is to make the old active read-only for a second or two while checking the passive for master position. Once there is no change you can move change it to be the active and make it writeable.
Depending how you point to the active master (DNS switch for example) it might take some time during which time both masters will accept writes and the old active being read-only will report errors.
I hope this helped.