You're correct. If the slaves are in sync to the same point on Alpha, START SLAVE UNTIL
would serve no purpose.
Effectively, they're having you do is stop the slave on Gamma so that it is at the same point in Alpha's binlog as Beta. In a live, high traffic condition, that can be slightly trickier to do, so I assume that's their motivation for suggesting that approach.
If Gamma and Beta have identical Relay_Master_Log_File
and Exec_Master_Log_Pos
values, then they are at the same point relative to Alpha, and you can safely switch Gamma to Beta at Beta's current master log file and position.
However... to state the (presumably) obvious, you can't be doing any writing on Beta (other than via replication) around the time you make the cut, because if you're writing to Beta, then Gamma would miss any events that hadn't replicated from Beta to Gamma via Alpha... but the START SLAVE UNTIL
wouldn't help you with that, either.
This means you need to actually verify that all three servers are in agreement in all possible directions, so there's nothing outstanding from Beta back to Alpha. Alpha's Exec_Master_Log_Pos
(SHOW SLAVE STATUS
) needs to match Beta's Position
(SHOW MASTER STATUS
) and both Gamma and Beta need to match Alpha and each other in the opposite direction (and of course the actual names of the binlogs need to match, not just the positions).
If you want to be super safe and paranoid, verify that everything is in agreement, then issue FLUSH TABLES WITH READ LOCK;
on both Alpha and Beta (leaving the connections open to hold that global lock, which is yours once the prompt returns), then verify that everything remains in agreement at the same binlog positions with nothing written to either master server's binlog that the other hasn't already played back. Stop, re-home and restart replication on Gamma, verifying that it is connected and waiting for incoming events. Finally, verify agreement has persisted in each direction between Alpha and Beta once again, then UNLOCK TABLES;
on Alpha and Beta.
Note that above, I never stopped the slave on Beta. It's not really necessary if you have a global read lock, since the slave is stalled and binlog coordinates can't advance under that condition, but you could stop it if desired, so long as the coordinates continue to match Alpha.
I would say that you benefit from future proofing your environment, because inevitably you will want to add more slaves.
I also don't like having to remember specific coordinates of what statements have applied/haven't when doing a restore. Having the server know the current set of applied statements is very useful to prevent duplicate processing/rogue corruption. In 5.7 mysqlbinlog even has a --idempotent
mode.
Operationally GTIDs are much easier to manage, and are the future :)
I recommend row-based replication as well - I think it's a great option:
http://www.tocker.ca/2013/09/04/row-based-replication.html
Best Answer
I think your project is realistic.
I suggest a migration path below who can minimising the database downtime and make things smooth.
For this explanation, I named the Current database server
PrdOld
, the next database Master (RW) serverPrd01
and the next database Slave (RO)Prd02
.As replication from 5.6 to 5.7 is supported, put your current MySql 5.6 database (PrdOld) as a replication master
see at
log_bin
,relay_log
,sync_binlog
,server_id
,expire_logs_days
,binlog_cache_size
,max_binlog_size
my.cnf variablesdon't search to enable GTID now
format=ROW
andinnodb_file_per_table
are advisablePrepare Prd01 as slave of PrdOld and master of Prd02
Prepare Prd02 as slave of Prd01
So we'll have a replication chain:
Do a full mysqldump (all replicated databases) of PrdOld
Write down
File
andPosition
fromSHOW MASTER STATUS;
commandFor a simpler process I choose to enable replication right now from Prd01 to Prd02 but it's not mandatory, it depend of your database (size is a factor but not the only one) and activity (load, ...).
Restore your dump on Prd01 witch is propagated to Prd02 (it may generate some slave lag during this step)
Enable replication between PrdOld and Prd01
You have now three live and synced databases.
Observe for a while how things are working. And decide quietly when you want to switch the application db connection context to prd01. Then cut PrdOld to Prd01 replication mechanism and shutdown definitively PrdOld.
For the db HA aspect between Prd01 and Prd02 (in case of failure), choose to give some virtual IP to each db server (as IPmaster and IPslave) of use a load balancer in front of them (HAProxy, NGinx, ...).
Enable master-master replication (known also circular replication) between Prd01 and Prd02
Enable GTID mechanism
log_slave_updates
,gtid_mode
,enforce_gtid_consistency
variablesNotice: You have to watch carefully the
Seconds_Behind_Master
indicator (during dump restore then after during production) inSHOW SLAVE STATUS\G
. Big number is not necessary a problem is it doesn't stay for a while ...