Before you perform any mysqldump to fully restore a Slave, you should consult the output of SHOW SLAVE STATUS\G
. Let's start with a sample SHOW SLAVE STATUS\G
:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.48.20.253
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000254
Read_Master_Log_Pos: 858190247
Relay_Log_File: relay-bin.066069
Relay_Log_Pos: 873918
Relay_Master_Log_File: mysql-bin.000254
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 858190247
Relay_Log_Space: 873772
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
Please notice that there are two sets of replication coordinates from the Master
- (Master_Log_File,Read_Master_Log_Pos)
- (Relay_Master_Log_File,Exec_Master_Log_Pos)
There is a major difference between them
(Master_Log_File,Read_Master_Log_Pos)
tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs.
(Relay_Master_Log_File,Exec_Master_Log_Pos)
tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs THAT IS NEXT TO BE EXECUTED ON THE SLAVE.
The timestamps from these two coordinates helps you figure out Seconds_Behind_Master
.
Knowing these things, here is what you can do:
- Step 01) Run
SHOW SLAVE STATUS\G
- Step 02) Get Relay_Master_Log_File,Exec_Master_Log_Pos from
SHOW SLAVE STATUS\G
(In the sample, that would be (mysql-bin.000254,858190247)
- Step 03)
STOP SLAVE;
- Step 04)
CHANGE MASTER TO master_log_file='mysql-bin.000254',master_log_pos=858190247;
- Step 05)
START SLAVE;
- Step 06) Wait 10 seconds
- Step 07) Run
SHOW SLAVE STATUS\G
and check Seconds_Behind_Master
If the Seconds_Behind_Master
is a number and eventaully drops to zero, replication is fully reesatablished.
After doing all this, if replication breaks because of a corrupt binary log from the master, then you do the last resort:
- Steo 01) On the Master,
RESET MASTER;
to erase all binary logs and start with a new one
- Step 02) On the Master, run this
This create proper dump for the slave
echo "STOP SLAVE;" > /root/MySQLData.sql
mysqldump --all-databases --routines --triggers --flush-privileges --master-data=1 >> /root/MySQLData.sql
echo "START SLAVE;" >> /root/MySQLData.sql
- Step 03) scp /root/MySQLData.sql over to the Slave and Load it in MySQL on the Slave
Give it a Try !!!
Look at the Slave Status
Master_Log_File: .001746 <<---------------
Read_Master_Log_Pos: 123599834 <<-------------
Relay_Log_File: XXXXXXX.000014
Relay_Log_Pos: 290086986
Relay_Master_Log_File: .001733 <<---------------
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 740137529 <<-------------
Relay_Log_Space: 13632228696 <<-----------
Until_Condition: None
You have 13GB worth of transactions yet to execute
- Since each binlog is 1GB,
1746
- 1733
is 13
* 1G = 13G
Relay_Log_Space
is also 13G
(sum of all relay log filesizes)
If replication never broke over the past 23 hours, you need to find out if any large transactions were executed in the last 24 hours.
If you performed a LOAD DATA INFILE
on the Master within the last 24 hours, it replicated to the Slave. In fact, the entire file in represented in the relay logs. My guess is that the file must be about 13G.
This is most likely what happened: Once you run LOAD DATA INFILE
and it was being replicated, Seconds_Behind_Master
remained 0 until the entire input file was represented as binlog event chunks spread throughout the relay logs. Then, suddenly, without warning, Seconds_Behind_Master
shot right up to thousands of seconds. That time displacement would have been the time it took to completely replicate the file from the Master's Binary Logs to the Slave's Relay Logs, extract it from the Relay Logs, and store it as a CSV on disk on the Slave. That time does not even take into account the time it takes for LOAD DATA INFILE
to be executed on the Slave.
I wrote about this scenario before
UPDATE 2014-08-07 16:36 EDT
Based on your comments, if you want to following the steps for setting up sync_binlog
, please proceed as follows:
STEP 01) On the Slave, run STOP SLAVE;
STEP 02) On the Master, add sync_binlog=1 to /etc/my.cnf:
[mysqld]
sync-binlog=1
STEP 03) Run one of the following on the Master:
SET GLOBAL sync_binlog = 1;
STEP 04) On the Slave, run START SLAVE;
No need to restart MySQL. Thus, no downtime.
Best Answer
ASPECT #1
Please make sure ServerA and ServerB have log_slave_updates enabled.
ASPECT #2
This setup will not break replication because every time a transaction is logged in the binary logs, the server_id of the transaction is written as well.
Let's say server_id=10 for ServerA and server_id=20 for ServerB
So, when you run this on ServerA
The transaction will include server_id 10. When ServerB looks into its relay logs and sees server_id 10, it will compare that with its own server_id (20). Since there is no match, it is OK to execute.
ServerB records that transaction in its binary logs. ServerA will pick that transaction up from its relay logs. When ServerA looks into its relay logs and sees server_id 10, it will compare that with its own server_id (10). Since there is a match, it refuses to execute.
I discussed this before in my past answers
Mar 27, 2012
: Master-Master Replication in MySQLOct 15, 2012
: Will changing server_id on master break replication?