There is more stable approach you can try
Here is something to remember
Whenever you run CHANGE MASTER TO
, it will erase every relay log you have. You do not want to keep relay logs of commands you have not executed any SQL on as of yet
The following is an excerpt taken from a post I made back on Feb 03, 2012 : How to resolve the master server shut down/unavailability in mysql with master - slave replication :
Please notice that there are two sets of replication coordinates from
the Master
- (Master_Log_File,Read_Master_Log_Pos)
- (Relay_Master_Log_File,Exec_Master_Log_Pos)
There is a major difference between them
(Master_Log_File,Read_Master_Log_Pos)
tells you the last binlog statement from the Master's log file and log position that the Slave
read from the Master and placed in its Relay Logs.
(Relay_Master_Log_File,Exec_Master_Log_Pos)
tells you the last binlog statement from the Master's log file and log position that the
Slave read from the Master and placed in its Relay Logs THAT IS NEXT
TO BE EXECUTED ON THE SLAVE.
What you want are two things:
- Erase Every Binary Log You Have
- Start Collecting Binary Log Entries From the Last SQL You Successfully Executed.
In your case, you must use the second set of Replication Coordinates
Relay_Master_Log_File
Exec_Master_Log_Pos
It is easy to distrust a corrupt relay log as shown in the error message. The one that hurts the most is a corrupt Master Log. You will have to jump through hoops if that is the case. On the other hand, if one of the other situations was the reason for the corrupt relay log, the simplest and most concise approach is what I stated.
To make sure, whatever is reported for Relay_Master_Log_File
, if that particular binary log still exists on the Master, perform a mysqlbinlog on it. If it dumps in its entirety without corrupt characters, go ahead and use the second set of replication coordinates.
From my same earlier post
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.48.20.253
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000254
Read_Master_Log_Pos: 858190247
Relay_Log_File: relay-bin.066069
Relay_Log_Pos: 873918
Relay_Master_Log_File: mysql-bin.000254
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 858190247
Relay_Log_Space: 873772
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
notice that the Replication Coordinates from SHOW SLAVE STATUS\G
for what was last executed are (mysql-bin.000254,858190247)
. The CHANGE MASTER TO
command in this case would be:
CHANGE MASTER TO master_log_file='mysql-bin.000254',master_log_pos=858190247;
Give it a Try !!!
UPDATE 2012-09-14 16:38 EDT
If you worried about the stockpiling relay logs, just throttle the relay logs. In SHOW SLAVE STATUS\G
, there is a field called Relay_Log_Space
. That gives you the sum of all relay sizes in bytes. Did you know you could put a cap on that number ?
The option is called relay_log_space_limit.
For example, if you want to cap the total number of bytes to 10G, do the following
STEP 01) Add this to /etc/my.cnf on the Slave
[mysqld]
relay_log_space_limit = 10G
STEP 02) Run service mysql restart
on the Slave
and that's it !!!
When the oldest relay has all its entries processed, it is deleted and a new relay log is created. That gets filled until all relay logs add up to 10G. That's the only way to control runaway relay log space issues.
UPDATE 2012-09-14 18:10 EDT
SUGGESTION : If you make mysqldump backups of the data on the Slave every midnight, you could set up the following to restrict having 1TB of binary logs:
STEP 01) Add this to /etc/my.cnf on the Master
[mysqld]
expire_logs_days = 14
STEP 02) Run this query on the Master
mysql> PURGE BINARY LOGS BEFORE DATE(NOW()) - INTERVAL 14 DAY;
STEP 03) service mysql restart
on the Master
STEP 04) Add a mysqldump backup script to a crontab on the Slave
This will make the Slave more useful and would control having excess binary logs to worry about
If you are running LOAD DATA LOCAL INFILE
on a Master every 5 minutes, how does that replicate ?
Back on Jan 22, 2012, I wrote an answer to this post : MySql shell command not replicated to slave
In my answer, I explained how LOAD DATA LOCAL INFILE
gets replicated:
- Master
- Executes
LOAD DATA LOCAL INFILE
- Copies the contents of entire text file used into the binary logs
- The SQL command
LOAD DATA LOCAL INFILE
is appended to the latest binary log
- Replication ships all of this from the Master's Binary Logs to the Slave's Relay Logs
- Slave
- Sees the Text File in the Relay Logs
- Reads all the blocks from multiple Relay Logs
- The text file get materialized in the
/tmp
folder
- Reads
LOAD DATA LOCAL INFILE
command from Relay Log
- Executes
LOAD DATA LOCAL INFILE
in the SQL Thread
During Steps 1-4 on the Slave, the IO Thread would have to be in the Reading event from the relay log
state, constantly extracting the CSV needed for the next LOAD DATA LOCAL INFILE
command. Sometimes, this causes Seconds_Behind_Master
to leap hundreds or even thousands of seconds at a time without warning after staying relatively idle.
Running STOP SLAVE;
does not help at all. The reason STOP SLAVE;
hangs? Steps 1-4 of the Slave will lock the IO Thread until the complete extraction of the CSV file. Even when the extraction has completed, there is the LOAD DATA LOCAL INFILE
itself. Running STOP SLAVE;
will lock on the SQL Thread running LOAD DATA LOCAL INFILE
.
In this crazy paradigm, Replication Lag has to increase steadily. Just look at your Relay_Log_Space
. It is 5708184440 (like 5.3 GB). There are multiple LOAD DATA LOCAL INFILE
commands just waiting to execute.
Look at the how Seconds_Behind Master
increases
LOAD DATA LOCAL INFILE
is executed on the Master
- The CSV needed in loaded into Binary Logs
- The CSV needed in unloaded from Relay Logs
LOAD DATA LOCAL INFILE
is executed on the Slave
If LOAD DATA LOCAL INFILE
takes 2 minutes, double that number and add the taken to ship the CSV file through the MySQL Replication process. You may need to come up with a different method of loading data that does not use LOAD DATA LOCAL INFILE
.
UPDATE 2013-04-05 15:00 EDT
If MySQL Replication continues falling behind (Seconds_Behind_Master keeps increasing) while the Master keeps getting log-jammed with small LOAD DATA LOCAL INFILE
commands, there is only one more thing I could suggest: Most people never touch sync_binlog, which is normally zero. What effect can this have on replicating LOAD DATA LOCAL INFILE
?
According to the Documentation on sync_binlog
, this setting can be used to flush binlog changes to disk. Since it is 0 by default, your Master DB Server is at the mercy of the OS because OS is responsible for flusihng binlog changes. When you set sync_binlog to 1, everything may actually change for better. How?
Here is what is probably happening when sync_binlog
is 0 on a Master:
- You run
LOAD DATA LOCAL INFILE
- mysqld on Master writes the command to the binlog
- mysqld on Master writes the entire CSV file into the binlogs
- mysqld on Master leaves it to the OS to flush binlogs changes
- Slave reads all binlogs info from Master except the last binlog that the Master's OS did not flush
- Slave status shows it is trying to retrieve the remaining info
This is how sync_binlog
can hopefully improve things:
- You run
SET GLOBAL sync_binlog = 1;
- You run
LOAD DATA LOCAL INFILE
- You run
SET GLOBAL sync_binlog = 0;
- mysqld on Master writes the command to the binlog
- mysqld on Master writes the entire CSV file into the binlogs
- mysqld on Master flushes every write to the binlogs because
sync_binlog = 1
- Slave reads all binlogs info from Master
- Slave status should show it has read every needed binlog
Give it a Try !!!
UPDATE 2013-04-09 11:23 EDT
If you have a low-to-moderate amount of writes (INSERT, UPDATE, DELETE, and ALTER TABLE) in the Master DB Server, leaving sync_binlog
at 1 may not be a bad idea. You would then need to do the following:
STEP 01) On the Slave, run STOP SLAVE;
STEP 02) On the Master, add sync_binlog=1
to /etc/my.cnf
:
[mysqld]
sync-binlog=1
STEP 03) Run one of the following on the Master:
SET GLOBAL sync_binlog = 1;
or
service mysql restart
STEP 04) On the Slave, run START SLAVE;
Give it a Try !!!
Best Answer
After searching for sometime I found a reason and solution.
By default the relay_log_purge variable was on and the relay_log_recovery was off in mysql configuration and both values are default. As per the mysql documentation the --relay-log-recovery option must be enabled on the slave to guarantee resilience in the event of an unexpected server halt. so in my case due to no disk space server restarted and relay log corrupt error message came now
SOLUTION:
first remove the node from group
do this on the removed slave node
STOP GROUP REPLICATION
RESET SLAVE
rejoin node to the cluster .It will work.
Explanation: RESET SLAVE makes the slave forget its replication position in the master's binary log deletes all the relay log files, and starts a new relay log file.
To use RESET SLAVE on a Group Replication group member, the member status must be OFFLINE, meaning that the plugin is loaded but the member does not currently belong to any group. A group member can be taken offline by using a STOP GROUP REPLICATION statement
For a server where GTIDs are in use (gtid_mode is ON), issuing RESET SLAVE has no effect on the GTID execution history. The statement does not change the values of gtid_executed or gtid_purged