Mysql – Looking for an efficient way to fix “Could not parse relay log event entry…” error

MySQLreplication

I have a corrupt relay log in MySQL 5.0 and looking for the steps to fix replication without having to reprocess all the binary logs from the master again.

Here is the full error message if you are curious:
Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

The normal fix is quite easy, you just obtain the correct binary log name and position from show slave status and run the change master command. But this is not what I would like to do.

I would like to reprocess only one or few binary logs from the master to recover the corrupted relay log and then continue to use the relay logs that were already created. I am looking for someone who had past experience doing this as there is a potential for failures doing the recovery this way.

Conceptually, I am thinking the following steps might be needed (maybe some of them are optional):

  1. Stop replication on the slave.
  2. Shutdown slave instance.
  3. Move relay logs to a safe location.
  4. See if relay-log.info or master.info files need to be updated manually.
  5. Start slave instance.
  6. Run change master command and reprocess binary logs from the failed position.
  7. Wait until next relay log is created.
  8. Stop replication on the slave.
  9. Shutdown slave instance.
  10. Figure out how to replace 2nd relay log file with relay logs files that are in the safe location and move them in.
  11. See if relay-log.info or master.info files need to be updated manually.
  12. Start slave instance.
  13. Hope for the best.

I am thinking of an alternate method to do it, which seems much simpler:

  1. Stop replication on the slave.
  2. Shutdown slave instance.
  3. Move relay logs to a safe location.
  4. Start slave instance.
  5. Run change master command and reprocess binary logs from the failed position.
  6. Wait until next relay log is created.
  7. Stop replication on the slave.
  8. Use mysqlbinlog to process the relay logs that are in the safe location by piping to mysql client. Identify which file and position to start at. (Potential for issues, unless aborting processing upon first error).
  9. Set the slave to replicate from master at the read master log file/position.

Best Answer

There is more stable approach you can try

Here is something to remember

Whenever you run CHANGE MASTER TO, it will erase every relay log you have. You do not want to keep relay logs of commands you have not executed any SQL on as of yet

The following is an excerpt taken from a post I made back on Feb 03, 2012 : How to resolve the master server shut down/unavailability in mysql with master - slave replication :

Please notice that there are two sets of replication coordinates from the Master

  • (Master_Log_File,Read_Master_Log_Pos)
  • (Relay_Master_Log_File,Exec_Master_Log_Pos)

There is a major difference between them

  • (Master_Log_File,Read_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs.
  • (Relay_Master_Log_File,Exec_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs THAT IS NEXT TO BE EXECUTED ON THE SLAVE.

What you want are two things:

  1. Erase Every Binary Log You Have
  2. Start Collecting Binary Log Entries From the Last SQL You Successfully Executed.

In your case, you must use the second set of Replication Coordinates

  • Relay_Master_Log_File
  • Exec_Master_Log_Pos

It is easy to distrust a corrupt relay log as shown in the error message. The one that hurts the most is a corrupt Master Log. You will have to jump through hoops if that is the case. On the other hand, if one of the other situations was the reason for the corrupt relay log, the simplest and most concise approach is what I stated.

To make sure, whatever is reported for Relay_Master_Log_File, if that particular binary log still exists on the Master, perform a mysqlbinlog on it. If it dumps in its entirety without corrupt characters, go ahead and use the second set of replication coordinates.

From my same earlier post

mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.48.20.253
                Master_User: replicant
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000254
        Read_Master_Log_Pos: 858190247
             Relay_Log_File: relay-bin.066069
              Relay_Log_Pos: 873918
      Relay_Master_Log_File: mysql-bin.000254
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 858190247
            Relay_Log_Space: 873772
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

notice that the Replication Coordinates from SHOW SLAVE STATUS\G for what was last executed are (mysql-bin.000254,858190247). The CHANGE MASTER TO command in this case would be:

CHANGE MASTER TO master_log_file='mysql-bin.000254',master_log_pos=858190247;

Give it a Try !!!

UPDATE 2012-09-14 16:38 EDT

If you worried about the stockpiling relay logs, just throttle the relay logs. In SHOW SLAVE STATUS\G, there is a field called Relay_Log_Space. That gives you the sum of all relay sizes in bytes. Did you know you could put a cap on that number ?

The option is called relay_log_space_limit.

For example, if you want to cap the total number of bytes to 10G, do the following

STEP 01) Add this to /etc/my.cnf on the Slave

[mysqld]
relay_log_space_limit = 10G

STEP 02) Run service mysql restart on the Slave

and that's it !!!

When the oldest relay has all its entries processed, it is deleted and a new relay log is created. That gets filled until all relay logs add up to 10G. That's the only way to control runaway relay log space issues.

UPDATE 2012-09-14 18:10 EDT

SUGGESTION : If you make mysqldump backups of the data on the Slave every midnight, you could set up the following to restrict having 1TB of binary logs:

STEP 01) Add this to /etc/my.cnf on the Master

[mysqld]
expire_logs_days = 14

STEP 02) Run this query on the Master

mysql> PURGE BINARY LOGS BEFORE DATE(NOW()) - INTERVAL 14 DAY;

STEP 03) service mysql restart on the Master

STEP 04) Add a mysqldump backup script to a crontab on the Slave

This will make the Slave more useful and would control having excess binary logs to worry about