Mysql – Relay log read fail, relay or master log currupt ,How to repair

group-replicationmysql-8.0mysql-innodb-clusterreplication

I am running mysql InnoDB cluster 8.0.17(group replication)
During the repication the slave disk gets full and no space left at all the thread waits for the disk space to be freed at the same time this was happening the server restarted once server came back mysql error log shows relay log is corrupt and slave cannot rejoin.
my question is How can I repair the slave?

error log
Disk is full writing ‘./ic-1-relay-bin-group_replication_applier.000039’ (OS errno 28 – No space left on device). Waiting for someone to free space… Retry in 60 secs. Message reprinted in 600 secs.
2019-08-21T19:02:37.095609Z 33 [ERROR] [MY-010584] [Repl] Slave SQL for channel ‘group_replication_applier’: Could not execute Write_rows event on table sbtest.sbtest3; Error writing file ‘/tmp/MLfd=28’ (OS errno 22019-08-22T06:55:36.766622Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.17) starting as process 910
2019-08-22T06:55:43.510632Z 1 [Warning] [MY-012637] [InnoDB] 81920 bytes should have been written. Only 77824 bytes written. Retrying for the remaining bytes.
2019-08-22T06:55:43.510669Z 1 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2019-08-22T06:55:43.510683Z 1 [ERROR] [MY-012639] [InnoDB] Write to file ./#innodb_temp/temp_6.ibt failed at offset 0, 81920 bytes should have been written, only 77824 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
….
Error number 28 means ‘No space left on device’
2019-08-22T07:02:37.278732Z 1 [ERROR] [MY-012267] [InnoDB] Could not set the file size of ‘./ibtmp1’. Probably out of disk space
2019-08-22T07:02:37.278746Z 1 [ERROR] [MY-012926] [InnoDB] Unable to create the shared innodb_temporary.
2019-08-22T07:02:37.278764Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error.
2019-08-22T07:02:37.776328Z 1 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
2019-08-22T07:02:37.776504Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2019-08-22T07:02:37.779913Z 0 [ERROR] [MY-010119] [Server] Aborting

[MY-010818] [Server] Error reading GTIDs from relaylog: -1
Slave SQL for channel ‘group_replication_applier’: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave. Error_code: MY-013121

Best Answer

After searching for sometime I found a reason and solution.

By default the relay_log_purge variable was on and the relay_log_recovery was off in mysql configuration and both values are default. As per the mysql documentation the --relay-log-recovery option must be enabled on the slave to guarantee resilience in the event of an unexpected server halt. so in my case due to no disk space server restarted and relay log corrupt error message came now

SOLUTION:

first remove the node from group

do this on the removed slave node

STOP GROUP REPLICATION

RESET SLAVE

rejoin node to the cluster .It will work.

Explanation: RESET SLAVE makes the slave forget its replication position in the master's binary log deletes all the relay log files, and starts a new relay log file.

To use RESET SLAVE on a Group Replication group member, the member status must be OFFLINE, meaning that the plugin is loaded but the member does not currently belong to any group. A group member can be taken offline by using a STOP GROUP REPLICATION statement

For a server where GTIDs are in use (gtid_mode is ON), issuing RESET SLAVE has no effect on the GTID execution history. The statement does not change the values of gtid_executed or gtid_purged