MySQL slave stuck in “Reading event from the relay log”

linuxMySQLmysql-5.5replication

  • 5.5.28-log MySQL Community Server (GPL) by Remi
  • binlog-format=MIXED

My problem is similar to this question.

*************************** 2. row ***************************
     Id: 973415
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 29062
  State: Reading event from the relay log
   Info: NULL

It looks like a bug, except for no one mentioned that verion 5.5.28 is affected.

Here is the additional information:

mysql> show slave status\G

          Master_Log_File: mysql-bin.006413
      Read_Master_Log_Pos: 40371120
           Relay_Log_File: mysqld-relay-bin.011003
            Relay_Log_Pos: 226592090
    Relay_Master_Log_File: mysql-bin.006392
         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes
          Replicate_Do_DB:  
      Replicate_Ignore_DB:  
       Replicate_Do_Table:  
   Replicate_Ignore_Table:  
  Replicate_Wild_Do_Table:  
Replicate_Wild_Ignore_Table: 
               Last_Errno: 0
               Last_Error: 
             Skip_Counter: 0
      Exec_Master_Log_Pos: 226591944
          Relay_Log_Space: 5708184440
          Until_Condition: None
           Until_Log_File: 
            Until_Log_Pos: 0
       Master_SSL_Allowed: No
       Master_SSL_CA_File: 
       Master_SSL_CA_Path: 
          Master_SSL_Cert: 
        Master_SSL_Cipher: 
           Master_SSL_Key: 
    Seconds_Behind_Master: 29531

mysql> show engine innodb status;

---TRANSACTION 1709EE56, ACTIVE 29524 sec fetching rows
mysql tables in use 8, locked 8
94647 lock struct(s), heap size 14121400, 45755176 row lock(s), undo log entries 7547
MySQL thread id 973415, OS thread handle 0x40d90940, query id 1773214543 Reading event from the relay log

mysqlbinlog –start-position=226591944 mysql-bin.006392

#130401 15:18:23 server id 248  end_log_pos 226592078   Table_map: `reportingdb`.`v3_cam_ip` mapped to number 4973102
#130401 15:18:23 server id 248  end_log_pos 226593102   Delete_rows: table id 4973102
#130401 15:18:23 server id 248  end_log_pos 226594131   Delete_rows: table id 4973102
#130401 15:18:23 server id 248  end_log_pos 226595169   Delete_rows: table id 4973102

The command that the user run on the master:

DELETE FROM v3_cam_ip WHERE dt ='2013-03-15' LIMIT 100000

mysql> show create table v3_cam_ip\G

*************************** 1. row ***************************
       Table: v3_cam_ip
Create Table: CREATE TABLE `v3_cam_ip` (
  `campaignid` mediumint(9) unsigned DEFAULT NULL,
  `ip` varchar(20) COLLATE latin1_bin NOT NULL DEFAULT '',
  `dt` date NOT NULL,
  KEY `ix_campaignid` (`campaignid`),
  KEY `ix_dt` (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
/*!50100 PARTITION BY RANGE (TO_DAYS(dt))
(PARTITION p00 VALUES LESS THAN (0) ENGINE = InnoDB,
 PARTITION p01 VALUES LESS THAN (734653) ENGINE = InnoDB,
 PARTITION p02 VALUES LESS THAN (734745) ENGINE = InnoDB,
 PARTITION p05 VALUES LESS THAN (734965) ENGINE = InnoDB,
 PARTITION p06 VALUES LESS THAN (735160) ENGINE = InnoDB,
 PARTITION p07 VALUES LESS THAN (735210) ENGINE = InnoDB,
 PARTITION MERGER_201341 VALUES LESS THAN (735324) ENGINE = InnoDB,
 PARTITION pcurrent_201341 VALUES LESS THAN (735325) ENGINE = InnoDB) */

mysql> show keys from v3_cam_ip\G

*************************** 1. row ***************************
        Table: v3_cam_ip
   Non_unique: 1
     Key_name: ix_campaignid
 Seq_in_index: 1
  Column_name: campaignid
    Collation: A
  Cardinality: 20
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment: 
Index_comment: 
*************************** 2. row ***************************
        Table: v3_cam_ip
   Non_unique: 1
     Key_name: ix_dt
 Seq_in_index: 1
  Column_name: dt
    Collation: A
  Cardinality: 20
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
2 rows in set (0.95 sec)

What I have done on one of two Slaves:

  • stop slave; hangs so I have to press CtrlC
  • then I tried to restart the MySQL (using init script) but it failed
  • after that I did a kill -9 (silly?) and start again
  • now the slave is started but it doesn't get update from the master

What should I do on the remaining slave?


gdb –batch –quiet -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' -p $(pidof mysqld)

#4  0x0000000000926b99 in ha_partition::handle_unordered_next(unsigned char*, bool) ()
No symbol table info available.
#5  0x0000000000733de9 in Rows_log_event::find_row(Relay_log_info const*) ()
No symbol table info available.
#6  0x0000000000734079 in Delete_rows_log_event::do_exec_row(Relay_log_info const*) ()
No symbol table info available.
#7  0x0000000000734a92 in Rows_log_event::do_apply_event(Relay_log_info const*) ()
No symbol table info available.
#8  0x000000000051774e in apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) ()

The full backtrace: http://fpaste.org/pXvT/

Best Answer

If you are running LOAD DATA LOCAL INFILE on a Master every 5 minutes, how does that replicate ?

Back on Jan 22, 2012, I wrote an answer to this post : MySql shell command not replicated to slave

In my answer, I explained how LOAD DATA LOCAL INFILE gets replicated:

  • Master
    1. Executes LOAD DATA LOCAL INFILE
    2. Copies the contents of entire text file used into the binary logs
    3. The SQL command LOAD DATA LOCAL INFILE is appended to the latest binary log
  • Replication ships all of this from the Master's Binary Logs to the Slave's Relay Logs
  • Slave
    1. Sees the Text File in the Relay Logs
    2. Reads all the blocks from multiple Relay Logs
    3. The text file get materialized in the /tmp folder
    4. Reads LOAD DATA LOCAL INFILE command from Relay Log
    5. Executes LOAD DATA LOCAL INFILE in the SQL Thread

During Steps 1-4 on the Slave, the IO Thread would have to be in the Reading event from the relay log state, constantly extracting the CSV needed for the next LOAD DATA LOCAL INFILE command. Sometimes, this causes Seconds_Behind_Master to leap hundreds or even thousands of seconds at a time without warning after staying relatively idle.

Running STOP SLAVE; does not help at all. The reason STOP SLAVE; hangs? Steps 1-4 of the Slave will lock the IO Thread until the complete extraction of the CSV file. Even when the extraction has completed, there is the LOAD DATA LOCAL INFILE itself. Running STOP SLAVE; will lock on the SQL Thread running LOAD DATA LOCAL INFILE.

In this crazy paradigm, Replication Lag has to increase steadily. Just look at your Relay_Log_Space. It is 5708184440 (like 5.3 GB). There are multiple LOAD DATA LOCAL INFILE commands just waiting to execute.

Look at the how Seconds_Behind Master increases

  • LOAD DATA LOCAL INFILE is executed on the Master
  • The CSV needed in loaded into Binary Logs
  • The CSV needed in unloaded from Relay Logs
  • LOAD DATA LOCAL INFILE is executed on the Slave

If LOAD DATA LOCAL INFILE takes 2 minutes, double that number and add the taken to ship the CSV file through the MySQL Replication process. You may need to come up with a different method of loading data that does not use LOAD DATA LOCAL INFILE.

UPDATE 2013-04-05 15:00 EDT

If MySQL Replication continues falling behind (Seconds_Behind_Master keeps increasing) while the Master keeps getting log-jammed with small LOAD DATA LOCAL INFILE commands, there is only one more thing I could suggest: Most people never touch sync_binlog, which is normally zero. What effect can this have on replicating LOAD DATA LOCAL INFILE ?

According to the Documentation on sync_binlog, this setting can be used to flush binlog changes to disk. Since it is 0 by default, your Master DB Server is at the mercy of the OS because OS is responsible for flusihng binlog changes. When you set sync_binlog to 1, everything may actually change for better. How?

Here is what is probably happening when sync_binlog is 0 on a Master:

  • You run LOAD DATA LOCAL INFILE
  • mysqld on Master writes the command to the binlog
  • mysqld on Master writes the entire CSV file into the binlogs
  • mysqld on Master leaves it to the OS to flush binlogs changes
  • Slave reads all binlogs info from Master except the last binlog that the Master's OS did not flush
  • Slave status shows it is trying to retrieve the remaining info

This is how sync_binlog can hopefully improve things:

  • You run SET GLOBAL sync_binlog = 1;
  • You run LOAD DATA LOCAL INFILE
  • You run SET GLOBAL sync_binlog = 0;
  • mysqld on Master writes the command to the binlog
  • mysqld on Master writes the entire CSV file into the binlogs
  • mysqld on Master flushes every write to the binlogs because sync_binlog = 1
  • Slave reads all binlogs info from Master
  • Slave status should show it has read every needed binlog

Give it a Try !!!

UPDATE 2013-04-09 11:23 EDT

If you have a low-to-moderate amount of writes (INSERT, UPDATE, DELETE, and ALTER TABLE) in the Master DB Server, leaving sync_binlog at 1 may not be a bad idea. You would then need to do the following:

STEP 01) On the Slave, run STOP SLAVE;

STEP 02) On the Master, add sync_binlog=1 to /etc/my.cnf:

[mysqld]
sync-binlog=1

STEP 03) Run one of the following on the Master:

SET GLOBAL sync_binlog = 1;

or

service mysql restart

STEP 04) On the Slave, run START SLAVE;

Give it a Try !!!