Mysql – How to resolve the master server shut down/unavailability in thesql with master – slave replication

MySQLreplication

Let us suppose that we have a master/slave setup and the application is running fine for the time being. All of a sudden the master server suddenly shuts down due to some problem.

Now how can I address this situation? What am I thinking is that:

  1. stop the slave on slave
  2. get the show master status on slave.
  3. get the additional trasactions that were not replicated from the masters binary log
  4. execute them on the slave
  5. point the master to the applicatin and start as master.
  6. recover the original master and make this as slave.

Is there anything else that adds more consistency?

Best Answer

Before you perform any mysqldump to fully restore a Slave, you should consult the output of SHOW SLAVE STATUS\G. Let's start with a sample SHOW SLAVE STATUS\G:

mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.48.20.253
                Master_User: replicant
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000254
        Read_Master_Log_Pos: 858190247
             Relay_Log_File: relay-bin.066069
              Relay_Log_Pos: 873918
      Relay_Master_Log_File: mysql-bin.000254
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 858190247
            Relay_Log_Space: 873772
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

Please notice that there are two sets of replication coordinates from the Master

  • (Master_Log_File,Read_Master_Log_Pos)
  • (Relay_Master_Log_File,Exec_Master_Log_Pos)

There is a major difference between them

  • (Master_Log_File,Read_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs.
  • (Relay_Master_Log_File,Exec_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs THAT IS NEXT TO BE EXECUTED ON THE SLAVE.

The timestamps from these two coordinates helps you figure out Seconds_Behind_Master.

Knowing these things, here is what you can do:

  • Step 01) Run SHOW SLAVE STATUS\G
  • Step 02) Get Relay_Master_Log_File,Exec_Master_Log_Pos from SHOW SLAVE STATUS\G (In the sample, that would be (mysql-bin.000254,858190247)
  • Step 03) STOP SLAVE;
  • Step 04) CHANGE MASTER TO master_log_file='mysql-bin.000254',master_log_pos=858190247;
  • Step 05) START SLAVE;
  • Step 06) Wait 10 seconds
  • Step 07) Run SHOW SLAVE STATUS\G and check Seconds_Behind_Master

If the Seconds_Behind_Master is a number and eventaully drops to zero, replication is fully reesatablished.

After doing all this, if replication breaks because of a corrupt binary log from the master, then you do the last resort:

  • Steo 01) On the Master, RESET MASTER; to erase all binary logs and start with a new one
  • Step 02) On the Master, run this

This create proper dump for the slave

echo "STOP SLAVE;" > /root/MySQLData.sql
mysqldump --all-databases --routines --triggers --flush-privileges --master-data=1 >> /root/MySQLData.sql
echo "START SLAVE;" >> /root/MySQLData.sql
  • Step 03) scp /root/MySQLData.sql over to the Slave and Load it in MySQL on the Slave

Give it a Try !!!