Mysql – How to avoid MySQL slave to have errors after crash

MySQLreplication

I have a MySQL master/slave configuration (MariaDB 10.0.17). Sometimes, the slave or the master server crashes. After the crash and restart, I get some errors of duplicated keys on slave. The server get stuck into these errors and I need to take action by deleting the duplicated key entries so it goes on.

How to avoid this or, if not possible, do this "slave recover" automatically?

Best Answer

Regarding de-synchronization of a single or master server: Mosty, the cause of these problems come from the fact that the master is not in full ACID mode or it is using MyISAM tables. In order to avoid problems with a server, only use InnoDB as the engine of your tables and set innodb_flush_log_at_trx commit to 1. This is not efficient -you will lose a lot of performance in a busy server except if you can get advantage of the group commit, but it is the only way to guarantee data consistency with disk (no transaction loss).

Many people, that do not have appropriate hardware (hardware cache), are forced to relax the value of this variable to 0 or 2, but in that case, you (or your hardware) is responsible to assure the availability of the disk/service in case something goes wrong (crash).
The second reason where disk can be out of sync with a master or single server is not using sync_binlog=1. By default, this is set to 0, and that means that you can write data internally, but that doesn't assure that it is written to the binary log. This shouldn't be an issue if you assume the master is always going to be back online, but you must understand that may be an issue. Activating it almost doubles your IOPS requirements, so be careful. It is not usually necessary and I do not recommend you to blindly activate it.
Third, the most common cause of desynchronization is the storage of the replication position in the non-transactional master.info file in the slave. This is the default configuration. I highly recommend changing the configuration of the variables master_info_repository and relay_log_info_repository to use TABLE (InnoDB transactional tables) instead of FILE. This will get rid of the desynchronization problems when the slave crashes.
Additionally, in order to make sure that the master and slave are always in sync, and that no transactions are skipped, I recommend you to use the GTID replication, available in MySQL 5.6/MariaDB 10. While it does not prevent the desync of the master and slave, it makes easier to identify it and resync master and slave. This is a tutorial on how to enable it on MariaDB 10. Please note that GTID implementations between MySQL and MariaDB are not compatible. Once GTID is running, things like failover are greatly simplified.
There could be other reasons for desynchronization, like undeterministic updates, or replication filters. Avoid the later, and you can mitigate the former by using row-based replication binlog_format = ROW.
On a different note, it is important to remark that MySQL default replication protocol is asynchronous, so one can never guarantee the same state on the master and the slave. If slave drift and other associated issues are a no-go, you can try alternatives like the semi-sync plugin for replication or Galera cluster, which both make tighter links between nodes.

Alternatively, I recommend you to both constantly monitor and repair replication problems with tools like pt-heartbeat, pt-table-checksum and pt-table-sync, which will avoid the need to manually provision the slave for every incident you encounter.

In order to have a more accurate answer, I recommend you to hire a consultant so that you can apply the best strategy in your case (some of the above may have an important impact on your performance while not being your specific problem).

Related Solutions

MySQL Slave replication: Need to determine where replication left off to start slave in correct location

The Primary thing you need to do was left in a comment:

Follow the suggestion in Duplicate entry in MySQL slave:
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter=1;
START SLAVE;

However, here is how you can start off at the better position with the CHANGE MASTER TO command.

Let's take a look at a sample SHOW SLAVE STATUS\G

               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.16.245
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.001527
          Read_Master_Log_Pos: 554619670
               Relay_Log_File: relay-bin.004561
                Relay_Log_Pos: 554619815
        Relay_Master_Log_File: mysql-bin.001527
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB: phpmyadmin
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 554619670
              Relay_Log_Space: 554620007
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:

Please take note of Relay_Master_Log_File and Exec_Master_Log_Pos. These represent the log file and the position of the last SQL statement that was completed on the Master that is next in line to be executed on the Slave.

So, the CHANGE MASTER TO in this case would be:

CHANGE MASTER TO master_log_file='mysql-bin.001527',master_log_pos=554619670;

For your particular case, here is what you need to do on the Slave Server:

Step 01) Start mysql up with replication disabled at startup

$ service mysql restart --skip-slave-start

Step 02) Login to mysql and show the slave status:

mysql> SHOW SLAVE STATUS\G

Step 03) Get Relay_Master_Log_File and Exec_Master_Log_Pos from SHOW SLAVE STATUS\G

Step 04) Run CHANGE MASTER TO command using Relay_Master_Log_File and Exec_Master_Log_Pos

This will erase any relay logs collected and start collecting with a fresh, empty relay log.

Step 05) mysql> START SLAVE;

Step 06) mysql> SHOW SLAVE STATUS\G repeatedly to watch Seconds_Behind_Master go to 0

Step 07) If replication breaks due to Error 1062 (Duplicate Key), now you can implement @DTest's suggestion. Then, goto Step 06. Repeat this until Seconds_Behind_Master go to 0.

We stopped at log 'mysql1-bin.000362' position 3384732.

That's what you use:

CHANGE MASTER TO master_log_file='mysql1-bin.000362'. master_log_pos=3384732;

If you ran RESET SLAVE on the Slave, do not despair. Just run the full syntax version of the CHANGE MASTER TO command:

CHANGE MASTER TO
MASTER_HOST='IP Address of Master',
MASTER_PORT=3306,
MASTER_USER='Replication Username',
MASTER_PASSWORD='Replication Password',
MASTER_LOG_FILE='mysql1-bin.000362',
MASTER_LOG_POS=3384732;

MySQL replication fails due to duplicate keys after master is restarted

Don't stop a master while any slaves are connected to it. If you need to stop it, then stop the slaves first and the master at the end. This way, you won't have any errors like the one you described.

Best Answer

Related Solutions

MySQL Slave replication: Need to determine where replication left off to start slave in correct location

MySQL replication fails due to duplicate keys after master is restarted

Related Question