MySql replication Failure

MySQLmysql-5.7replication

I have a mysql (Ver 14.14 Distrib 5.7.29, for Linux (x86_64)) primary -> secondary replication setup. Initially the replication was working successfully until we had to restart the primary database. Since then replication is failing. We have tried to re-enable replication a few times but after a few minutes replication fails again.

my.cnf file -> primary

# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
log-bin=mysql-bin
server-id=1
datadir=/mysqldb/datadir/mysql
socket=/mysqldb/datadir/mysql/mysql.sock
log-error=/var/log/mysql/mysql_error.log
log-syslog=ON
max_connections=1000

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]
port=3306
socket=/mysqldb/datadir/mysql/mysql.sock

my.cnf file -> secondary

# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
server-id=2
datadir=/mysqldb/datadir/mysql
socket=/mysqldb/datadir/mysql/mysql.sock
log-error=/var/log/mysql/mysql_error.log
log-syslog=ON
max_connections=1000

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]
port=3306
socket=/mysqldb/datadir/mysql/mysql.sock

show status slave\G -> secondary

Slave_IO_State: Waiting for master to send event
                  Master_Host: <ip-address>
                  Master_User: <username>
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 336473455
               Relay_Log_File: avt-mysql-sc-prod-2-relay-bin.000002
                Relay_Log_Pos: 187164
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table store_metadata.access_token; Can't find record in 'access_token', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000003, end_log_pos 336136737
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 336136333
              Relay_Log_Space: 524507
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table store_metadata.access_token; Can't find record in 'access_token', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000003, end_log_pos 336136737
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 7c367a1b-de48-11e9-a49d-42010a164351
             Master_Info_File: /mysqldb/datadir/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200122 17:37:02
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

We then try to re-enable replication again by:

stop slave;

CHANGE MASTER TO
       MASTER_HOST='<ip-address>',
       MASTER_USER='<username>',
       MASTER_PASSWORD='<password>',
       MASTER_LOG_FILE='mysql-bin.000003',
       MASTER_LOG_POS=336562088;

start slave;

where

MASTER_LOG_POS=;

comes from

show master status\G
File: mysql-bin.000003
         Position: 336562088

after a few minutes replication fails
error_log -> secondary

2020-01-22T17:32:45.562685Z 1524 [Note] Slave I/O thread for channel '': connected to master 'replica1@<ip-address>:3306',replication started in log 'mysql-bin.000003' at position 335949489
2020-01-22T17:37:02.294105Z 1525 [ERROR] Slave SQL for channel '': Could not execute Delete_rows event on table store_metadata.access_token; Can't find record in 'access_token', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000003, end_log_pos 336136737, Error_code: 1032
2020-01-22T17:37:02.294180Z 1525 [Warning] Slave: Can't find record in 'access_token' Error_code: 1032
2020-01-22T17:37:02.294193Z 1525 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000003' position 336136333.

I have also noticed the following connection error on the primary database. However, this one occurs once the replication fails. Or could it be that the connection terminates and therefore the replication fails.

mysqld.log -> primary

2020-01-22T21:10:13.065441Z 697 [Note] Start binlog_dump to master_thread_id(697) slave_server(2), pos(mysql-bin.000003, 347432570)
2020-01-22T21:53:31.113366Z 678 [Note] Aborted connection 678 to db: 'store_metadata' user: 'avtadmin' host: '<hostname>-f6297368e00118f33f8e74e464.interna' (Got an error reading communication packets)

Additional testing information

On Slave

stop slave;
drop database;
create database;

On Master

flush tables with read lock;
show master status\G
mysqldump -h <ip-address> --single-transaction --max_allowed_packet=1024m --databases store_metadata> /tmp/mysqlsnapshot-replication.sql
unlock tables;

On Slave

mysql -h<ip-address> -u <username> -p  store_metadata < "/tmp/mysqlsnapshot-replication.sql"
show databases;
use store_metadata;
show tables;

Enable Replication

CHANGE MASTER TO
       MASTER_HOST=‘<ip-address>’,
       MASTER_USER=‘<username>’,
       MASTER_PASSWORD=‘<password>’,
       MASTER_LOG_FILE='mysql-bin.000003',
       MASTER_LOG_POS=364656552;

start slave;
show slave status\G
tail -f /var/log/mysqld.log

On Master

show master status\G
tail -f /var/log/mysqld.log

Initially all was working fine. I then added a new database to master but the database never got replicated to the secondary DB. I then deleted the new database on master and that broke replication again with the error that new database didn’t exist on the secondary DB.
Any thoughts on this?
Should I restart the primary database so that a new binary log gets created?

Thank you.

Best Answer

Basically, this means your slave is now out of date with the master, and it's going to keep being out of data until the slave is fully synced. You might have to start a new slave from scratch, and get that synced with the current state of the primary.

This has been answered here before:

https://forums.mysql.com/read.php?26,367180,367785#msg-367785

Related Question