Mysql – Error 1236 ‘Found old binary log without GTIDs’ after master restart

MySQLreplication

So after quite a few attempts to build a slave and getting the error above, I decided to start with no binlogs from scratch so on the latest slave rebuild I issued PURGE ALL BINLOGS BEFORE NOW() on the master to get rid of all the possible binlogs without GTIDs, then made a full dump with innobackupex and moved it to the slave.
Started the slave up and it worked fine…
Up to the point where I always get this exact error which is when I restart the master….
so after a mysql restart on the master is executed the slave lost connection of course, after reconnecting I got greeted with the usual:

"Got fatal error 1236 from master when reading data from binary log: 'Found old binary log without GTIDs while looking for the oldest binary log that contains any GTID that is not in the given gtid set'"

This is getting ridiculous now, I have rebuilt the slave 4 times now in 3 days.

Here's my my.cnf:

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld_safe]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
nice            = 0
open_files_limit        = 65535

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /srv/mysql
tmpdir          = /tmp
lc-messages-dir = /usr/share/mysql
explicit_defaults_for_timestamp

bind-address    = xxxxxxxxx

key_buffer              = 64M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 16
myisam-recover         = BACKUP
table_open_cache                = 512
open_files_limit        = 65535
interactive_timeout=180
wait_timeout=180

query_cache_limit       = 1M
query_cache_size        = 256M

slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 10
max_binlog_size         = 100M
expire_logs_days        = 10

log-error       = /var/log/mysql/error.log
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

symbolic-links=0

innodb_file_per_table
innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size         = 24G
innodb_file_format              = "Barracuda"

binlog-format=MIXED
log-slave-updates=true
log-bin
gtid-mode=on
server-id=1
enforce-gtid-consistency=true

[mysqldump]
quick
quote-names
max_allowed_packet      = 32M

[mysql]

[isamchk]
key_buffer              = 16M

At the time of master restart, the slave was about 4000 seconds behind already (that's what I was trying to solve somehow, but I got this error – again -)

I case you are wondering last time I rebuilt the whole thing was last night, so it is not a case of missing binlogs due to expiry.

Does anyone have any idea?

Thanks

Best Answer

Something is not right about your process.

Usually, when I see error 1236, such the following I used in my old post How can you monitor if MySQL binlog files get corrupted?

[ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236).
[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236
111014 20:25:48 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.

here was the situation: When doing MySQL Replication without GTID, the IO Thread examines the position from the latest Master binlog. If Read_Master_Log_Pos is bigger than the actual filesize of the binlog, you get error 1236.

When doing MySQL Replication with GTID, the situation is somewhat similar. The IO Thread is looking for some kind of closure with regard to the GTID it was last using. When you restarted MySQL on the Master, you closed the last binlog on the Master and opened a new binlog upon startup. The IO Thread on the Slave was still active. Thus, the same error number is coming up.

The next time you restart a Master, remember the Slaves are active.

The slave should have reconnected after a minute, but that is not happening for you.

To play it safe, you should do the following

  • On the Slave, STOP SLAVE;
  • On the Master, service mysql restart
  • On the Slave, START SLAVE;

You should not have to do this. As an alternative, try setting up replication with heartbeat set at one tenth of a second:

CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD = 100;

This should make the IO Thread on the Slave a little more sensitive