Mysql – Replication stops with GTID_NEXT error after creation/drop of memory table in thesql5.6

gtidmemory-optimized-tablesmysql-5.6replicationupgrade

We have recently upgraded to mysql5.6.25 from mysql5.5.x/mysql5.1.x on our mysql-cluster.
Below is a brief snapshot of our architecture.
enter image description here

Since we have upgraded and enabled gtid-mode we have been intermittently getting slave errors similar to :

Last_SQL_Error: Error 'When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd7e8990d-3a9e-11e5-8bc7-22000aa63d47:1466'.' on query. Default database: 'adplatform'. Query: 'create table X_new like X'

Our observations are as below..

  • These slave errors are resolved simply by restarting the slave.
  • Such errors are always with Create/Drop of tables which have Memory Storage Engine.
  • Errors on Complete-Slave(B) show up continuously at a fixed minute (39th) of the hour and have been repeating since we have upgraded, almost a week.
  • Errors on Complete-Slave as well as Partial slave are observed whenever its master is restarted.
  • Cluster-1 and Cluster-2 have centos machines and Cluster-3 have ubuntu-machines. Slaves on centos machines also fail with the same error whenever its master(C/D) is restarted, but slave on ubuntu machines do not fail!!.

We have temporarily been able to live with this issue by setting up an action-script on our monitoring system which fires on slave error alert on any machine.

A look into gtid_next section in replication-options doc of mysql tells following

Prior to MySQL 5.6.20, when GTIDs were enabled but gtid_next was not
AUTOMATIC, DROP TABLE did not work correctly when used on a
combination of nontemporary tables with temporary tables, or of
temporary tables using transactional storage engines with temporary
tables using nontransactional storage engines. In MySQL 5.6.20 and
later, DROP TABLE or DROP TEMPORARY TABLE fails with an explicit error
when used with either of these combinations of tables. (Bug #17620053)

This seems related to my issue but still doesn't not explain my scenario.
Any hints/direction to solve the issue would be greatly appreciated…

EDIT :
I managed to find a similar recently reported bug in mysql(#77729), description of which is as follows :

https://bugs.mysql.com/bug.php?id=77729

When you have table with Engine MEMORY working on replication master,
mysqld injects "DELETE" statement in binary logs on first access query
to this table. This insures consistency of data on replicating slaves.

If replication is GTID ROW based, this inserted "DELETE" breaks
replication. Logged event is in STATEMENT format and do not generate
correct SET GTID_NEXT statements in binary log.

Unfortunately, the status of this bug is marked as Can't Repeat

Best Answer

I encountered a similar problem that generated the same error. On a master instance I dropped a database.

On a slave there was a file left over <tablename>.exp (I forget what purpose this serves).

The slave replication died with

last_Error: Error 'DROP DATABASE failed; some tables may have been dropped but the database directory remains. The GTID has not been added to GTID_EXECUTED and the statement was not written to the binary log. Fix this as follows: (1) remove all files from the database directory ./db_to_drop/; (2) SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'; (3) DROP DATABASE db_to_drop.' on query. Default database: 'db_to_drop'. Query: 'drop database db_to_drop'

I removed the file, ran the SET and restarted replication:

SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'; start slave; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'.

And this, from following the suggestion. Since it's only a development cluster, I wasn't worried about losing it entirely, so I punted and tried setting it to the next GTID.

SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066507';

(i.e. + 1). Restarted replication and things got back to normal.