Mysql – Replication stops with GTID_NEXT error after creation/drop of memory table in thesql5.6

gtidmemory-optimized-tablesmysql-5.6replicationupgrade

We have recently upgraded to mysql5.6.25 from mysql5.5.x/mysql5.1.x on our mysql-cluster.
Below is a brief snapshot of our architecture.

Since we have upgraded and enabled gtid-mode we have been intermittently getting slave errors similar to :

Last_SQL_Error: Error 'When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd7e8990d-3a9e-11e5-8bc7-22000aa63d47:1466'.' on query. Default database: 'adplatform'. Query: 'create table X_new like X'

Our observations are as below..

These slave errors are resolved simply by restarting the slave.
Such errors are always with Create/Drop of tables which have Memory Storage Engine.
Errors on Complete-Slave(B) show up continuously at a fixed minute (39th) of the hour and have been repeating since we have upgraded, almost a week.
Errors on Complete-Slave as well as Partial slave are observed whenever its master is restarted.
Cluster-1 and Cluster-2 have centos machines and Cluster-3 have ubuntu-machines. Slaves on centos machines also fail with the same error whenever its master(C/D) is restarted, but slave on ubuntu machines do not fail!!.

We have temporarily been able to live with this issue by setting up an action-script on our monitoring system which fires on slave error alert on any machine.

A look into gtid_next section in replication-options doc of mysql tells following

Prior to MySQL 5.6.20, when GTIDs were enabled but gtid_next was not
AUTOMATIC, DROP TABLE did not work correctly when used on a
combination of nontemporary tables with temporary tables, or of
temporary tables using transactional storage engines with temporary
tables using nontransactional storage engines. In MySQL 5.6.20 and
later, DROP TABLE or DROP TEMPORARY TABLE fails with an explicit error
when used with either of these combinations of tables. (Bug #17620053)

This seems related to my issue but still doesn't not explain my scenario.
Any hints/direction to solve the issue would be greatly appreciated…

EDIT :
I managed to find a similar recently reported bug in mysql(#77729), description of which is as follows :

https://bugs.mysql.com/bug.php?id=77729

When you have table with Engine MEMORY working on replication master,
mysqld injects "DELETE" statement in binary logs on first access query
to this table. This insures consistency of data on replicating slaves.

If replication is GTID ROW based, this inserted "DELETE" breaks
replication. Logged event is in STATEMENT format and do not generate
correct SET GTID_NEXT statements in binary log.

Unfortunately, the status of this bug is marked as Can't Repeat…

Best Answer

I encountered a similar problem that generated the same error. On a master instance I dropped a database.

On a slave there was a file left over <tablename>.exp (I forget what purpose this serves).

The slave replication died with

last_Error: Error 'DROP DATABASE failed; some tables may have been dropped but the database directory remains. The GTID has not been added to GTID_EXECUTED and the statement was not written to the binary log. Fix this as follows: (1) remove all files from the database directory ./db_to_drop/; (2) SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'; (3) DROP DATABASE db_to_drop.' on query. Default database: 'db_to_drop'. Query: 'drop database db_to_drop'

I removed the file, ran the SET and restarted replication:

SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'; start slave; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066506'.

And this, from following the suggestion. Since it's only a development cluster, I wasn't worried about losing it entirely, so I punted and tried setting it to the next GTID.

SET GTID_NEXT='c62b6474-84e6-11e8-bf49-00164ef9ea6c:73066507';

(i.e. + 1). Restarted replication and things got back to normal.

I have very bad news for you.

You should not have deleted the ibdata1 file. Here is why:

ibdata1 contains four type of information:

table metadata
MVCC data
data pages (with innodb_file_per_table enabled)
index pages (with innodb_file_per_table enabled)

Each InnoDB table created has a numercial id assigned to it via some auto increment metadata feature to each ibd file. That internal tablespace id (ITSID) is embedded in the .ibd file. That number is checked against the list of ITSIDs maintained, guess where, ... ibdata1.

I also have very good news for you along with some bad news.

It is possible to reconstruct ibdata1 to have the correct ITSIDs but it takes work to do it. While I personally have not done procedure alone, I assisted a client at my employer's web hosting to do this. We figured this out together but since the client hosed ibdata1, I let him do most of the work (30 InnoDB tables).

Anyway, here a past post I made in the DBA StackExchange. I answered another question whose root cause was the mixing up of ITSIDs.

To cut right to the chase, here is the article explaining what to do with reference to ITSID and how to massage ibdata1 into acknowledging the presence of the ITSID contained within the .ibd file.

I am sorry there is no quick-and-dirty method for recovering the .ibd file other than playing games with ITSIDs.

UPDATE 2011-10-17 06:19 EDT

Here is your original innodb configuration from your question:

innodb_file_per_table=1
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G
innodb_buffer_pool_size=4G
innodb_data_file_path=ibdata1:10M:autoextend
innodb_buffer_pool_size = 384M
innodb_log_file_size=5M
innodb_lock_wait_timeout = 18000

Please notice that innodb_log_file_size is there twice. Look carefully...

innodb_file_per_table=1
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G <----
innodb_buffer_pool_size=4G
innodb_data_file_path=ibdata1:10M:autoextend
innodb_buffer_pool_size = 384M
innodb_log_file_size=5M <----
innodb_lock_wait_timeout = 18000

The last setting of innodb_log_file_size takes precedence. MySQL expected to start up with the log files being 5M. Your ib_logfile0 and ib_logfile1 were 1G when you tried to start up mysqld. It saw a size conflict and took the path of least resistance, which was to disable InnoDB. That's why InnoDB was missing from show engines;. Mystery solved !!!

UPDATE 2011-10-17 11:07 EDT

The error message was deceptive because innodb_log_file_size was smaller than the log files (ib_logfile0 and ib_logfile1), which were 1G at the time. What's interesting is this: Corruption was reported because the file was expected to be 5M and the files were bigger. If the situation were reversed and the innodb log files were smaller than the declared size in my.cnf you should get something like this in the error log:

110216 9:48:41 InnoDB: Initializing buffer pool, size = 128.0M
110216 9:48:41 InnoDB: Completed initialization of buffer pool
InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes
InnoDB: than specified in the .cnf file 0 33554432 bytes!
110216 9:48:41 [ERROR] Plugin 'InnoDB' init function returned error.
110216 9:48:41 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

In this example, the log files were already existing as 5M and the setting for innodb_log_file_size was bigger (in this case, 32M).

For this particular question, I blame MySQL (eh Oracle [still hate saying it]) for the inconsistent error message protocol.

Best Answer

Related Solutions

Mysql – Is MySQL Replication appropriate for keeping a laptop in sync

Mysql – Error ‘Unknown table engine ‘InnoDB” on query. after restarting thesql

I have very bad news for you.

I also have very good news for you along with some bad news.

UPDATE 2011-10-17 06:19 EDT

UPDATE 2011-10-17 11:07 EDT

Related Question