Mysql – Does this make the XtraDB Cluster hang

MySQLperconareplicationxtradbxtradb-cluster

After seeing the following in the error log on the second node:

121003  7:16:06 [Note] WSREP: Member 0 (joiner) synced with group.
121003  7:16:06 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
121003  7:16:06 [Note] WSREP: Synchronized with group, ready for connections
121003  7:16:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121003  7:17:08 [Note] WSREP: Skipping empty log_xid: COMMIT
121003  7:17:08 [Note] WSREP: ignoring DDL failure: 0 ALTER TABLE bigdata_queue_campaigns DISABLE KEYS
121003  7:17:08 [Note] WSREP: Skipping empty log_xid: COMMIT
121003  7:17:08 [Note] WSREP: ignoring DDL failure: 0 ALTER TABLE bigdata_queue_campaigns ENABLE KEYS

...

121003 7:31:33 [ERROR] Slave SQL: Error 'Table 'reportingdb.norep_zonebannertmp_bk' doesn't exist' on query. Default database: 'reportingdb'. Query: 'TRUNCATE TABLE norep_zonebannertmp_bk', Error_code: 1146

121003  7:31:33 [Warning] WSREP: RBR event 1 Query apply warning: 1, 1141

121003 7:31:33 [Warning] WSREP: Ignoring error for TO isolated action: source: 84dcb35c-0ce4-11e2-0800-4568aec9a7f3 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 1106 trx_id: -1 seqnos (l: 1196, g: 1141, s: 1140, d: 1140, ts: 1349224295344525000)

I cannot login to the first node mysql -u root -p hangs. I don't see any interesting in the error log on this node.

I'm using Percona-XtraDB-Cluster-server-5.5.27-23.6.356.rhel5.

Let me know if you need further information.

Best Answer

If the table reportingdb.norep_zonebannertmp_bk does not exist on one of the nodes in the Cluster, this could be the source of your problem. Why? TRUNCATE TABLE is DDL. That will close the current transaction PXC has during WriteSet Operations. This cannot be rolled back. I would expect a PXC communication problem at COMMIT time because TRUNCATE TABLE causes implicit COMMIT (See my StackOverflow post from May 12, 2011). If the table reportingdb.norep_zonebannertmp_bk had existed in every node in the Cluster, then the TRUNCATE TABLE would execute without an error. You should check for any BugReports about this. You should probably submit this as a question to Percona.

There are two methods you may want to try to clear this up:

METHOD #1

You could recreate the table on the PXC (Percona XtraDB Cluster) that has the table missing

Step01) Do SHOW CREATE TABLE reportingdb.norep_zonebannertmp_bk\G on Node that still has table
Step 02) Copy and Paste that Query into /root/Fix.sql
Step 03) Change CREATE TABLE to CREATE TABLE IF NOT EXISTS in /root/Fix.sql
Step 04) Execute /root/Fix.sql on the PXC node that does not have the table

METHOD #2

You may have to perform a full SST in the event other tables are out of sync. The most aggressive way to do this is the following:

cd /var/lib/mysql
service mysql stop
rm -f /var/lib/mysql/galera.cache /var/lib/mysql/grastate.dat
service mysql start

Since there is no /var/lib/mysql/galera.cache, IST cannot be done. SST would be initiated.

Related Solutions

Mysql – Percona XtraDB Cluster: How to skip SST when starting

I think you want wsrep_sst_method=skip not wsrep_sst_mode=skip

mysql> show variables like 'wsrep_sst%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| wsrep_sst_auth            |            |
| wsrep_sst_donor           |            |
| wsrep_sst_method          | xtrabackup |
| wsrep_sst_receive_address | AUTO       |
+---------------------------+------------+
4 rows in set (0.00 sec)

mysql>

Percona XtraDB Cluster vs MySQL Replication – Key Differences

it seems the binlog doesn't include the new inserts

I'm not sure whether you're saying the binlog actually doesn't include them, and you have confirmed this with mysqlbinlog, or that it "seems" like it doesn't, because they don't replicate.

PXC needs log_slave_updates turned on at the node serving as master to the asynchronous slave, otherwise, not everything will be written to the master's binary log. This is very different than an ordinary MySQL server as master, where log_slave_updates will do nothing at all (unless the master is actually a slave to another master).

If that's not it, remove replicate_do_db and binlog_do_db and all of their related options from your configuration and then remove them from your brain. They should never be added unless you know exactly how they work, in your sleep. The simplest and by far most reliable replication configuration is, and will always be, replicate everything, which is the default.

Forget about binlog_format on the slave. It makes absolutely no difference unless the slave, itself, has other, subtended slaves... and if the master is using ROW format, the slave will still log in ROW format if you do indeed have it configured with subtended slaves. Also, the slave's binlogs (not to be confused with the relay logs) will not log statements received from an upstream master unless log_slave_updates is enabled on the slave.

The same thing goes for innodb_flush_log_at_trx_commit. It does not impact actual replication. It's a setting the determines a tradeoff between ACID compliance and performance.