Mysql – pt-table-sync error: Called not_in_left in state 0

MySQLpercona-tools

I have setup a Mysql replication between 2 servers, using Percona Xtrabackup:

Master is a MySQL 5.0.91 Community Edition (CentOS 4.8)

Slave is a MySQL 5.1.68 Community Edition (CentOS 6.4)

When starting the slave, some replication queries where blocked because of some unknown "temp" tables.
I used a few SQL_SLAVE_SKIP_COUNTER commands to hide the problem. And now that the replication is up to date, I try to resync the tables.
=> 2 tables are out of sync. I use pt-table-sync to resync.

The first table has been resynced without any problem (a few UPDATE to replay)

But the second table, a huge table (57GB), give me this error after some time (varying from a few minutes to a few hours):

pt-table-sync --verbose --execute -uroot -p h=10.2.0.1,D=MYDB,t=MyTable h=10.2.0.2

# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
Called not_in_left in state 0 at /usr/bin/pt-table-sync line 5500.  while doing MYDB.MyTable on 10.2.0.2
#      0       0      0      0 0         16:38:24 19:08:17 1    MYDB.MyTable

Note that I launch pt-table-sync from a third server on the local network.

I don't find much information about this error.
What would you recommand to help me solve this problem?

Best Answer

You could try:

  • Checking the MySQL error log after you attempt the sync. It may reveal some consistency issues with the table that you weren't aware of
  • Working around this table for now (using --exclude-table), and then coming back to it later
  • Trying different combinations of checksum options (--algorithms, --chunk-size) for the problem table - this might fix the problem, or shed more light on the underlying issue

I received this error recently, and the underlying issue turned out to be a corrupt table in need of repair on the master server.

I discovered the problem by changing the behaviour of pt-table-sync to using the "nibble" algorithm for my problem table (--algorithms nibble). Instead of the cryptic error, I got a specific error along with the SQL Statement that caused it, when I then attempted to execute directly on the server. This led me straight to my problem.

I could have also checked the error log and discovered the same thing.

Alternatively, you could try changing algorithms or reducing the chunk size from 1,000 (e.g. --chunk-size 100). This may require a bit of trial and error, and it's likely to slow down the checksums significantly, which is why I suggest skipping your largest tables to start with.

For a sufficiently large table (and 57GB certainly qualifies) with a poorly distributed primary key, you might also experience problems with query or wait timeouts depending on your MySQL configuration - I'm not sure if this would also result in the same error. The pt-table-sync documentation offers a little bit more information on this.