MySQL Replication Error 1146 – Table Doesn’t Exist

I'm getting a very weird MySQL replication error No. 1146 for a REPLACE INTO query on a slave host replicating all tables in all databases from a master, and I'm having a bit of a hard time understanding why.

Here's my scenario:

New data is generated solely on the master server, MySQL 5.5.40.
Slave A, MySQL 5.5.38, has been replicating all tables in all databases from this master just fine for a long while, producing no errors of any kind.
IO_THREAD was paused on slave A. Value of Relay_Master_Log_File was confirmed to match that of Master_Log_File and value of Exec_Master_Log_Pos was confirmed to match that of Read_Master_Log_Pos.
A FLUSH TABLES WITH READ LOCK was issued on slave A and a dump of all db's was subsequently produced from it with mysqldump -v -h localhost -u root -p --all-databases --opt --single-transaction --hex-blob --no-autocommit > dump.sql. Lock was released only after the dump completed, and then the slave IO_THREAD was restarted; replication from master resumed without any problems, and continues to run smoothly to this day on slave A.
The dump was transferred to slave B, MySQL 5.5.34, and loaded successfully into it through a simple mysql -h localhost -u root -p < dump.sql command, after confirming non of the target databases existed on this second host (and actually no db's other than the mysql db and information & performance schemas). I can also confirm integrity of the dump file after the transfer to slave B thanks to matching RMD160 checksums for it on both hosts.
Slave B was pointed to the master server for replication, with MASTER_LOG_FILE and MASTER_LOG_POS coordinates set to the values of Relay_Master_Log_File and Exec_Master_Log_Pos recorded in step 3 above from slave A, respectively.
Replication was started on slave B, and data started flowing in just fine.

However, after about a day of smooth operation, slave B produced the following error in its SQL_THREAD:

Error 'Table 'knet.course_location_tracks' doesn't exist' on query. Default database: 'knet'. Query: 'REPLACE INTO `course_location_tracks` (`userid`,`courseid`,`lesson_location`,`datestamp`) VALUES (val1,val2,val3,val4)'

(actual row values have been redacted out)

I can't make much sense of this error because I can confirm not only that the knet.course_location_tracks table exists on slave B, but also that its definition is identical to that of the corresponding table on slave A. Slave A, as I pointed out above, continues to replicate from master just fine to this day, without any problems of any kind.

If replication started fine on slave B, leading me to believe that initial replication coordinates were calculated correctly for it off of the state of slave A at the point of the dump, why am I then getting this error for a table that does exists on the host? Moreover, why am I getting the error on slave B while slave A is still replicating smoothly?

Other than having mismatching MySQL versions across my three hosts (something I only noticed recently in trying to debug the problem, and that I will see to correct ASAP), what could I be doing wrong?

And, finally, once the problem is determined and corrected, how could I get slave B to resume replication at the correct point so that it is again fully in sync with master?

Thanks in advance for any help!

PS: If it matters, replication on slave A was initially set up by transferring all db's & tables in master to it in a similar fashion, i.e. by first flushing & blocking all tables on master with the read lock, dumping the data with mysqldump (same flags), and finally loading them into the slave with the same mysql command-line client call.

Best Answer

Not too sure why still, but this problem disappeared after some of the troubleshooting steps I took to hunt down the culprit. Maybe it was the open files limit that caused the errors while opening the MyISAM tables, perhaps because of the high activity of other services running on the slave2 server, like Rackspace's cloud backup... But, whichever the case, replication has been running smoothly on slave2 off of my master server for a number of weeks already after loading it up once more with a fresh data snapshot taken from slave1 in the same fashion as described in my original question.

So, although unfortunately I can't provide a clear answer to the problem, I can definitely say it looks to have been solved. So I'm closing this post for the time being.

Best Answer

Related Solutions

Mysql – Broken thesql replication on slave server (error 1236)

MySQL slave replicates changes that are in neither binlog_do_db nor replicate_do_db

Related Question