We were able to setup mysql master-slave replication with the following version on both master/slave:
mysqld Ver 5.5.28-29.1-log for Linux on x86_64 (Percona Server (GPL), Release 29.1)
One day, we noticed that replication has stopped, we tried skipping over the entries that caused the replication errors. The errors persisted so we decided to skip replication for the 4 problematic tables. The slave has now caught up with the master except for the 4 tables.
What is the best way to enable replication again for the 4 tables?
This is what I have in mind but I don't know if it will work:
1) Modify slave config to enable replication again for the 4 tables
2) stop slave replication
3) for each of the 4 tables, use pt-table-sync –execute –verbose –print –sync-to-master h=localhost,D=mydb,t=mytable
4) restart slave database to reload replication configuration
5) start slave replication
Best Answer
I don't use
pt-table-sync
because I don't agree with the approach it uses:My philosophy is "no, that's okay, you guys can go ahead and stay away from the master."
While it's true that a common cause of replication errors are related to things being changed directly on the slave, it's also true that if two servers have an identical set of data at an identical set of binlog coordinates, that's a valid place to begin replication. (I would also suggest that the other common cause of replication errors is people trying to replicate only some of their tables and not all, which is almost never a good idea, unless you really know what you're doing, in which case, you'll know better than to try it.)
If you do use it, and I'm sure a lot of people do, then I don't think you actually want to stop the slave. I think you need to keep the slave running to it can make those "no-op changes to the master" and watch them replicate over to fix the inconsistent tables. I don't know what it does with errors, perhaps it skips over them.
Here is one of the alternative approaches that I use.
You need three console windows for this. Do not disconnect your session from the master or you will lose the global lock. You need a console connection to the master and one to the slave, and you need to have your commands to do
mysqldump
set up and ready to go in a third window.First, check the master's binlog coordinates and the slave's binlog coordinates to make sure the slave is not lagging.
There are more values included with
SHOW SLAVE STATUS
but these are the important ones. TheExec_Master_Log_Pos
needs to be close to or identical to the master, and the log file the same.Next, obtain a brief read lock on the master. This locks all of the tables for write allows the master to settle down long enough that we know the slave and master are at the same place in the binlog at this point in time.
Note that this sometimes takes a few seconds before your prompt returns.
Do not disconnect this session. You need this session active because it is holding the lock on the master. You can use it for the following, but don't close it.
Now, repeat the
SHOW MASTER STATUS;
on the master andSHOW SLAVE STATUS\G
on the slave. You should find that they have settled down to the same set of binlog coordinates. Wait until they do. If they don't settle down quickly, you might want to drop the read lock on the master to figure out why not.Next:
This will stop the slave from executing any more events from the master, which is still locked.
In the third window, start a backup of the tables from the master.
As soon as you see "Retrieving table structure..." you can release the global read lock on your master, so your application can start writing to it again.
If your tables are InnoDB, the
--single-transaction
option should get a consistent backup from the master at a point in time during the few seconds that you had the global read lock on the master. Since you also stopped the slave at that point, the slave should be positioned at precisely the appropriate point in time for applying that backup.When the backup of those tables is complete, apply it to the slave:
At this point, you have restored the tables onto the slave in a form that should be identical to the way they appeared on the master at the time when the slave was stopped, and replication should resume when you remove the configuration not to replicate those tables, and then:
Be sure your backups are solid before you begin.