We are facing the above mention issue. In this case if we try to STOP SLAVE then it hangs and we even can to restart DB.
As I search in google for the same it looks like a bug. Id 38205 in V5.1.
My question is if we upgrade to 5.6, is it fixed and if yes what should be the exact version of 5.6.
Thanks
Shyama Prasad
Best Answer
ALTER TABLE operations before 5.6 always required a full write table lock (which means the whole table could be read, but not written). Bulk updates depend on the kind of operations and the ENGINE used (for example, LOAD DATA will lock the whole table on MyISAM, but it will only create row-based locks on InnoDB tables.
If you are in a replication environment, executing an alter table before 5.6 will:
This 5th step is problematic (and probably the reason why you think that the server is hanged, because in the master, other actions could be applied in parallel to other objects, but not on the slave.
In case of bulk updates, a lock may or not be involved, but you have an additional problem: logging the changes though the binary log is usually very slow (slower than its direct application).
Then there is the issue with stopping a slave: when you try to
stop slave
or restart the server, any other ongoing operation would be killed, but application of slaves is allowed to finish to avoid replication issues (the ones you mention on your bug report). You can actually kill the threads, but it may only make things worse, breaking your replication on non-transactional tables (or non-transactional replication control files) or falling back to a rollback, that could take even more time that a commit.Does 5.6 solve anything of this? Well, in some cases, it may help reducing the problems:
So I would say that it will not provide you 100% solution to your problem.
However, what I would recommend you is to perform schema changes and data loads in a slave -> master order (assuming that they are changes that are backwards compatible) and editing directly every node instead of using the binary log.
For example, you would execute this:
On the slave first, and then on the master, so that you write those independently. To minimize downtime, you can do a switchover between servers so that at any time, the active node is not blocked for reading nor writing.
Of course, in order for this to be consistent, you need to make sure the changes are compatible (adding a column is ok, deleting one is not), and that your application does not use the new version until both servers have been migrated. Basically, as locking mechanisms are avoided, you are in charge of making sure that your changes are consistent.