MySQL replication – slave hangs during ALTER table or BULK update – v5.1.71

MySQLreplication

We are facing the above mention issue. In this case if we try to STOP SLAVE then it hangs and we even can to restart DB.

As I search in google for the same it looks like a bug. Id 38205 in V5.1.

My question is if we upgrade to 5.6, is it fixed and if yes what should be the exact version of 5.6.

Thanks
Shyama Prasad

Best Answer

ALTER TABLE operations before 5.6 always required a full write table lock (which means the whole table could be read, but not written). Bulk updates depend on the kind of operations and the ENGINE used (for example, LOAD DATA will lock the whole table on MyISAM, but it will only create row-based locks on InnoDB tables.

If you are in a replication environment, executing an alter table before 5.6 will:

  1. Lock the table on the master
  2. Alter the structure and insert the original data of the table (on the master) using a temporary table, which may take a long time. This is not true for all operations- some name-only changes or secondary indexes manipulation on the InnoDB plugin may be faster and may not require a full table copy. While this is happening, replication is up and running.
  3. Unlock the tables and log the statement (using STATEMENT binary log format) to the binary log on the master
  4. The log is transmitted to the slave
  5. The slave will apply the changes as in the same way it is done on the steps 1-3, with a big difference: the slave will not apply other changes while this is happening because replication is single threaded (so it does apply the changes in the same order than originally)

This 5th step is problematic (and probably the reason why you think that the server is hanged, because in the master, other actions could be applied in parallel to other objects, but not on the slave.

In case of bulk updates, a lock may or not be involved, but you have an additional problem: logging the changes though the binary log is usually very slow (slower than its direct application).

Then there is the issue with stopping a slave: when you try to stop slave or restart the server, any other ongoing operation would be killed, but application of slaves is allowed to finish to avoid replication issues (the ones you mention on your bug report). You can actually kill the threads, but it may only make things worse, breaking your replication on non-transactional tables (or non-transactional replication control files) or falling back to a rollback, that could take even more time that a commit.

Does 5.6 solve anything of this? Well, in some cases, it may help reducing the problems:

  • it introduces transactional on-table replication control to minimize problems when the slave crashes.
  • it allows certain ALTER TABLEs on InnoDB tables to run in a hot way, allowing concurrent writes
  • It has multithreaded replication, but it is only useful for multi-tenant databases (one thread per schema). Real multithreading will be available on 5.7.

So I would say that it will not provide you 100% solution to your problem.

However, what I would recommend you is to perform schema changes and data loads in a slave -> master order (assuming that they are changes that are backwards compatible) and editing directly every node instead of using the binary log.

For example, you would execute this:

SET SQL_LOG_BIN = 0
-- Blocking operation, like ALTER or LOAD DATA, etc.
SET SQL_LOG_BIN = 1

On the slave first, and then on the master, so that you write those independently. To minimize downtime, you can do a switchover between servers so that at any time, the active node is not blocked for reading nor writing.

Of course, in order for this to be consistent, you need to make sure the changes are compatible (adding a column is ok, deleting one is not), and that your application does not use the new version until both servers have been migrated. Basically, as locking mechanisms are avoided, you are in charge of making sure that your changes are consistent.