MySQL replication – slave hangs during ALTER table or BULK update – v5.1.71

MySQLreplication

We are facing the above mention issue. In this case if we try to STOP SLAVE then it hangs and we even can to restart DB.

As I search in google for the same it looks like a bug. Id 38205 in V5.1.

My question is if we upgrade to 5.6, is it fixed and if yes what should be the exact version of 5.6.

Thanks
Shyama Prasad

Best Answer

ALTER TABLE operations before 5.6 always required a full write table lock (which means the whole table could be read, but not written). Bulk updates depend on the kind of operations and the ENGINE used (for example, LOAD DATA will lock the whole table on MyISAM, but it will only create row-based locks on InnoDB tables.

If you are in a replication environment, executing an alter table before 5.6 will:

Lock the table on the master
Alter the structure and insert the original data of the table (on the master) using a temporary table, which may take a long time. This is not true for all operations- some name-only changes or secondary indexes manipulation on the InnoDB plugin may be faster and may not require a full table copy. While this is happening, replication is up and running.
Unlock the tables and log the statement (using STATEMENT binary log format) to the binary log on the master
The log is transmitted to the slave
The slave will apply the changes as in the same way it is done on the steps 1-3, with a big difference: the slave will not apply other changes while this is happening because replication is single threaded (so it does apply the changes in the same order than originally)

This 5th step is problematic (and probably the reason why you think that the server is hanged, because in the master, other actions could be applied in parallel to other objects, but not on the slave.

In case of bulk updates, a lock may or not be involved, but you have an additional problem: logging the changes though the binary log is usually very slow (slower than its direct application).

Then there is the issue with stopping a slave: when you try to stop slave or restart the server, any other ongoing operation would be killed, but application of slaves is allowed to finish to avoid replication issues (the ones you mention on your bug report). You can actually kill the threads, but it may only make things worse, breaking your replication on non-transactional tables (or non-transactional replication control files) or falling back to a rollback, that could take even more time that a commit.

Does 5.6 solve anything of this? Well, in some cases, it may help reducing the problems:

it introduces transactional on-table replication control to minimize problems when the slave crashes.
it allows certain ALTER TABLEs on InnoDB tables to run in a hot way, allowing concurrent writes
It has multithreaded replication, but it is only useful for multi-tenant databases (one thread per schema). Real multithreading will be available on 5.7.

So I would say that it will not provide you 100% solution to your problem.

However, what I would recommend you is to perform schema changes and data loads in a slave -> master order (assuming that they are changes that are backwards compatible) and editing directly every node instead of using the binary log.

For example, you would execute this:

SET SQL_LOG_BIN = 0
-- Blocking operation, like ALTER or LOAD DATA, etc.
SET SQL_LOG_BIN = 1

On the slave first, and then on the master, so that you write those independently. To minimize downtime, you can do a switchover between servers so that at any time, the active node is not blocked for reading nor writing.

Of course, in order for this to be consistent, you need to make sure the changes are compatible (adding a column is ok, deleting one is not), and that your application does not use the new version until both servers have been migrated. Basically, as locking mechanisms are avoided, you are in charge of making sure that your changes are consistent.

ASPECT #1

I would be concerned with InnoDB settings because there are new features in InnoDB (starting with the plugin as of MySQL 5.1.38) that allow for accessing multiple CPUs. If you do not set the parameters, InnoDB will behave like it did before the enhancements. In fact, there are cases where InnoDB in older versions of MySQL will operate faster than newer version:

ASPECT #2

I would be very carful when upgrading MySQL 5.1 to another version of MySQL 5.1 because MySQL 5.1 introduced the concept of installing plugins because you now have the table mysql.plugin. If you install a newer version of MySQL, make sure you do not hose mysql.plugin. You have to just reinstall the plugin. This, in itself, is not the cause of your problem. Nevertheless, you need to be aware of this in the event that this is a cause.

ASPECT #3

If you have not already done this, I would raise innodb_log_buffer_size to 32M. The default is 8M. Increasing it will reduce disk I/O. Please do not go beyond this, since large COMMITs can and will form I/O-based bottlenecks.

RECOMMENDATION

I would upgrade completely to MySQL 5.5 because all of the InnoDB Plugin is natively installed in MySQL 5.5. As highlighted in my previous links, you must configure and tune the settings needed for InnoDB to use multiple CPUs. Increase your log buffer

UPDATE 2012-06-03 18:30 EDT

Since you cannot install MySQL 5.5, you still have aspects 2 and 3 you could implement.

Install the InnoDB Plugin
Increase the innodb_log_buffer_size to 32M

Mysql – Some MySQL slave errors

It's better not to skip by hundreds. Just loop one-by-one:

until mysql -e "show slave status\G;" | grep -i "Slave_SQL_Running: Yes";do
  mysql -e "stop slave; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave;";
  sleep 1;
done