MySQL – Increasing Column from VARCHAR(255) in Master/Slave Setup

alter-tableMySQLmysql-5.1replication

We have replication set up on some MyISAM servers, in a master/save scenario, MySQL v5.1.

One of the columns is currently declared as VARCHAR(255). We want to increase this to VARCHAR(512).

We have one master and 4 slaves. What is the best scenario to make the change.

If we do this:

ALTER TABLE item MODIFY url VARCHAR(512)

should that be ok, as long as we update the slaves first and then the master, or may it complain and it's best to do them all at once? Should I stop the slaves?

I've tried the update on a dev machine which takes about 10 mins, but it doesn't have replication set there. So looking for any flaws in my thinking.

Or looking further, does ALTER TABLE get replicated, and that's all I would need to do on the master?

Best Answer

In a replication environment, for an ALTER that takes a long time, I recommend this:

On one slave: Take out of rotation (keep clients from using it); perform the `ALTER; put back into rotation.
Repeat for other slaves.
SET @@session.sql_bin_log = 0 so it won't replicate; do the ALTER; set that back to 1.

That way, Slave users will not be impacted; Master user may be impacted.

With Galera / PXC, you would use "RSU" (Rolling Slave Update) and have zero downtime (but have to do it manually in a similar manner).

You need to upgrade to 5.6. With it many ALTERs (probably including your case) can use the fast ALGORITHM=INPLACE.

Related Solutions

MySQL – ALTER TABLE on Large Table with Indexed Column

If you are a little adventurous, you could take matters into your hands by performing the ALTER TABLE in stages you can see. Suppose the table you want to change is called WorkingTable. You could perform the changes in stages like this:

#
#  Script 1
#  Alter table structure of a single column of a large table
#
CREATE TABLE WorkingTableNew LIKE WorkingTable;
ALTER TABLE WorkingTableNew MODIFY BigColumn VARCHAR(50);
INSERT INTO WorkingTableNew SELECT * FROM WorkingTable;
ALTER TABLE WorkingTable RENAME WorkingTableOld;
ALTER TABLE WorkingTableNew RENAME WorkingTable;
DROP TABLE WorkingTableOld;

You can perform this on all slaves. What about the master ??? How do you prevent this from replicating to the slaves. Simple: Don't send the SQL into the master's binary logs. Simply shut off binary logging in the session before doing the ALTER TABLE stuff:

#
#  Script 2
#  Alter table structure of a single column of a large table
#  while preventing it from replicating to slaves
#
SET SQL_LOG_BIN = 0;
CREATE TABLE WorkingTableNew LIKE WorkingTable;
ALTER TABLE WorkingTableNew MODIFY BigColumn VARCHAR(50);
INSERT INTO WorkingTableNew SELECT SQL_NO_CACHE * FROM WorkingTable;
ALTER TABLE WorkingTable RENAME WorkingTableOld;
ALTER TABLE WorkingTableNew RENAME WorkingTable;
DROP TABLE WorkingTableOld;

But wait !!! What about any new data that comes in while processing these commands ??? Renaming the table in the beginning of the operation should do the trick. Let alter this code a little to prevent entering new data in that respect:

#
#  Script 3
#  Alter table structure of a single column of a large table
#  while preventing it from replicating to slaves
#  and preventing new data from entering into the old table
#
SET SQL_LOG_BIN = 0;
ALTER TABLE WorkingTable RENAME WorkingTableOld;
CREATE TABLE WorkingTableNew LIKE WorkingTableOld;
ALTER TABLE WorkingTableNew MODIFY BigColumn VARCHAR(50);
INSERT INTO WorkingTableNew SELECT SQL_NO_CACHE * FROM WorkingTableOld;
ALTER TABLE WorkingTableNew RENAME WorkingTable;
DROP TABLE WorkingTableOld;

Script 1 can be executed on any slave that do not have binary logs enabled
Script 2 can be executed on any slave that does have binary logs enabled
Script 3 can be executed on a master or anywhere else

Give it a Try !!!

Mysql – what are the conditions under which thesql replication might break

Given that MySQL Replication is dual-thread, it is importatnt to recognize how Replication looks when it is broken. There are four main topics is this area

SQL Thread Dies

The SQL Thread is responsible for

Getting the Next SQL Statement fromt the Relay Logs
Executing the SQL Statement
Rotating Relay Logs by Deleting any Relay Log that had all its SQL Entries Executed

If any SQL error happens, the SQL Thread simply dies and the following is posted to its Slave Status:

Error Number
Error Message
SQL statement that experienced the Error
Current database
Master Log File where the SQL Originated
Master Log Position where the SQL Originated

This gives an opportunity to troubleshoot, skip the error, run the SQL statement by hand, start replication back up. Sometimes it may be a SQL-based error, such as error 1062 (Duplicate Key). Other times, it may be related to the Storage Engine or the OS.

To figure out if an SQL statement will break replication, you should take any DML (INSERT, UPDATE, or DELETE) and make a corresponding SELECT using the WHERE clause of the DML. Then, run that SELECT to see if the data you are about to manipulate really exists or not.

I/O Thread Dies

The I/O Thread is responsible for four(4) things:

Downloading SQL from the Binary Log Entries of a Master
Recording SQL into its Local Relay Logs as a FIFO queue
Acknowledging Communication Failure
Attempting the Reestablish of I/O Thread

Any network latency may cause the I/O Thread to simply die and retry connection. Once a while under those circumstances, the Slave's viewpoint of the Master's log file and position (as logged in its relay logs) may be out-of-sync with what Master actually recorded in its binary logs.

Other side effects may include corrupt relay log entries

caused by bad network transmission, which can be corrected by running CHANGE MASTER TO from the last SQL statement from the Master that the Slave executed.
caused by corrupt binary log entries on the Master which was successfully transmitted to the relay logs, which can be corrected by
- RESET MASTER; on the Master to Zap all binary logs
- setting up replication from the new current binary log
- using pt-table-sync to correct differences

Temporary Table Usage

Troubleshooting this is like playing "pin the tail on donkey". Most developers are unaware of this until it happens and you try to fix it not realizing where the cause of this began. Here is the scenation: If you use CREATE TEMPORARY TABLE on a Master, it will replicate to the Slave. During the time the table is in use, it will be kept in existence in the SQL Thread. If you issue STOP SLAVE;, the SQL Thread is voluntarily killed along with all temporary tables the SQL Thread was holding. You do not realize that this has occurred until you issue START SLAVE; and the SQL Threads dies again because the needed temp table no longer exists.

To fix this, you have perform surgery on the master's binary logs and replication as follows:

Step 01) Locate the exact log file and position the CREATE TEMPORARY TABLE was issued on the Master
Step 02) Locate the name of the database that the CREATE TEMPORARY TABLE was meant for
Create the table using CREATE TABLE instead of CREATE TEMPORARY TABLE
Step 03) Run CHANGE MASTER TO using the file and position from Step 01
Step 04) Run START SLAVE; until Replication catches up or another table's nonexistence (due to CREATE TEMPORARY TABLE) breaks replication for this same issue
Step 05) If replication breaks again because of CREATE TEMPORARY TABLE on a different table, go back to Step 01

Network Inconsiderations

Once upon a time, there was a tendency for MySQL to say Replication was running when, in fact, it was not. This can happen when the network has intermittency that may delay data transmission of binary logs but not severe enough to timeout the I/O Thread. Since the MySQL process can be inconsiderate by being a little insensitive to the network, I affectionately call this "Network Inconsideration". While the bug report on this is closed, it is good to have multiple ways to check MySQL Replication as to its ability to run, especially the I/O Thread. Using MySQL 5.5, you could adjust the sensitivity of the I/O Thread using the the heartbeat and timeout parameters centered around Semisynchronous Replication.