MySQL Replication ignore few columns

master-master-replicationmulti-masterMySQLreplication

I have setup MySQL replication, is there a way to skip replicating one column or multiple columns from few tables?

Best Answer

Replicate it! Else you won't be able to failover to a Slave if the Master dies. Deal with access control in some of the following ways:

Use GRANT to restrict what each user can see.
Use VIEWs to avoid showing the sensitive columns.
Segregate the sensitive columns into other tables (vertical partitioning), so that the GRANT applies to tables instead of columns and VIEWs aren't needed.

SQL Thread Dies

The SQL Thread is responsible for

Getting the Next SQL Statement fromt the Relay Logs
Executing the SQL Statement
Rotating Relay Logs by Deleting any Relay Log that had all its SQL Entries Executed

If any SQL error happens, the SQL Thread simply dies and the following is posted to its Slave Status:

Error Number
Error Message
SQL statement that experienced the Error
Current database
Master Log File where the SQL Originated
Master Log Position where the SQL Originated

This gives an opportunity to troubleshoot, skip the error, run the SQL statement by hand, start replication back up. Sometimes it may be a SQL-based error, such as error 1062 (Duplicate Key). Other times, it may be related to the Storage Engine or the OS.

To figure out if an SQL statement will break replication, you should take any DML (INSERT, UPDATE, or DELETE) and make a corresponding SELECT using the WHERE clause of the DML. Then, run that SELECT to see if the data you are about to manipulate really exists or not.

I/O Thread Dies

The I/O Thread is responsible for four(4) things:

Downloading SQL from the Binary Log Entries of a Master
Recording SQL into its Local Relay Logs as a FIFO queue
Acknowledging Communication Failure
Attempting the Reestablish of I/O Thread

Any network latency may cause the I/O Thread to simply die and retry connection. Once a while under those circumstances, the Slave's viewpoint of the Master's log file and position (as logged in its relay logs) may be out-of-sync with what Master actually recorded in its binary logs.

Other side effects may include corrupt relay log entries

caused by bad network transmission, which can be corrected by running CHANGE MASTER TO from the last SQL statement from the Master that the Slave executed.
caused by corrupt binary log entries on the Master which was successfully transmitted to the relay logs, which can be corrected by
- RESET MASTER; on the Master to Zap all binary logs
- setting up replication from the new current binary log
- using pt-table-sync to correct differences

Temporary Table Usage

Troubleshooting this is like playing "pin the tail on donkey". Most developers are unaware of this until it happens and you try to fix it not realizing where the cause of this began. Here is the scenation: If you use CREATE TEMPORARY TABLE on a Master, it will replicate to the Slave. During the time the table is in use, it will be kept in existence in the SQL Thread. If you issue STOP SLAVE;, the SQL Thread is voluntarily killed along with all temporary tables the SQL Thread was holding. You do not realize that this has occurred until you issue START SLAVE; and the SQL Threads dies again because the needed temp table no longer exists.

To fix this, you have perform surgery on the master's binary logs and replication as follows:

Step 01) Locate the exact log file and position the CREATE TEMPORARY TABLE was issued on the Master
Step 02) Locate the name of the database that the CREATE TEMPORARY TABLE was meant for
Create the table using CREATE TABLE instead of CREATE TEMPORARY TABLE
Step 03) Run CHANGE MASTER TO using the file and position from Step 01
Step 04) Run START SLAVE; until Replication catches up or another table's nonexistence (due to CREATE TEMPORARY TABLE) breaks replication for this same issue
Step 05) If replication breaks again because of CREATE TEMPORARY TABLE on a different table, go back to Step 01

Network Inconsiderations

Once upon a time, there was a tendency for MySQL to say Replication was running when, in fact, it was not. This can happen when the network has intermittency that may delay data transmission of binary logs but not severe enough to timeout the I/O Thread. Since the MySQL process can be inconsiderate by being a little insensitive to the network, I affectionately call this "Network Inconsideration". While the bug report on this is closed, it is good to have multiple ways to check MySQL Replication as to its ability to run, especially the I/O Thread. Using MySQL 5.5, you could adjust the sensitivity of the I/O Thread using the the heartbeat and timeout parameters centered around Semisynchronous Replication.

Thesql replication goes out of sync for some tables

You could be suffering from what is known as data drift.

QUERIES

This can happen if there are queries that are unsafe for replication.

One of the more common types is running UPDATE or DELETE using LIMIT. Using LIMIT on DML can work just fine on a Master. On a Slave, the rows selected (and perhaps certain ORDER BY choices) may not be the same set being updated or deleted as the set on the Master. See the MySQL Documentation for a Comprehensive Description of Unsafe Statements that can affect MySQL Replication.

Baron Schwartz once dealt with this and had to refactor his query to get around this

The following hypothetical scenario illustrates one way to introduce data drift:

BINLOGS

Master

20 DB Connections writing changes (INSERTs,UPDATEs,DELETEs)
The I/O thread from a Slave has to serialize the SQL coming from the 20 DB Connections
The serializing of the queries into the binary logs may be in an order different from when each DB Connection executed its change.
sync_binlog set to 0 (default), which leaves the responsibility of flushing binlogs to disk in the hands of the OS

Slave

I/O Thread reads binlog events in the order the Master wrote them
SQL Thread executes binlog events in the order the Master wrote them

Observation

If binlogs are not flushed to Disk in a timely, predictable manner, any binlog events the Slave needs could easily be bypassed. This could cause data simply not exist on the Slave. Depending on the data recorded or not recorded, Replication's SQL Thread could break because of missing data or data that should be missing.

EPILOGUE

Not every Slave can be affected this way. Masters keep a list of all Slave I/O Threads and transmits binlog events to the Slave in order by ProcessID on the Master. I can see later slaves being victimized first.

If sync_binlog is indeed an issue, perhaps all Slaves have data drift and we just don't know of it yet.

The only way to tell is to download one of the following

and checksum everything on every Slave against the Master. You may find more data drift problems than you think. Just run the sync scripts to correct them.

CAVEAT

You suggested that network latency could be at issue. With binlogs not being flushed by the OS yet, any disconnect and reconnect of MySQL Replication due to latency or dropped packets is worth looking over as well. It could also be a major contributor to data drift.

Also to be noted is what network route the new Slave is communicating with back to the Master. If it is not the same route as the older Slaves (perhaps passing through a different switch, over public IP, etc.) needs to be investigated.