Mysql – How to make MySQL replication reliable

innodbmyisamMySQLreplication

Master version: 5.5.13-1
Slave version: 5.5.14-1
Binary log format: MIXED

My Slave database (~ 40GB) has been out of sync from the Master. I cannot find anything interesting in the error log. Google gives me a very helpful link.

I'm going to re-sync the database follow this instruction to minimal downtime on the Master. But before doing this, I just want to make sure that this situation is limited in the future. I will scan through the parts in the above to show you what I've done:

Slave database was configured with read-only option
There is some unsafe queries. Does it get some problems with MIXED
based replication?
I replicated all databases
I used both InnoDB and MyISAM storage engines
Developers use alot of temporary tables

Should I:

Don't use the unsafe queries
Ask developers put all the temporary tables into a separated database

Is there anything else? In case of out of sync, is mk-table-sync reliable enough to re-sync automatically? Does anyone use it on production?

UPDATE: Tue Feb 28 23:27:13 ICT 2012

My Slave database (~ 40GB) has been out of sync from the Master. I
cannot find anything interesting in the error log.

To get more information about what was happening, the Slave should be started with --log-warnings=2.

Best Answer

OBSERVATION #1

You mentioned Ask developers put all the temporary tables into a separated database

If your developers are using CREATE TEMPORARY TABLE commands to create temporary tables, they need to use CREATE TABLE instead. Here is why:

With MySQL Replication processing a temporary table, this is what occurs

1) Master run CREATE TEMPORARY TABLE
2) Command inserted into binary log
3) Replication copies this over to the Slave's Relay Logs via I/O Thread
4) Slave SQL Thread runs CREATE TEMPORARY TABLE
5) Slave processes data with that temp table

Once in a while, someone may run STOP SLAVE; to run a backup. If STOP SLAVE; is issued just after step 4, the temp created disappears and so does its data. When you run START SLAVE; Replication breaks instantly complaining the table does not exist. This is normal because when a DB Connections terminates deliberately or accidently, all temp tables opened using CREATE TEMPORARY TABLE in the DB session are dropped. Running STOP SLAVE; kill the SQL thread who was holding opening the temp table.

The only workaround for this is to create the table using CREATE TABLE instead of CREATE TEMPORARY TABLE. When run STOP SLAVE;, the temp table you created normally does not disappear.

I have seen this happen maybe 10 times in my DBA career. Fixing it using the binary logs to find out the name of the temp tables, to create those tables using CREATE TABLE, then starting replication up was the only maintenance possible without just brute force copying the master.

OBSERVATION #2

mk-table-sync only works on tables with primary keys and/or unique keys. It works maybe 99% of the time. I have seen instances where the checksum of a table on the master and slave were different. I would run mk-table-sync, there were still differences (Of course, I was doing mk-table-sync in circular replication with 3 masters, which can be a little dangerous. Using it in Master/Slave is far more stable)

OBSERVATION #3

You mentioned There is some unsafe queries. Does it get some problems with MIXED based replication?

It depends. The most popular unsafe query is any UPDATE or DELETE that uses ORDER BY ... LIMIT. With SBR, this could possibly cause MySQL to UPDATE or DELETE rows from a table on the Slave in a different order tham that of the Master. With RBR, I believe the exact changes in a row are more identifiable to UPDATE or DELETE on the Slave.

SOLUTION : Avoid using unsafe queries. Then, you will not worry !!!

OBSERVATION #4

I just read your second link. ROFL !!! I am familiar with the poster of the answer.

Related Solutions

MySQL Bug required remedy : [Warning] Invalid (old?) table or database name ‘#sql-‘

There is the presence of a bug in this case for two reasons:

111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p1' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p2' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p3' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p4' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p5' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p6' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p7' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p8' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p9' 
111116 10:48:29 [Warning] Invalid (old?) table or database name '#sql-3z8g_122a#P#p10'

REASON #1 : Any table starting with #sql is a temp table. If these tables are still present in any database folder and the datetime stamp, simply delete them.

REASON #2 : Look carefully at the suffix of each table. #P#p? resembles a partition tag. This would indicate an attempt to create a temp table using partitiions. That's sounds insane. There was bug report on this back in Feb 16, 2006 for MySQL 5.1.7-beta (closed Mar 15, 2006). The bug report is based on trying to do this manually. Is mysql attempting to do this internally ?

IMHO I would upgrade mysql away from MySQL 5.1 up to MySQL 5.5

Mysql – LOAD DATA (400k rows) INFILE takes about 7 minutes, cannot kill the “logging slow query” process

The query under process ID 180233 looks like it is in distress.

Here is the query itself

SELECT COUNT(DISTINCT A.`campaignid`)  INTO _c
FROM `ox_campaigns` A 
INNER JOIN `selfserving_users` B ON B.`user_id` = A.`uid`
INNER JOIN `v3_cam_date` C ON C.`campaignid` = A.`campaignid`
WHERE A.`revenue_type` = 5 AND A.`deleted` = 0
AND A.`expire` = DATE_ADD(CURRENT_DATE, INTERVAL 1 DAY) 
AND A.`isExpired` = 0
AND IF( NAME_CONST('_permitid',3) = -1, 1=1,
IF( NAME_CONST('_permitid',3) = 0, A.`uid` IN
(SELECT C.`user_id` FROM `selfserving_users` C
WHERE C.`groupid` =  NAME_CONST('_groupid',12) ) ,
A.`uid` =  NAME_CONST('userid',388)));

The scary part about the query is the self-reference

You have selfserving_users acting in a self serving manner against itself.

Sometimes, the MySQL Query Optimizer will play a bait-and-switch, smoke-and-mirrors games with data, especially with a self reference, in order to formulate the best EXPLAIN plan possible. While mysql is very capable of completing sub-SELECTs, it can be still be expensive.

However, this is just a symptom that manifested because of Process ID 97. What is really the issue here?

LOAD DATA INFILE against an InnoDB table could make mysqld a little punch drunk. I don't believe (or at least I don't exercise full confidence) you can encapsulate it as a normal transaction although this was addressed back in MySQL 5.0.

Just picture it:

You are hammering the InnoDB Buffer
Some memory swapping may be going on
Possible full table locking issues that are affecting data pages outside the v3_zone_date table (such was with the selfserving_users table)

There may be a way to throttle the LOAD DATA INFILE process on an InnoDB table. I cannot give you a solid answer on this one, but try this link from Baron Schwartz.

UPDATE 2012-02-22 12:00 EST

There is open bug report in MySQL 5.5.7 called Deadlock when DDL under LOCK TABLES WRITE, READ + PREPARE. At the bottom of the report, a person complained about a block problem cause by the explicit LOCK TABLES.

Launching a COMMIT on locked rows in a table would hang because of trying to unraveling MVCC data assocaited with the locked rows. Based on the InnoDB Status you have shown, there would exist 6933 row locks on the table you are importing. I know that in Oracle, when introducing new rows to a table, MVCC is still generated because the previous version of the newly inserted row is a nonexistent row. The same must be occurring for InnoDB.

UPDATE 2012-02-22 12:42 EDT

In your question you stated the following about your .NET process

LOCK TABLES;
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOAD DATA;
COMMIT;
UNLOCK TABLES;
SET autocommit=1;
SET unique_checks=1;

All of these events are running within the same DB Session. This is also happening within one DB Connection. Thus, this is not a deadlock in the traditional sense. It is just a case of blocking your COMMIT within a given DB Connection/Session because the tables were locked within the same DB Connection/Session.

UPDATE 2012-02-23 19:00 EDT

I would change the sequence to be this:

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOCK TABLES;
LOAD DATA;
UNLOCK TABLES;
COMMIT;
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

Please remember, a COMMIT cannot proceed if you have the tables locked in serial fashion. Therefore, UNLOCK TABLES must precede COMMIT.