Mysql – InnoDB Master-Master replication goes inconsistant after power failure tests

failoverinnodbMySQLPHPreplication

Afternoon gents,

I'm currently stress testing a Master-Master replication setup using InnoDB as database engine.

We're using this simple script for testing which we run in Linux CLI from a remote server.

<?php

while(true) {
    try {
        $conn = mysql_connect('10.0.10.210', 'test', 'test');
        if ($conn) {
            mysql_select_db('testdb');
            $random = rand(0, 1000);
            $res = mysql_query("INSERT INTO test VALUES(0, 'test',    $random)");
            if ($res) {
                echo "\n inserted " . microtime();
            } else {
                echo "\n not inserted " . microtime();
            }
            mysql_close($conn);
        } else {
            echo "\n can not connect";
        }
    } catch (Exception $ex) {
        echo "\n can not insert"  . microtime();
    }
}

var_dump($res);
echo "ok";

The issues we're facing is that we're trying to shut off one of the hosts using nothing but unplugging the power, a hard power off that is.

We're also using MySQL-MMM for fail-over purposes, but that have nothing to do with the issues we're facing but I'll explain the procedure we're using now.

1) Master-Master working perfectly, server1 having virtual IP 10.0.10.210 and is serving writes and reads

2) We shut off server1 by unplugging the cable, the virtual IP gets moved to server2, everything is working and inserts continue after ~20 seconds downtime.

3) We start server1 again, it goes up and gets back the virtual IP address, inserts continues after 1-2 second downtime.

The problem with this is that we lose all inserts that happened during the downtime of server1, and if I type "STOP SLAVE; START SLAVE;" I get these errors:

[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236

And if I check the binary log manually with an offset that corresponds with what the mysqld.log file says:

[root@db1 mysql]# mysqlbinlog --offset=623435 db1-mysql-bin.000001
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#121030 12:55:16 server id 1  end_log_pos 106   Start: binlog v 4,   server v 5.1.61-log created 121030 12:55:16 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
VOqPUA8BAAAAZgAAAGoAAAABAAQANS4xLjYxLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAABU6o9QEzgNAAgAEgAEBAQEEgAAUwAEGggAAAAICAgC
'/*!*/;
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 112,     event_type: 2
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

I understand that the binary log file aren't getting closed properly but aren't InnoDB supposed to take care of this? Surely a hard power off is not something that's very rare, at least not in my mind. I'm running EXT4 filesystem.

This is so far nothing but a lab setup, in reality we run this in state-of-the-art (not submerged …) data centers with all necessary precautions.

Any light shed on the matter would be greatly appreciated, thank you.

my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
symbolic-links=0
sync_binlog=1

# REPLICATION SETTINGS
server_id = 2
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 2
replicate-do-db = test
binlog-ignore-db = mysql

log-bin=db2-mysql-bin
relay-log=db2-relay-log
relay-log-index=relay-log-index

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

UPDATE

I've now switched filesystem from EXT4 to XFS and it took indeed care of the loss of data, but now I have another problem yet quite small and should be easy to solve.

After I go through the procedure to shut off server1, fail over to server2, start server1, roll back to server1. Everything keeps working brilliantly and server1 picks up exactly where server2 left off, the only problem is that server2 stops syncing server1 so it's the other way around.

If I run STOP SLAVE; START SLAVE; it starts syncing and after a few seconds are identical as server1, but why doesn't it do this automatically?

Best Answer

Replication and binary logging happen independently from innodb, which can unfortunately cause problems.

Check out: http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_sync_binlog

From what you describe I suspect that sync_binlog is set to 0 for your servers. Leaving it at 0 means mysql will rely upon the filesystem to handle flushing to disk. Effectively this means that the binlog data will often be in the filesystem cache. That gets flushed to disk by the kernel at some interval, but in the case of a power failure anything in there is lost.

Setting sync_binlog to 1 will force mysql to flush the binlog event to the filesystem using fdatasync after every commit. This is safer (since you will lose 1 transaction at most in case of power failure), but creates a lot more disk activity. Benchmark and see what the impact is for your workload. Knowing the tradeoffs for both scenarios will hopefully help you make an informed decision.

Hope that helps.

Question 1

Does the DML operations committed by db2 during the replication process gets included in its own bin-log?

Answer to Question 1

Yes it will, provided you have this in /etc/my.cnf on both db1 and db2

[mysqld]
log-slave-updates

If you do not have this, add it and restart mysql

Question 2

Would the resulting bin-log in db2 be exactly the same with the bin-log of db1, to the letter?

Answer to Question 2

Yes. Make sure the clocks on both DB servers are synchronized

Question 3

What happens to the entries in db2 relay-log once they are committed to the database during the replication process, are they discarded? What role does the relay-log info log has in this?

Answer to Question 3

In MySQL Replication, the IO Thread of a Slave will read its Master's bin-log entries and store them in a FIFO queue. For each relay log in a slave, when every entry in the currently processed relay is executed it is rotated out and discarded. If relay logs are piling up, this quickly indicates that the SQL thread died because of any SQL error. Just do SHOW SLAVE STATUS\G to find out what stopped the SQL thread. The IO Thread would conitnue collecting completed SQL statements from its Master.

Question 4

How does db1 know where in the bin-log of db2 (somehow dependent on the answer of Question 2), it will start the replication process?

Answer to Question 4

When you do SHOW SLAVE STATUS\G, look for the following lines:

Master_Log_File : The latest binary log whose most recently command was copied to the Slave
Read_Master_Log_Pos : The latest position of the latest binary log whose most recently command was copied to the Slave
Relay_Master_Log_File : The latest binary log whose most recently command was executed on the Slave
Exec_Master_Log_Pos : The latest binary log whose most recently command was executed on the Slave
Relay_Log_Space : The sum total (in bytes) of all relay logs. By default, each relay log is the default size of a binary log (1G). If Relay_Log_Space starts to significant exceed 1G, this indicates one of two things:
- SQL thread died due to SQL Error
- SQL thread is busy with a long-running query

Question 4.1

If you enable log-slave-updates on both databases i.e. dB1 & dB2, then that would mean all items from the binary log of dB1, which was successfully replicated by dB2 will be written into dB2's binary log and vice-versa. Would not this result to some sort of infinite circular replication or duplications of entries on both databases, if it's possible at all, considering the possible key-collision issues that would arise? What I'm trying to say is, How would dB1 know once it checks on the binary log of dB2 that, "I should not replicate those entries in there because they all just came from me"?

Answer to Question 4.1

You must have log-slave-updates available on both DB servers in order to have an audit trail that the SQL executed on on DB server made it to the other. If you don't, you would have to do your due diligence to compare the data explicitly. Such ways would include:

Running CHECKSUM TABLE on every table you have in both DB servers to compare their contents.
Using pt-table-checksum, which is an automated version of running CHECKSUM TABLE between Master and one or more Slaves

You need not worry about infinite circular replication unless you are dealing with more that two masters. There have been rare times when someone with, let's say four Masters, removes one of the four servers from circular rep cluster. Let's suppose the the server_id is 13. It is remotely, but still, possible for binary log entries whose server_id belongs to the server that removed to be inside the relay logs on other servers. Only in such a scenario would you worry about infinite circular replication.

To circumvent such situations, MySQL 5.5 has a new option for the CHANGE MASTER TO command called IGNORE_SERVER_IDS. You would do the following to repair things on all the remaining servers:

STOP SLAVE;
CHANGE MASTER TO IGNORE_SERVER_IDS = (13);
START SLAVE;

In fact, here is what the MySQL Documentation says on this:

IGNORE_SERVER_IDS was added in MySQL 5.5. This option takes a comma-separated list of 0 or more server IDs. Events originating from the corresponding servers are ignored, with the exception of log rotation and deletion events, which are still recorded in the relay log.

In circular replication, the originating server normally acts as the terminator of its own events, so that they are not applied more than once. Thus, this option is useful in circular replication when one of the servers in the circle is removed. Suppose that you have a circular replication setup with 4 servers, having server IDs 1, 2, 3, and 4, and server 3 fails. When bridging the gap by starting replication from server 2 to server 4, you can include IGNORE_SERVER_IDS = (3) in the CHANGE MASTER TO statement that you issue on server 4 to tell it to use server 2 as its master instead of server 3. Doing so causes it to ignore and not to propagate any statements that originated with the server that is no longer in use.

Question 5

On INSERT queries on the master, what form of the query is written into the binary log? Is it the 'raw' form of the query, or the one which already has the auto-generated value of the auto-increment key?

Answer to Question 5

Whichever form is presented. Here is what I mean: The raw form would usually not include the auto_increment column expressed explicitly. On the other hand, it you import a mysqldump into a DB server with binary logging, the rows being inserted would explicitly be given. Either version of INSERT would be allowed execution in mysqld. In like fashion, either version of INSERT would be recorded AS IS...

Mysql – Broken thesql replication on slave server (error 1236)

I believe the log file and position need to be specified with the host.

Try:

CHANGE MASTER TO MASTER_HOST='192.168.1.105', MASTER_USER='slaveuser', MASTER_PASSWORD='mypassword', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=107;