Mysql – Master-Master Replication with MySQL Cluster – Duplicate Writes in Both Directions

high-availabilitymulti-masterMySQLmysql-clusterndbcluster

I have created 2 MySQL Clusters; each cluster is located in a separate geographic location. I have set these clusters up in a Master-Master, Geographic Replication fashion, so that data can be written to any of the 2 SQL nodes in each data center (4 SQL nodes total between the two clusters) and data will be created in both clusters, no matter where the data is inserted from.

The issue I am seeing after enabling both master and slave functionality on all SQL nodes, is that when I create data in one cluster, the data created is written twice to the other cluster.

For instance:

[sql_node_1_in_cluster_1] mysql> insert into numbers values (3,30),(4,40);
[sql_node_1_in_cluster_1] mysql> select * from numbers order by num1;
+------+------+
| num1 | num2 |
+------+------+
|    1 |   10 |
|    2 |   20 |
|    3 |   30 |
|    4 |   40 |
+------+------+
4 rows in set (0.00 sec)

This data looks as expected from the first MySQL Cluster, but when I run a query in the other database cluster I see this:

[sql_node_1_in_cluster_2] mysql> select * from numbers order by num1;
+------+------+
| num1 | num2 |
+------+------+
|    1 |   10 |
|    2 |   20 |
|    3 |   30 |
|    3 |   30 |
|    4 |   40 |
|    4 |   40 |
+------+------+
6 rows in set (0.01 sec)

The results are similar if I do inserts from the second cluster; It looks normal when I query the new data from the database it was originally written to (the master), but when I query the data from the other cluster, I see that the data was written twice.

Method for Setting Up Replication:

After creating a MySQL Cluster in each data center, I enabled log-bin and gave each SQL node a unique server-id in each SQL node's my.cnf file. I then followed by granting replication slaves from each SQL node. I ended up granting a replication slave from SQL Node 1 in Data Center 1 to SQL Node 1 in Data Center 2, then proceeded to grant a replication slave from SQL Node 1 in Data Center 2 to SQL Node 1 in Data Center 1. This way, both SQL Nodes act as slaves to one another. This same method was carried out for SQL Node 2 in each MySQL Cluster. Then I proceeded to change masters on each SQL node, like so:

Example for First Bi-Directional Replication Channel:

SQL Node 1 in Data Center 1:

[sql_node_1_in_cluster_1] mysql> CHANGE MASTER TO
-> MASTER_HOST = 'sqlnode1_cluster2',
-> MASTER_USER = 'rep1',
-> MASTER_PASSWORD = 'password',
-> MASTER_LOG_FILE = 'sqlnode1_cluster2-bin.000007',
-> MASTER_LOG_POS = 120;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

[sql_node_1_in_cluster_1] mysql> start slave;

SQL Node 1 in Data Center 2:

[sql_node_1_in_cluster_2] mysql> CHANGE MASTER TO
-> MASTER_HOST = 'sqlnode1_cluster1',
-> MASTER_USER = 'rep1',
-> MASTER_PASSWORD = 'password',
-> MASTER_LOG_FILE = 'sqlnode1_cluster1-bin.000007',
-> MASTER_LOG_POS = 120;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

[sql_node_1_in_cluster_2] mysql> start slave;

This same process was completed for the second SQL node in each data center. As a result, I am expecting each SQL node to be a master and a slave of one another, with SQL Node 1 in each cluster making the first bi-directional replication channel, and SQL Node 2 in each cluster making the second bi-directional replication channel.

The output of show slave status\G; looks good to me for each slave and their respective master, with each slave looking at the master's log file and position:

SQL Node 1, Cluster 2

[sql_node_1_in_cluster_2] mysql> show master status;
+------------------------------+----------+--------------+------------------+-------------------+
| File                         | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------------------+----------+--------------+------------------+-------------------+
| sqlnode1_cluster2-bin.000011 |     8868 |              |                  |                   |
+------------------------------+----------+--------------+------------------+-------------------+
1 row in set (0.01 sec)

SQL Node 1, Cluster 1

[sql_node_1_in_cluster_1] mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: sqlnode1_cluster2
                  Master_User: rep1
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: sqlnode1_cluster2-bin.000011
          Read_Master_Log_Pos: 8868
               Relay_Log_File: sqlnode1_cluster1-relay-bin.000002
                Relay_Log_Pos: 295
        Relay_Master_Log_File: sqlnode1_cluster2-bin.000011
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 8868
              Relay_Log_Space: 475
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 11
                  Master_UUID: 2ac24025-84ae-11e5-becf-0050568daf33
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

ERROR: 
No query specified

Reiterating the issue at hand, when I create data in one cluster, it is created twice in the other cluster. Does anyone have any suggestion as to why this is happening?

Best Answer

The problem lies in enabling the slaves on your SQL Node 2 in each datacenter. MySQL Cluster does support Multi-master replication as seen in the first diagram on the documentation page. Notice how only one SQL node in each cluster participate in the circular replication topology.

And the duplicate issue is referenced in the quote from this page on setting up two replication channels, emphasis mine:

Warning Only the primary channel is to be started at this point. The secondary replication channel is to be started only in the event that the primary replication channel fails, as described in Section 18.6.8, “Implementing Failover with MySQL Cluster Replication”. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replication slaves.

Finally, I would caution you on enabling master-master between multiple MySQL Clusters. Replication in this manner is still asynchronous, which means at some point you are going to run into replication issues between the two clusters.