Percona XtraDB Cluster – Replication Stops Working, Cluster in ‘Synced’ State

galeramariadbreplication

We are trying to populate an empty Galera cluster using myloader. We use the latest version of myloader from repo trunk, the command is as follows:

/root/mydumper/myloader -d /root/dump/export-20140609-133618/ -h localhost -q 1 -u root -p XXXXXXXXX -t 8 -o -v 3

Our cluster config is:

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address            = 0.0.0.0
key_buffer              = 16M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8
myisam-recover         = BACKUP
#query_cache_limit      = 1M
query_cache_size        = 0
binlog_format           = ROW
innodb_autoinc_lock_mode=2
log_error = /var/log/mysql/error.log
expire_logs_days        = 10
max_binlog_size         = 100M

innodb_buffer_pool_size         = 4096M
innodb_flush_log_at_trx_commit  = 2
innodb_log_file_size    = 128M
innodb_log_buffer_size  = 32M
default-storage-engine  = InnoDB
innodb_file_per_table = 1
max_connections = 250
tmp_table_size=128M
max_heap_table_size=128M
wsrep_provider=/usr/lib/galera/libgalera_smm.so 
wsrep_provider_options = "evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; gcache.size=256M"
wsrep_cluster_address=gcomm://
#wsrep_cluster_address=gcomm://10.0.2.230,10.0.3.66,10.0.4.124
wsrep_cluster_name="cluster" 
wsrep_node_address="10.0.2.230" 
wsrep_node_name="cluster_1" 
wsrep_sst_method=xtrabackup 
wsrep_sst_auth="root:XXXXXXXXX" 

DDL statements seem to be replicated properly but the replication process stops effectively as soon as myloader starts to load data into tables. There is nothing relevant in MySQL error logs. The cluster status looks like:

MariaDB [XXXXX]> show status like 'wsrep%';
+------------------------------+--------------------------------------+
| Variable_name                | Value                                |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid       | 79929fd2-f06f-11e3-bd73-1a98fe06672f |
| wsrep_protocol_version       | 5                                    |
| wsrep_last_committed         | 6155                                 |
| wsrep_replicated             | 1                                    |
| wsrep_replicated_bytes       | 265                                  |
| wsrep_repl_keys              | 2                                    |
| wsrep_repl_keys_bytes        | 39                                   |
| wsrep_repl_data_bytes        | 162                                  |
| wsrep_repl_other_bytes       | 0                                    |
| wsrep_received               | 6171                                 |
| wsrep_received_bytes         | 2948301                              |
| wsrep_local_commits          | 0                                    |
| wsrep_local_cert_failures    | 0                                    |
| wsrep_local_replays          | 0                                    |
| wsrep_local_send_queue       | 0                                    |
| wsrep_local_send_queue_avg   | 0.000000                             |
| wsrep_local_recv_queue       | 0                                    |
| wsrep_local_recv_queue_avg   | 0.347107                             |
| wsrep_local_cached_downto    | 1                                    |
| wsrep_flow_control_paused_ns | 0                                    |
| wsrep_flow_control_paused    | 0.000000                             |
| wsrep_flow_control_sent      | 0                                    |
| wsrep_flow_control_recv      | 0                                    |
| wsrep_cert_deps_distance     | 1.000000                             |
| wsrep_apply_oooe             | 0.000000                             |
| wsrep_apply_oool             | 0.000000                             |
| wsrep_apply_window           | 1.000000                             |
| wsrep_commit_oooe            | 0.000000                             |
| wsrep_commit_oool            | 0.000000                             |
| wsrep_commit_window          | 1.000000                             |
| wsrep_local_state            | 4                                    |
| wsrep_local_state_comment    | Synced                               |
| wsrep_cert_index_size        | 1537                                 |
| wsrep_causal_reads           | 0                                    |
| wsrep_cert_interval          | 0.000000                             |
| wsrep_incoming_addresses     | ,10.0.4.124:3306,10.0.2.230:3306     |
| wsrep_cluster_conf_id        | 4                                    |
| wsrep_cluster_size           | 3                                    |
| wsrep_cluster_state_uuid     | 79929fd2-f06f-11e3-bd73-1a98fe06672f |
| wsrep_cluster_status         | Primary                              |
| wsrep_connected              | ON                                   |
| wsrep_local_bf_aborts        | 0                                    |
| wsrep_local_index            | 2                                    |
| wsrep_provider_name          | Galera                               |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>    |
| wsrep_provider_version       | 25.3.5-wheezy(rXXXX)                 |
| wsrep_ready                  | ON                                   |
+------------------------------+--------------------------------------+
47 rows in set (0.00 sec)

on a slave node and like the following on a master node:

MariaDB [XXXXX]> show status like 'wsrep%';
+------------------------------+--------------------------------------+
| Variable_name                | Value                                |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid       | 79929fd2-f06f-11e3-bd73-1a98fe06672f |
| wsrep_protocol_version       | 5                                    |
| wsrep_last_committed         | 6155                                 |
| wsrep_replicated             | 6120                                 |
| wsrep_replicated_bytes       | 2931465                              |
| wsrep_repl_keys              | 12232                                |
| wsrep_repl_keys_bytes        | 238616                               |
| wsrep_repl_data_bytes        | 2301169                              |
| wsrep_repl_other_bytes       | 0                                    |
| wsrep_received               | 3                                    |
| wsrep_received_bytes         | 273                                  |
| wsrep_local_commits          | 0                                    |
| wsrep_local_cert_failures    | 0                                    |
| wsrep_local_replays          | 0                                    |
| wsrep_local_send_queue       | 0                                    |
| wsrep_local_send_queue_avg   | 0.000162                             |
| wsrep_local_recv_queue       | 0                                    |
| wsrep_local_recv_queue_avg   | 0.000000                             |
| wsrep_local_cached_downto    | 36                                   |
| wsrep_flow_control_paused_ns | 0                                    |
| wsrep_flow_control_paused    | 0.000000                             |
| wsrep_flow_control_sent      | 0                                    |
| wsrep_flow_control_recv      | 0                                    |
| wsrep_cert_deps_distance     | 1.000000                             |
| wsrep_apply_oooe             | 0.000000                             |
| wsrep_apply_oool             | 0.000000                             |
| wsrep_apply_window           | 1.000000                             |
| wsrep_commit_oooe            | 0.000000                             |
| wsrep_commit_oool            | 0.000000                             |
| wsrep_commit_window          | 1.000000                             |
| wsrep_local_state            | 4                                    |
| wsrep_local_state_comment    | Synced                               |
| wsrep_cert_index_size        | 1537                                 |
| wsrep_causal_reads           | 0                                    |
| wsrep_cert_interval          | 0.000000                             |
| wsrep_incoming_addresses     | ,10.0.4.124:3306,10.0.2.230:3306     |
| wsrep_cluster_conf_id        | 4                                    |
| wsrep_cluster_size           | 3                                    |
| wsrep_cluster_state_uuid     | 79929fd2-f06f-11e3-bd73-1a98fe06672f |
| wsrep_cluster_status         | Primary                              |
| wsrep_connected              | ON                                   |
| wsrep_local_bf_aborts        | 0                                    |
| wsrep_local_index            | 1                                    |
| wsrep_provider_name          | Galera                               |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>    |
| wsrep_provider_version       | 25.3.5-wheezy(rXXXX)                 |
| wsrep_ready                  | ON                                   |
+------------------------------+--------------------------------------+
47 rows in set (0.00 sec)

We have tried two different Percona XtraDB Cluster versions with wsrep_provider_version 2.8(r165) and 2.10(r175). We also tried MariaDB Galera Cluster with wsrep_provider_version 25.3.5 (I believe this is the latest version of wsrep provider according to http://galeracluster.com/downloads/). We also tried to set up two completely separate environments with different network latencies but to no avail.
Please help us to identify and solve the problem.

Best Answer

We were able to resolve this using mysqldump instead of mydumper/myloader. Apparently there is a bug in myloader or in Galera itself which prevents replication to flow normally.