Mariadb – My Galera Cluster does not connect across AWS regions

awsgaleramariadb

  • I am building a MariaDB Galera Cluster, on AWS, EC2. Using MariaDB 10.5 and Galera 4.
  • I have two nodes in the California region, named galera_ca and galera_ca2, and a third node in Oregon name galera_or
  • The security groups between the servers are wide open, and the ping time between OR and CA is 12ms
  • I started galera_ca using galera_new_cluster
  • I started galera_ca2 using systemctl start mariadb

The two California nodes connect and share data just fine.
wsrep_cluster_size indicates that I have two nodes in the cluster.

  • I started galera_or using systemctl start mariadb

Oregon then hangs on starting for a very long time, and wsrep_cluster_size shows three nodes, but the server cannot be connected to.

Eventually systemctl start mariadb returns this message:
Job for mariadb.service failed because a fatal signal was delivered to the control process. See "systemctl status mariadb.service" and "journalctl -xe" for details.

systemctl status mariadb.service doesn't show anything exceptional,

journalctl -xe shows:
requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable

wsrep_cluster_size still shows three nodes, but mariadb is not running in OR.
If I try to shut down either of the California servers systemctl stop mariadb it never returns.

All three servers have basically the same configuration:

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so

wsrep_node_name='galera_ca'
wsrep_node_address="INTERNAL ADDRESS OF THIS NODE"

wsrep_cluster_name='galera-experiment'
wsrep_cluster_address="gcomm://EXTERNAL ADDRESS OF NODE CA, EXTERNAL ADDRESS OF NODE CA2, EXTERNAL ADDRESS OF NODE OR"

wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"
wsrep_slave_threads=4
wsrep_sst_method=rsync

binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

If I try just one CA node and the OR node, the result is the same.

Best Answer

In wsrep_node_address on Amazon EC2, you must use the global DNS name, instead of the internal IP address.