MariaDB – Node Unable to Join Galera Cluster

centos-7galeramariadbmariadb-10.1

I'm trying to setup a third node for my MariaDB cluster. I copied over the server.cnf file from my second node (which joins the cluster and can start its own cluster). The first two work fine using a similar file structure as I've tested replication and failing one server to let the other take over and restarting that server. This is only a test cluster to help me understand how it (MariaDB/Galera) works. I am using MariaDB version 10.1-24 for my three CentOS 7 virtual servers. If it is any sort of detriment, as you will find out, I bootstrapped and used both of the first two servers before I even received the third virtual server.

Problem

Copying over the file from the second server to the third server, however, does not appear to work despite changing the node's name and address. I edited the other two files as well so that the address was "gcomm://node1,node2,node3" (replace nodeX with the proper IP address for each server). I then started the node after bootstrapping the lead node. However, after starting it, I did not see it join the cluster. I was able to, and still can gain, access MySQL and do some testing but when I looked at the cluster size, I got this:

[root@node3 ~]# mysql -uroot -e "show status like 'wsrep_cluster_size';"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 0     |
+--------------------+-------+

The primary node, as well, does not report the node joining the cluster:

[root@node1 ~]# mysql -uroot -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+

What have I done before coming here?

I looked at several other people who had similar issues but confirmed they were not related to whatever is going on with mine. I checked to make sure wsrep_on=ON, that the address correctly points to the other nodes, that the node has a unique name to its cluster buddies, etc.

I compared the two working files with the third node and they were all the same (barring the node's IP address and name, of course). I decided to test if maybe it was an issue with the cluster itself so I set up this node to be its own cluster but bootstrapping it still resulted in nothing. I played with the original cluster's server.cnf files to see if that would work but still no changes. I have henceforth reverted it to the version you'll see below.

On CentOS7, I have confirmed the SELinux has been set to permissive and firewalld has been disabled/turned off for my testing process. I know this was an issue because when I put the first server together, it wasn't bootstrapping because of SELinux and firewalld. This was one of the first things I checked.

Node File

Here is the problem node's configuration file:

[mysqld]
bind-address=0.0.0.0
#
# * Galera-related settings
#
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
default_storage_engine=InnoDB
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_cluster_address="gcomm://10.32.18.90,10.32.18.91,10.32.18.92"
#wsrep_cluster_address="gcomm://"
wsrep_cluster_name='galera'
wsrep_node_address='10.32.18.92'
wsrep_node_name='node3'
wsrep_sst_method=rsync
#wsrep_sst_auth=test:PASS

And here is the second node which this file is based off of:

[mysqld]
bind-address=0.0.0.0

#
# * Galera-related settings
#
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
default_storage_engine=InnoDB
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_cluster_address="gcomm://10.32.18.90,10.32.18.91,10.32.18.92"
#wsrep_cluster_address="gcomm://"
wsrep_cluster_name='galera'
wsrep_node_address='10.32.18.91'
wsrep_node_name='node2'
wsrep_sst_method=rsync
#wsrep_sst_auth=test:PASS

Error log from systemctl status mariadb.service -l

mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: active (running) since Mon 2017-06-26 10:38:50 EDT; 25s ago
  Process: 5488 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 5448 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
  Process: 5445 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 5460 (mysqld)
   Status: "Taking your SQL requests now..."
   CGroup: /system.slice/mariadb.service
           └─5460 /usr/sbin/mysqld
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] InnoDB: Highest supported file format is Barracuda.
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] InnoDB: 128 rollback segment(s) are active.
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] InnoDB: Waiting for purge to start
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequence number 1616839
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140609750824704 [Note] InnoDB: Dumping buffer pool(s) not yet started
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] Plugin 'FEEDBACK' is disabled.
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] Server socket created on IP: '::'.
Jun 26 10:38:50 node3.libvirt mysqld[5460]: 2017-06-26 10:38:50 140610510633216 [Note] /usr/sbin/mysqld: ready for connections.
Jun 26 10:38:50 node3.libvirt mysqld[5460]: Version: '10.1.24-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
Jun 26 10:38:50 node3.libvirt systemd[1]: Started MariaDB database server.

TL;DR

Problem/Question: Third node will not join an existing cluster nor is it able to be bootstrapped to make its own cluster. Server 3 uses a similar config setup to 1 and 2. However, both of the original two servers work and are able to do whatever a MariaDB cluster needs to do and can do. What could I have done wrong during the setup phase?

What I've Tried (no particular order):

  • Rebooting the server
  • Checked SELinux and firewalld to make sure they were permissive and down respectively
  • Making sure wsrep settings are correct (and on)
  • Making sure rsync was downloaded and at the newest version
  • Starting the problem node as it's own cluster
  • Uninstalling and reinstalling MariaDB
  • This question from ServerFault
  • Galera's own documentation
  • MariaDB's own documentation
  • Up to, and including, the ninth page of Google using different versions of the same question

If any more information is needed, please do say so. I want to make this as easy a process as I can for all parties.

Best Answer

So I managed to figure out what the cause of the issue was. Apparently, I had deleted the /etc/my.cnf (not the folder, just my.cnf). I did not realize this until just today when I was trying to find the error log for eroomydna. Consequently, it wasn't being written to due to the fact that the file wasn't there and telling mysql about what directory to include. If anyone else is having an issue and they also don't have said file, add it and write in it:

[client-server]
!includedir /etc/my.cnf.d

I realized this file was missing when I accidentally opened up the first server and, when I tabbed to autocomplete opening /etc/my.cnf.d/server.cnf, it didn't autocomplete my.cnf.d due to the existence of my.cnf.