I'm using 10.1.16-MariaDB-1~xenial
from the official MariaDB apt repository for 10.1 [stable], via the University of Texas mirror.
I had a perfectly functioning MariaDB Galera cluster setup on 3 Ubuntu 16.04 servers.
Then I upgraded them. Now I have nothing.
The upgrade to 10.1.16 failed, and quickly brought down the whole cluster. I don't have the output, but dpkg failed on setting up mariadb-server
and mariadb-server-10.1
.
I have backups, so I purged all traces of MariaDB/MySQL/Galera from my servers (including removing /var/lib/mysql/
, /etc/mysql/
, and /var/log/mysql/
) and started over. However, now, with a clean install on each server, none of the standard system startup scripts work. I suspect this is why the upgrade process through apt
failed, too.
I've tried each of the following on my first node:
galera_new_cluster
service mysql bootstrap
service mysql bootstrap --wsrep-new-cluster
service mysql bootstrap --wsrep-cluster-address="gcomm://"
service mysql start
service mysql start --wsrep-new-cluster
service mysql start --wsrep-cluster-address="gcomm://"
systemctl start mariadb
systemctl start mariadb --wsrep-new-cluster
systemctl start mariadb --wsrep-cluster-address="gcomm://"
Every single one gives me the same output:
Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
systemctl status mariadb.service
:
● mariadb.service - MariaDB database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: exit-code) since Fri 2016-07-22 13:29:45 CDT; 42s ago
Process: 10799 ExecStartPre=/bin/sh -c VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE)
Process: 10794 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 16865 (code=exited, status=0/SUCCESS)
Jul 22 13:29:41 sql2 systemd[1]: Starting MariaDB database server...
Jul 22 13:29:45 sql2 mysqld[10799]: WSREP: Failed to recover position: '2016-07-22 13:29:41 140110745778432 [Note] /usr/sbin/mysqld (mysqld 10.1.16-MariaDB-1~xenial) starting as process 11080 ...'
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Control process exited, code=exited status=1
Jul 22 13:29:45 sql2 systemd[1]: Failed to start MariaDB database server.
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Unit entered failed state.
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Failed with result 'exit-code'.
The only way I can start my servers now is by manually executing:
sudo -u mysql mysqld --wsrep-cluster-address='gcomm://'
On the first node, and then:
sudo -u mysql mysqld --wsrep-cluster-address='gcomm://ip1,ip2,ip3'
On the other two nodes. That works, and I have a working cluster again. But now, systemd/systemctl have no idea the service is running. It seems like the systemd startup scripts can't use the wsrep-cluster-address
setting in my configuration files at all. Specifying it to service
or systemctl
command line does not work either.
How am I supposed to start mariadb?
Best Answer
There was a bug in galera_recovery.sh script. https://jira.mariadb.org/browse/MDEV-10396