Mariadb – Automatically and robustly restarting a Galera cluster

galeralinuxmariadb

Galera clusters, somewhat famously, have a requirement that nodes need to be shut down and brought back up again in a certain order.

There are a few questions here and on other sites about recovering from unclean shutdowns, which may cause split brain situations in the cluster. For manual restarts, I usually follow a procedure that relies on checking files and potentially even database state and is, frankly, too fragile and too much manual work for my taste.

I think it should be possible to encode this manual procedure right into the systemd unit files of a cluster, so that when the failure condition is resolved and machines boot back up, the cluster members also come back up in the correct sequence automatically. This will definitely involve listing all cluster member nodes in some central location, but should be re-usable apart from that.

So, is there a commonly used, best-practice way to re-start a Galera cluster after any sort of shutdown, automatically during system startup?

Best Answer

I would say the best practice would be to ensure that one node is always still running, when the others have stopped normally. (systemctl stop mysql/mariadb) Then restart one of the previously stopped nodes, and when that restarted node reports in logfiles that it connected to the running old node, and that it's ready to serve connections, you may restart the old one still running.

If all nodes have been stopped, you need to figure out manually which is the newest node (by using (ubuntu):galera_recovery, note the highest change no.) Maybe even have edit the safe_to_bootstrap line in the grastate.dat file.

That node then needs to restart the cluster (create the new unique ID for the others to connect to) (by using (ubuntu): galera_new_cluster).

The remaining nodes simply start db with 'systemctl start mariadb'.

Sounds as if there's a reason they didnt yet built that in to systemd ;) ?