Mariadb – what is the right way to restart a mariadb galera cluster

clusteringgaleramariadb

I am trying to restart cluster. I have two problems in stopping cluster and starting cluster

In stopping cluster, I follow the instruction https://software.mirantis.com/reference-documentation-on-fuel-folsom/galera/

For safe, I stop replication on node by set global wsrep_on='OFF'
and then, I cannot stop it gracefully. I must kill it. It is not clean.

In starting cluster, I start the first node by service mysql start –wsrep-cluster-address=gcomm:// I am not sure it is the right way because it create new cluster.

The instruction is in https://mariadb.com/kb/en/mariadb/documentation/replication-cluster-multi-master/galera/getting-started-with-mariadb-galera-cluster/#state-transfer-failure not available to me.

What is the right way to me ?

Best Answer

As far as I read the right way is to use the init.d/systemd scripts of MySQL/MariaDB in order to stop a node. This will inform WSREP/Galera that the replication must stop and will save on the disk, in the MySQL/MariaDB data folder, in a file the last transaction number that the node has committed (So that next time it will boot, the nodes will compare their last committed transaction and sync).

You have to stop a node and wait a little bit (like a minute or so) before to stop the next node, to be sure the other nodes see that.

After that, it is very important to start again the nodes in the reverse ordre, so that the last node you shutdown is the first booting one, having the most up-to-date data. Then the next nodes will be able to sync with the first one.

Using wsrep_on='OFF' doesn't sound right to me, WSREP is embedded in MySQL, so when MySQL gracefully stops, WSREP knows what it has to do.

Related Solutions

MySQL cluster write performance with Galera (MariaDB)

Data Warehousing (your app) and OLTP are somewhat different. So, I question the relevancy of the benchmark. OLTP, if I am not mistaken, involves very short 'transactions', which can suffer from the cross-node syncing done at COMMIT by Galera's wsrep. On the other hand, your insertion of new datapoints could be done in batches of, say, 1000, and not have to worry about the commit overhead.

With 60GB, innodb_buffer_pool_size should be more like 45G. And set innodb_buffer_pool_instances=16.

On the surface, 19K/sec (per node?) seems faster -- 19K*3 > 54K. But maybe I am misreading things.

Let's see your SHOW CREATE TABLE for the Fact table. There are a number of mistakes that can cause severe performance problems in your type of application.

When do you do the "consolidating"? And how? Summary table (which I assume is what you are talking about) is key to efficiency. Sometimes it is even better store the summary data, yet toss the raw data.

Check out my discussion of High speed ingestion and Summary tables. My point in those are that there are significant speedups that can (and should) be made aside from the underlying structure (hardware, Galera, etc).

Your Galera settings are in the right direction; there may be extra tips in Galera.

More questions (which will lead to more discussion): How many clients are pumping data into the table? How many rows are inserted at once? How easy would it be to gather the rows and batch-insert them? Is any Normalization occurring during the insertion? Does it make sense to do the summarization in parallel with the insertion? Did you intend to have clients writing to all 3 nodes? Do you have some form of Proxy between the clients and the nodes?

Mariadb – The right way of starting up a Galera Cluster

At MariaDB 10.1 where Galera included you need to start first node with key --wsrep-new-cluster

e.g. /etc/init.d/mysql start --wsrep-new-cluster

All other nodes starts as usually:

/etc/init.d/mysql start

To determine which node needs to be bootstrapped, compare the wsrep_last_committed value on all DB nodes:

KVM-1> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed | 21568       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+

KVM-2> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed |  1359       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+



KVM-3> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed |   537       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+

KVM-1 has the most up-to-date data. In this case, all Galera nodes are already started, so you don’t necessarily need to bootstrap the cluster again. We just need to promote KVM-1 to be a Primary Component:

KVM-1> SET GLOBAL wsrep_provider_options="pc.bootstrap=1";

The remaining nodes will then reconnect to the Primary Component (KVM-1) and resyncing back data based on this node.

Best Answer

Related Solutions

MySQL cluster write performance with Galera (MariaDB)

Mariadb – The right way of starting up a Galera Cluster

Related Question