Mysql – Question about a two-node MariaDB Galera Cluster

galeramariadbmysql-cluster

Hypothetical situation:

Node1 is where data gets written to by the app server … it was where the cluster was bootstrapped from. Node 2 is the second node in the cluster. Both these nodes run in the same data center.

Let's say there was a network event; maybe a sysadmin pulled out a switch he wasn't supposed to pull … we don't know!

Now, both nodes are momentarily offline and both nodes come back online again after say a 5 min downtime where the sysadmin realizes his mistake and hastily puts the switch back in.

Now since both nodes were affected and came back online about the same time … the cluster automatically corrected itself after spewing several messages like this:

WSREP: Quorum: No node with complete state

re-bootstrapping prim from partitioned components

WSREP: Full re-merge of primary 9j421w44-0e4c-12r5-7fa2-3b63f2c92b0p found

Nobody notices this slight hiccup and everyone assumes everything is fine.

Now after a few days one of the devs begins noticing that data being written to node1 is being overwritten by older data. They pore through app logs and finally through the mysql logs and see the ^above messages.

They run this query on both nodes:

SHOW STATUS LIKE 'wsrep_last_committed';

They find that the value for node 1 lags behind node 2 and come to the conclusion that node2 is overwriting data from node1.

The database admin says "HOGWASH! THIS IS NOT POSSIBLE IN A SYNCHRONOUS REPLICATION CLUSTER SETUP!!"

My questions – is this hypothetical scenario possible? If it is possible, could the Dev be correct in implying that node 2 is overwriting data from node 1?

Asking for a friend (who knows that having a 2 node cluster is just asking for split brain) …

Best Answer

I'd say highly unlikely. During the network partition, when switch was pulled, (on an equally weighted quorum calculation) quorum cannot be achieved and the two node cluster loses primary component status. Exactly the reason not to have an equally weighted two node setup. When there's no prim component, there's no DML. After the situation is fixed the nodes find each other to form prim component and continue synchronous replication.

Highly unlikely that they get out of sync this way.