Mysql – Galera Cluster Setup – Primary and Secondary Site Scenario

galerahigh-availabilitymariadbmulti-masterMySQL

I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation, the Weighted Quorum for a Primary and Secondary Site Scenario is quite promising. For ease of reading, I've extracted the setup from the document as follows:

When configuring quorum weights for primary and secondary sites, use
the following pattern:

Primary Site:
  node1: pc.weight = 2
  node2: pc.weight = 2

Secondary Site:
  node3: pc.weight = 1
  node4: pc.weight = 1

Under this pattern, some nodes are located at the primary site while
others are at the secondary site. In the event that the secondary site
goes down or if network connectivity is lost between the sites, the
nodes at the primary site remain the Primary Component. Additionally,
either node1 or node2 can crash without the rest of the nodes becoming
non-primary components.

But there seems to be two drawbacks:

  1. If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
  2. Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.

So I've come up with another idea in mind – leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:

  • The primary site will remain the Primary Component on network issue,
    just like the original setup.
  • The cluster still functions even if two nodes failed.
  • Flipping between primary and secondary site just require a move of the Galera Arbitrator.

May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.

Best Answer

"Weighting" was added late in the game, when they realized that a 2-datacenter setup was too vulnerable. (3 datacenters is resilient, and can use garbd in one of them.) The example you quote is resilient to any single server, datacenter, or network outage.

As I read the last sentence of the quote, node1 or node2 died but the other three nodes are alive and talking to each other. That is, there is a Quorum, and the system is still reliable.

However, I agree that the sentence is ambiguous -- It can be read that after the network died, node1 or node 2 died. This leaves three clumps: (node1), (node2), (node3,node4), each with a weight of 2. None should be considered "primary" because none has Quorum.

You bring up garbd, yet it is not in the example?? And where would you put it?

You should not be changing the configuration while the system is hobbled -- you should be fixing the broken components.

The main goal of is to allow a single point of failure -- a single node, the network, a data center. It would take a really large and complex system to survive two failures. For example, I think it would require 5 datacenters to survive 2 network failures.

So, focus on a single point of failure.