1) Is a two node configuration safe\supported?
Galera will still run in a two node setup. However, there is always the threat of a split-brain scenario. For example, suppose you have DB1 and DB2 form a two node Galera Cluster. If DB1 goes down, you need to failover to DB2. While DB1 is having maintenance done to bring it back up, DB2 gets all your changes. By the time DB1 is back up, doing an IST is virtually impossible (or at least too late at this point). You would have to perform a full SST from DB2. Easiest way to do this is to delete the Galera cache files before starting up DB1. During the SST, DB2 is in a read-only state (as a donor) and cannot process any inserts, updates, or deletes. When there is a third node in the Cluster, at least one server can collect inserts, updates, and deletes.
You will also find
Percona XtraDB Cluster: Failure Scenarios with only 2 nodes more informative.
2) In a 3 node setup, if the third node dies it now becomes a two node setup. Is this now unsafe meaning I should really have a 4 node setup?
A two node Cluster is still operational. When introducing the third node for the first time or bringing back the third node from being down, one of the two operating nodes enters a read-only state (as a donor) to help the third node play catchup. Only one
As I said in the first question: When there is a third node in the Cluster, at least one server can collect inserts, updates, and deletes.
3) What exactly is a Galera "arbiter"?
I think you mean Arbitrator (See the subheading Using an arbitrator
). It is a mechanism that helps decide who plays the role of donor and who play the role of the cluster when introducing node back into a Cluster. It helps with mitigating split-brain scenarios More can be found in the Galera Documentation.
As far as I read the right way is to use the init.d/systemd scripts of MySQL/MariaDB in order to stop a node. This will inform WSREP/Galera that the replication must stop and will save on the disk, in the MySQL/MariaDB data folder, in a file the last transaction number that the node has committed (So that next time it will boot, the nodes will compare their last committed transaction and sync).
You have to stop a node and wait a little bit (like a minute or so) before to stop the next node, to be sure the other nodes see that.
After that, it is very important to start again the nodes in the reverse ordre, so that the last node you shutdown is the first booting one, having the most up-to-date data.
Then the next nodes will be able to sync with the first one.
Using wsrep_on='OFF'
doesn't sound right to me, WSREP is embedded in MySQL, so when MySQL gracefully stops, WSREP knows what it has to do.
Best Answer
For your first question:
Multiply Things you must know before using Galera Cluster.
It's virtually synchronized Multi-master Replication
You need minimum of 3 nodes/servers(or 2 nodes and 1 abrirator) to avoid split/brain.
So if you want to use Galera Cluster I thinks this article is good for start.