I was speaking with my boss about Load-Balancing a 2 node Galera Cluster and we weren't sure if there was any reason.
For writes, his argument was, even if we balance the writes, it has to write to each server to do the replication.
For reads, we could balance the reads across the servers, but would this really save time if everything now goes thru a single VM on another server?
We have two dedicated SQL Servers which are in an Active-Active Galera setup.
The only way I can think of doing a HA-Proxy would be a 3rd VM on another server, is this really worth the performance gain to have everything go thru this one VM which will be on a server congested with other traffic?
Is it possible/Would it make sense to put HA-Proxy right on the SQL Server(s) and load balance the reads, but it would still go thru the Primary server w/ HA-Proxy to get to Server B.
Just looking for some general thoughts and advice for this simple setup.
Best Answer
There are several different topics you comment on your question, and with many "IF"s, depending on your specific workload and architecture. Let's start with the things you are right:
There are some buts: * I had some clients that claimed better write performance, probably because horrible SQL queries + Galera requiring row-based replication, and in some special cases, you could get some gains with that (if you do 30-second writes but you only write a few records, you will get some extra scalability). That is normally very rare: you should fix first your queries, but I am just pointing to a (very) specific exception.
The meat of your question is that if the fact that you are adding a proxy in the middle will not be worth the improvement you get from load balancing queries. To answer that, you need to say 6 things:
This will tell you if you are interested on using a proxy or not for performance reasons.
You can find out the first 3 by using ping, the last 3 by profiling the actions. Normally, query time is much larger than round time within a datacenter, but that depends on which queries you are doing and how far located (physically) are those VMs. To cancel some of those times, people install the proxy on the same machine than the clients, so any overhead is mostly canceled. Also, HAproxy being mostly an IP proxy, the overhead is very low.
Now, if your servers are not very loaded, you may not get any advantage in latency- querying the server will double your throughput- if that has an impact on latency or not will depend on your current load.
There is usually a more important reason to use a proxy, which is high availability- using HAProxy will allow you to switch to a secondary galera node in case the active one goes down, automatically. It will also simplify manual switchovers. Of course, the proxy itself can be a single point of failure.
I hope that helps you decide- but the most important advice is try it yourself and measure!
Edit: BTW, with 2 nodes only, I hope you use galera as a replication solution, not as a cluster (requires special configuration) or with garbd, because if you don't, and a node goes down, the second will go down too to avoid a slit brain (no node has 50%+ quorum).