HAProxy Load-Balancing – Performance Gain in 2 Node Setup

galeraload balancingvirtualisation

I was speaking with my boss about Load-Balancing a 2 node Galera Cluster and we weren't sure if there was any reason.

For writes, his argument was, even if we balance the writes, it has to write to each server to do the replication.

For reads, we could balance the reads across the servers, but would this really save time if everything now goes thru a single VM on another server?

We have two dedicated SQL Servers which are in an Active-Active Galera setup.

The only way I can think of doing a HA-Proxy would be a 3rd VM on another server, is this really worth the performance gain to have everything go thru this one VM which will be on a server congested with other traffic?

Is it possible/Would it make sense to put HA-Proxy right on the SQL Server(s) and load balance the reads, but it would still go thru the Primary server w/ HA-Proxy to get to Server B.

Just looking for some general thoughts and advice for this simple setup.

Best Answer

There are several different topics you comment on your question, and with many "IF"s, depending on your specific workload and architecture. Let's start with the things you are right:

  • It is true that Galera by itself should not give you better write workload- it is a shared-nothing, non-sharded cluster, which means you will have to write everywhere. The only way to improve significantly your write performance is if you can share the write load between several nodes, while in your case you are duplicating them. In fact, as it is shared nothing architecture, the more servers you add, the slower the whole usually will be, as the slowest or further away server will be the bottleneck of your replication. If you need write performance, galera is not for you (it will usually have less throughput than a single server). You normally setup galera to get high availability(survivability of data and service if a node goes down), not better write performance.

There are some buts: * I had some clients that claimed better write performance, probably because horrible SQL queries + Galera requiring row-based replication, and in some special cases, you could get some gains with that (if you do 30-second writes but you only write a few records, you will get some extra scalability). That is normally very rare: you should fix first your queries, but I am just pointing to a (very) specific exception.

The meat of your question is that if the fact that you are adding a proxy in the middle will not be worth the improvement you get from load balancing queries. To answer that, you need to say 6 things:

  • What is your current average round-trip time between the client and the servers
  • What is your average round-trip time between the client and the proxy
  • What is your average round-trip time between the proxy and the servers
  • What is the overhead of the proxy processing time
  • What is the local latency of the actual mysql query time
  • How loaded is your average client?

This will tell you if you are interested on using a proxy or not for performance reasons.

You can find out the first 3 by using ping, the last 3 by profiling the actions. Normally, query time is much larger than round time within a datacenter, but that depends on which queries you are doing and how far located (physically) are those VMs. To cancel some of those times, people install the proxy on the same machine than the clients, so any overhead is mostly canceled. Also, HAproxy being mostly an IP proxy, the overhead is very low.

Now, if your servers are not very loaded, you may not get any advantage in latency- querying the server will double your throughput- if that has an impact on latency or not will depend on your current load.

There is usually a more important reason to use a proxy, which is high availability- using HAProxy will allow you to switch to a secondary galera node in case the active one goes down, automatically. It will also simplify manual switchovers. Of course, the proxy itself can be a single point of failure.

I hope that helps you decide- but the most important advice is try it yourself and measure!

Edit: BTW, with 2 nodes only, I hope you use galera as a replication solution, not as a cluster (requires special configuration) or with garbd, because if you don't, and a node goes down, the second will go down too to avoid a slit brain (no node has 50%+ quorum).