MySQL cluster – how many hosts are needed for full HA

high-availabilityMySQL

I need some clarification on MySQL-Cluster (NDB) setup

I want to achieve full HA but I'm low on server budget and can't afford to rent a dozen of servers for simple solution that I'm working on.

What's the minimum amount of hosts to achieve full HA in case of one node failure?

I mean situation that cluster should continue to work with no errors when any single physical servers fails. Failed server can be any of MySQL Cluster components (management, sql or data)

I don't need extreme performance, just fully automated server failure handling on small number of servers

From the MySQL Cluster documentation I know that I need at least two full copies of data (replicas). Does one replica need to be on more than one host?

Can HA be achieved on 4 servers? 2 management nodes (kind of small VPS) + 2 physical servers each running SQL and data node?

I'm aware of MySQL master-slave, but it doesn't provide automated failover and requires application rewrite

Best Answer

You really need to define what level of "HA" you are looking for quantitatively -- one man's "can-sleep-at-night' is another man's "this-thing-is-a-house-of-cards".


The minimum number of systems is two (2) -- An active and a standby replica (with something like heartbeat or custom-grown scripts to handle the failover).

With a MySQL cluster this means at least two SQL nodes and two Data nodes (to continue serving requests in the event of a failure of any one node). (If you also need redunancy on the management server's functions you would need two of those as well).

The key part here is testing the failover in a development environment -- which means you need at least two more machines (or a virtual machine). You also want to test upgrade and maintenance processes to ensure they won't trigger unintended consequences (Ideally you should do nothing to production that hasn't been tested and proven in Development).

If you fail to properly test you may trigger failover, which means you incur the procedural (and possibly business) cost of a failover -- typically having to rebuild the former active server to be a new standby server.

This protects against hardware failures (power supply, NIC, disk, switch (if they're on separate switches).

Note that this doesn't just apply to your DB servers -- You need two of *EVERYTHING: Web servers, DBI servers, Firewalls, DNS servers...
Redundancy of one component is meaningless if you still have a bunch of single-points-of-failure in your stack.


The next level of protection is network failures ("What if my ISP goes down?") - this requires replicating your whole redundant environment above to a remote datacenter.

What's important here is that you diversify network connections, power, etc -- You don't want your standby datacenter across the street where it's fed by the same power and fiber as your main facility.
A company I consulted for had a requirement that any remote facility used for DR be "at least 15 degrees of longitude away" (i.e. "In the next time zone"). A common practice in the US is East-Coast/West-Coast, or NY/Chicago LA/Texas.


The next level above that is truly distributed resources (think Google) which requires a database system that supports replication and sharding (think MongoDB).
If implemented properly there's almost no chance of a true "outage", though service may occasionally be degraded and recovery can take a while.