MySQL Sharding vs MySQL Cluster – Key Differences

MySQLndbcluster

Considering performance only, can a MySQL Cluster beat a custom data sharding MySQL solution? sharding = horizontal partitioning

When I refer to sharding, I'm considering sharding made in the application layer, for instance, distributing records evenly across independent MySQL instances. For two servers, it could be (key mod 2).

Best Answer

Disclosure: I am a MySQL employee, working on MySQL Cluster.

I would say that MySQL Cluster could achieve higher throughput / host than sharded MySQL+InnoDB provided that :

Queries are simple
All data fits in-memory

In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latency for purely in-memory data could be similar.

As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing. To get a more specific answer, you need to describe more about your application and the queries you perform, as well as the number of hosts and volume of data. MySQL Cluster has recently gained parallel localised query execution (AQL) which means it can be competitive with standalone MySQLD despite having data distributed across multiple hosts.

MySQL Cluster is currently limited to 'sharding' over 48 hosts. Sharded MySQL in theory has no limit. However, for a given target throughput, fewer MySQL Cluster hosts may be needed than sharded MySQL hosts.

More interesting differences are when you look at areas other than performance :

MySQL Cluster supports arbitrary queries across all shards
MySQL Cluster supports arbitrary transactions across all shards
MySQL Cluster supports synchronous replication of shards with automatic failover and recovery
MySQL Cluster supports online add node (cluster expansion)
Sharded MySQL is more 'roll your own'

Having sharding built into your application gives you maximum scaling potential, but adds complexity and limits your flexibility in terms of cross-shard queries and operations. If your sharding is premature then it may be the root of some problems for you. MySQL Cluster lets you get some of the benefits of sharding without having to constrain your application to be single-shard only.

Regarding the previous answer, a few clarifications :

"Although MySQL Cluster is ACID-complaint, it does not provide a suitable storage engine for data with compound keys."

MySQL Cluster supports compound primary and secondary keys. Not sure what's not 'suitable' about it. Perhaps the previous poster can explain?

"In order to have data with the same key characteristics stored in a particular set of data nodes, you could doing the following:

Take all data nodes offline, leaving only those data nodes you want to house data with the same key characteristics.

Load your data into the MySQL Cluster, which populates only your select data nodes

Bring all data nodes back online"

This is incorrect. Data distribution is independent of which nodes happen to be online at any time. MySQL Cluster supports various data distribution schemes to support the optimisations you describe. I describe data distribution in MySQL Cluster in a blog post here : Data distribution in MySQL Cluster

Best Answer

Related Solutions

Mysql – Any reasons at all *not* to use MySQL in this scenario

There are political reasons for and against this setup.

Mysql – Solution for deploying multiple instances of Django Server with Database replication

Related Question

Mysql – Any reasons at all not to use MySQL in this scenario