Considering performance only, can a MySQL Cluster beat a custom data sharding MySQL solution? sharding = horizontal partitioning
When I refer to sharding, I'm considering sharding made in the application layer, for instance, distributing records evenly across independent MySQL instances. For two servers, it could be (key mod 2).
Best Answer
Disclosure: I am a MySQL employee, working on MySQL Cluster.
I would say that MySQL Cluster could achieve higher throughput / host than sharded MySQL+InnoDB provided that :
In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latency for purely in-memory data could be similar.
As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing. To get a more specific answer, you need to describe more about your application and the queries you perform, as well as the number of hosts and volume of data. MySQL Cluster has recently gained parallel localised query execution (AQL) which means it can be competitive with standalone MySQLD despite having data distributed across multiple hosts.
MySQL Cluster is currently limited to 'sharding' over 48 hosts. Sharded MySQL in theory has no limit. However, for a given target throughput, fewer MySQL Cluster hosts may be needed than sharded MySQL hosts.
More interesting differences are when you look at areas other than performance :
Having sharding built into your application gives you maximum scaling potential, but adds complexity and limits your flexibility in terms of cross-shard queries and operations. If your sharding is premature then it may be the root of some problems for you. MySQL Cluster lets you get some of the benefits of sharding without having to constrain your application to be single-shard only.
Regarding the previous answer, a few clarifications :
MySQL Cluster supports compound primary and secondary keys. Not sure what's not 'suitable' about it. Perhaps the previous poster can explain?
This is incorrect. Data distribution is independent of which nodes happen to be online at any time. MySQL Cluster supports various data distribution schemes to support the optimisations you describe. I describe data distribution in MySQL Cluster in a blog post here : Data distribution in MySQL Cluster