I have never dipped my hands in MySQL Cluster and want to know how it can benefit me. When is it best to use it, and does it have performance benefits?
MySQL Cluster – Benefits and Advantages
MySQLndbcluster
Related Solutions
Please keep in mind that OPTIMIZE TABLE does not perform defragmentation. Internally, OPTIMIZE TABLE perform several operations (copying data to a temp file, recreate indexes, recompute index statistics). In fact, the example I have can be performed manually as shown.
Example: If you optimize mydb.mytable
, you enter this command:
OPTIMIZE TABLE mydb.mytable;
Note that mysql performs something the following under the hood:
CREATE TABLE mydb.mytable2 LIKE mydb.mytable;
ALTER TABLE mydb.mytable2 DISABLE KEYS;
INSERT INTO mydb.mytable2 SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable2 ENABLE KEYS;
DROP TABLE mydb.mytable;
ALTER TABLE mydb.mytable2 RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;
This is quite useful for tables that experience a high volume of UPDATEs and DELETEs
Performing this can accomplish two things
Prevent mysql from looking through fragments in a table in an attempt to load data into the right sized fragments. Eliminating these fragments will reduce this operation.
Having the index statistics recomputed helps the MySQL Query Optimizer construct better EXPLAIN plans. Otherwise, queries may deteriorate in execution time because the MySQL Query Optimizer decided to take bad guesses at the EXPLAIN plan. This would be a definite symptom of a table that has had a high volume of UPDATEs and DELETEs.
CAVEAT
With regard to caching, caching takes a dive quickly because of doing a full table scan. For MyISAM index pages flow in and out of the MyISAM Key Cache. For InnoDB, data and index pages flow in and out of the InnoDB Buffer Pool.
Disclosure: I am a MySQL employee, working on MySQL Cluster.
I would say that MySQL Cluster could achieve higher throughput / host than sharded MySQL+InnoDB provided that :
- Queries are simple
- All data fits in-memory
In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latency for purely in-memory data could be similar.
As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing. To get a more specific answer, you need to describe more about your application and the queries you perform, as well as the number of hosts and volume of data. MySQL Cluster has recently gained parallel localised query execution (AQL) which means it can be competitive with standalone MySQLD despite having data distributed across multiple hosts.
MySQL Cluster is currently limited to 'sharding' over 48 hosts. Sharded MySQL in theory has no limit. However, for a given target throughput, fewer MySQL Cluster hosts may be needed than sharded MySQL hosts.
More interesting differences are when you look at areas other than performance :
- MySQL Cluster supports arbitrary queries across all shards
- MySQL Cluster supports arbitrary transactions across all shards
- MySQL Cluster supports synchronous replication of shards with automatic failover and recovery
- MySQL Cluster supports online add node (cluster expansion)
- Sharded MySQL is more 'roll your own'
Having sharding built into your application gives you maximum scaling potential, but adds complexity and limits your flexibility in terms of cross-shard queries and operations. If your sharding is premature then it may be the root of some problems for you. MySQL Cluster lets you get some of the benefits of sharding without having to constrain your application to be single-shard only.
Regarding the previous answer, a few clarifications :
"Although MySQL Cluster is ACID-complaint, it does not provide a suitable storage engine for data with compound keys."
MySQL Cluster supports compound primary and secondary keys. Not sure what's not 'suitable' about it. Perhaps the previous poster can explain?
"In order to have data with the same key characteristics stored in a particular set of data nodes, you could doing the following:
- Take all data nodes offline, leaving only those data nodes you want to house data with the same key characteristics.
- Load your data into the MySQL Cluster, which populates only your select data nodes
- Bring all data nodes back online"
This is incorrect. Data distribution is independent of which nodes happen to be online at any time. MySQL Cluster supports various data distribution schemes to support the optimisations you describe. I describe data distribution in MySQL Cluster in a blog post here : Data distribution in MySQL Cluster
Best Answer
MySQL Cluster is designed around 3 core principles:
High availability (shared nothing, local and geographic replication, integrated heart-beating, automated failover and self-healing recovery, etc.)
Write-scalability and real time performance (auto-sharding, in-memory optimizations, etc)
Multiple database interfaces (SQL and NoSQL)
These are discussed in more detail in the following paper (note, registration required): http://mysql.com/why-mysql/white-papers/mysql_wp_scaling_web_databases.php
Adaptive Query Localization is a feature of the current MySQL Cluster 7.2 development release and rapidly improves performance of JOIN operations by pushing them down to the data nodes, thereby significantly reducing network traffic - to address the point Rolando makes above. You can read more about it here: http://dev.mysql.com/tech-resources/articles/mysql-cluster-labs-dev-milestone-release.html
MySQL Cluster is licensed under the GPL and can be downloaded from dev.mysql.com, with on-line help via the forums: forums.mysql.com/list.php?25