Mongodb – Sharded key-value store using MongoDB

mongodbnosql

Would like to set up a key-value store that is sharded across multiple machines.

We are currently using MongoDB, is there a reason why we shouldn't use MongoDB for this purpose?

We also use Redis, however for this use case, we would like to use the hard drive and Redis is in-RAM only.

Best Answer

There are a number of reasons not to use MongoDB as a pure key-value store, and there are some reasons to consider it. Mongo is optimized as a document store - it indexes all the fields in a document, and has rich primitives for JSON objects and hierarchies. You can use it as a key-value store, but the single-threaded nature means you won't be getting good performance out of your hardware. Storing simple blobs removes a number of the benefits of Mongo. Mongo has algorithms where it splits data chunks as you insert, which can create lag. Monogo's system for re-partitioning is cumbersome, as well. The benefit of a key-value system is it should be really simple and really fast, so you can scale up and keep server and management costs down.

Other systems are more tuned for key-value use. You mention Redis, one of the best key-value stores, but the repartitioning/clustering in Redis is still alpha-level, and there is a requirement of DRAM. Some people build their own shard layers and partitioning layers on Redis - this is very common among some of the larger Chinese social networks.

Cassandra is sometimes used as a key-value store. This isn't the best use of Cassandra, as Cassandra's "super column families" provide rich indexing. Cassandra isn't as fast as databases written in C like Redis and Mongo, but does have strong clustering capabilities.

One store you should strongly consider in this area is Aerospike. Aerospike has very flexible cluster management - adding a single node by just bringing it up - as well as support for both DRAM and SSD/Flash - and easy replication for HA. It's in use at very high levels of scale by advertising platform companies who need huge key value stores. Aerospike has a free version that supports node sizes to 200G.

CoucheBase (was MemBase) is another system to look at for key-value use. It provides some clustering primitives, and is focused more around in-memory use.